Public Repository

Last pushed: 2 months ago
Short Description
This is the Open Source GDPR (General Data Protection Regulation) repository
Full Description

PontusVision GDPR Open Source

The PontusVision GDPR Open Source IT solution can help companies comply with GDPR in three steps:

  • Extract – Enables c ustomers to extract personal information from a variety of different areas, including e-mails, MS Office documents, Relational Databases, CRM Systems, and Big Data Lakes.

  • Track – Enables customers to track the origin of the data, including where the data came from, how to delete it, update it, and stores the data into a Graph database.

  • Comply – Gives data protection officers a web portal with a single view of the Data, including the ability to fulfil subject access requests, and data breach analysis (figure out which data was impacted by security breaches).

Why Pontus

PontusVision GDPR Open Source IT Solution is the only one in the market that combines the following features in one product:

  • Open Source – all of Pontus Vision GDPR software is open sourced. The UK Government department where the platform was born has very progressive attitudes towards using and producing open source software. This gives customers a clear view of the code, and prevents vendor lock-in.

  • Cloud Neutral – our solution does not rely on any cloud vendor-specific technologies. The solution can be deployed on-prem, within any cloud vendor that supports Linux Servers, and even across cloud vendors for extra resiliency.

  • Cyber Security – we have had to get our architecture and design revised by a number of accreditors including reviews from NCSC/GCHQ. This enables customers to be reassured that the platform is as safe as their needs require.

  • Scalable Automation – The Pontus Vision GDPR Architecture and design have as few manual steps as possible to enable vast quantities of data to be processed. The solution is able to scale to 100s of billions of records.

  • Many Formats – Pontus Vision GDPR was designed and built as a modular solution that is capable of taking data from hundreds of different formats. We also include the ability to create bespoke sources and create a reusable library of components.


Our architecture follows our simple three steps of Extract Track Comply:

On the Extract part of the design, we are using a powerful open source flow management infrastructure (Pontus-NiFi) based on the Apache NiFi project; that enables users to convert data from a variety of platforms ready for the Track phase.

On the Track part of the design, we store data into a canonical format, and can run either Online Transaction Processing (OLTP), or Online Analytics Processing (OLAP) queries on the data to clean up the application. We use a gremlin Tinkerpop 3.3.0 compliant graph database do front those queries, and store the data into Apache Hbase 1.3.1 and index it with Elastic Search 5.6.3. We can also apply very rich redaction/filtering rules inside these stores to ensure that not even an administrator can see sensitive data. All the data is encrypted both in-flight (TLS) and at-rest (using dmcrypt), with keys optionally stored in a Hardware Security Module (HSM).

Lastly, the Comply part of the architecture is what gives users the ability to query the data. We ensure that all users are authenticated by using a combination of either Apache Knox or Nginx as HTTPs Gateways, with KeyCloak to authenticate users and generate a JSON Web Token (JWT) that can then be used to track user queries throughout the system. KeyCloak can authenticate users from a variety of external (OpenID, SAML, OAUTH2) as well as internal sources (e.g. Active Directory). The user queries can be easily modified to cater for the user needs without any new code being created.


To start using the sandbox image in this repository, you must have the latest version of docker available; we strongly recommend that you run docker from a RHEL (or centos) 7 virtual machine on windows rather than use the standard windows docker.
Here are the steps to install docker on a fresh Centos 7 VM:

# remove old docker images:
sudo yum -y remove docker  docker-common docker-selinux docker-engine

# pre-reqs for the docker community edition (docker-ce)
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo
yum -y install docker-ce
# create conf file to throttle down the simultaneous connections to 1 
# this helps avoid timeout errors of 'Auth failed' when downloading large images
cat << 'EOF' > /etc/docker/daemon.json
 "max-concurrent-uploads" : 1,
 "max-concurrent-downloads" : 1

systemctl start docker

Then ensure that you forward tcp port number 8443 and 5005 from your virtual machine to your host.

Port Forwarding in Virtual Box
If using Virtual Box you can forward ports by clicking on Settings>Network>Advanced>Port Forwarding
The table should look similar to this:

Name Protocol Host IP Host Port Guest IP Guest Port
SSH TCP 22 22
GUI TCP 8443 8443
NIFI TCP 8080 8080
Dbg TCP 5005 5005

Getting Started

Once the pre-reqs have been applied, simply run the following:

docker run --privileged --name pontus_sandbox -v /sys/fs/cgroup:/sys/fs/cgroup:ro  -p8443:8443 -p5005:5005 -p 8080:8080 -d  pontusvision/open-source-gdpr

This will kick start parts of the architecture above in a centos/systemd - based docker image with the following processes running:

  • pontus-graph - a graph database that will store the GDPR records in the local pontus-hbase and pontus-elastic instances. This is also currently handling some of the pontus-gui REST requests.
  • pontus-nifi - a flow management workflow tool that handles the extract business logic.
  • pontus-knox - an API gateway that proxies all the traffic to the nifi GUI and the pontus-gui under https://localhost:8443/gateway/sandbox/pvgdpr_gui
  • pontus-elastic - a local elastic search instance that can handle indexing of data
  • pontus-hbase - a local hbase instance (with built-in zookeeper) to store the data

Then, open your browser on https://localhost:8443/gateway/sandbox/pvgdpr_gui

Docker Pull Command