Internet2 Technology Exchange 2018 October, 2018 Kris Steinhoff
Goals: An Ethical, Privacy Preserving Platform Enable researchers to ask aggregate questions across multiple data sets in a ethical, privacy-preserving manner. Allow for a privacy and ethics body review to ensure that only appropriate, aggregate questions are asked. Allow researchers to ask aggregate questions across multiple data sets while no researcher has direct access to the data sets. Enable U-M ITS to support such queries in a scalable, effective manner.
Wi-Fi Mobility Data DEVICE LOCATION IDENTITY AP LOCATION MULTIPLE APs TRIANGULATION MAC ADDRESS UNIQUE ID TIME BUILDING SUB- CAMPUS AP NAME ROLE ROOM DEVICE MAC ADDRESS SIGNAL STRENGTH HOME BASE PATH AP DIRECTION COLLISION COHORT GIS GIS GIS GIS GIS DEVICE LOCATION/TIME SERIES (AT REST/IN TRANSIT) GIS GIS GIS GIS GIS CAMPUS GIS (X, Y, Z) GIS
PrivaScope 1.0 Portal
Overview Data Sources People direct query Wifi Researcher... request study Running code Sandbox Database anonymized subset Data Loader schedule code run Enclave Database Privascope Secure Enclave Privascope Infrastructure Running code - Study request - Study approval - Code run scheduling - approval results reviewed before release
Technical Architecture Data Sources People direct query Wifi Researcher... request study Running code Sandbox Database anonymized subset Data Loader schedule code run Enclave Database Privascope Secure Enclave Privascope Infrastructure Running code - Study request - Study approval - Code run scheduling - approval results reviewed before release
Technical Architecture Docker
Technical Architecture Docker Web application written in Django using the django-fsm library to manage workflow. Deployed outside the PrivaScope Enclave, currently in an on-prem OpenShift cluster.
Technical Architecture Docker queueing is handled with the Celery python library using.
Technical Architecture Docker s are run in Docker containers to achieve process isolation.
Horizontal Scaling Kubernetes Cluster HPC VM This architecture allows for horizontal scaling at the processing node level.
Technical Architecture Docker
Workflow 1. 2. 3. 4. 5. 6. Researcher: submits algorithm/code through PrivaScope portal PrivaScope Review Board: reviews privacy protection attributes of the code IF APPROVED PrivaScope staging processing: queues algorithm for execution in secure enclave PrivaScope query engine: runs algorithm in secure enclave PrivaScope Review Board: reviews the output to ensure privacy protection compliance IF APPROVED Output is released to researcher for publishing
Technical Architecture runner Docker submitted Build Run Collect released
Workflow runner Docker submitted Build Run Collect Researcher submits job code and dependencies. released
Workflow runner Docker submitted Build Run Collect Code is reviewed by the PrivaScope team. released
Workflow runner Docker submitted Build Run Collect If, the job is queued for execution. released
Workflow runner Docker submitted Build Run Collect The runner retrieves job from the queue and builds the image in Docker. released
Format Dockerfile (required) analysis.py FROM python3:latest import os from mongo import Connection import pandas as pd RUN mkdir /usr/src/app WORKDIR /usr/src/app wifi = Connection(os.getenv('MONGODB_URL')).wifi COPY. /usr/src/app/ CMD venv/bin/python3 analysis.py df = pd.dataframe(list(wifi.find())) #... analysis df.to_csv('results.csv')
Format Dockerfile (required) analysis.py FROM python3:latest import os from mongo import Connection import pandas as pd RUN mkdir /usr/src/app WORKDIR /usr/src/app wifi = Connection(os.getenv('MONGODB_URL')).wifi COPY. /usr/src/app/ CMD venv/bin/python3 analysis.py df = pd.dataframe(list(wifi.find())) #... analysis df.to_csv('results.csv') The Dockerfile is used by PrivaScope to create a Docker image.
Format Dockerfile (required) analysis.py FROM python3:latest import os from mongo import Connection import pandas as pd RUN mkdir /usr/src/app WORKDIR /usr/src/app wifi = Connection(os.getenv('MONGODB_URL')).wifi COPY. /usr/src/app/ CMD venv/bin/python3 analysis.py df = pd.dataframe(list(wifi.find())) #... analysis df.to_csv('results.csv') The researcher can include dependencies with their job to support their analysis code.
Format Dockerfile (required) analysis.py FROM python3:latest import os from mongo import Connection import pandas as pd RUN mkdir /usr/src/app WORKDIR /usr/src/app wifi = Connection(os.getenv('MONGODB_URL')).wifi COPY. /usr/src/app/ CMD venv/bin/python3 analysis.py df = pd.dataframe(list(wifi.find())) #... analysis df.to_csv('results.csv') PrivaScope will populate several variables into the environment of the running container to allow the analysis code to connect to data in the enclave.
Format Dockerfile (required) analysis.py FROM python3:latest import os from mongo import Connection import pandas as pd RUN mkdir /usr/src/app WORKDIR /usr/src/app wifi = Connection(os.getenv('MONGODB_URL')).wifi COPY. /usr/src/app/ CMD venv/bin/python3 analysis.py df = pd.dataframe(list(wifi.find())) #... analysis df.to_csv('/srv/data/results.csv') The analysis code can output results to a standard location which will be collected by PrivaScope for review.
Workflow runner Docker submitted Build Run Collect The job is run in a Docker container. The container not given any network access outside the PrivaScope enclave. released
Workflow runner Docker submitted Build Run Collect The job results are returned to the web application workflow. released
Workflow runner Docker submitted Build Run Collect The results are reviewed by the PrivaScope team to ensure that they only contain aggregate results. released
Workflow runner Docker submitted Build Run Collect If, the results are made available to the researcher. released
Future Plans Refine PrivaScope 1.0 workflows and administration. Integration with Git (GitLab merge requests and/or CI/CD). Our goal for PrivaScope 2.0 is to build an API that allows users to query arbitrarily and have the API enforce privacy preservation.
Questions