Data Privacy in Big Data Applications. Sreagni Banerjee CS-846

Similar documents
Prajapati Het I., Patel Shivani M., Prof. Ketan J. Sarvakar IT Department, U. V. Patel college of Engineering Ganapat University, Gujarat

Privacy Challenges in Big Data and Industry 4.0

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain

Comparison and Analysis of Anonymization Techniques for Preserving Privacy in Big Data

Big data privacy: a technological perspective and review

Data attribute security and privacy in Collaborative distributed database Publishing

PrivApprox. Privacy- Preserving Stream Analytics.

ECEN Security and Privacy for Big Data. Introduction Professor Yanmin Gong 08/22/2017

Survey Result on Privacy Preserving Techniques in Data Publishing

A Review of Privacy Preserving Data Publishing Technique

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud

K ANONYMITY. Xiaoyong Zhou

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

Privacy-Preserving Machine Learning

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey

Integration of information security and network data mining technology in the era of big data

KantanMT.com. Security & Infra-Structure Overview

Comparative Analysis of Anonymization Techniques

Privacy Preserved Data Publishing Techniques for Tabular Data

CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung

Slicing Technique For Privacy Preserving Data Publishing

All Aboard the HIPAA Omnibus An Auditor s Perspective

Massive Scalability With InterSystems IRIS Data Platform

Characterizing Smartphone Usage Patterns from Millions of Android Users

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Security Control Methods for Statistical Database

An Adaptive Algorithm for Range Queries in Differential Privacy

Big Data - Security with Privacy

Modern Database Architectures Demand Modern Data Security Measures

Crowd-Blending Privacy

A Layered Protocol Architecture for Scalable Innovation and Identification of Network Economic Synergies in the Internet of Things

Computer-based Tracking Protocols: Improving Communication between Databases

Reza Tourani, Satyajayant (Jay) Misra, Travis Mick

Anonymization of Network Traces Using Noise Addition Techniques

Efficient Privacy Preservation Techniques for Maintaining HealthCare Records Using Big Data

Distributed Hybrid MDM, aka Virtual MDM Optional Add-on, for WhamTech SmartData Fabric

Kusum Lata, Sugandha Sharma

Denial of Service, Traceback and Anonymity

DDN Annual High Performance Computing Trends Survey Reveals Rising Deployment of Flash Tiers & Private/Hybrid Clouds vs.

A Peek at the Future Intel s Technology Roadmap. Jesse Treger Datacenter Strategic Planning October/November 2012

Solution Overview Gigamon Visibility Platform for AWS

Distributed Systems CS6421

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin

How Security Policy Orchestration Extends to Hybrid Cloud Platforms

Void main Technologies

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

IMPROVING DATA SECURITY USING ATTRIBUTE BASED BROADCAST ENCRYPTION IN CLOUD COMPUTING

On Privacy-Preservation of Text and Sparse Binary Data with Sketches

Commercial Data Intensive Cloud Computing Architecture: A Decision Support Framework

Distributed Data Anonymization with Hiding Sensitive Node Labels

Networking Cyber-physical Applications in a Data-centric World

Fujitsu World Tour 2018

Pufferfish: A Semantic Approach to Customizable Privacy

USE CASES. See how Polygon s Biometrid can be used in different usage settings

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

CHALLENGES GOVERNANCE INTEGRATION SECURITY

ITU Arab Forum on Future Networks: "Broadband Networks in the Era of App Economy", Tunis - Tunisia, Feb. 2017

Fine-Grained Access Control

Three Key Challenges Facing ISPs and Their Enterprise Clients

Data Management and Security in the GDPR Era

Exam C Foundations of IBM Cloud Reference Architecture V5

Rocky Mountain Cyberspace Symposium 2018 DoD Cyber Resiliency

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness

ELASTIC DATA PLATFORM

A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017

Presentation Outline. Introduction Information Gathering Sample of Results Requirements Case Study Challenges Recommendations

vrealize Introducing VMware vrealize Suite Purpose Built for the Hybrid Cloud

A Way to Personalize In-Home Healthcare and Assisted Living

THALES DATA THREAT REPORT

SOLUTION BRIEF NETWORK OPERATIONS AND ANALYTICS. How Can I Predict Network Behavior to Provide for an Exceptional Customer Experience?

Office 365 Buyers Guide: Best Practices for Securing Office 365

ENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING

Public Key Infrastructure scaling perspectives

Ambiguity: Hide the Presence of Individuals and Their Privacy with Low Information Loss

Automating Service Management For 5G Networks

Top 4 considerations for choosing a converged infrastructure for private clouds

mhealth: Privacy Challenges in Smartphone-based Personal Health Records and a Conceptual Model for Privacy Management

MULTIVARIATE ANALYSIS OF STEALTH QUANTITATES (MASQ)

Massive Data Analysis

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.

SAP C_TS4FI_1610 Exam

White Paper. Why IDS Can t Adequately Protect Your IoT Devices

PRIVACY STATEMENT +41 (0) Rue du Rhone , Martigny, Switzerland.

Horizon Health Care, Inc.

Teradata and Protegrity High-Value Protection for High-Value Data

HIPAA AND SECURITY. For Healthcare Organizations

Towards the Use of Graph Summaries for Privacy Enhancing Release and Querying of Linked Data

Wireless e-business Security. Lothar Vigelandzoon

Privacy Breaches in Privacy-Preserving Data Mining

The Introduction of Sensor-Cloud and Its Architecture, Applications and Approaches. Mao-Lin Li 2013/11/5

Privacy and Security Ensured Rule Mining under Partitioned Databases

How do you decide what s best for you?

Privacy-Preserving WebID Analytics on the Decentralized Policy-Aware Social Web

Design and Implementation of Privacy-Preserving Surveillance. Aaron Segal

Cell Suppression In SAS Visual Analytics: A Primer

4 Ways to Protect Your Organization from a Data Breach

Stadium. A Distributed Metadata-private Messaging System. Matei Zaharia Nickolai Zeldovich SOSP 2017

Review Paper onbuilding Prediction based model for cloud-based data mining

Whitepaper. Endpoint Strategy: Debunking Myths about Isolation

Transcription:

Data Privacy in Big Data Applications Sreagni Banerjee CS-846

Outline! Motivation! Goal and Approach! Introduction to Big Data Privacy! Privacy preserving methods in Big Data Application! Progress! Next Steps! References

Motivation! Data are diverse. User privacy is a huge concern.! Need for a review of the privacy preservation mechanisms in Big Data clarifies the challenges of existing privacy mechanisms.! There is a lack of a comparative study between various techniques of Big Data privacy.! There is a need of analysis of privacy and security aspects healthcare in Big Data.

Research Question! RQ1. What are the privacy preserving techniques available in the market and are they good enough?! RQ2. How computing infrastructures should be configured and intelligently managed to fulfill the most notably privacy aspects required by Big Data applications?! RQ3:How Big Data privacy preserving mechanisms have been used in healthcare?

Goal! Identifying Big Data privacy requirements.! Investigating how Big Data privacy-preserving techniques have been used in healthcare! Investigating privacy challenges in each phase of Big Data life cycle.! Find out the advantages and disadvantages of existing privacy-preserving technologies in the context of Big Data applications.

Big Data Privacy in Data Life Cycle

Big Data Privacy in Data Life Cycle Big Data privacy in data generation phase! Access Restriction (anti-tracking extensions).! Falsifying data(i.e. Socket Puppet, Mask me). Big Data privacy in data storage phase! Approaches to privacy preservation storage on cloud: Encryption(ABE,IBE).! Integrity verification of Big Data storage. Big Data privacy preserving in data processing! First phase : Safeguarding Information.! Second Phase: Extract meaningful information from the data without violating the privacy.

Privacy Preserving Methods in Big Data! De-identification:Process used to prevent a person's identity from being connected with information.

Privacy Preserving Methods in Big Data! HybrEx: Utilizes public clouds only for non-sensitive data whereas sensitive data and computation, the model utilizes their private cloud.! Map hybrid! Vertical partitioning! Horizontal partitioning! Hybrid! Privacy-preserving aggregation: Limited to the phases of Big Data collecting and storing.

Recent Techniques of Privacy Preserving in Big Data! Differential Privacy:! Analysts are not provided the direct access to the database containing personal information.! Intermediate software: Privacy guard.

Recent Techniques of Privacy Preserving in Big Data! Identity based anonymization:! Combination of anonymization, privacy protection, and big data techniques.! Initiative taken by Intel.

Recent Techniques of Privacy Preserving in Big Data! Hiding a needle in a haystack: The Idea: Detecting a rare class of data, such as the needles, is hard to find in a haystack, such as a large size of data. Example: The service provider adds a dummy item as noise to the original transaction data collected by the data provider.

Related Work! Limited research results focus on privacy in the context of Big Data applications.! ACHIEVING k-anonymity PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION--paper published by LATANYA SWEENEY(School of Computer Science, Carnegie Mellon University).! l-diversity: Privacy Beyond k-anonymity--a paper by Ashwin Machanavajjhala Johannes Gehrke Daniel Kifer Muthuramakrishnan Venkitasubramaniam(Department of Computer Science, Cornell University)! t-closeness: Privacy Beyond k-anonymity and l-diversity a paper by Ninghui Li, Tiancheng Li and Suresh Venkatasubramanian(Department of Computer Science, Purdue University)

Fast Anonymization of Big Data Streams Problem: Privacy preserving technologies cannot be applied to Big Data streams because:! Unlike static data, data streams need real-time processing.! In the existing static k-anonymization algorithms to reduce information loss, data must be repeatedly scanned during the anonymization procedure. The same process is impossible in data streams processing.! The scales of data streams in some applications are increasing tremendously. Solution: Proposed: Fast Anonymization.

Big Data Privacy in Healthcare! Privacy preserving techniques have been increasingly relevant in healthcare, especially in areas such as pervasive healthcare computing.! Patient information is stored in data centres with varying levels of security.! Big Data governance is necessary prior to exposing data to analytics. Some important aspects:! Data governance! Real-time security analytics! Privacy-preserving analytics! Data quality! Data sharing and privacy

Progress! Have done analysis on privacy techniques in different phases of Big Data life cycle.! Surveying some recent techniques about privacy in Big Data application in depth((e.g.the techniques mentioned previously).! Currently doing comparative studies on the techniques and also trying to find out how this techniques are already used in Healthcare.

Future work:! As our future direction, perspectives are needed to achieve effective solutions to the scalability problem of privacy and security in the era of Big Data and especially to the problem of reconciling security and privacy models by exploiting the map reduce framework.! Privacy preserving techniques for data streams.! Differential privacy is one area which has got significant potential to be utilized further.

References:! http://www.cs.colostate.edu/~cs656/reading/ldiversity.pdf! https://www.researchgate.net/publication/322330948_a_study_on_k-anonymity_ldiversity_and_t-closeness_techniques_focusing_medical_data! http://ijesc.org/upload/ 53e8ff6d52d14fefd7f4c55763537023.Controlling%20Privacy%20by%20using%20Methodo logy%20and%20techniques%20of%20big%20data.pdf! https://dataprivacylab.org/dataprivacy/projects/kanonymity/kanonymity2.pdf! https://www.utdallas.edu/~muratk/courses/privacy08f_files/ldiversity.pdf! https://www.cs.purdue.edu/homes/ninghui/papers/t_closeness_icde07.pdf! https://www.rand.org/content/dam/rand/pubs/working_papers/wr1100/wr1161/ RAND_WR1161.pdf! https://arxiv.org/ftp/arxiv/papers/1601/1601.06206.pdf

Thank you! Questions?