Data Privacy in Big Data Applications Sreagni Banerjee CS-846
Outline! Motivation! Goal and Approach! Introduction to Big Data Privacy! Privacy preserving methods in Big Data Application! Progress! Next Steps! References
Motivation! Data are diverse. User privacy is a huge concern.! Need for a review of the privacy preservation mechanisms in Big Data clarifies the challenges of existing privacy mechanisms.! There is a lack of a comparative study between various techniques of Big Data privacy.! There is a need of analysis of privacy and security aspects healthcare in Big Data.
Research Question! RQ1. What are the privacy preserving techniques available in the market and are they good enough?! RQ2. How computing infrastructures should be configured and intelligently managed to fulfill the most notably privacy aspects required by Big Data applications?! RQ3:How Big Data privacy preserving mechanisms have been used in healthcare?
Goal! Identifying Big Data privacy requirements.! Investigating how Big Data privacy-preserving techniques have been used in healthcare! Investigating privacy challenges in each phase of Big Data life cycle.! Find out the advantages and disadvantages of existing privacy-preserving technologies in the context of Big Data applications.
Big Data Privacy in Data Life Cycle
Big Data Privacy in Data Life Cycle Big Data privacy in data generation phase! Access Restriction (anti-tracking extensions).! Falsifying data(i.e. Socket Puppet, Mask me). Big Data privacy in data storage phase! Approaches to privacy preservation storage on cloud: Encryption(ABE,IBE).! Integrity verification of Big Data storage. Big Data privacy preserving in data processing! First phase : Safeguarding Information.! Second Phase: Extract meaningful information from the data without violating the privacy.
Privacy Preserving Methods in Big Data! De-identification:Process used to prevent a person's identity from being connected with information.
Privacy Preserving Methods in Big Data! HybrEx: Utilizes public clouds only for non-sensitive data whereas sensitive data and computation, the model utilizes their private cloud.! Map hybrid! Vertical partitioning! Horizontal partitioning! Hybrid! Privacy-preserving aggregation: Limited to the phases of Big Data collecting and storing.
Recent Techniques of Privacy Preserving in Big Data! Differential Privacy:! Analysts are not provided the direct access to the database containing personal information.! Intermediate software: Privacy guard.
Recent Techniques of Privacy Preserving in Big Data! Identity based anonymization:! Combination of anonymization, privacy protection, and big data techniques.! Initiative taken by Intel.
Recent Techniques of Privacy Preserving in Big Data! Hiding a needle in a haystack: The Idea: Detecting a rare class of data, such as the needles, is hard to find in a haystack, such as a large size of data. Example: The service provider adds a dummy item as noise to the original transaction data collected by the data provider.
Related Work! Limited research results focus on privacy in the context of Big Data applications.! ACHIEVING k-anonymity PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION--paper published by LATANYA SWEENEY(School of Computer Science, Carnegie Mellon University).! l-diversity: Privacy Beyond k-anonymity--a paper by Ashwin Machanavajjhala Johannes Gehrke Daniel Kifer Muthuramakrishnan Venkitasubramaniam(Department of Computer Science, Cornell University)! t-closeness: Privacy Beyond k-anonymity and l-diversity a paper by Ninghui Li, Tiancheng Li and Suresh Venkatasubramanian(Department of Computer Science, Purdue University)
Fast Anonymization of Big Data Streams Problem: Privacy preserving technologies cannot be applied to Big Data streams because:! Unlike static data, data streams need real-time processing.! In the existing static k-anonymization algorithms to reduce information loss, data must be repeatedly scanned during the anonymization procedure. The same process is impossible in data streams processing.! The scales of data streams in some applications are increasing tremendously. Solution: Proposed: Fast Anonymization.
Big Data Privacy in Healthcare! Privacy preserving techniques have been increasingly relevant in healthcare, especially in areas such as pervasive healthcare computing.! Patient information is stored in data centres with varying levels of security.! Big Data governance is necessary prior to exposing data to analytics. Some important aspects:! Data governance! Real-time security analytics! Privacy-preserving analytics! Data quality! Data sharing and privacy
Progress! Have done analysis on privacy techniques in different phases of Big Data life cycle.! Surveying some recent techniques about privacy in Big Data application in depth((e.g.the techniques mentioned previously).! Currently doing comparative studies on the techniques and also trying to find out how this techniques are already used in Healthcare.
Future work:! As our future direction, perspectives are needed to achieve effective solutions to the scalability problem of privacy and security in the era of Big Data and especially to the problem of reconciling security and privacy models by exploiting the map reduce framework.! Privacy preserving techniques for data streams.! Differential privacy is one area which has got significant potential to be utilized further.
References:! http://www.cs.colostate.edu/~cs656/reading/ldiversity.pdf! https://www.researchgate.net/publication/322330948_a_study_on_k-anonymity_ldiversity_and_t-closeness_techniques_focusing_medical_data! http://ijesc.org/upload/ 53e8ff6d52d14fefd7f4c55763537023.Controlling%20Privacy%20by%20using%20Methodo logy%20and%20techniques%20of%20big%20data.pdf! https://dataprivacylab.org/dataprivacy/projects/kanonymity/kanonymity2.pdf! https://www.utdallas.edu/~muratk/courses/privacy08f_files/ldiversity.pdf! https://www.cs.purdue.edu/homes/ninghui/papers/t_closeness_icde07.pdf! https://www.rand.org/content/dam/rand/pubs/working_papers/wr1100/wr1161/ RAND_WR1161.pdf! https://arxiv.org/ftp/arxiv/papers/1601/1601.06206.pdf
Thank you! Questions?