CRCS and Berkman Center Working Group Educational Document ...
Bridging Notions of Privacy (a.k.a. de-identification WG) Kobbi Nissim (BGU and [email protected]) Privacy Tools for Sharing Research Data NSF site visit, October 2015 WG Goals 1. Help Dataverse depositors navigate the complex privacy landscape (hence, enabling more sharing) Pedagogical document Excerpts may be integrated with a future tagging system
2. Bridging law and mathematical definitions of privacy In what sense does differential privacy satisfy the language of the law? 3. Building our own common understanding of legal and technological aspects of privacy Who? Discussion open to all CRCS: Kobbi Nissim (lead), Salil Vadhan, Marco Gaboardi; Post doctoral researcher: Or Sheffet; Ph.D. Students: Thomas Steinke, Mark Bun, Aaron Bembenek; REU students Berkman: Alex Wood, David OBrien; Ph.D. Student: Ann Kristen; Law students IQSS: Deborah Hurley Visitors: Latanya Sweeney (Harvard), Vitaly Shmatikov (Cornell), Micah Altman (MIT), Sonia
Barbosa (Harvard) The Pedagogical Document Pedagogical document Goal: Help social scientists (Dataverse depositors) navigate the complex privacy landscape Target audience: Social scientists conducting studies using personal information Format: collection of 3-4 documents, ~ Dec 15 ~ Nov 15
Importance of data privacy, implications of privacy breaches Relevant laws and best practices Common de-identification methods and re-identification risks Differential privacy Planned use: Stand alone documents Language to explain topics to future Dataverse users as they consider whether and how to use tools developed in the Privacy Tools project Pedagogical document (DP)
Not this way: is -differentially private if s.t. , . Pedagogical document (DP) Structure: 1. 2. 3. 4. 5. 6.
Introduction What is the differential privacy guarantee? The privacy loss parameter How does differential privacy address privacy risks? Differential privacy and legal requirements How are differentially private analyses constructed? 7. Limits of differential privacy 8. Tools for differentially private analyses 9. Summary 10. Further discussion 11. Further reading Simple language and technical terms
But mathematically accurate and factual Illustrative examples What is the privacy guarantee? Demonstration of differencing attack Interpreting risk via replacing probability with dollar amounts Incorporated feedback from our social science REU students An Example: Gertrudes Life Insurance Gertrude is 65, her life insurance policy is $100,000, considers her risks from participating in a medical
study performed with DP Gertrude baseline risk: 1% chance of dying next year, fair premium $1,000 Gertrude is a coffee drinker, if study shows 65-year-old female coffee drinkers have 2% chance of dying next year, her fair premium would be $2,000 Gertrude worried that the study may reveal more maybe she has a 50% chance of dying, would that increase her premium from $2,000 to $50,000? Reasoning about Gertrudes risk Study done with =0.01 Insurance companys estimate of Gertrude's dying probability can increase to at most (1+ ) 2 = 2.02% Fair premium would increase to at most $2,020, Gertrudes risk would be at most $20 What have we done?
Simplified but somewhat realistic situation Translated a complicated notion of probability to an easier to understand dollar amounts Provided a table for performing similar calculations (w/varying values of posterior beliefs and ) Exploring/bridging law and mathematical definitions of privacy Does differential privacy satisfy the legal privacy standards? Why ask? Essential for making differential privacy usable! De-identification is only technique specifically endorsed by standards like FERPA and HIPAA E.g., HIPAAs Safe Harbor method: Remove all 18 listed identifiers
No clear standard w.r.t. other techniques HIPAAs Expert Determination method: Obtain confirmation from a qualified statistician that the risk of identification is very small Who is an expert? How should s/he determine that the risk is small? Were here to help! A gap to be bridged CS paradigm of security definitions Security defined as a game with an attacker Attacker defined by: Computational power (how much resources such as time, memory, it can spend) External knowledge it can bring from outside the system (aka auxiliary information)
Not a uniquely specified attacker, but a large family of potential attackers Capture all plausible misuses Game defines: Access to the system What it means for an attacker to win System secure: If no attacker can win too much Privacy definitions in FERPA/HIPAA/ Not technically rigorous, open for interpretation Refer to the obvious extreme cases, not to the hard to determine grey areas
Advocate redaction of identifying information Not as clear about other techniques No explicit attacker model, but regulations do contain hints: Who is the attacker? What would be considered a win? Opportunities for Bridging the Gap Many shared goals: Understanding privacy Minimizing harms from data usage while obtaining as much utility as possible Differential privacy: Not conforming to regulation would be a barrier for usage
Law and regulation: Need to understand technology to approve its use Bridging the legal and CS views BAD copy input to output ! redact this Good? it depends... analyse
s Bridging the legal and CS views BAD copy input to output ! redact this Good? it depends... analyse s
Bridging the legal and CS views BAD copy input to output ! redact this Good? it depends... analyse s
Bridging the legal and CS views BAD copy input to output ! redact this Good? it depends... analyse s Methodology:
BAD copy input to output ! DP redact this Good? analyse s Methodology: 1. Search explicit requirements and hints on attacker model
E.g., FERPA defines attacker as A reasonable person in the school community that does not have personal knowledge of the relevant circumstances Directory information can be made public Attackers goal: identification of sensitive (non-directory) data Etc. 2. Create a formal mathematical attacker model for the regulation Always err on the conservative side 3. Provide a formal mathematical proof I.e., differential privacy satisfies the resulting security definition 4. Suggest how to set up the privacy parameter Based on the regulation
Provide explanation suitable for CS and Legal scholars alike! Summary WG active for ~one year Regular weekly meeting, persistent core of participants bringing expertise in TCS and law, field expert visitors Productive cross fertilization
Knowledge transfer between Law and CS Brainstorming and testing of ideas Collaboration on explaining privacy landscape to non-specialists New collaborative interdisciplinary research quantifiable, formal approach to privacy regulation Involving a PhD students and postdoctoral researchers Planned products: Educational document (for comments) on project and Berkman center sites, as well as SSRN Presentation of bridging work in Berkman lunch, November 2015 Paper in first steps of preparation
Additionally students are taking courses outside of Solanco (HACC, Millersville, CTC, etc). Students are able to build schedules that best fit their needs and allow us to break the traditional mold of the classroom setting. ... Expand Moodle use to...
CSS Data Warehousing for BS(CS) Lecture 1-2: DW & Need for DW Khurram Shahzad [email protected] Department of Computer Science Agenda Introduction Course Material Course Evaluation Course Contents Muhammad Khurram Shahzad M Khurram Shahzad Assistant Professor M.Sc. from PUCIT, University of...
How would you feel if Apple sold all of your history (every song you have downloaded, how you paid, the time and date, etc.) to another party? ... If you were building a system to track inventory: Data might include...
Nevada Department of Corrections had 2 months to prepare for our first official audit. ... post orders and the list goes on. In addition to procedures, we need to show ... A Pseudo-family can be as large as 15-20 inmates...
Human foods and supplements may have unintended effects on the flora populations and their functions. Microbial flora are microorganisms in or on the body. Not much is known about their association with humans. Foods and supplements may have unintended effects...
shells made of silica produce large amounts of oxygen Multicellular algae are classified by their pigments. Green algae contain chlorophyll a and b. Brown algae contain chlorophyll c. Red algae contain chlorophyll a and phycoerythrin. Many plantlike protists can reproduce...
Ready to download the document? Go ahead and hit continue!