Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps

Similar documents
Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy

Business Model for Global Platform for Big Data for Official Statistics in support of the 2030 Agenda for Sustainable Development

Security and Privacy Governance Program Guidelines

Open Data Policy City of Irving

National Policy and Guiding Principles

How Cybersecurity Initiatives May Impact Operators. Ross A. Buntrock, Partner

NATIONAL DEFENSE INDUSTRIAL ASSOCIATION Homeland Security Symposium

MNsure Privacy Program Strategic Plan FY

Re: Special Publication Revision 4, Security Controls of Federal Information Systems and Organizations: Appendix J, Privacy Control Catalog

Digital Preservation Policy. Principles of digital preservation at the Data Archive for the Social Sciences

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

U.S. Japan Internet Economy Industry Forum Joint Statement October 2013 Keidanren The American Chamber of Commerce in Japan

UAE National Space Policy Agenda Item 11; LSC April By: Space Policy and Regulations Directory

Mitigation Framework Leadership Group (MitFLG) Charter DRAFT

NATIONAL INSTITUTE OF FORENSIC SCIENCE

ISAO SO Product Outline

M a d. Take control of your digital security. Advisory & Audit Security Testing Certification Services Training & Awareness

Legal and Regulatory Developments for Privacy and Security

The President s Spectrum Policy Initiative

Turning Risk into Advantage

NEW INNOVATIONS NEED FOR NEW LAW ENFORCEMENT CAPABILITIES

AMERICAN CHAMBER OF COMMERCE IN THAILAND DIGITAL ECONOMY POSITION PAPER

Implementing Executive Order and Presidential Policy Directive 21

Between 1981 and 1983, I worked as a research assistant and for the following two years, I ran a Software Development Department.

13.f Toronto Catholic District School Board's IT Strategic Review - Draft Executive Summary (Refer 8b)

Five-Year Strategic Plan

Competency Definition

From the E-readiness Assessment and Analysis to an Action Plan and Policies Recommendations. Gabriel Accascina

Sustainable Governance for Long-Term Stewardship of Earth Science Data

Angela McKay Director, Government Security Policy and Strategy Microsoft

G7 Bar Associations and Councils

Cyber Partnership Blueprint: An Outline

Accreditation Services Council Governing Charter

ICGI Recommendations for Federal Public Websites

International Atomic Energy Agency Meeting the Challenge of the Safety- Security Interface

UNITED STATES OFFICE OF PERSONNEL MANAGEMENT

Business microdata dissemination at Istat

FIRE REDUCTION STRATEGY. Fire & Emergency Services Authority GOVERNMENT OF SAMOA April 2017

Promoting accountability and transparency of multistakeholder partnerships for the implementation of the 2030 Agenda

EGM, 9-10 December A World that Counts: Mobilising the Data Revolution for Sustainable Development. 9 December 2014 BACKGROUND

EU Data Protection Triple Threat for May of 2018 What Inside Counsel Needs to Know

December 10, Statement of the Securities Industry and Financial Markets Association. Senate Committee on Banking, Housing, and Urban Development

Stony Brook University Data Strategy. Presented to the Data Governance Council June 8, 2017

SLAS Special Interest Group Charter Application

International Legal Regulation of Cybersecurity U.S.-German Standards Panel 2018

Metadata Framework for Resource Discovery

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING

Policy. London School of Economics & Political Science. Remote Access Policy. IT Services. Jethro Perkins. Information Security Manager.

Federal STI Managers Group Presentation to Board on Research Data and Information Ellen Herbst Director, NTIS CENDI Chair

Managing Jurisdictional Risks for Public Cloud Services

IJESRT. (I2OR), Publication Impact Factor: (ISRA), Impact Factor: 2.114

STRATEGIC PLAN

MYTH vs. REALITY The Revised Cybersecurity Act of 2012, S. 3414

APF!submission!!draft!Mandatory!data!breach!notification! in!the!ehealth!record!system!guide.!

H2020-LEIT-ICT WP European Data Infrastructure ICT-13 Supporting the emergence of data markets and the data economy

Agenda. Bibliography

TEL2813/IS2820 Security Management

The Science and Technology Roadmap to Support the Implementation of the Sendai Framework for Disaster Risk Reduction

UNDAF ACTION PLAN GUIDANCE NOTE. January 2010

2011 INTERNATIONAL COMPARISON PROGRAM

SOC for cybersecurity

Objectives and Agenda

Executive Summary and Overview

Global Wildlife Cybercrime Action Plan1

Cybersecurity Risk Management:

EXAM PREPARATION GUIDE

2. What is Personal Information and Non-Personally Identifiable Information?

USA HEAD OFFICE 1818 N Street, NW Suite 200 Washington, DC 20036

IMF Statistics Department

Defense Coastal/Estuarine Research Program (DCERP) SERDP RC DCERP Data Policy Version 2.0

Recommendations for Small and Medium Enterprises. Event Date Location

STRATEGIC PLAN. USF Emergency Management

Office of Acquisition Program Management (OAPM)

Background Document for Agenda Item 3: Briefing from the Secretariat on launch of the Green Coalition on the Belt and Road Initiative

PONEMON INSTITUTE RESEARCH REPORT 2018 STUDY ON GLOBAL MEGATRENDS IN CYBERSECURITY

Commonwealth Cyber Declaration

National Science and Technology Council. Interagency Working Group on Digital Data

Economic and Social Council

STRENGTHENING THE CYBERSECURITY OF FEDERAL NETWORKS AND CRITICAL INFRASTRUCTURE

Connected & Automated Vehicle Activities

STRATEGIC PLAN VERSION 1.0 JANUARY 31, 2015

Advanced Cyber Risk Management Threat Modeling & Cyber Wargaming April 23, 2018

ESTABLISHMENT OF AN OFFICE OF FORENSIC SCIENCES AND A FORENSIC SCIENCE BOARD WITHIN THE DEPARTMENT OF JUSTICE

Summary of Consultation with Key Stakeholders

Proposed Regional ehealth Strategy ( )

I-95 Corridor Coalition. Multi-State VMT-Based Road-User Fee Initiative. Mark F. Muriello

BENEFITS of MEMBERSHIP FOR YOUR INSTITUTION

International Nonproliferation Export Control Program (INECP) Government Outreach for Enterprise Compliance

Strategy for long term preservation of material collected for the Netarchive by the Royal Library and the State and University Library 2014

Cybersecurity Presidential Policy Directive Frequently Asked Questions. kpmg.com

Collaboration on Cybersecurity program between California University and Shippensburg University

TURNING STRATEGIES INTO ACTION DISASTER MANAGEMENT BUREAU STRATEGIC PLAN

Security Management Models And Practices Feb 5, 2008

Implementing the Administration's Critical Infrastructure and Cybersecurity Policy

Sector(s) Public administration- Information and communications (50%), General public administration sector (50%)

Defining Computer Security Incident Response Teams

COMPANY BROCHURE. About Us. Kinnectiv, LLC. Consulting. Security. Innovation. +1(888)

Cybersecurity & Privacy Enhancements

HIPAA COMPLIANCE CALIFORNIA STATE UNIVERSITY, LOS ANGELES. Audit Report October 29, 2010

CONCLUSIONS AND RECOMMENDATIONS

Transcription:

Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps Robert Groves, Georgetown University and Chair of Committee on National Statistics Washington, DC October 20, 2017

Panel on Improving Federal Statistics for Policy and Social Science Research Using Multiple Data Sources and State-of-the-Art Estimation Methods Robert M. Groves, (Chair), Georgetown University Michael E. Chernew, Harvard University Piet Daas, Statistics Netherlands Cynthia Dwork, Harvard University Ophir Frieder, Georgetown University Hosagrahar V. Jagadish, University of Michigan Frauke Kreuter, University of Maryland Sharon Lohr, Westat, Inc. James P. Lynch, University of Maryland Colm O muircheartaigh, University of Chicago Trivellore Raghunathan, University of Michigan Roberto Rigobon, MIT Marc Rotenberg, Electronic Privacy Information Center 2

Statement of Task An ad hoc panel of nationally renowned experts in social science research, computing technology, statistical methods, privacy, and use of alternative data sources in the United States and abroad will conduct a study with the goal of fostering a paradigm shift in federal statistical programs. In place of the current paradigm of providing users with the output from a single census, survey, or administrative records source, a new paradigm would use combinations of diverse data sources from government and private sector sources combined with state-of-the art methods to give users richer and more reliable statistics leading to new insights about policy and socioeconomic behavior. The motivation for the study stems from the increasing challenges to the current paradigm, such as declining response rates and increasing cost and burden for surveys. The panel will prepare two reports as part of this study. 3

Acknowledgements The Laura and John Arnold Foundation, with additional support from the National Academy of Sciences Kellogg Fund. 4

The Panel s First Report: 5

Topics in Panel s First Report Introduction to Sample Survey Paradigm Use of Administrative Records Use of Private-Sector Data Protecting Privacy and Confidentiality While Allowing Researcher Access Advancing the Paradigm of Federal Statistics 6

Advancing the Paradigm of Combining Data Sources RECOMMENDATION 6-1: A new entity or an existing entity should be designed to facilitate secure access to data for statistical purposes to enhance the quality of federal statistics RECOMMENDATION 6-2: The proposed new entity should maximize the utility of the data for which it is responsible while protecting privacy by using modern database, cryptography, privacypreserving, and privacy-enhancing technologies. 7

The Panel s Second Report Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps 8

Table of Contents 1. Introduction 2. Statistical Methods for Combining Multiple Data Sources 3. Implications of Using Multiple Data Source for Information Technology Infrastructure 4. Legal and Scientific Approaches for Privacy 5. Preserving Privacy Using Technology from Computer Science, Statistical Methods, and Administrative Procedures 6. Quality Frameworks for Statistics Using Multiple Data Sources 7. A New Entity to Provide Vital Information through Enhanced Federal Statistics 9

Table of Contents 1. Introduction 2. Statistical Methods for Combining Multiple Data Sources 3. Implications of Using Multiple Data Source for Information Technology Infrastructure 4. Legal and Scientific Approaches for Privacy 5. Preserving Privacy Using Technology from Computer Science, Statistical Methods, and Administrative Procedures 6. Quality Frameworks for Statistics Using Multiple Data Sources 7. A New Entity to Provide Vital Information through Enhanced Federal Statistics 10

2. Statistical Methods for Combining Multiple Data Sources The driving forces behind statistics combining data sources are desires to have statistics with more temporal and spatial granularity 11

Statistical Methods for Combining Data Record Linkage Approaches Multiple Frame Methods Imputation Based Methods Small Area Estimation Models, with a variety of hierarchical structures Redesign of Surveys to be Mated to Nonsurvey data 12

Demands for More Granular Statistics RECOMMENDATION 2-1 Multiple data sources should be used to redesign current data collection efforts and estimation tasks to improve the utility, timeliness, and cost-efficiency of federal statistics. 13

Statistical Methods for Combining Data RECOMMENDATION 2-2 To achieve transparency, federal statistical agencies should document the processes used to collect, combine, and analyze data from multiple sources and make that documentation publicly available. RECOMMENDATION 7-6 The recommended new entity should strive to facilitate replicability of the linkage, processing and analyses conducted through the entity by compiling and storing metadata and documentation for authorized data users. 14

Next Steps for Combining Data Sources Implementation Issues RECOMMENDATION 2-4 Federal statistical agencies should ensure their statistical staff receive training for the new skills needed for combining data from different sources. RECOMMENDATION 2-5 Federal statistical agencies should develop partnerships with academia and external research organizations to develop methods needed for design and analysis using multiple data sources. 15

4. Legal and Computer Science Approaches to Privacy Personally Identifiable Information and Privacy Law Legal View of Privacy in the Context of Statistical Data Analysis The Scope of PII Examples Elucidating the PII/non-PII Issue Synthesis: A Proposed Liability Rule for PII Implications for Federal Statistical Agencies 16

The Scope of PII Currently legal experts and computer scientists have differing views of PII Combining data sources increases privacy risks through auxiliary data Data that previously could not identify an individual may now contain auxiliary information that could identify an individual 17

Implications for Federal Statistical Agencies RECOMMENDATION 4-1 Because linked datasets offer greater privacy threats than single datasets, federal statistical agencies should develop and implement strategies to safeguard privacy while increasing accessibility to linked datasets for statistical purposes. 18

6. Quality Frameworks for Statistics Using Multiple Data Sources A Quality Framework for Survey Research Broader Framework for Assessing Quality Assessing the Quality of Administrative and Private-Sector Data The Quality of Alternative Data Sources: Two Illustrations 19

Broader Frameworks for Assessing Quality RECOMMENDATION 6-1 Federal statistical agencies should adopt a broader framework for statistical information than total survey error to include additional dimensions that better capture user needs, such as timeliness, relevance, accuracy, accessibility, coherence, integrity, privacy, transparency, and interpretability. E.G., Data linkage errors, timeliness, spatial granularity 20

Assessing the Quality of Administrative and Private Sector Data RECOMMENDATION 6-2 Federal statistical agencies should outline and evaluate the strengths and weaknesses of alternative data sources on the basis of a comprehensive quality framework, and, if possible, quantify the quality attributes and make them transparent to users. Agencies should focus more attention on the tradeoffs between different quality aspects, such as, trading precision for timeliness and granularity, rather than focusing primarily on accuracy. 21

THANK YOU! For further information contact: Bob Groves (bgroves@georgetown.edu)

EXTRA SLIDES 23

3. Implications of Using Multiple Data Sources for Information Technology Infrastructure and Data Processing Issues for Federal Statistical Agency IT Systems System Architecture Data Processing Issues System Migration Personnel Staffing and Skills 24

System Architecture CONCLUSION 3-1 Moving to a paradigm of using multiple data sources requires a new and different information technology architecture than a paradigm based on a single data source. Federal statistical agencies will need to create research and production systems capable of using multiple, diverse data sources to create statistics. CONCLUSION 3-2 A range of possible computing environments could enable use of multiple data sources for statistics. Federal statistical agencies will need to consider the governance, functionality, and flexibility of a system, as well as the implications for protecting privacy and addressing data providers concerns regarding privacy. 25

Data Processing Issues CONCLUSION 3-3 Creating statistics using multiple data sources often requires complex methodology to generate even relatively simple statistics. With the advent of new and different sources and innovations in statistical products, federal statistical agencies need to figure out ways to provide transparency of their methods and to clearly communicate these methods to users. 26

Personnel Staffing and Skills RECOMMENDATION 3-1 Because technology changes continuously and understanding those changes is critical for the statistical agencies products, federal statistical agencies should ensure that their information technology staff receive continuous training to keep pace with these changes. Training programs should be set up to meet the current and expected future training needs for technology, and recruitment plans should account for future technology demands. 27

5. Preserving Privacy Using Technology from Computer Science, Statistical Methods, and Administrative Procedures Two Avenues to a Breach of Privacy Inference Control Techniques Implications for Federal Statistical Agencies 28

Implications for Federal Statistical Agencies RECOMMENDATION 5-1 Federal statistical agencies should ensure their technical staff receive appropriate training in modern computer science technology including but not limited to database, cryptography, privacy-preserving, and privacy-enhancing technologies. 29

7. A New Entity to Provide Vital Information through Enhanced Federal Statistics Attributes of the New Entity Organizational Location Functions Technological Environment for Data Access Access by Outside Researchers Privacy Transparency Financing Governance Implementation 30

Core Requirements for New Entity It has to have legal authority to access data that can be useful for statistical purposes. The legal authority needs to span cabinet-level departments and independent agencies. It has to have strong authority to protect the privacy of data that are accessed and prevent misuse. At minimum, that authority needs to be commensurate with existing laws (CIPSEA, the Privacy Act), but it may also require new legislation. It has to have authority to permit appropriate uses for the extraction of statistical information from the multiple datasets relevant to program evaluation and the monitoring of policy-relevant social and economic phenomenon. The authority needs to delimit what uses are forbidden as well as what uses are encouraged. It needs to be staffed with personnel whose skills fit the needs of the recommended entity, including advance IT architectures, data transmission, record linkage, statistical computing, cryptography, data curation, cybersecurity, and privacy regulations. 31

Organizational Location The report discusses different options for possible location of the new entity including: A New Federal Statistical Agency Within an Existing Federal Statistical Agency A Federally Funded Research and Development Center A University-Based Public-Private Research Center The report also discusses the advantages and disadvantages to each of these locations. RECOMMENDATION 7-1 The recommended new entity for meeting the statistical needs of the nation should follow the principles and practices for federal statistical agencies and permit information accessed through it to be used only for statistical purposes. 32

Functions The report discusses different approaches to allowing users to access data for statistical purposes within the entity. The entity could serve as a location to access and combine data and the staff would not do any data analysis. The entity could also be a full-service research institution who would also staff economists, statisticians, and other experts who can analyze data. RECOMMENDATION 7-2 The recommended new entity should assist federal statistical agencies in identifying data sources that can most effectively inform the creation of national statistics, help develop techniques to use data from these sources to compute national statistics while respecting privacy and other protection obligations on the data, and nurture the expertise required to perform these functions. 33

Access by Outside Researchers Currently, researchers are subject to a lengthy variety of approaches to be approved to access statistical data. The panel believes that the entity could be useful to streamlining the research approval process. RECOMMENDATION 7-3 Statistical agencies and the recommended new entity should strive to provide federal agency researchers and external researchers access to data for exclusively statistical purposes, in a timely manner, in a way that is not administratively burdensome and with strict adherence to confidentiality, privacy, and data security requirements. 34

Transparency It will be critical that it acknowledges people s right to know how their data are being used and that the concerns of the public and data providers guide its practices. The new entity can also serve as a valuable scientific function facilitating the creation of new statistics and research by maintaining metadata, code, and appropriate documentation for other users. 35

Transparency RECOMMENDATION 7-5 The recommended new entity should endeavor to maximize the transparency of its statistical activities by posting a summary of the data sources accessed through the entity on a public website. The summary should include the purpose and public benefit of the study, the data sources used, a brief description of the methodology, and links to resulting statistical products. RECOMMENDATION 7-6 The recommended new entity should strive to facilitate replicability of the linkage, processing and analyses conducted through the entity by compiling and storing metadata and documentation for authorized data users. 36

Financing The main source of funding is also clearly linked to the organizational location of the entity. If the entity is located in the federal government, it would presumably either receive direct appropriations from Congress or receive funding outside of the congressional appropriations process. This primary source of funding could be supplemented by additional reimbursable agreements If the entity is part of a private-public partnership, it could be funded by a federal statistical agency or it might receive some funding from other federal agencies supporting scientific research. 37

Governance Governance of the entity will be driven by the location of the organization and the authorizing legislation Given the mission and nature of the recommended new entity, consideration should be given to additional structures and mechanisms for governance of the entity. RECOMMENDATION 7-7 The director of the recommended new entity should report to a board of directors that includes representatives of the federal statistical agencies, experts on privacy, holders of data used in the entity, and users of statistical data. 38

Governance RECOMMENDATION 7-8 The recommended new entity should have an advisory committee on privacy to inform and advise the federal statistical system on policies and current best practices. The advisory committee should include privacy advocates, data users, and members of the public whose data may be accessed, as well as experts from statistics, computer science, and the legal profession. RECOMMENDATION 7-9 The legal foundation of the recommended new entity should foster independence from political and other undue external influence in providing access to data, linking and analyzing data, and in producing and disseminating statistical information. 39

Implementation A strategic plan will be needed for expanding the data sources accessible through the entity. This plan will need to be carefully structured in phases, detailing outcomes for each phase and decision points. The first phase might cover 5 years, at which time it would be useful to have a comprehensive review. The first phase needs to include expanded access to federal administrative and operational data that could be useful for federal statistics. Private-sector data could be included as part of the new entity in a later phase. How this entity is created and how it functions will determine its ability to be an effective resource of and for the federal statistical system. 40