The NIH Big Data to Knowledge Initiative: Raising the Prominence of Data Michael F. Huerta, Ph.D. Associate Director, National Library of Medicine Director, Office of Health Information Programs Development Society for Risk Assessment December 10, 2013
Biomedical Research Enterprise of Today Lots and lots of data in individual labs Lab 2 Lab 1 Lab 6 Lab 4 Lab 3 Lab 5
Biomedical Research Enterprise of Today Lots and lots of data in individual labs Few data broadly available to research community Major public products are concepts in scientific papers, not data Today s enterprise is not data centric
Biomedical Research Enterprise of Tomorrow Increased data sharing will make data available Community-based standards will make data useable Data will be made part of the research ecosystem Discoverable, citable & linked to literature, data & tools Data science & tools will enable scientific innovation Tomorrow s enterprise will be data centric
Today NIH Big Data to Knowledge Initiative for Research Data BD2K Tomorrow
NIH Considers Data & Informatics Data and Informatics Working Group of the Advisory Committee to the Director of NIH Report 6/15/12 BD2K BD2K is a significant and unique initiative Will fund research, development & training Supported jointly by all NIH Institutes & Centers Make biomedical research more data centric BD2K has three major thrusts Advance the science & technology of biomedical big data Enhance & develop the workforce in biomedical big data Facilitate the broad use of biomedical research data
Advance the Science & Technology of Big Data Centers of Excellence Nov 20 Applications received Summer Fall 2014 Awards made Spring 2014 Workshop planned on data integration Research Project Grants to develop software, tools & methods Aug 8 RFI on methods and software tools February 2014 Workshop planned FY14/15 FOAs planned Expedite the wide use of large scale computing Ongoing discussions with intramural NIH, National Labs, DARPA, Google, Microsoft, Amazon, etc.
Enhance & Develop Workforce in Big Data From undergraduate students to senior investigators A variety of short and long term training initiatives & resources Feb 20 RFI on training needs in biomedical big data July 29 & 30 Workshop on training Co-chaired by K. Bandeen-Roche & I. Kohane Report is being drafted Dec Jan FOAs planned
Facilitate Broad Use of Biomedical Data Change policy, practice & culture to increase availability of data Establish frameworks to routinely support community-based standards efforts Catalog of information about data to bring data into the ecosystem of research and scholarship Commenced August 2012 Recommendations January 2013
OSTP Memo Increasing Access Data & Publications from Federally Supported Research Use existing infrastructure & use public-private partnerships Improve ability to locate & access Optimize search, archiving & dissemination Notification of obligations to share Measure & enforce compliance Identify existing resources for implementation Provide timeline for implementation Identify barriers Memo February 2013 Agency Plan Submitted August 2013
Facilitate Broad Use of Biomedical Data Change policy, practice & culture to increase availability of data Recommendations: Data management & sharing plans for all research projects supported by any mechanism Grants, contracts, intramural projects, etc. Peer review of data management & sharing plans Investigators provide information about data for catalog Investigators use data repositories and standards Increase availability of clinical data for research Sept 11 & 12 Workshop on enabling research use of clinical data Co-chaired by R. Califf & D. Masys Policy changes, as well as tools, methods & infrastructure Report is being drafted
Facilitate Broad Use of Biomedical Data Establish frameworks to routinely support community-based standards efforts Standards are used when community wants & supports them Frameworks of policies, funding mechanisms, administrative processes, etc., exist at NIH to support various research types Research Project Grants SBIR Grants Such frameworks are necessary for routine support of such projects No such frameworks exist for supporting community-based standards efforts They will be established under BD2K Sept 25 & 26 Workshop on community-based standards efforts Chaired by S. Sansone & D. Kennedy Report is being drafted
Facilitate Broad Use of Biomedical Data Catalog of information about data to bring data into the ecosystem of research and scholarship Information such as: Authors of the data, description of the data, whether, where, when and how data are available Information about data catalog / a registry of info about data sets Index that information for efficient search & retrieval When registered & indexed, info about particular data sets can be: Discoverable Expands use of data Citable Provides incentives for data-related activities Linked to data, tools, & literature Brings data into ecosystem Linked to NIH administrative systems - Informs NIH processes Available to public Allows analysis and use by third parties June 6 RFI on data catalog Aug 21 & 22 Workshop on data catalog Chaired by Francine Berman Report is being drafted Jan Feb FOA planned
Today BD2K? Tomorrow
Biomedical Research Enterprise of Tomorrow Recommendations in accord with OSTP interests Increased data sharing will make data available Frameworks will be established to encourage & support community based standards efforts Community-based standards will make data useable Data catalog will make data discoverable, citable & linked to data, tools & literature Data will be made part of the research ecosystem Advancing science & technology via centers, research project grants, & large scale computing Data science & tools will enable scientific innovation
Today BD2K! Tomorrow