Metadata Ingestion and Processinng
|
|
- Loren Adams
- 5 years ago
- Views:
Transcription
1 biomedical and healthcare Data Discovery Index Ecosystem Ingestion and Processinng Jeffrey S. Grethe, Ph.D BioCADDIE All Hands Meeting
2 prototype Ingestion Indexing Repositories Ingestion ElasticSearch s Online datasets User Interface Searching Funding Agencies Publishers Data producers Terminology server Supported by the NIH grant #1-U24-AI to the University of California, San Diego
3 Data Indexing Pipeline 1. Configuration file developed by curator 2. of metadata/data from data resource or dataset via ingestion module w Cache information for further processing 3. Mapping of metadata/data to metadata model(s) 4. Process metadata/data via a set of processing modules 5. to target endpoint(s)via export modules 6. Search via ElasticSearch APIs Supported by the NIH grant #1-U24-AI to the University of California, San Diego
4 Ingestion and Indexing Transport Scalable metadata document store Cloud based NoSQL Datastore Current implementation utilizes mongodb Multiple transport mechanisms developed E.g. Rsync, ASPERA, FTP, REST web services, Open Archives Initiative Protocol for Harvesting (OAI-PMH) Source configuration based on format All incoming metadata converted to JSON
5 Ingestion and Indexing Ingestor Type Sample biocaddie Sources Number of sources using this ingestor Web Service Ingestor Clinical Trials, Uniprot 42 Database Ingestor NeuroMorpho,PeptideAtlas,Cl inical Trials Network OAI-PMH Ingestor Dryad,CVRG 2 Two-stage Web Service Ingestor * ICPSR,Dataverse Native 2 Rsync ingestor PDB, dbgap 2 FTP Ingestor BioProject,Biological Magnetic Resonance Data Bank (BMRB) Aspera Ingestor GEO Datasets 1 CSV Ingestor Gemma XML Ingestor ArrayExpress 1
6 Ingestion and Indexing
7 Ingestion and Indexing Based on DATS metadata model developed by WG3 transform column = 'primary')].'pdbx:pdbx_database_id_pubmed'.'_$'" to primarypublication.id" apply {{ result = 'pmid:' + value }};
8 Data
9 Curation Process programmer creates Source descriptor JSON file is input of Ingestor generates generates Transform web servicevalidates uses finalized by Sample Data Script Ingestion pipeline provides transformation template and sample data for curators
10 Ingestion and Indexing Collaboration across biocaddie Development of the underlying index infrastructure driven by biocaddie community Refinement and reconciliation of DATS biocaddie WG2 Data Identifiers Recommendation biocaddie WG3 Descriptive DATS Model biocaddie WG9 End User Evaluation Criteria biocaddie WG5 Dataset Citation Metrics biocaddie WG7 Accessibility biocaddie WG10 Repository Collaboration
11 v Number of Sources: 58 Statistics v Total Number of Records: 77,085,123 v v w Approximately 3X size of PubMed Total Size of : GB Average Record Size w w w w Average: 0.89 KB Median: 2.89 KB Min: 0.57 KB Max: 3.4 MB
12 Ingestion and Indexing of Pipeline allows for addition of enhancement modules 1. NLP Entity Recognition module developed by biocaddie CDT (UT Health) 2. Primary Publication Information 3. Data Citation module (Pilot Project 3.2) 4. Addition of dataset analytics (e.g. Diploid)
13 Ingestion and Indexing Final Pipeline retains both originally extracted metadata and transformed metadata Provenance for metadata generation is available Scalable document processing pipeline processing can be scaled horizontally Large scale tests with full enhancement pipeline and latest DATS model ~20,000 documents per hour per light-weight node
14 Source Processing Total Ingestion Time: 23 Hours, 35 Minutes 77+ Million Records Did not include data download time Trembl Ingestion rate: 224 documents per second
15 Horizontal Scaling Process timings with different number of consumers Consumer containers running on Amazon t2.medium (4GB RAM with 2 virtual CPUs) EC2 nodes.
16 Ingestion and Indexing Scalable index services Pipelines primary export is to Elasticsearch Scalable cloud based indexing/search platform
17 Thank you
Core Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationAgenda. Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities
Agenda Clarification of issues Quarter definition Steering and Executive Committee composition Dissemination and community outreach activities Progress and updates Y1Q3 and plans for Y1Q4 Plan for the
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationThe Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK
The Final Updates Supported by the NIH grant 1U24 AI117966-01 to UCSD PI, Co-Investigators at: Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationHarmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies
Harmonizing biocaddie Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies Harold R. Solbrig 1, Guoqian Jiang 1 1 Mayo Clinic College of Medicine, Rochester, MN [solbrig.harold,
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationMinutes. Date: Location: UCSD BRF2 5A03. Attendees Present
Executive Committee Meeting Location: UCSD BRF2 5A03 Date: 8-16-16 Start time: 10:00 am PDT End time: 11:30 am PDT Meeting Objective Attendees Present Minute Taker Executive Committee Meeting UCSD: Lucila
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please isit: https://www.readytalk.com/account-administration/international-numbers
More informationMetadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform
Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform biocaddie All Hands Meeting September 11 th, 2016 Ram Gouripeddi & Julio Facelli Department
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationSusanna-Assunta Sansone, PhD. Metadata WG3 chair.
Susanna-Assunta Sansone, PhD Metadata WG3 chair 3-workgroup@biocaddie.org WG3 Metadata v v Full description: goals, synergies, phases, members & files Joint effort with BD2K Center for Expanded Data Annotation
More informationEUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT
EUDAT A European Collaborative Data Infrastructure Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT OpenAire Interoperability Workshop Braga, Feb. 8, 2013 EUDAT Key facts
More informationCore Technology Development Team Meeting
Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationSteering Committee Meeting
Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationRealtime visitor analysis with Couchbase and Elasticsearch
Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo
More informationDataverse: Modular Storage and Migration to the Cloud
Dataverse: Modular Storage and Migration to the Cloud Gustavo Durand, Dataverse Technical Lead / Architect Leonid Andreev, Dataverse Senior Developer Dataverse Overview An open-source platform to publish,
More informationResearch repository models: Can one size fit all?
Research repository models: Can one size fit all? Ann Morgan Mark Baldock University of South Australia Library The Library s role Repositories developed by the Library Future of research repositories
More informationeveloping DataMed the current status
eeloping DataMed the current status Hua Xu Core Deelopment Team (CDT) biocaddie AHM 2017 8/8/17 Supported by the NIH grant 1U24 AI117966-01 to the Uniersity of California, San Diego 1 Outline CDT Roles
More informationData Exchange and Conversion Utilities and Tools (DExT)
Data Exchange and Conversion Utilities and Tools (DExT) Louise Corti, Angad Bhat, Herve L Hours UK Data Archive CAQDAS Conference, April 2007 An exchange format for qualitative data Data exchange models
More informationWeb of Science. Platform Release Nina Chang Product Release Date: March 25, 2018 EXTERNAL RELEASE DOCUMENTATION
Web of Science EXTERNAL RELEASE DOCUMENTATION Platform Release 5.28 Nina Chang Product Release Date: March 25, 2018 Document Version: 1.0 Date of issue: March 22, 2018 RELEASE OVERVIEW The following features
More informationEnd-to-End Online Performance Data Capture and Analysis of Scientific Workflows
End-to-End Online Performance Data Capture and Analysis of Scientific Workflows G. Papadimitriou, C. Wang, K. Vahi, R. Ferreira da Silva, A. Mandal, Z. Liu, R. Mayani, M. Rynge, M. Kiran, V. Lynch, R.
More informationDigital Curation and Preservation: Defining the Research Agenda for the Next Decade
Storage Resource Broker Digital Curation and Preservation: Defining the Research Agenda for the Next Decade Reagan W. Moore moore@sdsc.edu http://www.sdsc.edu/srb Background NARA research prototype persistent
More informationMetadata Models for Experimental Science Data Management
Metadata Models for Experimental Science Data Management Brian Matthews Facilities Programme Manager Scientific Computing Department, STFC Co-Chair RDA Photon and Neutron Science Interest Group Task lead,
More informationLong-term digital preservation of UNSWorks
Long-term digital preservation of UNSWorks UNSW Library Arif Shaon, Maude Frances CAUL Community Days 2014 UNSW Australia The University of New South Wales at a Glance: https://www.unsw.edu.au/sites/default/files/documents/unsw4009_miniguide_2012_aw2_v2.pdf
More informationAGGREGATIVE DATA INFRASTRUCTURES FOR THE CULTURAL HERITAGE
AGGREGATIVE DATA INFRASTRUCTURES FOR THE CULTURAL HERITAGE Max Planck Institute, 13 th of March, 2013 Paolo Manghi paolo.manghi@isti.cnr.it Istituto di Scienza e Tecnologie dell Informazione, Consiglio
More informationDeveloping a Research Data Policy
Developing a Research Data Policy Core Elements of the Content of a Research Data Management Policy This document may be useful for defining research data, explaining what RDM is, illustrating workflows,
More informationRegistry Interchange Format: Collections and Services (RIF-CS) explained
ANDS Guide Registry Interchange Format: Collections and Services (RIF-CS) explained Level: Awareness Last updated: 10 January 2017 Web link: www.ands.org.au/guides/rif-cs-explained The RIF-CS schema is
More informationCDISC Standards End-to-End: Enabling QbD in Data Management Sam Hume
CDISC Standards End-to-End: Enabling QbD in Data Management Sam Hume 1 Shared Health and Research Electronic Library (SHARE) A global electronic repository for developing, integrating
More informationCyberGIS Big Data. Dawn Wright Esri Chief Scientist. GIScience 2012, September 20, 2012, Columbus, OH
CyberGIS Big Data Dawn Wright Esri Chief Scientist GIScience 2012, September 20, 2012, Columbus, OH What are key characteristics of big data and cybergis? What new fundamental problems does big data pose
More informationOpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond
Alessia Bardi and Paolo Manghi, Institute of Information Science and Technologies CNR Katerina Iatropoulou, ATHENA, Iryna Kuchma and Gwen Franck, EIFL Pedro Príncipe, University of Minho OpenAIRE Fostering
More informationExperiences with Serverless Big Data
Experiences with Serverless Big Data AWS Meetup Munich 2016 Markus Schmidberger, Head of Data Service Munich, 17.10.16 Key Components of our Data Service Real-Time Monitoring Enable our development teams
More informationIndustrial system integration experts with combined 100+ years of experience in software development, integration and large project execution
PRESENTATION Who we are Industrial system integration experts with combined 100+ years of experience in software development, integration and large project execution Background of Matrikon & Honeywell
More informationPerspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe
Perspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe Stephane Berghmans, DVM PhD 31 January 2018 9 When talking about data, we talk about All forms of research
More informationFlexible Design for Simple Digital Library Tools and Services
Flexible Design for Simple Digital Library Tools and Services Lighton Phiri Hussein Suleman Digital Libraries Laboratory Department of Computer Science University of Cape Town October 8, 2013 SARU archaeological
More informationReproducible Workflows Biomedical Research. P Berlin, Germany
Reproducible Workflows Biomedical Research P11 2018 Berlin, Germany Contributors Leslie McIntosh Research Data Alliance, U.S., Executive Director Oya Beyan Aachen University, Germany Anthony Juehne RDA,
More informationBig Data infrastructure and tools in libraries
Line Pouchard, PhD Purdue University Libraries Research Data Group Big Data infrastructure and tools in libraries 08/10/2016 DATA IN LIBRARIES: THE BIG PICTURE IFLA/ UNIVERSITY OF CHICAGO BIG DATA: A VERY
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationIng. José A. Mejía Villar M.Sc. Computing Center of the Alfred Wegener Institute for Polar and Marine Research
Ing. José A. Mejía Villar M.Sc. jmejia@awi.de Computing Center of the Alfred Wegener Institute for Polar and Marine Research 29. November 2011 Contents 1. Fedora Commons Repository 2. Federico 3. Federico's
More informationWorking with Islandora
Working with Islandora Erin Tripp, discoverygarden erin@discoverygarden.ca @eeohalloran April 21, 2015 Jasna, Slovakia Presentation Agenda Introductions Islandora Software Islandora Community Islandora
More informationDelphi XE. Delphi XE Datasheet
Delphi XE Datasheet DATASHEET Delphi XE Embarcadero Delphi XE is the fastest way to deliver ultrarich, ultra-fast Windows applications. Used by millions of developers, Delphi combines a leading-edge object-oriented
More informationMuseKnowledge Hybrid Search
MuseKnowledge Hybrid Search MuseGlobal, Inc. One Embarcadero Suite 500 San Francisco, CA 94111 415 896-6873 www.museglobal.com MuseGlobal S.A Calea Bucuresti Bl. 27B, Sc. 1, Ap. 10 Craiova, România 40
More informationData publication and discovery with Globus
Data publication and discovery with Globus Questions and comments to outreach@globus.org The Globus data publication and discovery services make it easy for institutions and projects to establish collections,
More informationMitigating Risk of Data Loss in Preservation Environments
Storage Resource Broker Mitigating Risk of Data Loss in Preservation Environments Reagan W. Moore San Diego Supercomputer Center Joseph JaJa University of Maryland Robert Chadduck National Archives and
More informationA Vision for Bigger Biomedical Data: Integration of REDCap with Other Data Sources
A Vision for Bigger Biomedical Data: Integration of REDCap with Other Data Sources Ram Gouripeddi Assistant Professor, Department of Biomedical Informatics, University of Utah Senior Biomedical Informatics
More informationTEXT MINING: THE NEXT DATA FRONTIER
TEXT MINING: THE NEXT DATA FRONTIER An Infrastructural Approach Dr. Petr Knoth CORE (core.ac.uk) Knowledge Media institute, The Open University United Kingdom 2 OpenMinTeD Establish an open and sustainable
More informationDemos: DMP Assistant and Dataverse
Demos: DMP Assistant and Dataverse Alexandra Cooper, Data Services Coordinator, Queen s University Meghan Goodchild, RDM Systems Librarian, Queen s University/Scholars Portal Overview of session Research
More informationMaterials Data Curation System
Materials Data Curation System Alden Dima, Guillaume Sousa Amaral, Phillippe Dessauw, Marcus Newrock, Pierre-François Rigodiat, Xavier Schmitt, Sharief Youssef Information Systems Group - Mary Brady, Group
More informationConducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository
Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository Robert R. Downs and Robert S. Chen Center for International Earth Science Information
More informationAn Experimentation Workbench for Replayable Networking Research
An Experimentation Workbench for Replayable Networking Research Eric Eide,, Leigh Stoller, and Jay Lepreau University of Utah, School of Computing NSDI 2007 / April 12, 2007 Repeated Research A scientific
More informationJanuary 16, Re: Request for Comment: Data Access and Data Sharing Policy. Dear Dr. Selby:
Dr. Joe V. Selby, MD, MPH Executive Director Patient-Centered Outcomes Research Institute 1828 L Street, NW, Suite 900 Washington, DC 20036 Submitted electronically at: http://www.pcori.org/webform/data-access-and-data-sharing-policypublic-comment
More informationHow to get data into EFG and Metadata Quality
How to get data into EFG and Metadata Quality EFG Information Meeting 3 July 2015 Cinema Ritrovato Festival, Bologna Co-funded by the Community Programme ICT- PSP EFG Information space Metadat a Metadat
More informationIBM Advantage: IBM Watson Compare and Comply Element Classification
IBM Advantage: IBM Watson Compare and Comply Element Classification Executive overview... 1 Introducing Watson Compare and Comply... 2 Definitions... 3 Element Classification insights... 4 Sample use cases...
More informationOPENAIRE FP7 POST-GRANT OPEN ACCESS PILOT
OPENAIRE FP7 POST-GRANT OPEN ACCESS PILOT Alternative Funding Bid No 10. Hungarian Educational Research Journal (HERJ) Presenter: Laura Morvai University of Debrecen University and National Library Managing
More informationPowering Knowledge Discovery. Insights from big data with Linguamatics I2E
Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural
More informationAn Experimentation Workbench for Replayable Networking Research
An Experimentation Workbench for Replayable Networking Research Eric Eide, Leigh Stoller, and Jay Lepreau Repeated Research A scientific community advances when its experiments are repeated University
More informationHawaii Energy and Environmental Technologies (HEET) Initiative
Hawaii Energy and Environmental Technologies (HEET) Initiative Office of Naval Research Grant Award Number N0014-11-1-0391 Task 8. ENERGY-NEUTRAL ENERGY TEST PLATFORMS 8.3 Advanced Database Research, Development
More informationLinking data and publications the past, present, and future. Dr. Hylke Koers, Head of Content Innovation, Elsevier
Linking data and publications the past, present, and future Dr. Hylke Koers, Head of Content Innovation, Elsevier BioCADDIE webinar January 8, 2015 Ease of access Open Access 2 The issue: data is important,
More informationThe NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets
The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets Jeffrey Brown, Lesley Curtis, and Rich Platt June 13, 2014 Previously The NIH Collaboratory:
More informationEsri and MarkLogic: Location Analytics, Multi-Model Data
Esri and MarkLogic: Location Analytics, Multi-Model Data Ben Conklin, Industry Manager, Defense, Intel and National Security, Esri Anthony Roach, Product Manager, MarkLogic James Kerr, Technical Director,
More informationApplication of machine learning and big data technologies in OpenAIRE system
Application of machine learning and big data technologies in OpenAIRE system Warsztaty Orange z cyklu Centrum Badawczo Rozwojowe zaprasza Mateusz Kobos, ICM, Univeristy of Warsaw Warszawa, 2017-05-10 OpenAIRE
More informationBuilding knowledge graphs in DIG. Pedro Szekely and Craig Knoblock University of Southern California Information Sciences Institute dig.isi.
Building knowledge graphs in DIG Pedro Szekely and Craig Knoblock University of Southern California Information Sciences Institute dig.isi.edu Goal raw messy disconnected clean organized linked hard to
More informationPonds, Lakes, Ocean: Pooling Digitized Resources and DPLA. Emily Jaycox, Missouri Historical Society SLRLN Tech Expo 2018
Ponds, Lakes, Ocean: Pooling Digitized Resources and DPLA Emily Jaycox, Missouri Historical Society SLRLN Tech Expo 2018 Reflections on the digital world Librarian Good news So many libraries have digitized
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationMarkus Kaindl Senior Manager Semantic Data Business Owner SN SciGraph
Analytics Building business tools for the scholarly publishing domain using LOD and the ELK stack SEMANTiCS Vienna 2018 Markus Kaindl Senior Manager Semantic Data Business Owner SN SciGraph 1 Agenda (25
More informationdan.fay@microsoft.com Scientific Data Intensive Computing Workshop 2004 Visualizing and Experiencing E 3 Data + Information: Provide a unique experience to reduce time to insight and knowledge through
More informationShowing it all a new interface for finding all Norwegian research output
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 00 (2014) 000 000 www.elsevier.com/locate/procedia CRIS 2014 Showing it all a new interface for finding all Norwegian research
More informationHow to contribute information to AGRIS
How to contribute information to AGRIS Guidelines on how to complete your registration form The dashboard includes information about you, your institution and your collection. You are welcome to provide
More informationdan.fay@microsoft.com http://research.microsoft.com A Tidal Wave of Scientific Data Experimental Science Theoretical Science Newton s Laws, Maxwell s Equations Computational Science Simulation of complex
More informationPerformance Evaluation of NoSQL Databases
Performance Evaluation of NoSQL Databases A Case Study - John Klein, Ian Gorton, Neil Ernst, Patrick Donohoe, Kim Pham, Chrisjan Matser February 2015 PABS '15: Proceedings of the 1st Workshop on Performance
More informationTechnical Sheet NITRODB Time-Series Database
Technical Sheet NITRODB Time-Series Database 10X Performance, 1/10th the Cost INTRODUCTION "#$#!%&''$!! NITRODB is an Apache Spark Based Time Series Database built to store and analyze 100s of terabytes
More information2013 AWS Worldwide Public Sector Summit Washington, D.C.
2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic
More informationReproducibility and Reuse of Scientific Code Evolving the Role and Capabilities of Publishers
Reproducibility and Reuse of Scientific Code Evolving the Role and Capabilities of Publishers Reproducibility and Reuse of Scientific Code Evolving the Role and Capabilities of Publishers Michael Forster
More informationTamr Technical Whitepaper
Tamr Technical Whitepaper 1. Executive Summary Tamr was founded to tackle large-scale data management challenges in organizations where extreme data volume and variety require an approach different from
More informationThe Ohio State University's Knowledge Bank: An Institutional Repository in Practice
The Ohio State University's Knowledge Bank: Maureen P. Walsh, The Ohio State University Libraries The Ohio State University s Institutional Repository Mission The mission of the institutional repository
More informationDataverse and DataTags
NFAIS Open Data Fostering Open Science June 20, 2016 Dataverse and DataTags Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitive Social Science Harvard University @mercecrosas
More informationDigiTool for Course Support at Notre Dame. Pascal Calarco, University of Notre Dame IGeLU 2007 Brno, Czech Republic September 3, 2007
DigiTool for Course Support at Notre Dame Pascal Calarco, University of Notre Dame IGeLU 2007 Brno, Czech Republic September 3, 2007 Some Brief History... Digitool 1.0 purchased in 2001 Small pilot with
More informationResearch Data Repository Interoperability Primer
Research Data Repository Interoperability Primer The Research Data Repository Interoperability Working Group will establish standards for interoperability between different research data repository platforms
More informationDynamic, Rule-based Quality Control Framework for Real-time Sensor Data
Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data Wade Sheldon Georgia Coastal Ecosystems LTER University of Georgia Introduction Quality Control of high volume, real-time data from
More informationDIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM
OMB No. 3137 0071, Exp. Date: 09/30/2015 DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM Introduction: IMLS is committed to expanding public access to IMLS-funded research, data and other digital products:
More information