Integrating large, fast-moving, and heterogeneous data sets in biology.

Size: px
Start display at page:

Download "Integrating large, fast-moving, and heterogeneous data sets in biology."

Transcription

1 Integrating large, fast-moving, and heterogeneous data sets in biology. C. Titus Brown Asst Prof, CSE and Microbiology; BEACON NSF STC Michigan State University

2 Introduction Background: Modeling dl & data analysis undergrad d => Open source software development + software engineering + developmental biology + genomics PhD => Bio + computer science faculty => Data driven biology Currently working with next-gen sequencing data (mrnaseq, metagenomics, difficult genomes). Thinking hard about how to do data-driven modeling & model-driven data analysis.

3 Goal & outline Address challenges and opportunities of heterogeneous data integration: 1000 ft view. Outline: What types of analysis and discovery do we want to enable? What are the technical challenges, common solutions, and common failure points? Where might we look for success stories, and what lessons can we port to biology? My conclusions.

4 Specific types of questions I have a known chemical/gene interaction; do I see it in this other data set? I have a known chemical/gene interaction; what other gene expression is affected? What does chemical X do to overall phenotype, effect on gene expression, altered protein localization, and patterns of histone modification? More complex/combinatorial interactions: What does this chemical do in this genetic background? What kind of additional gene expression changes are generated by the combination of these two chemicals? What are common effects of this class of chemicals?

5 What general behavior do we want to enable? Reuse of data by groups that did not/could not produce it. Publication of reusable/ fork able data analysis pipelines pp and models. Integration of data and models. Serendipitous uses and cross-referencing of data sets ( mashups ). Rapid scientific exploration and hypothesis generation in data space.

6 (Executable papers & data reuse) ENCODE All data is available; all processing scripts for papers are available on a virtual machine. QIIME (microbial ecology) Amazon virtual machine containing software and data for: Collaborative cloud-enabled d tools allow rapid, reproducible biological insights. (pmid ) Digital normalization paper Amazon virtual machine, again:

7 Executable papers can support easy replication & reuse of code, data. (IPython Notebook; also see RStudio)

8 What general behavior do we want to enable? Reuse of data by groups that did not/could not produce it. Publication of reusable/ fork able data analysis pipelines and models. Integration of data and models. Serendipitous uses and cross-referencing of data sets ( mashups ). Rapid scientific exploration and hypothesis generation in data space.

9 An entertaining digression -- A mashup of Facebook top 10 books by college and per-college SAT rankings

10 Technical obstacles Syntactic incompatibility The first 90% of bioinformatics: your IDs are different from my IDs. Semantic incompatibility The second 90% of bioinformatics: what does gene mean in your database? Impedance mismatch SQL is notoriously bad at representing intervals and hierarchies Genomes consist of intervals; ontologies consist of hierarchies! SQL databases dominate (vs graph or object DBs). Data volume & velocity Large & expanding data sets just make everything er harder. Unstructured data aka publications most scientific knowledge is locked up

11 Typical solutions Entity resolution Accession numbers or other common identifiers requires global naming system OR translators. Top down imposition of structure Centralized DB; Here is the schema you will all use ; limits flexibility, prevents use of unstructured data, heavyweight. Ontologies to enable correct communication Centrally coordinated vocabulary slow, hard to get right, doesn t solve unstructured data problem. Balancing theoretical rigor with practical applicability is particularly hard. Ad hoc entity resolution ( winging it ) Common solution doesn t work that well.

12 Are better standards the solution?

13 Rephrasing technical goals How can we best provide a platform or platforms to support flexible data dt integration it ti and data dt investigation across a wide range of data sets and data types in biology? My interests: Avoid master data manager and centralization Support federated roll-out of new data and functionality Provide flexible extensibility of ontologies and hierarchies Support diverse ecology of databases, interfaces, and analysis software.

14 Success stories outside of biology? Look for domains: with really large amounts of heterogenous data, that are continually increasing in size, are being effectively mined on an ongoing basis, Have widely used programmatic interfaces that support mashups and other cross-database stuff, and are intentional, with principles that we can steal or adapt.

15 Success stories outside of biology? Look for domains: with really large amounts of heterogenous data, that are continually increasing in size, are being effectively mined on an ongoing basis, Have widely used programmatic interfaces that support mashups and other cross-database stuff, and are intentional, with principles that we can steal or adapt. Amazon.

16 Amazon: > 50 million users, > 1 million product partners, billions of reviews, dozens of compute services Continually changing/updating data sets. Explicitly l adopted d a service-oriented architecture that enables both internal and external use of this data. For example, the amazon.com Web site is itself built from over 150 independent services Amazon routinely deploys new services and functionality.

17 Sources: The Platform Rant (Steve Yegge) -- in which he compares the Google and Amazon approaches: eouesvavx A summary at HighScalability.com: com: (They are both long and tech-y, note, but the first is especially entertaining.)

18 A brief summary of core principles Mandates from the CEO: 1. All teams must expose data and functionality solely through h a service interface. 2. All communication between teams happens through that service interface. 3. All service interfaces must be designed so that they can be exposed to the outside world.

19 More colloquially: You should eat your own dogfood. Design and implement the database and database functionality to meet your own needs; and only use the functionality yyou ve explicitly made available to everyone. To adapt to research: database functionality should be designed in tightly integration with researchers who are using it, both at a user interface level and programmatically. (Genome databases have done a really good job of this, albeit generally in a centralized model.)

20 If the customers aren t integrated into the development loop:

21 A platform view? Metabolic model Diff'n gene expression query Data exploration WWW Gene ID translator Chemical relationships Expression normalization Isoform resolution/ comparison Expression data (tiling) Expression data (microarray) Expression data (mrnaseq) Expression data II (mrnaseq)

22 A few points Open source and agile software development approaches can be surprisingly effective and inexpensive. Developing services in small groups that include customerfacing developers helps ensure utility. Implementing services in the cloud (e.g. virtual machines, or on top of infrastructure as a service services) )gives developer flexibility in tools, approaches, implementation; also enables scaling and reusability.

23 Combining modelling with data Data-driven modeling: connections and parameters can be, to some extent, determined d from data. Model-driven driven data investigation: data that doesn t fit the known model is particularly interesting. The second approach is essentially how particle physicists work with accelerator data: build a model & then interpret the data using the model. (In biology, models are less constraining, though; more unknowns.)

24

25 Using developmental models Davidson et al.,

26 Using developmental models Models can contain useful abstractions of specific processes; here, the direct effects of blocking nuclearization of B-catenin can be predicted by following the connections. Models provide a common language for (dis)agreement in a community.

27 Using developmental models Davidson et al.,

28 Social obstacles Training of biologically aware software developers is lacking. Molecular biologists are still very much of a computationally naïve mindset: give me the answer so I can do the real work Incentives for data sharing, much less useful data sharing are not yet very strong. Pubs, grants, respect... Patterns for useful data sharing are still not well understood, in general.

29 Other places to look NEON and other NSF centers (e.g. NCEAS) are collecting vast heterogenous data sets, and are explicitly tackling the data management/use/integration/reuse problem. SBML ( Systems Biology Markup Language ) is a modeling descriptive language g that enables interoperability of modeling software. Software Carpentry runs free workshops on effective use of computation for science.

30 My conclusions We need a platform mentality to make the most use of our data, even if we don t completely embrace loose coupling and distribution. Agile and end-user focused software development methodologies have worked well in other areas; much of the hard technical space has already been explored in Internet companies (and probably social networking companies, too). Data is most useful in the context of an explicit model; models can Data is most useful in the context of an explicit model; models can be generated from data, and models can feed back into data gathering.

31 Things I didn t discuss Database maintenance and active curation is incredibly important. Most data only makes sense in the context of other data (think: controls; wild type vs knockout; other backgrounds; etc.) so we will need lots more data to interpret the data we already have. Deep learning is a promising field for extracting correlations from multiple large data sets. All of these technical problems are easier to solve than the social problems (incentives; training).

32 Thanks -- This talk and ancillary notes will be available on my blog ~soon: /bl / Pl d t t t tb@ d if h ti Please do contact me at ctb@msu.edu if you have questions or comments.

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR Table of Contents Foreword... 2 New Era of Rapid Data Warehousing... 3 Eliminating Slow Reporting and Analytics Pains... 3 Applying 20 Years

More information

Science-as-a-Service

Science-as-a-Service Science-as-a-Service The iplant Foundation Rion Dooley Edwin Skidmore Dan Stanzione Steve Terry Matthew Vaughn Outline Why, why, why! When duct tape isn t enough Building an API for the web Core services

More information

SELF-SERVICE SEMANTIC DATA FEDERATION

SELF-SERVICE SEMANTIC DATA FEDERATION SELF-SERVICE SEMANTIC DATA FEDERATION WE LL MAKE YOU A DATA SCIENTIST Contact: IPSNP Computing Inc. Chris Baker, CEO Chris.Baker@ipsnp.com (506) 721 8241 BIG VISION: SELF-SERVICE DATA FEDERATION Biomedical

More information

The Emerging Data Lake IT Strategy

The Emerging Data Lake IT Strategy The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin,

More information

ANNUAL REPORT Visit us at project.eu Supported by. Mission

ANNUAL REPORT Visit us at   project.eu Supported by. Mission Mission ANNUAL REPORT 2011 The Web has proved to be an unprecedented success for facilitating the publication, use and exchange of information, at planetary scale, on virtually every topic, and representing

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

Unstructured Text in Big Data The Elephant in the Room

Unstructured Text in Big Data The Elephant in the Room Unstructured Text in Big Data The Elephant in the Room David Milward ICIC, October 2013 Click Unstructured to to edit edit Master Master Big title Data style title style Big Data Volume, Variety, Velocity

More information

Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit Martin Scharm, Dagmar Waltemath Department of Systems Biology and Bioinformatics University of Rostock

More information

Advances in Data Integration & Representation in Systems Biology

Advances in Data Integration & Representation in Systems Biology Advances in Data Integration & Representation in Systems Biology Susie Stephens Principal Product Manager, Life Sciences Oracle susie.stephens@oracle.com Outline Systems Biology Data Requirements Semantic

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

Extending SOA Infrastructure for Semantic Interoperability

Extending SOA Infrastructure for Semantic Interoperability Extending SOA Infrastructure for Semantic Interoperability Wen Zhu wzhu@alionscience.com ITEA System of Systems Conference 26 Jan 2006 www.alionscience.com/semantic Agenda Background Semantic Mediation

More information

Dataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom

Dataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Dataspaces: A New Abstraction for Data Management Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Today s Agenda Why databases are great. What problems people really have Why databases are not

More information

Big Data - Some Words BIG DATA 8/31/2017. Introduction

Big Data - Some Words BIG DATA 8/31/2017. Introduction BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1 What is Big Data Big Data means

More information

Intermediate/Advanced Python. Michael Weinstein (Day 1)

Intermediate/Advanced Python. Michael Weinstein (Day 1) Intermediate/Advanced Python Michael Weinstein (Day 1) Who am I? Most of my experience is on the molecular and animal modeling side I also design computer programs for analyzing biological data, particularly

More information

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

What is Text Mining? Sophia Ananiadou National Centre for Text Mining   University of Manchester National Centre for Text Mining www.nactem.ac.uk University of Manchester Outline Aims of text mining Text Mining steps Text Mining uses Applications 2 Aims Extract and discover knowledge hidden in text

More information

BUILDING MICROSERVICES ON AZURE. ~ Vaibhav

BUILDING MICROSERVICES ON AZURE. ~ Vaibhav BUILDING MICROSERVICES ON AZURE ~ Vaibhav Gujral @vabgujral About Me Over 11 years of experience Working with Assurant Inc. Microsoft Certified Azure Architect MCSD, MCP, Microsoft Specialist Aspiring

More information

Ontologies and Database Schema: What s the Difference? Michael Uschold, PhD Semantic Arts.

Ontologies and Database Schema: What s the Difference? Michael Uschold, PhD Semantic Arts. Ontologies and Database Schema: What s the Difference? Michael Uschold, PhD Semantic Arts. Objective To settle once and for all the question: What is the difference between an ontology and a database schema?

More information

When, Where & Why to Use NoSQL?

When, Where & Why to Use NoSQL? When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Introduction to Semantic Web

Introduction to Semantic Web ه عا ی Semantic Web Introduction to Semantic Web Morteza Amini Sharif University of Technology Fall 95-96 Outline Thinking and Intelligent Applications The World Wide Web History The Problem with the Web

More information

Strategic Briefing Paper Big Data

Strategic Briefing Paper Big Data Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which

More information

SOA: Service-Oriented Architecture

SOA: Service-Oriented Architecture SOA: Service-Oriented Architecture Dr. Kanda Runapongsa (krunapon@kku.ac.th) Department of Computer Engineering Khon Kaen University 1 Gartner Prediction The industry analyst firm Gartner recently reported

More information

Wither OWL in a knowledgegraphed, Linked-Data World?

Wither OWL in a knowledgegraphed, Linked-Data World? Wither OWL in a knowledgegraphed, Linked-Data World? Jim Hendler @jahendler Tetherless World Professor of Computer, Web and Cognitive Science Director, Rensselaer Institute for Data Exploration and Applications

More information

Interoperability ~ An Introduction

Interoperability ~ An Introduction Interoperability ~ An Introduction Cyndy Chandler Biological and Chemical Oceanography Data Management Office (BCO-DMO) Woods Hole Oceanographic Institution 26 July 2008 MMI OOS Interoperability Planning

More information

Empowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia

Empowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Empowering People with Knowledge the Next Frontier for Web Search Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Important Trends for Web Search Organizing all information Addressing user

More information

The 7 Habits of Highly Effective API and Service Management

The 7 Habits of Highly Effective API and Service Management 7 Habits of Highly Effective API and Service Management: Introduction The 7 Habits of Highly Effective API and Service Management... A New Enterprise challenge has emerged. With the number of APIs growing

More information

How to integrate data into Tableau

How to integrate data into Tableau 1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service

More information

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT MANAGING THE DIGITAL FIRM, 12 TH EDITION Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT VIDEO CASES Case 1: Maruti Suzuki Business Intelligence and Enterprise Databases

More information

Bioinformatics Data Distribution and Integration via Web Services and XML

Bioinformatics Data Distribution and Integration via Web Services and XML Letter Bioinformatics Data Distribution and Integration via Web Services and XML Xiao Li and Yizheng Zhang* College of Life Science, Sichuan University/Sichuan Key Laboratory of Molecular Biology and Biotechnology,

More information

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services Enabling Open Science: Data Discoverability, Access and Use Jo McEntyre Head of Literature Services www.ebi.ac.uk About EMBL-EBI Part of the European Molecular Biology Laboratory International, non-profit

More information

Data management is fun. Casey Dunn Assistant Professor Ecology and Evolutionary Biology

Data management is fun. Casey Dunn Assistant Professor Ecology and Evolutionary Biology Data management is fun Casey Dunn Assistant Professor Ecology and Evolutionary Biology What is science? The study of the natural world through observation and experiment. Reproducible study. Prove it isn

More information

Starting small to go Big: Building a Living Database

Starting small to go Big: Building a Living Database Starting small to go Big: Building a Living Database Michael Sabbatino 1,2, Baker, D.V. Vic 3,4, Rose, K. 1, Romeo, L. 1,2, Bauer, J. 1, and Barkhurst, A. 3,4 1 US Department of Energy, National Energy

More information

Title: Episode 11 - Walking through the Rapid Business Warehouse at TOMS Shoes (Duration: 18:10)

Title: Episode 11 - Walking through the Rapid Business Warehouse at TOMS Shoes (Duration: 18:10) SAP HANA EFFECT Title: Episode 11 - Walking through the Rapid Business Warehouse at (Duration: 18:10) Publish Date: April 6, 2015 Description: Rita Lefler walks us through how has revolutionized their

More information

Semantic Technologies for Nuclear Knowledge Modelling and Applications

Semantic Technologies for Nuclear Knowledge Modelling and Applications Semantic Technologies for Nuclear Knowledge Modelling and Applications D. Beraha 3 rd International Conference on Nuclear Knowledge Management 7.-11.11.2016, Vienna, Austria Why Semantics? Machines understanding

More information

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems 1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for

More information

REPORT MICROSOFT PATTERNS AND PRACTICES

REPORT MICROSOFT PATTERNS AND PRACTICES REPORT MICROSOFT PATTERNS AND PRACTICES Corporate Headquarters Nucleus Research Inc. 100 State Street Boston, MA 02109 Phone: +1 617.720.2000 Nucleus Research Inc. TOPICS Application Development & Integration

More information

Your Data Demands More NETAPP ENABLES YOU TO LEVERAGE YOUR DATA & COMPUTE FROM ANYWHERE

Your Data Demands More NETAPP ENABLES YOU TO LEVERAGE YOUR DATA & COMPUTE FROM ANYWHERE Your Data Demands More NETAPP ENABLES YOU TO LEVERAGE YOUR DATA & COMPUTE FROM ANYWHERE IN ITS EARLY DAYS, NetApp s (www.netapp.com) primary goal was to build a market for network-attached storage and

More information

CMIS An Industry Effort to Define a Service-Based Interoperability Standard for Content Management

CMIS An Industry Effort to Define a Service-Based Interoperability Standard for Content Management CMIS An Industry Effort to Define a Service-Based Interoperability Standard for Content Management Dr. David Choy Content Management & Archiving CTO Office Chair, OASIS CMIS Technical Committee Patricia

More information

Foundations of a Data Centric Organization A NDREW K A R CHER SQL SAT U R D AY #740 A P R IL 1 4,

Foundations of a Data Centric Organization A NDREW K A R CHER SQL SAT U R D AY #740 A P R IL 1 4, Foundations of a Data Centric Organization A NDREW K A R CHER SQL SAT U R D AY #740 A P R IL 1 4, 20 1 8 About Me http://www.andrewkarcher.com Twitter: @akarcher LinkedIn, Twitter Email: akarcher@gmail.com

More information

PlantSimLab An Innovative Web Application Tool for Plant Biologists

PlantSimLab An Innovative Web Application Tool for Plant Biologists PlantSimLab An Innovative Web Application Tool for Plant Biologists Feb. 17, 2014 Sook S. Ha, PhD Postdoctoral Associate Virginia Bioinformatics Institute (VBI) 1 Outline PlantSimLab Project A NSF proposal

More information

Data Analysis and Validation for ML

Data Analysis and Validation for ML Analysis and for ML Neoklis (Alkis) Polyzotis, Google Research Collaborators: Eric Breck, Sudip Roy, Steven Whang, Martin Zinkevich Outline ML in production is hard, and a big part of hardness is related

More information

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1 Outline Quick Introduction to Database Systems Why do we need a different kind of system? What is a database system? Separating the what the how: The relational data model Querying the databases: SQL May

More information

Meeting the OMB FY2012 Objective: Experiences, Observations, Lessons-Learned, and Other Thoughts

Meeting the OMB FY2012 Objective: Experiences, Observations, Lessons-Learned, and Other Thoughts Meeting the OMB FY2012 Objective: Experiences, Observations, Lessons-Learned, and Other Thoughts 2013 Federal Interagency Workshop 9 December, 2013 Ron Broersma DREN Chief Engineer ron@dren.mil Introduction

More information

Getting Started with Semantics in the Enterprise. November 10, 2010, AWOSS, Moncton, NB

Getting Started with Semantics in the Enterprise. November 10, 2010, AWOSS, Moncton, NB Getting Started with Semantics in the Enterprise Bradley Shoebottom November 10, 2010, AWOSS, Moncton, NB Introduction Should your enterprises first ontology look like this (and take 2 years to get there)?

More information

Prof. Dr. Christian Bizer

Prof. Dr. Christian Bizer STI Summit July 6 th, 2011, Riga, Latvia Global Data Integration and Global Data Mining Prof. Dr. Christian Bizer Freie Universität ität Berlin Germany Outline 1. Topology of the Web of Data What data

More information

TWOO.COM CASE STUDY CUSTOMER SUCCESS STORY

TWOO.COM CASE STUDY CUSTOMER SUCCESS STORY TWOO.COM CUSTOMER SUCCESS STORY With over 30 million users, Twoo.com is Europe s leading social discovery site. Twoo runs the world s largest scale-out SQL deployment, with 4.4 billion transactions a day

More information

XML in the bipharmaceutical

XML in the bipharmaceutical XML in the bipharmaceutical sector XML holds out the opportunity to integrate data across both the enterprise and the network of biopharmaceutical alliances - with little technological dislocation and

More information

Towards Practical Differential Privacy for SQL Queries. Noah Johnson, Joseph P. Near, Dawn Song UC Berkeley

Towards Practical Differential Privacy for SQL Queries. Noah Johnson, Joseph P. Near, Dawn Song UC Berkeley Towards Practical Differential Privacy for SQL Queries Noah Johnson, Joseph P. Near, Dawn Song UC Berkeley Outline 1. Discovering real-world requirements 2. Elastic sensitivity & calculating sensitivity

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Where the Social Web Meets the Semantic Web. Tom Gruber RealTravel.com tomgruber.org

Where the Social Web Meets the Semantic Web. Tom Gruber RealTravel.com tomgruber.org Where the Social Web Meets the Semantic Web Tom Gruber RealTravel.com tomgruber.org Doug Engelbart, 1968 "The grand challenge is to boost the collective IQ of organizations and of society. " Tim Berners-Lee,

More information

Education Brochure. Education. Accelerate your path to business discovery. qlik.com

Education Brochure. Education. Accelerate your path to business discovery. qlik.com Education Education Brochure Accelerate your path to business discovery Qlik Education Services offers expertly designed coursework, tools, and programs to give your organization the knowledge and skills

More information

Big Data in Translational Science

Big Data in Translational Science Big Data in Translational Science Albert Wang Associate Director, Translational R&D IT Bristol-Myers Squibb 2015 AAPS Annual Meeting Agenda Perspectives on Big Data Big Data in Translational R&D Selected

More information

HOW THE RIGHT CMS MAKES CONTENT AVAILABLE WHEN AND WHERE CUSTOMERS NEED IT

HOW THE RIGHT CMS MAKES CONTENT AVAILABLE WHEN AND WHERE CUSTOMERS NEED IT HOW THE RIGHT CMS MAKES CONTENT AVAILABLE WHEN AND WHERE CUSTOMERS NEED IT We have never lived in a more oversaturated content environment than we do now. We have images and hashtags and blog posts demanding

More information

CSE 3241: Database Systems I Databases Introduction (Ch. 1-2) Jeremy Morris

CSE 3241: Database Systems I Databases Introduction (Ch. 1-2) Jeremy Morris CSE 3241: Database Systems I Databases Introduction (Ch. 1-2) Jeremy Morris 1 Outline What is a database? The database approach Advantages Disadvantages Database users Database concepts and System architecture

More information

Let me begin by introducing myself. I have been a Progress Application Partner since 1986 and for many years I was the architect and chief developer

Let me begin by introducing myself. I have been a Progress Application Partner since 1986 and for many years I was the architect and chief developer Let me begin by introducing myself. I have been a Progress Application Partner since 1986 and for many years I was the architect and chief developer for our ERP application. In recent years, I have refocused

More information

Building a Data Strategy for a Digital World

Building a Data Strategy for a Digital World Building a Data Strategy for a Digital World Jason Hunter, CTO, APAC Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies Data Hub 100 s of Service

More information

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel National Center for Supercomputing Applications University of Illinois

More information

Databases in the Cloud

Databases in the Cloud Databases in the Cloud Ani Thakar Alex Szalay Nolan Li Center for Astrophysical Sciences and Institute for Data Intensive Engineering and Science (IDIES) The Johns Hopkins University Cloudy with a chance

More information

Biocomputing II Coursework guidance

Biocomputing II Coursework guidance Biocomputing II Coursework guidance I refer to the database layer as DB, the middle (business logic) layer as BL and the front end graphical interface with CGI scripts as (FE). Standardized file headers

More information

National Centre for Text Mining NaCTeM. e-science and data mining workshop

National Centre for Text Mining NaCTeM. e-science and data mining workshop National Centre for Text Mining NaCTeM e-science and data mining workshop John Keane Co-Director, NaCTeM john.keane@manchester.ac.uk School of Informatics, University of Manchester What is text mining?

More information

Build Scientific Computing Infrastructure with Rebar3 and Docker. Eric Sage

Build Scientific Computing Infrastructure with Rebar3 and Docker. Eric Sage Build Scientific Computing Infrastructure with Rebar3 and Docker Eric Sage A scientific telecommunications network Hello, I d like an automated gene ontology please! Agenda - An example biological service

More information

Database Management Systems Chapter 1 Instructor: Oliver Schulte Database Management Systems 3ed, R. Ramakrishnan and J.

Database Management Systems Chapter 1 Instructor: Oliver Schulte Database Management Systems 3ed, R. Ramakrishnan and J. Database Management Systems Chapter 1 Instructor: Oliver Schulte oschulte@cs.sfu.ca Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 What is a database? A database (DB) is a very large,

More information

Introduction to Data Management for Ocean Science Research

Introduction to Data Management for Ocean Science Research Introduction to Data Management for Ocean Science Research Cyndy Chandler Biological and Chemical Oceanography Data Management Office 12 November 2009 Ocean Acidification Short Course Woods Hole, MA USA

More information

Planning & Managing Migrations

Planning & Managing Migrations Planning & Managing Migrations It s for the birds. Har har. Aimee Degnan / aimee@hook42.com Expectation Setting This is the first run of this presentation. It is being shaped for DrupalCon. Is text heavy

More information

Sensor Data Collection and Processing

Sensor Data Collection and Processing Sensor Data Collection and Processing Applying Web Scale To Sensor Data Today s speaker Josh Patterson josh@cloudera.com / twitter: @jpatanooga Master s Thesis: self-organizing mesh networks Published

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

BUILDING the VIRtUAL enterprise

BUILDING the VIRtUAL enterprise BUILDING the VIRTUAL ENTERPRISE A Red Hat WHITEPAPER www.redhat.com As an IT shop or business owner, your ability to meet the fluctuating needs of your business while balancing changing priorities, schedules,

More information

Semantic Web in a Constrained Environment

Semantic Web in a Constrained Environment Semantic Web in a Constrained Environment Laurens Rietveld and Stefan Schlobach Department of Computer Science, VU University Amsterdam, The Netherlands {laurens.rietveld,k.s.schlobach}@vu.nl Abstract.

More information

RiskSense Attack Surface Validation for IoT Systems

RiskSense Attack Surface Validation for IoT Systems RiskSense Attack Surface Validation for IoT Systems 2018 RiskSense, Inc. Surfacing Double Exposure Risks Changing Times and Assessment Focus Our view of security assessments has changed. There is diminishing

More information

Introduction. October 5, Petr Křemen Introduction October 5, / 31

Introduction. October 5, Petr Křemen Introduction October 5, / 31 Introduction Petr Křemen petr.kremen@fel.cvut.cz October 5, 2017 Petr Křemen (petr.kremen@fel.cvut.cz) Introduction October 5, 2017 1 / 31 Outline 1 About Knowledge Management 2 Overview of Ontologies

More information

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services Patrick Wendel Imperial College, London Data Mining and Exploration Middleware for Distributed and Grid Computing,

More information

This video is part of the Microsoft Virtual Academy.

This video is part of the Microsoft Virtual Academy. This video is part of the Microsoft Virtual Academy. 1 In this session we re going to talk about building for the private cloud using the Microsoft deployment toolkit 2012, my name s Mike Niehaus, I m

More information

Data Integrity in Stateful Services. Percona Live, Santa Clara, 2017

Data Integrity in Stateful Services. Percona Live, Santa Clara, 2017 Data Integrity in Stateful Services Percona Live, Santa Clara, 2017 Data Integrity Bringing Sexy Back Protect the Data. -Every DBA who doesn t want to be fired Breaking Integrity Down Physical Integrity

More information

10 Cloud Myths Demystified

10 Cloud Myths Demystified 10 Cloud s Demystified The Realities for Modern Campus Transformation Higher education is in an era of transformation. To stay competitive, institutions must respond to changing student expectations, demanding

More information

from the idea to the experience

from the idea to the experience User Interface Design and the Semantic Web from the idea to the experience Duane Degler Design for Context www.designforcontext.com Copyright D. Degler, Design for Context. 11.16.2010 Slide 1 Semantic

More information

Copyright 2012 EMC Corporation. All rights reserved. Obrigado

Copyright 2012 EMC Corporation. All rights reserved. Obrigado Copyright 20132012 EMC Corporation. EMC Corporation. All rights reserved. All rights reserved. 1 EMC FORUM 2013 2 Obrigado 3 SOFTWARE DEFINED DATA CENTER WORLD IS CHANGING RAPID CHANGE APP / INFRA INCREASED

More information

Tackling network heterogeneity head-on

Tackling network heterogeneity head-on Tackling network heterogeneity head-on Timothy Roscoe Networks and Operating Systems Group ETH Zürich ETH Zürich Scene setting Different dimension of MAD networks: Independent evolution Arbitrary policies

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING

More information

Proposal for Implementing Linked Open Data on Libraries Catalogue

Proposal for Implementing Linked Open Data on Libraries Catalogue Submitted on: 16.07.2018 Proposal for Implementing Linked Open Data on Libraries Catalogue Esraa Elsayed Abdelaziz Computer Science, Arab Academy for Science and Technology, Alexandria, Egypt. E-mail address:

More information

NDARC Web Refresh 2011

NDARC Web Refresh 2011 NDARC Web Refresh 2011 Update to Staff Luc Betbeder Status Update Where we were What did we want What did we do Where are we up to Releasing 25 Aug What now? Where we were An image is worth a 1000 bullet

More information

What s a BA to do with Data? Discover and define standard data elements in business terms

What s a BA to do with Data? Discover and define standard data elements in business terms What s a BA to do with Data? Discover and define standard data elements in business terms Susan Block, Lead Business Systems Analyst The Vanguard Group Discussion Points Discovering Business Data The Data

More information

Improving Decision-Making Support

Improving Decision-Making Support Improving Decision-Making Support by Linking Database results to Simulations Gio Wiederhold Stanford University May 2014 Gio Wiederhold SimQL 1 Problem : Mismatch Database Technology should support Decision-Making

More information

Lesson 14 SOA with REST (Part I)

Lesson 14 SOA with REST (Part I) Lesson 14 SOA with REST (Part I) Service Oriented Architectures Security Module 3 - Resource-oriented services Unit 1 REST Ernesto Damiani Università di Milano Web Sites (1992) WS-* Web Services (2000)

More information

Choosing the perfect CMS

Choosing the perfect CMS ... Choosing the perfect CMS 4 Pillars of picking the perfect Content Management System www.milestoneinternet.com 1-866-615-2516 Introduction Your website and mobile presence are the most powerful channels

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Semantic Web and Web2.0. Dr Nicholas Gibbins

Semantic Web and Web2.0. Dr Nicholas Gibbins Semantic Web and Web2.0 Dr Nicholas Gibbins Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success

More information

Q&A TAKING ENTERPRISE SECURITY TO THE NEXT LEVEL. An interview with John Summers, Enterprise VP and GM, Akamai

Q&A TAKING ENTERPRISE SECURITY TO THE NEXT LEVEL. An interview with John Summers, Enterprise VP and GM, Akamai TAKING ENTERPRISE SECURITY TO THE NEXT LEVEL An interview with John Summers, Enterprise VP and GM, Akamai Q&A What are the top things that business leaders need to understand about today s cybersecurity

More information

Semantics Modeling and Representation. Wendy Hui Wang CS Department Stevens Institute of Technology

Semantics Modeling and Representation. Wendy Hui Wang CS Department Stevens Institute of Technology Semantics Modeling and Representation Wendy Hui Wang CS Department Stevens Institute of Technology hwang@cs.stevens.edu 1 Consider the following data: 011500 18.66 0 0 62 46.271020111 25.220010 011500

More information

Building A Business Online. A Crash Course in Creating an Online Presence for Your Business

Building A Business Online. A Crash Course in Creating an Online Presence for Your Business A Crash Course in Creating an Online Presence for Your Business A little bit about me Graphic Design graduate from George Brown College Been in industry for the past 15 years Experience with clients ranging

More information

Chapter 6 VIDEO CASES

Chapter 6 VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research

Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research Ian Fore, D.Phil. Associate Director, Biorepository and Pathology Informatics Senior Program

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Exploring and Exploiting the Biological Maze Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Motivation An abundance of biological data sources contain data about scientific entities, such as

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

LeakDAS Version 4 The Complete Guide

LeakDAS Version 4 The Complete Guide LeakDAS Version 4 The Complete Guide SECTION 4 LEAKDAS MOBILE Second Edition - 2014 Copyright InspectionLogic 2 Table of Contents CONNECTING LEAKDAS MOBILE TO AN ANALYZER VIA BLUETOOTH... 3 Bluetooth Devices...

More information

By Snappy. Advanced SEO

By Snappy. Advanced SEO Advanced SEO 1 Table of Contents Chapter 4 Page Speed 9 Site Architecture 13 Content Marketing 25 Rich Results 01 Page Speed Advanced SEO ebook CHAPTER 1 Page Speed CHAPTER 1 CHAPTER ONE Page Speed ONE

More information

Transformative characteristics and research agenda for the SDI-SKI step change:

Transformative characteristics and research agenda for the SDI-SKI step change: Transformative characteristics and research agenda for the SDI-SKI step change: A Cadastral Case Study Dr Lesley Arnold Research Fellow, Curtin University, CRCSI Director Geospatial Frameworks World Bank

More information

Making the most of metadata with Metadata 2020

Making the most of metadata with Metadata 2020 Making the most of metadata with Metadata 2020 Patricia Feeney, Crossref and Metadata2020 CSE Annual Meeting April 2018 What is Metadata 2020? Metadata 2020 is a collaboration that advocates richer, connected,

More information

Enterprise Knowledge Map: Toward Subject Centric Computing. March 21st, 2007 Dmitry Bogachev

Enterprise Knowledge Map: Toward Subject Centric Computing. March 21st, 2007 Dmitry Bogachev Enterprise Knowledge Map: Toward Subject Centric Computing March 21st, 2007 Dmitry Bogachev Are we ready?...the idea of an application is an artificial one, convenient to the programmer but not to the

More information