HUMIT Interactive Data Integration in a Data Lake System for the Life Sciences
|
|
- Harry Richards
- 5 years ago
- Views:
Transcription
1 HUMIT Interactive Data Integration in a Data Lake System for the Life Sciences PD Dr. Christoph Quix Fraunhofer-Institut für Angewandte Informationstechnik FIT Life Science Informatics Abteilungsleiter High Content Analysis & Information-intensive Instruments christoph.quix@fit.fraunhofer.de Vertretungsprofessur Data Science Leiter der Forschungsgruppe Big Data & Model Management RWTH Aachen University
2 Funding period: , funded by BMBF Use Case Partners Regulation requirements & Quality assurance Coordinator / Technology Partner
3 High Content Screening Automatic analysis by substructure Systematic variation in parameters, e.g. by compound or sequence
4 Big Data in Life Sciences High-Content-Analysis Systematic Analysis of huge image sets Automated image analysis Meta data extraction from multimedia data Data management not only in life sciences Scientific Data Management Workflow integration
5 Zeta: Application Specific Platform Plugins Plugin Toolbar View Component Overlays Image Galeries Directory Tree Time Line Animation
6 Example Configuration Cell-Cycle Analysis Registration FB Detection Segmentation Tracking Classification Evaluation Result Cell-ID Position[x,y] Mother-ID Time-ID Size MeanIntensity TotalIntensity G phase Mitosis ImageName Well Site Wavelength SR100702Live_G12_w1_s1_t171.t 1 29,35-1 t if G12 s1 w1 SR100702Live_G12_w1_s1_t171.t 2 44,82-1 t if G12 s1 w1 3 63,465-1 t SR100702Live_G12_w1_s1_t171.t if G12 s1 w1 SR100702Live_G12_w1_s1_t171.t 4 97,363-1 t if G12 s1 w1
7 Metadata and data is managed files and filenames! is an inhibitor of Assay: cell cycle inhibition trichostatin A Histone deacetylase 1 File name: TSA_HDAC1_2.png Table name 7
8 Agenda 1. Motivation: Data Management in the Life Sciences 2. Requirements for Scientific Data Management 3. Data Lake Architecture in HUMIT 4. Summary
9 Scientific Data Data collected during the work of scientist Measuring results, test data, reports, analysis, Various file formats Excel, CSV, images/audio/video, text, XML, proprietary formats, Heterogeneous semantics Test vs. Result data, own vs. other data, timeframe, Idea Proposal Experiment Result Report
10 Heterogeneity is unavoidable Islands of data in separate projects and applications Integrated data analysis requires huge manual effort Traceability and reproducability is difficult because of manual processes Goal: From isolated data islands to (partially) integrated data landscapes
11 Requirements for Scientific Data Management Integration: Combined analysis of different data sources Traceability: Reproducability of research results Evidence in lawsuits: IP protection Reusability: Acccessibility for future usage Flexibility: Adapt to changes in the research processes Documentation Semantics Models
12 Agenda 1. Motivation: Data Management in the Life Sciences 2. Requirements for Scientific Data Management 3. Data Lake Architecture in HUMIT 4. Summary
13 Data Lakes If you think of a datamart as a store of bottled water cleansed and packaged and structured for easy consumption the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples. Maintain source data in its original structure Postpone (semantic) integration tasks Manage metadata about sources, mappings, and data quality Provide interfaces for uniform querying and interactive exploration of the data lake James Dixon (Pentaho)
14 HUMIT: Data Integration for High-Content Analysis Integration based on Pay-as-you-go Idea Incremental extraction and integration of data Interactive tools for exploration and querying of data, definition of semantic relationships and mappings, and data visualization Separation of data storage and data processing/transformation; raw data is stored with metadata in a Data Lake, thereby immediately available for data analysis; data integration and mapping done later (ELT instead of ETL)
15 Proposal for a Data Lake Architecture
16 Ingestion Layer Low Effort for loading data (ELT instead of ETL) Support for the extraction of metadata and data Degree of automatization (especially for metadata extraction)? Schema extraction for semi-structured data (JSON, XML) Schema-on-Read Lazy Loading Data quality control Specify minimal requirements for ingested data Complement and annotate extracted metadata
17 Storage Layer Choice of data storage HDFS? NoSQL? RDBMS? A hybride solution is required, but A uniform interface for data access A uniform query language ( query rewriting and data transformation) Metadata Repository and Metadata Model Manage schemata, mappings, data quality information and data lineage Close integration of data and metadata Data quality management Monitor data quality of data stores Semantic enrichment of metadata Prepare data marts for specific data sets
18 Interaction Layer Explore & Search in data repository Less direct queries (SQL), more Google-like queries Query for metadata and data User interaction should be captured as metadata Definition of exact queries Identification of new data relationships Metadata & Data Quality Management Exploration of the data lake (what kind of information is available) Capture semantic annotations of users Provide data quality information to users & collect feedback
19 Data Quality Comprehensive data quality mgmt for a data lake is necessary Data quality management is more than just data cleaning goals, metrics, measurements, analysis, improvements Data quality needs to be checked already for ingested data Minimal requirements for data sources (e.g., provide metadata or certain data items such as identifiers) Manage data quality information in metadata repository and make it available to data users
20 Agenda 1. Motivation: Data Management in the Life Sciences 2. Requirements for Scientific Data Management 3. Data Lake Architecture in HUMIT 4. Summary
21 Summary Data management in life sciences is often file-based which limits reuse and reproducability of experiments Making the data available in a data lake system provides query, search and exploration features to the users Data lake is in early concept and requires more research Within the HUMIT project, we are developing several components and the framework for a data lake system Metadata extraction ( CAiSE Forum 2016) Constance Data Lake Framework ( SIGMOD 2016) Data quality management ( QDB workshop at VLDB 2016) User interaction and data visualization
Data Lakes: A Solution or a newchallenge for Big Data Integration. Christoph Quix, DATA 2016
Data Lakes: A Solution or a newchallenge for Big Data Integration Christoph Quix, DATA 2016 christoph.quix@fit.fraunhofer.de FrequentProblems ofa Big Data Project Which data sources are available? WhereisthedatawhichI
More informationBIG DATA REVOLUTION IN JOBRAPIDO
BIG DATA REVOLUTION IN JOBRAPIDO Michele Pinto Big Data Technical Team Leader @ Jobrapido Big Data Tech 2016 Firenze - October 20, 2016 ABOUT ME NAME Michele Pinto LINKEDIN https://www.linkedin.com/in/pintomichele
More informationAgile Data Management Challenges in Enterprise Big Data Landscape
Agile Data Management Challenges in Enterprise Big Data Landscape Eric Simon, SAP Big Data October, 2017 1 Evolution Towards Enterprise Big Data Landscape administrator Data analyst Athena Redshift #123
More informationOliver Engels & Tillmann Eitelberg. Big Data! Big Quality?
Oliver Engels & Tillmann Eitelberg Big Data! Big Quality? Like to visit Germany? PASS Camp 2017 Main Camp 5.12 7.12.2017 (4.12 Kick Off Evening) Lufthansa Training & Conference Center, Seeheim SQL Konferenz
More informationProcessing big data with modern applications: Hadoop as DWH backend at Pro7. Dr. Kathrin Spreyer Big data engineer
Processing big data with modern applications: Hadoop as DWH backend at Pro7 Dr. Kathrin Spreyer Big data engineer GridKa School Karlsruhe, 02.09.2014 Outline 1. Relational DWH 2. Data integration with
More informationEnterprise Big Data Platforms
Enterprise Big Data Platforms + Big Data research @ Roma Tre Antonio Maccioni maccioni@dia.uniroma3.it 19 April 2017 Outline Polystores QUEPA project Data Lakes KAYAK project No one size fits all Polyglot
More informationVirtuoso Infotech Pvt. Ltd.
Virtuoso Infotech Pvt. Ltd. About Virtuoso Infotech Fastest growing IT firm; Offers the flexibility of a small firm and robustness of over 30 years experience collectively within the leadership team Technology
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationOracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service
Demo Introduction Keywords: Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Goal of Demo: Oracle Big Data Preparation Cloud Services can ingest data from various
More informationDatabase infrastructure for electronic structure calculations
Database infrastructure for electronic structure calculations Fawzi Mohamed fawzi.mohamed@fhi-berlin.mpg.de 22.7.2015 Why should you be interested in databases? Can you find a calculation that you did
More informationEnabling Data Governance Leveraging Critical Data Elements
Adaptive Presentation at DAMA-NYC October 19 th, 2017 Enabling Data Governance Leveraging Critical Data Elements Jeff Goins, President, Jeff.goins@adaptive.com James Cerrato, Chief, Product Evangelist,
More informationPYRAMID Headline Features. April 2018 Release
PYRAMID 2018.03 April 2018 Release The April release of Pyramid brings a big list of over 40 new features and functional upgrades, designed to make Pyramid s OS the leading solution for customers wishing
More informationOliver Engels & Tillmann Eitelberg. Big Data! Big Quality?
Oliver Engels & Tillmann Eitelberg Big Data! Big Quality? Sponsors help us to run this event! THX! You Rock! Sponsor Gold Sponsor Silver Sponsor Bronze Sponsor You Rock! Sponsor Session 13:45 Track 1 Das
More informationData Management Glossary
Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative
More informationData Lakes. IN A Modern Data Architecture
Data Lakes IN A Modern Data Architecture Data is Big Space is big, Douglas Adams mused in The Hitchhiker s Guide to the Galaxy. Really big. The same can be said of data: It s big. Really big. You might
More informationIntroduction to Data Science
UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More information@Pentaho #BigDataWebSeries
Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of
More informationImproving Your Business with Oracle Data Integration See How Oracle Enterprise Metadata Management Can Help You
Improving Your Business with Oracle Data Integration See How Oracle Enterprise Metadata Management Can Help You Özgür Yiğit Oracle Data Integration, Senior Manager, ECEMEA Safe Harbor Statement The following
More informationData Governance for the Connected Enterprise
Data Governance for the Connected Enterprise Irene Polikoff and Jack Spivak, TopQuadrant Inc. November 3, 2016 Copyright 2016 TopQuadrant Inc. Slide 1 Data Governance for the Connected Enterprise Today
More informationEnterprise Data Catalog for Microsoft Azure Tutorial
Enterprise Data Catalog for Microsoft Azure Tutorial VERSION 10.2 JANUARY 2018 Page 1 of 45 Contents Tutorial Objectives... 4 Enterprise Data Catalog Overview... 5 Overview... 5 Objectives... 5 Enterprise
More informationDataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom
Dataspaces: A New Abstraction for Data Management Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Today s Agenda Why databases are great. What problems people really have Why databases are not
More informationPYRAMID April 2018 Release
PYRAMID 2018.03 April 2018 Release The April release of Pyramid brings a list of over 40 new key features and numerous functional upgrades, designed to make Pyramid s OS the leading solution for customers
More informationCS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,
More informationHandout 12 Data Warehousing and Analytics.
Handout 12 CS-605 Spring 17 Page 1 of 6 Handout 12 Data Warehousing and Analytics. Operational (aka transactional) system a system that is used to run a business in real time, based on current data; also
More informationAchieve Data Democratization with effective Data Integration Saurabh K. Gupta
Achieve Data Democratization with effective Data Integration Saurabh K. Gupta Manager, Data & Analytics, GE www.amazon.com/author/saurabhgupta @saurabhkg Disclaimer: This report has been prepared by the
More informationIntroduction to Federation Server
Introduction to Federation Server Alex Lee IBM Information Integration Solutions Manager of Technical Presales Asia Pacific 2006 IBM Corporation WebSphere Federation Server Federation overview Tooling
More informationShine a Light on Dark Data with Vertica Flex Tables
White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,
More informationFinancial Dataspaces: Challenges, Approaches and Trends
Financial Dataspaces: Challenges, Approaches and Trends Finance and Economics on the Semantic Web (FEOSW), ESWC 27 th May, 2012 Seán O Riain ebusiness Copyright 2009. All rights reserved. Motivation Changing
More informationDEV-33: Get to Know Your Data Open Source Data Integration, Business Intelligence and more Marian Edu
DEV-33: Get to Know Your Data Open Source, Business Intelligence and more IT Consultant Agenda Take Ownership of Your Data. Data Discovery Reporting Analysis 2 DEV-33: Get to Know Your Data Data Discovery
More informationIs NiFi compatible with Cloudera, Map R, Hortonworks, EMR, and vanilla distributions?
Kylo FAQ General What is Kylo? Capturing and processing big data isn't easy. That's why Apache products such as Spark, Kafka, Hadoop, and NiFi that scale, process, and manage immense data volumes are so
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationData sources. Gartner, The State of Data Warehousing in 2012
data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. Gartner, The State of Data Warehousing
More informationTake P, R or U. and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22
Take P, R or U and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22 Oliver Engels CEO, oh22data AG @oengels Datamonster from Germany MS Data Platform MVP President of PASS Germany
More informationThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin,
More informationData Integration and Data Warehousing Database Integration Overview
Data Integration and Data Warehousing Database Integration Overview Sergey Stupnikov Institute of Informatics Problems, RAS ssa@ipi.ac.ru Outline Information Integration Problem Heterogeneous Information
More informationEnd-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.
End-to-End data mining feature integration, transformation and selection with Datameer Fastest time to Insights Rapid Data Integration Zero coding data integration Wizard-led data integration & No ETL
More informationInformation empowerment for your evolving data ecosystem
Information empowerment for your evolving data ecosystem Highlights Enables better results for critical projects and key analytics initiatives Ensures the information is trusted, consistent and governed
More informationREGULATORY REPORTING FOR FINANCIAL SERVICES
REGULATORY REPORTING FOR FINANCIAL SERVICES Gordon Hughes, Global Sales Director, Intel Corporation Sinan Baskan, Solutions Director, Financial Services, MarkLogic Corporation Many regulators and regulations
More informationHow Insurers are Realising the Promise of Big Data
How Insurers are Realising the Promise of Big Data Jason Hunter, CTO Asia-Pacific, MarkLogic A Big Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies
More informationIOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK
IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK DR. KONSTANTIN BOUDNIK DR.KONSTANTIN BOUDNIK EPAM SYSTEMS CHIEF TECHNOLOGIST BIGDATA, OPEN SOURCE
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationI am: Rana Faisal Munir
Self-tuning BI Systems Home University (UPC): Alberto Abelló and Oscar Romero Host University (TUD): Maik Thiele and Wolfgang Lehner I am: Rana Faisal Munir Research Progress Report (RPR) [1 / 44] Introduction
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationDBpedia Data Processing and Integration Tasks in UnifiedViews
1 DBpedia Data Processing and Integration Tasks in Tomas Knap Semantic Web Company Markus Freudenberg Leipzig University Kay Müller Leipzig University 2 Introduction Agenda, Team 3 Agenda Team & Goal An
More informationData-Transformation on historical data using the RDF Data Cube Vocabulary
Data-Transformation on historical data using the RD Data Cube Vocabulary Sebastian Bayerl, Michael Granitzer Department of Media Computer Science University of Passau SWIB15 Semantic Web in Libraries 22.10.2015
More informationBuilding Next- GeneraAon Data IntegraAon Pla1orm. George Xiong ebay Data Pla1orm Architect April 21, 2013
Building Next- GeneraAon Data IntegraAon Pla1orm George Xiong ebay Data Pla1orm Architect April 21, 2013 ebay Analytics >50 TB/day new data 100+ Subject Areas >100 PB/day Processed >100 Trillion pairs
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationWhat is Gluent? The Gluent Data Platform
What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the
More informationData Mining. Asso. Profe. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS (1)
Data Mining Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of CS 2016 2017 (1) Points to Cover Problem: Heterogeneous Information Sources
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationAugust Oracle - GoldenGate Statement of Direction
August 2015 Oracle - GoldenGate Statement of Direction Disclaimer This document in any form, software or printed matter, contains proprietary information that is the exclusive property of Oracle. Your
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationMeDUSA Method for Designing UML2-based Embedded System Software Architectures
MeDUSA Method for Designing UML2-based Embedded System Software Architectures Alexander Nyßen 1, Horst Lichter 1, Jan Suchotzki 2, Lukas Kurmann 3 1 Introduction MeDUSA (Method for Designing UML2-based
More informationBig Data Integration BIG DATA 9/15/2017. Business Performance
BIG DATA Business Performance Big Data Integration Big data is often about doing things that weren t widely possible because the technology was not advanced enough or the cost of doing so was prohibitive.
More informationQuality Assured (QA) data
Quality Assured (QA) data Towards DOI quality of data generated at the UFZ Mark Frenzel (Ecologist) & Thomas Schnicke (IT) DataCite / Helmholtz Open Science Workshop Leipzig, 12.01.2016 QA + DOI: Best
More informationPentaho Data Integration (PDI) Techniques - Guidelines for Metadata Injection
Pentaho Data Integration (PDI) Techniques - Guidelines for Metadata Injection Change log (if you want to use it): Date Version Author Changes Contents Overview... 1 Before You Begin... 1 Terms You Should
More informationData Lake Based Systems that Work
Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a
More informationUSC Viterbi School of Engineering
Introduction to Computational Thinking and Data Science USC Viterbi School of Engineering http://www.datascience4all.org Term: Fall 2016 Time: Tues- Thur 10am- 11:50am Location: Allan Hancock Foundation
More information2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice
2014 年 3 月 13 日星期四 From Big Data to Big Value Infrastructure Needs and Huawei Best Practice Data-driven insight Making better, more informed decisions, faster Raw Data Capture Store Process Insight 1 Data
More informationETL is No Longer King, Long Live SDD
ETL is No Longer King, Long Live SDD How to Close the Loop from Discovery to Information () to Insights (Analytics) to Outcomes (Business Processes) A presentation by Brian McCalley of DXC Technology,
More information3.4 Data-Centric workflow
3.4 Data-Centric workflow One of the most important activities in a S-DWH environment is represented by data integration of different and heterogeneous sources. The process of extract, transform, and load
More informationPřehled novinek v SQL Server 2016
Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing
More informationBig Data Facebook
Big Data Architectures@ Facebook QCon London 2012 Ashish Thusoo Outline Big Data @ Facebook - Scope & Scale Evolution of Big Data Architectures @ FB Past, Present and Future Questions Big Data @ FB: Scale
More informationIntroduction to ETL with SAS
Analytium Ltd Analytium Ltd Why ETL is important? When there is no managed ETL If you are here, at SAS Global Forum, you are probably involved in data management or data consumption in one or more ways.
More informationEvolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo
Evolution of Big Data Architectures@ Facebook Architecture Summit, Shenzhen, August 2012 Ashish Thusoo About Me Currently Co-founder/CEO of Qubole Ran the Data Infrastructure Team at Facebook till 2011
More informationFAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide
FAQs 1. What is the browser compatibility for logging into the TCS Connected Intelligence Data Lake for Business Portal? Please check whether you are using Mozilla Firefox 18 or above and Google Chrome
More informationWriting a Data Management Plan A guide for the perplexed
March 29, 2012 Writing a Data Management Plan A guide for the perplexed Agenda Rationale and Motivations for Data Management Plans Data and data structures Metadata and provenance Provisions for privacy,
More informationMaking Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST
Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0 WEBINAR MAY 15 th, 2018 1PM EST 10AM PST Welcome and Logistics If you have problems with the sound on your computer, switch
More informationDrawing the Big Picture
Drawing the Big Picture Multi-Platform Data Architectures, Queries, and Analytics Philip Russom TDWI Research Director for Data Management August 26, 2015 Sponsor 2 Speakers Philip Russom TDWI Research
More informationScience-as-a-Service
Science-as-a-Service The iplant Foundation Rion Dooley Edwin Skidmore Dan Stanzione Steve Terry Matthew Vaughn Outline Why, why, why! When duct tape isn t enough Building an API for the web Core services
More informationEUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal Heinrich Widmann, DKRZ DI4R 2016, Krakow, 28 September 2016 www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020
More informationSyncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET
SOLUTION SHEET Syncsort DMX-h Simplifying Big Data Integration Goals of the Modern Data Architecture Data warehouses and mainframes are mainstays of traditional data architectures and still play a vital
More informationBig Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012
Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema
More informationData Mining & Data Warehouse
Data Mining & Data Warehouse Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Information Technology 2016 2017 (1) Points to Cover Problem:
More informationData Analytics at Logitech Snowflake + Tableau = #Winning
Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief
More informationSql Fact Constellation Schema In Data Warehouse With Example
Sql Fact Constellation Schema In Data Warehouse With Example Data Warehouse OLAP - Learn Data Warehouse in simple and easy steps using Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP), Specialized SQL
More informationSTRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa
STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS LECTURE: 05 (A) DATA WAREHOUSING (DW) By: Dr. Tendani J. Lavhengwa lavhengwatj@tut.ac.za 1 My personal quote:
More informationIntegration in the 21 st -Century Enterprise. Thomas Blackadar American Chemical Society Meeting New York, September 10, 2003
Integration in the 21 st -Century Enterprise Thomas Blackadar American Chemical Society Meeting New York, September 10, 2003 The Integration Bill of Rights Integrate = to form, coordinate, or blend into
More informationExtend NonStop Applications with Cloud-based Services. Phil Ly, TIC Software John Russell, Canam Software
Extend NonStop Applications with Cloud-based Services Phil Ly, TIC Software John Russell, Canam Software Agenda Cloud Computing and Microservices Amazon Web Services (AWS) Integrate NonStop with AWS Managed
More informationWhere do these data come from? What technologies do they use?? Whatever they use, they need models (schemas, metadata, )
Week part 2: Database Applications and Technologies Data everywhere SQL Databases, Packaged applications Data warehouses, Groupware Internet databases, Data mining Object-relational databases, Scientific
More informationWhat is database? Types and Examples
What is database? Types and Examples Visit our site for more information: www.examplanning.com Facebook Page: https://www.facebook.com/examplanning10/ Twitter: https://twitter.com/examplanning10 TABLE
More informationGOVERNING HADOOP (AND THE DATA LAKE)
GOVERNING HADOOP (AND THE DATA LAKE) DAMA-RMC Discussion Lowell W. Fryman, CBIP-CDMP Practice Principle lowell.fryman@collibra.com April 20, 2017 2017 Collibra Inc DAMA-RMC Discussion Agenda Do we need
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationWriting Queries Using Microsoft SQL Server 2008 Transact-SQL. Overview
Writing Queries Using Microsoft SQL Server 2008 Transact-SQL Overview The course has been extended by one day in response to delegate feedback. This extra day will allow for timely completion of all the
More informationIntroduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data
Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction
More informationWhat's New in SAS Data Management
Paper SAS1390-2015 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC ABSTRACT The latest releases of SAS Data Integration Studio and DataFlux Data Management Platform provide
More informationAdvanced Data Management Technologies Written Exam
Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This
More informationA B2B Search Engine. Abstract. Motivation. Challenges. Technical Report
Technical Report A B2B Search Engine Abstract In this report, we describe a business-to-business search engine that allows searching for potential customers with highly-specific queries. Currently over
More informationBest practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP
Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can
More information#MicroFocusCyberSummit
#MicroFocusCyberSummit Data Simplicity: ArcSight Data Platform enhances enterprise data via the Common Event Format Peter Titov Micro Focus #MicroFocusCyberSummit Agenda Usage Ingestion Management Solutions
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationIntroduction to NoSQL
Introduction to NoSQL Agenda History What is NoSQL Types of NoSQL The CAP theorem History - RDBMS Relational DataBase Management Systems were invented in the 1970s. E. F. Codd, "Relational Model of Data
More informationPrinciples of Dataspaces
Principles of Dataspaces Seminar From Databases to Dataspaces Summer Term 2007 Monika Podolecheva University of Konstanz Department of Computer and Information Science Tutor: Prof. M. Scholl, Alexander
More informationdan.fay@microsoft.com Scientific Data Intensive Computing Workshop 2004 Visualizing and Experiencing E 3 Data + Information: Provide a unique experience to reduce time to insight and knowledge through
More informationIS THE DATA CATALOG A METADATA MANAGEMENT RELOADED?
Ein Unternehmen der Daimler AG IS THE DATA CATALOG A METADATA MANAGEMENT RELOADED? Andreas Buckenhofer, DOAG Big Data Days, Dresden 2018 ANDREAS BUCKENHOFER, DAIMLER TSS GMBH Forming good abstractions
More informationAzure Data Lake Store
Azure Data Lake Store Analytics 101 Kenneth M. Nielsen Data Solution Architect, MIcrosoft Our Sponsors About me Kenneth M. Nielsen Worked with SQL Server since 1999 Data Solution Architect at Microsoft
More informationBuilding a Data Strategy for a Digital World
Building a Data Strategy for a Digital World Jason Hunter, CTO, APAC Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies Data Hub 100 s of Service
More information