PANDA A System for Provenance and Data. Example: Sales Prediction Workflow. Example: Sales Prediction Workflow. Backward Tracing. Item Sales.

Size: px
Start display at page:

Download "PANDA A System for Provenance and Data. Example: Sales Prediction Workflow. Example: Sales Prediction Workflow. Backward Tracing. Item Sales."

Transcription

1 PANDA A System for Provenance and Data Example: Prediction Workflow Union Predict Agg -1 s 2 Example: Prediction Workflow Backward Tracing Union Predict Agg -1 s Name Amelie Jacques Isabelle Name Address ProbDemand Paris, Amelie Cowboy Hat Hat.98 high Paris, Texas Jacques Cowboy Hat.98 Paris, Texas? Isabelle Cowboy Hat.98 Texas?? 3

2 Example: Prediction Workflow Backward Tracing Union Predict Agg -1 s Name Amelie Jacques Isabelle Address Name 65, quai d'orsay, Paris Amelie 39, rue de Bretagne, Jacques Paris 20, rue d Orsel, Paris Isabelle? Address Paris, Paris, Texas Paris, Texas Texas 4 Example: Prediction Workflow... Forward Propagation Backward Tracing Dedup Split Union Predict Agg -1 s Name Amelie Jacques Isabelle Address Name Address 65, quai d'orsay, Amelie Paris Paris, France 39, rue de Bretagne, Jacques Paris Paris, France 20, rue d Orsel, Isabelle Paris Paris, France Beret Demand high 5 Provenance (very general) Where data came from How it was derived, manipulated, combined, processed, How it has evolved over time 6

3 Uses for Provenance (very general) Explanation Sources and evolution of data; deeper understanding Verification Buggy or stale source data? Buggy processing? Auditing Recomputation Propagate changes to affected downstream data 7 Some Application Domains prediction workflows Scientific-data workflows Including human-curated data Including evolving versions of data Any analytic pipeline Extract-transform-load (ETL) processes Information-extraction pipelines 8 Third Time s a Charm Isn t provenance the same thing as lineage? Haven t you worked on it before? Yes 1. Data Warehousing project (long ago) Lineage of relational views in warehouse: formal foundations, system/caching issues Lineage in ETL pipelines: foundations & algorithms 2. Trio project (recently) Data + Uncertainty + Lineage Lineage primarily in support of uncertainty Pretty much 9

4 Panda s Ambitions Previous provenance work tends to be Panda will Either data-based or process-based Capture both: data-oriented workflows Either fine-grained or coarse-grained Cover the spectrum in a unified fashion Focused on modeling and capturing provenance Also support provenance operators and queries Geared to specific functions or domains End with a general-purpose open-source system 10 Remainder of Talk Processing nodes and provenance capture Exploiting provenance Built-in operations Ad-hoc queries Other uses Current research 11 Processing Nodes... Dedup Split Union Predict Agg -1 s Relational Structured, well-understood operations Opaque (or semi-opaque) Incomplete knowledge of internals 12

5 Provenance Capture Model Ultimate model yet to be determined From bipartite graph to Open Provenance Model Goals: Support provenance at spectrum of granularities Mesh data-oriented and process-oriented provenance Composability/transitivity Understandability For remainder of talk: think bipartite graph 13 Provenance Capture Interface... Dedup Split Union Predict Agg -1 s Processing nodes provide provenance information along with output Eager or Lazy 14 Provenance Capture Interface... Dedup Split Union Predict Agg -1 Straightforward provenance s Relational, other Worst well-understood Case: transformations Automatic No access previous to fine-grained work provenance Opaque (or semi-opaque) Provided via provenance interface, or inferred 15

6 Provenance Operations Basic Backward tracing Where did the Cowboy Hat record come from? Forward tracing Which sales predictions did Amelie contribute to? Union Predict Agg -1 s 16 Additional Functionality Forward propagation Update all affected predictions after customers move from Texas to France Union Predict Agg -1 s 17 Additional Functionality Refresh Get latest prediction for Cowboy Hat sales (only) based on modified buying patterns Backward tracing + Forward propagation Union Predict Agg -1 s 18

7 Provenance Queries How many people from each country contributed to the Cowboy Hat prediction? Which customer list contributed the most to the top 100 predicted items? Union Predict Agg -1 s 19 Provenance Queries For a specific customer list, which items have higher demand than for the entire customer set? Which customers have more duplication those processed by or by? Union Predict Agg -1 s 20 Provenance Queries For a specific customer list, which items have higher demand than for the entire customer set? Which customers have more duplication those processed by or by? Query language goals Declarative ad-hoc queries à la database systems Seamlessly combine provenance and data Amenable to optimization Many interesting optimization issues 21

8 Current Research System Workflow infrastructure Basic provenance capture & operations Algorithms For refresh problem (also foundations & implementation) Theory Decision problems: eager/lazy, intermediate data Provenance foundations for semi-opaque transformations 22 Current Research System Workflow infrastructure Basic provenance capture & operations Algorithms For refresh problem See recent (also foundations & implementation) paper Theory Decision problems: eager/lazy, intermediate data Provenance foundations for semi-opaque transformations 23 Panda System (version 0.1) Command-line Client Data s Workflow Provenance Predicate s Panda Layer ite Forward Filter s Graphical Interface File System Arbitrary acyclic workflows processing nodes automatic capture processing nodes one-one, one-many Backward-tracing and forwardpropagation 24

9 Future Work The large majority of this talk Stay tuned 25 Backup slides Optimization Issues Query-driven provenance capture Eager vs. lazy provenance capture Space-time & query-update tradeoffs Processing-node dependent Ex: Dedup, Agg Retain intermediate data sets? Fine-grained vs. coarse-grained Approximate provenance 27

10 Panda System (version 0.1) Command-line Client Graphical Interface Panda Layer Data s Workflow Provenance Predicate s ite Forward Filter s File System 28 System: Key Features Arbitrary acyclic workflows processing nodes Automatic provenance capture processing nodes One-one and one-many transformations only Backward-tracing and forward-propagation 29 Panda System (version 0.1) Command-line Client Graphical Interface Create Data s Workflow Provenance Predicate s Panda Layer ite Forward Filter s File System 30

11 Panda System (version 0.1) Command-line Client Graphical Interface Create Transformatio n Data s Workflow Provenance Predicate s Panda Layer ite Forward Filter s File System 31 Panda System (version 0.1) Command-line Client Graphical Interface Create Transformatio n Data s Workflow Provenance Predicate s Panda Layer ite Forward Filter s File System 32 Panda System (version 0.1) Command-line Client Graphical Interface Refresh Output Element Panda Layer ite Data s Workflow Provenance Predicate s Forward Filter s File System 33

12 Provenance-Based Refresh Problem Exploit provenance to efficiently compute the up-to-date value of selected output elements Challenges: See recent Formally defining provenance paper & refresh When is new value up-to-date version of old one? Provenance as key When can selective refresh be done correctly and efficiently? Properties of transformations, workflows, provenance Algorithms for wide class of transformations and workflows 34

PANDA A System for Provenance and Data. Jennifer Widom Stanford University

PANDA A System for Provenance and Data. Jennifer Widom Stanford University PANDA A System for Provenance and Data Stanford University Example: Sales Prediction Workflow CustList 1 Europe CustList 2... Dedup Split Union Predict ItemAgg CustList n-1 CustList n USA Catalog Items

More information

Robert Ikeda Jennifer Widom. Stanford University

Robert Ikeda Jennifer Widom. Stanford University Jennifer Widom Stanford University Example Dedup Union Predict ItemAgg ItemVolumes 1 Pipeline for sales predictions 2 Example Dedup Union Predict ItemAgg ItemVolumes 1 3 Example Dedup Union Predict ItemAgg

More information

Unified Governance for Amazon S3 Data Lakes

Unified Governance for Amazon S3 Data Lakes WHITEPAPER Unified Governance for Amazon S3 Data Lakes Core Capabilities and Best Practices for Effective Governance Introduction Data governance ensures data quality exists throughout the complete lifecycle

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark Announcements HW2 due this Thursday AWS accounts Any success? Feel

More information

Handout 12 Data Warehousing and Analytics.

Handout 12 Data Warehousing and Analytics. Handout 12 CS-605 Spring 17 Page 1 of 6 Handout 12 Data Warehousing and Analytics. Operational (aka transactional) system a system that is used to run a business in real time, based on current data; also

More information

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Agile Data Management Challenges in Enterprise Big Data Landscape

Agile Data Management Challenges in Enterprise Big Data Landscape Agile Data Management Challenges in Enterprise Big Data Landscape Eric Simon, SAP Big Data October, 2017 1 Evolution Towards Enterprise Big Data Landscape administrator Data analyst Athena Redshift #123

More information

Perm Integrating Data Provenance Support in Database Systems

Perm Integrating Data Provenance Support in Database Systems Perm Integrating Data Provenance Support in Database Systems Boris Glavic Database Technology Group Department of Informatics University of Zurich glavic@ifi.uzh.ch Gustavo Alonso Systems Group Department

More information

Datameer Big Data Governance. Bringing open-architected and forward-compatible governance controls to Hadoop analytics

Datameer Big Data Governance. Bringing open-architected and forward-compatible governance controls to Hadoop analytics Datameer Big Data Governance Bringing open-architected and forward-compatible governance controls to Hadoop analytics As big data moves toward greater mainstream adoption, its compliance with long-standing

More information

SAMPLE. Preface xi 1 Introducting Microsoft Analysis Services 1

SAMPLE. Preface xi 1 Introducting Microsoft Analysis Services 1 contents Preface xi 1 Introducting Microsoft Analysis Services 1 1.1 What is Analysis Services 2005? 1 Introducing OLAP 2 Introducing Data Mining 4 Overview of SSAS 5 SSAS and Microsoft Business Intelligence

More information

COMPARISON WHITEPAPER. Snowplow Insights VS SaaS load-your-data warehouse providers. We do data collection right.

COMPARISON WHITEPAPER. Snowplow Insights VS SaaS load-your-data warehouse providers. We do data collection right. COMPARISON WHITEPAPER Snowplow Insights VS SaaS load-your-data warehouse providers We do data collection right. Background We were the first company to launch a platform that enabled companies to track

More information

Securing Mediated Trace Access Using Black-box Permutation Analysis

Securing Mediated Trace Access Using Black-box Permutation Analysis Securing Mediated Trace Access Using Black-box Permutation Analysis Prateek Mittal (UIUC), Vern Paxson (UCB/ICSI), Robin Sommer (ICSI/LBNL), Mark Winterrowd(UCB) 1 Thirst for Data Need real world network

More information

Datameer for Data Preparation:

Datameer for Data Preparation: Datameer for Data Preparation: Explore, Profile, Blend, Cleanse, Enrich, Share, Operationalize DATAMEER FOR DATA PREPARATION: EXPLORE, PROFILE, BLEND, CLEANSE, ENRICH, SHARE, OPERATIONALIZE Datameer Datameer

More information

Jennifer Widom. Stanford University

Jennifer Widom. Stanford University Principled Research in Database Systems Stanford University What Academics Give Talks About Other people s papers Thesis and new results Significant research projects The research field BIG VISION Other

More information

The Seven Steps to Implement DataOps

The Seven Steps to Implement DataOps The Seven Steps to Implement Ops ABSTRACT analytics teams challenged by inflexibility and poor quality have found that Ops can address these and many other obstacles. Ops includes tools and process improvements

More information

The Emerging Data Lake IT Strategy

The Emerging Data Lake IT Strategy The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin,

More information

BRMS: Best (and worst) Practices and Real World Examples

BRMS: Best (and worst) Practices and Real World Examples BRMS: Best (and worst) Practices and Real World Examples Edson Tirelli Senior Software Engineer JBoss, by Red Hat 1 It all starts with a question What is the performance of ? 2 Followed

More information

Modernizing Business Intelligence and Analytics

Modernizing Business Intelligence and Analytics Modernizing Business Intelligence and Analytics Justin Erickson Senior Director, Product Management 1 Agenda What benefits can I achieve from modernizing my analytic DB? When and how do I migrate from

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa Data Warehousing Data Warehousing and Mining Lecture 8 by Hossen Asiful Mustafa Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information,

More information

Representation Formalisms for Uncertain Data

Representation Formalisms for Uncertain Data Representation Formalisms for Uncertain Data Jennifer Widom with Anish Das Sarma Omar Benjelloun Alon Halevy and other participants in the Trio Project !! Warning!! This work is preliminary and in flux

More information

Logical Provenance in Data-Oriented Workflows

Logical Provenance in Data-Oriented Workflows Logical Provenance in Data-Oriented Workflows Robert Ikeda 1, Akash Das Sarma 2, Jennifer Widom 3 Stanford University, USA 1 rmikeda@cs.stanford.edu 2 akashds.iitk@gmail.com 3 widom@cs.stanford.edu Abstract

More information

MCSA SQL SERVER 2012

MCSA SQL SERVER 2012 MCSA SQL SERVER 2012 1. Course 10774A: Querying Microsoft SQL Server 2012 Course Outline Module 1: Introduction to Microsoft SQL Server 2012 Introducing Microsoft SQL Server 2012 Getting Started with SQL

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

Provenance-aware Secure Networks

Provenance-aware Secure Networks Provenance-aware Secure Networks Wenchao Zhou Eric Cronin Boon Thau Loo University of Pennsylvania Motivation Network accountability Real-time monitoring and anomaly detection Identifying and tracing malicious

More information

Logi Info v12.5 WHAT S NEW

Logi Info v12.5 WHAT S NEW Logi Info v12.5 WHAT S NEW Introduction Logi empowers companies to embed analytics into the fabric of their organizations and products enabling anyone to analyze data, share insights, and make informed

More information

MICROSOFT BUSINESS INTELLIGENCE

MICROSOFT BUSINESS INTELLIGENCE SSIS MICROSOFT BUSINESS INTELLIGENCE 1) Introduction to Integration Services Defining sql server integration services Exploring the need for migrating diverse Data the role of business intelligence (bi)

More information

Data Mining & Data Warehouse

Data Mining & Data Warehouse Data Mining & Data Warehouse Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Information Technology 2016 2017 (1) Points to Cover Problem:

More information

An InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager

An InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager An InterSystems Guide to the Data Galaxy Benjamin De Boe Product Manager Analytics 3 InterSystems Corporation. All rights reserved. 4 InterSystems Corporation. All rights reserved. 5 InterSystems Corporation.

More information

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information

More information

Chapter 3: Google Penguin, Panda, & Hummingbird

Chapter 3: Google Penguin, Panda, & Hummingbird Chapter 3: Google Penguin, Panda, & Hummingbird Search engine algorithms are based on a simple premise: searchers want an answer to their queries. For any search, there are hundreds or thousands of sites

More information

A Survey on Database Systems Handling Computable and Real-World Dependencies

A Survey on Database Systems Handling Computable and Real-World Dependencies A Survey on Database Systems Handling Computable and Real-World Dependencies Beena J Stuvert 1, Preeja V 2 P.G. Student, Department of CSE, SCTCE, Trivandrum, Kerala, India 1 Asst. Professor, Department

More information

360SCIENCE. use case. Customer Data Matching for Business Intelligence and Finance M&A

360SCIENCE. use case. Customer Data Matching for Business Intelligence and Finance M&A 360SCIENCE use case Customer Data Matching for Business Intelligence and Finance M&A Amir Javidan, SVP Operations PayPal - TIO Networks Overview TIO Networks (a PayPal company) is a financial services

More information

The Hadoop Paradigm & the Need for Dataset Management

The Hadoop Paradigm & the Need for Dataset Management The Hadoop Paradigm & the Need for Dataset Management 1. Hadoop Adoption Hadoop is being adopted rapidly by many different types of enterprises and government entities and it is an extraordinarily complex

More information

Deccansoft Software Services. SSIS Syllabus

Deccansoft Software Services. SSIS Syllabus Overview: SQL Server Integration Services (SSIS) is a component of Microsoft SQL Server database software which can be used to perform a broad range of data migration, data integration and Data Consolidation

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Full file at

Full file at Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits

More information

FOUR INDEPENDENT TOOLS TO MANAGE COMPLEXITY INHERENT TO DEVELOPING STATE OF THE ART SYSTEMS. DEVELOPER SPECIFIER TESTER

FOUR INDEPENDENT TOOLS TO MANAGE COMPLEXITY INHERENT TO DEVELOPING STATE OF THE ART SYSTEMS. DEVELOPER SPECIFIER TESTER TELECOM AVIONIC SPACE AUTOMOTIVE SEMICONDUCTOR IOT MEDICAL SPECIFIER DEVELOPER FOUR INDEPENDENT TOOLS TO MANAGE COMPLEXITY INHERENT TO DEVELOPING STATE OF THE ART SYSTEMS. TESTER PragmaDev Studio is a

More information

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways

More information

Data Warehouse Testing. By: Rakesh Kumar Sharma

Data Warehouse Testing. By: Rakesh Kumar Sharma Data Warehouse Testing By: Rakesh Kumar Sharma Index...2 Introduction...3 About Data Warehouse...3 Data Warehouse definition...3 Testing Process for Data warehouse:...3 Requirements Testing :...3 Unit

More information

Data Governance: Data Usage Labeling and Enforcement in Adobe Cloud Platform

Data Governance: Data Usage Labeling and Enforcement in Adobe Cloud Platform Data Governance: Data Usage Labeling and Enforcement in Adobe Cloud Platform Contents What is data governance? Why data governance? Data governance roles. The Adobe Cloud Platform advantage. A framework

More information

Data Immersion : Providing Integrated Data to Infinity Scientists. Kevin Gilpin Principal Engineer Infinity Pharmaceuticals October 19, 2004

Data Immersion : Providing Integrated Data to Infinity Scientists. Kevin Gilpin Principal Engineer Infinity Pharmaceuticals October 19, 2004 Data Immersion : Providing Integrated Data to Infinity Scientists Kevin Gilpin Principal Engineer Infinity Pharmaceuticals October 19, 2004 Informatics at Infinity Understand the nature of the science

More information

SQL Maestro and the ELT Paradigm Shift

SQL Maestro and the ELT Paradigm Shift SQL Maestro and the ELT Paradigm Shift Abstract ELT extract, load, and transform is replacing ETL (extract, transform, load) as the usual method of populating data warehouses. Modern data warehouse appliances

More information

Implementing a Data Warehouse with Microsoft SQL Server 2014

Implementing a Data Warehouse with Microsoft SQL Server 2014 Course 20463D: Implementing a Data Warehouse with Microsoft SQL Server 2014 Page 1 of 5 Implementing a Data Warehouse with Microsoft SQL Server 2014 Course 20463D: 4 days; Instructor-Led Introduction This

More information

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com Objectives Explain the basics of: 1. Data

More information

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting. DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting April 14, 2009 Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie,

More information

CS639: Data Management for Data Science. Lecture 1: Intro to Data Science and Course Overview. Theodoros Rekatsinas

CS639: Data Management for Data Science. Lecture 1: Intro to Data Science and Course Overview. Theodoros Rekatsinas CS639: Data Management for Data Science Lecture 1: Intro to Data Science and Course Overview Theodoros Rekatsinas 1 2 Big science is data driven. 3 Increasingly many companies see themselves as data driven.

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

CSE 344 Final Review. August 16 th

CSE 344 Final Review. August 16 th CSE 344 Final Review August 16 th Final In class on Friday One sheet of notes, front and back cost formulas also provided Practice exam on web site Good luck! Primary Topics Parallel DBs parallel join

More information

Educating a New Breed of Data Scientists for Scientific Data Management

Educating a New Breed of Data Scientists for Scientific Data Management Educating a New Breed of Data Scientists for Scientific Data Management Jian Qin School of Information Studies Syracuse University Microsoft escience Workshop, Chicago, October 9, 2012 Talk points Data

More information

Contractors Guide to Search Engine Optimization

Contractors Guide to Search Engine Optimization Contractors Guide to Search Engine Optimization CONTENTS What is Search Engine Optimization (SEO)? Why Do Businesses Need SEO (If They Want To Generate Business Online)? Which Search Engines Should You

More information

Decision Support, Data Warehousing, and OLAP

Decision Support, Data Warehousing, and OLAP Decision Support, Data Warehousing, and OLAP : Contents Terminology : OLAP vs. OLTP Data Warehousing Architecture Technologies References 1 Decision Support and OLAP Information technology to help knowledge

More information

Data Mining. Asso. Profe. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS (1)

Data Mining. Asso. Profe. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS (1) Data Mining Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of CS 2016 2017 (1) Points to Cover Problem: Heterogeneous Information Sources

More information

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Informatica Enterprise Information Catalog

Informatica Enterprise Information Catalog Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with

More information

Social-Aware Routing in Delay Tolerant Networks

Social-Aware Routing in Delay Tolerant Networks Social-Aware Routing in Delay Tolerant Networks Jie Wu Dept. of Computer and Info. Sciences Temple University Challenged Networks Assumptions in the TCP/IP model are violated DTNs Delay-Tolerant Networks

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training

More information

MarkLogic 8 Overview of Key Features COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

MarkLogic 8 Overview of Key Features COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic 8 Overview of Key Features Enterprise NoSQL Database Platform Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a documentcentric, schemaagnostic database Search and

More information

Data Warehousing & Mining Techniques

Data Warehousing & Mining Techniques Data Warehousing & Mining Techniques Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 2. Summary Last week: What is a Data

More information

SOLUTION OVERVIEW: DATA CATALOGS FOR DATA RATIONALIZATION

SOLUTION OVERVIEW: DATA CATALOGS FOR DATA RATIONALIZATION SOLUTION OVERVIEW: DATA CATALOGS FOR DATA RATIONALIZATION Introduction How big of a problem is data redundancy? If you are like most companies, it is much bigger than you would care to admit. For most

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

What is Data Warehouse like

What is Data Warehouse like What is Data Warehouse like in the Big Data Era? Sales (Asia) Data Warehouse Sales (US) ETL ETL Collects and organizes historical data from multiple sources Inventory Advertising ETL ETL So far Ø Star

More information

Multi-event IDS Categories. Introduction to Misuse Intrusion Detection Systems (IDS) Formal Specification of Intrusion Signatures and Detection Rules

Multi-event IDS Categories. Introduction to Misuse Intrusion Detection Systems (IDS) Formal Specification of Intrusion Signatures and Detection Rules Formal Specification of Intrusion Signatures and Detection Rules By Jean-Philippe Pouzol and Mireille Ducassé 15 th IEEE Computer Security Foundations Workshop 2002 Presented by Brian Kellogg CSE914: Formal

More information

Software Reuse and Component-Based Software Engineering

Software Reuse and Component-Based Software Engineering Software Reuse and Component-Based Software Engineering Minsoo Ryu Hanyang University msryu@hanyang.ac.kr Contents Software Reuse Components CBSE (Component-Based Software Engineering) Domain Engineering

More information

2. Summary. 2.1 Basic Architecture. 2. Architecture. 2.1 Staging Area. 2.1 Operational Data Store. Last week: Architecture and Data model

2. Summary. 2.1 Basic Architecture. 2. Architecture. 2.1 Staging Area. 2.1 Operational Data Store. Last week: Architecture and Data model 2. Summary Data Warehousing & Mining Techniques Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: What is a Data

More information

Categorical databases

Categorical databases Categorical databases David I. Spivak dspivak@math.mit.edu Mathematics Department Massachusetts Institute of Technology Presented on 2017/03/29 at the Broad Institute David I. Spivak (MIT) Categorical

More information

CSSE 490 Model-Based Software Engineering: Transformation Systems

CSSE 490 Model-Based Software Engineering: Transformation Systems CSSE 490 Model-Based Software Engineering: Transformation Systems Shawn Bohner Office: Moench Room F212 Phone: (812) 877-8685 Email: bohner@rose-hulman.edu Plan for Today FacePamphlet Demo and Discussion

More information

Requirements and Design Overview

Requirements and Design Overview Requirements and Design Overview Robert B. France Colorado State University Robert B. France O-1 Why do we model? Enhance understanding and communication Provide structure for problem solving Furnish abstractions

More information

The Six Principles of BW Data Validation

The Six Principles of BW Data Validation The Problem The Six Principles of BW Data Validation Users do not trust the data in your BW system. The Cause By their nature, data warehouses store large volumes of data. For analytical purposes, the

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses Designing Data Warehouses To begin a data warehouse project, need to find answers for questions such as: Data Warehousing Design Which user requirements are most important and which data should be considered

More information

Introduction to Data Science

Introduction to Data Science UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics

More information

Data Governance Data Usage Labeling and Enforcement in Adobe Experience Platform

Data Governance Data Usage Labeling and Enforcement in Adobe Experience Platform Contents What is data governance? Why data governance? Data governance roles The Adobe Experience Platform advantage A framework for data governance Data usage patterns Data governance in action Conclusion

More information

Best ETL Design Practices. Helpful coding insights in SAS DI studio. Techniques and implementation using the Key transformations in SAS DI studio.

Best ETL Design Practices. Helpful coding insights in SAS DI studio. Techniques and implementation using the Key transformations in SAS DI studio. SESUG Paper SD-185-2017 Guide to ETL Best Practices in SAS Data Integration Studio Sai S Potluri, Synectics for Management Decisions; Ananth Numburi, Synectics for Management Decisions; ABSTRACT This Paper

More information

The AXML Artifact Model

The AXML Artifact Model 1 The AXML Artifact Model Serge Abiteboul INRIA Saclay & ENS Cachan & U. Paris Sud [Time09] Context: Data management in P2P systems 2 Large number of distributed peers Peers are autonomous Large volume

More information

Enabling Secure Hadoop Environments

Enabling Secure Hadoop Environments Enabling Secure Hadoop Environments Fred Koopmans Sr. Director of Product Management 1 The future of government is data management What s your strategy? 2 Cloudera s Enterprise Data Hub makes it possible

More information

Master's Thesis Defense

Master's Thesis Defense Master's Thesis Defense Development of a Data Management Architecture for the Support of Collaborative Computational Biology Lance Feagan 1 Acknowledgements Ed Komp My mentor. Terry Clark My advisor. Victor

More information

Implementing Trusted Digital Repositories

Implementing Trusted Digital Repositories Implementing Trusted Digital Repositories Reagan W. Moore, Arcot Rajasekar, Richard Marciano San Diego Supercomputer Center 9500 Gilman Drive, La Jolla, CA 92093-0505 {moore, sekar, marciano}@sdsc.edu

More information

Toward Models for Forensic Analysis

Toward Models for Forensic Analysis Toward Models for Forensic Analysis Sean Peisert (UC San Diego) Matt Bishop (UC Davis) Sid Karin (UC San Diego) Keith Marzullo (UC San Diego) SADFE 07 April 11, 2007 1 What is Forensic Analysis? Forensic

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Analytics Fundamentals by Mark Peco

Analytics Fundamentals by Mark Peco Analytics Fundamentals by Mark Peco All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their respective

More information

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA. Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

Previous Webinar. Access to the video and slides:

Previous Webinar. Access to the video and slides: Previous Webinar Access to the video and slides: http://www.prometheusresearch.com/webinar-bridging-the-clinic-and-research-divide/ Mission: Empower teams to securely harness complex, sensitive data in

More information

Tackling network heterogeneity head-on

Tackling network heterogeneity head-on Tackling network heterogeneity head-on Timothy Roscoe Networks and Operating Systems Group ETH Zürich ETH Zürich Scene setting Different dimension of MAD networks: Independent evolution Arbitrary policies

More information

Top 20 Data Quality Solutions for Data Science

Top 20 Data Quality Solutions for Data Science Top 20 Data Quality Solutions for Data Science Data Science & Business Analytics Meetup Boulder, CO 2014-12-03 Ken Farmer DQ Problems for Data Science Loom Large & Frequently 4000000 Strikingly visible

More information

Enterprise Architect. User Guide Series. Time Aware Models. Author: Sparx Systems. Date: 30/06/2017. Version: 1.0 CREATED WITH

Enterprise Architect. User Guide Series. Time Aware Models. Author: Sparx Systems. Date: 30/06/2017. Version: 1.0 CREATED WITH Enterprise Architect User Guide Series Time Aware Models Author: Sparx Systems Date: 30/06/2017 Version: 1.0 CREATED WITH Table of Contents Time Aware Models 3 Clone Structure as New Version 5 Clone Diagram

More information

Minsoo Ryu. College of Information and Communications Hanyang University.

Minsoo Ryu. College of Information and Communications Hanyang University. Software Reuse and Component-Based Software Engineering Minsoo Ryu College of Information and Communications Hanyang University msryu@hanyang.ac.kr Software Reuse Contents Components CBSE (Component-Based

More information

Computer-based Tracking Protocols: Improving Communication between Databases

Computer-based Tracking Protocols: Improving Communication between Databases Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability

More information

Data Warehousing 3. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Data Warehousing 3. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa ICS 421 Spring 2010 Data Warehousing 3 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/1/2010 Lipyeow Lim -- University of Hawaii at Manoa 1 Implementation

More information

The future of Subsurface Data Management? Building a Data Science Lab Data Lake Jane McConnell, Practice Partner Oil and Gas, Teradata DEJ KL, 3

The future of Subsurface Data Management? Building a Data Science Lab Data Lake Jane McConnell, Practice Partner Oil and Gas, Teradata DEJ KL, 3 The future of Subsurface Data Management? Building a Data Science Lab Data Lake Jane McConnell, Practice Partner Oil and Gas, Teradata DEJ KL, 3 October 2017 Analytics and AI is gaining ground in Subsurface

More information

SQL Server Evolution. SQL 2016 new innovations. Trond Brande

SQL Server Evolution. SQL 2016 new innovations. Trond Brande SQL Server Evolution SQL 2016 new innovations Trond Brande SQL Server 2016 Editions Enterprise Express SMALL-SCALE DATABASES Development and management tools Easy backup and restore to Microsoft Azure

More information

Updates through Views

Updates through Views 1 of 6 15 giu 2010 00:16 Encyclopedia of Database Systems Springer Science+Business Media, LLC 2009 10.1007/978-0-387-39940-9_847 LING LIU and M. TAMER ÖZSU Updates through Views Yannis Velegrakis 1 (1)

More information

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services What s New at AWS? looking at just a few new things for Enterprise Philipp Behre, Enterprise Solutions Architect, Amazon Web Services 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More information

Outline. Managing Information Resources. Concepts and Definitions. Introduction. Chapter 7

Outline. Managing Information Resources. Concepts and Definitions. Introduction. Chapter 7 Outline Managing Information Resources Chapter 7 Introduction Managing Data The Three-Level Database Model Four Data Models Getting Corporate Data into Shape Managing Information Four Types of Information

More information