PANDA A System for Provenance and Data. Example: Sales Prediction Workflow. Example: Sales Prediction Workflow. Backward Tracing. Item Sales.
|
|
- Victoria Stone
- 6 years ago
- Views:
Transcription
1 PANDA A System for Provenance and Data Example: Prediction Workflow Union Predict Agg -1 s 2 Example: Prediction Workflow Backward Tracing Union Predict Agg -1 s Name Amelie Jacques Isabelle Name Address ProbDemand Paris, Amelie Cowboy Hat Hat.98 high Paris, Texas Jacques Cowboy Hat.98 Paris, Texas? Isabelle Cowboy Hat.98 Texas?? 3
2 Example: Prediction Workflow Backward Tracing Union Predict Agg -1 s Name Amelie Jacques Isabelle Address Name 65, quai d'orsay, Paris Amelie 39, rue de Bretagne, Jacques Paris 20, rue d Orsel, Paris Isabelle? Address Paris, Paris, Texas Paris, Texas Texas 4 Example: Prediction Workflow... Forward Propagation Backward Tracing Dedup Split Union Predict Agg -1 s Name Amelie Jacques Isabelle Address Name Address 65, quai d'orsay, Amelie Paris Paris, France 39, rue de Bretagne, Jacques Paris Paris, France 20, rue d Orsel, Isabelle Paris Paris, France Beret Demand high 5 Provenance (very general) Where data came from How it was derived, manipulated, combined, processed, How it has evolved over time 6
3 Uses for Provenance (very general) Explanation Sources and evolution of data; deeper understanding Verification Buggy or stale source data? Buggy processing? Auditing Recomputation Propagate changes to affected downstream data 7 Some Application Domains prediction workflows Scientific-data workflows Including human-curated data Including evolving versions of data Any analytic pipeline Extract-transform-load (ETL) processes Information-extraction pipelines 8 Third Time s a Charm Isn t provenance the same thing as lineage? Haven t you worked on it before? Yes 1. Data Warehousing project (long ago) Lineage of relational views in warehouse: formal foundations, system/caching issues Lineage in ETL pipelines: foundations & algorithms 2. Trio project (recently) Data + Uncertainty + Lineage Lineage primarily in support of uncertainty Pretty much 9
4 Panda s Ambitions Previous provenance work tends to be Panda will Either data-based or process-based Capture both: data-oriented workflows Either fine-grained or coarse-grained Cover the spectrum in a unified fashion Focused on modeling and capturing provenance Also support provenance operators and queries Geared to specific functions or domains End with a general-purpose open-source system 10 Remainder of Talk Processing nodes and provenance capture Exploiting provenance Built-in operations Ad-hoc queries Other uses Current research 11 Processing Nodes... Dedup Split Union Predict Agg -1 s Relational Structured, well-understood operations Opaque (or semi-opaque) Incomplete knowledge of internals 12
5 Provenance Capture Model Ultimate model yet to be determined From bipartite graph to Open Provenance Model Goals: Support provenance at spectrum of granularities Mesh data-oriented and process-oriented provenance Composability/transitivity Understandability For remainder of talk: think bipartite graph 13 Provenance Capture Interface... Dedup Split Union Predict Agg -1 s Processing nodes provide provenance information along with output Eager or Lazy 14 Provenance Capture Interface... Dedup Split Union Predict Agg -1 Straightforward provenance s Relational, other Worst well-understood Case: transformations Automatic No access previous to fine-grained work provenance Opaque (or semi-opaque) Provided via provenance interface, or inferred 15
6 Provenance Operations Basic Backward tracing Where did the Cowboy Hat record come from? Forward tracing Which sales predictions did Amelie contribute to? Union Predict Agg -1 s 16 Additional Functionality Forward propagation Update all affected predictions after customers move from Texas to France Union Predict Agg -1 s 17 Additional Functionality Refresh Get latest prediction for Cowboy Hat sales (only) based on modified buying patterns Backward tracing + Forward propagation Union Predict Agg -1 s 18
7 Provenance Queries How many people from each country contributed to the Cowboy Hat prediction? Which customer list contributed the most to the top 100 predicted items? Union Predict Agg -1 s 19 Provenance Queries For a specific customer list, which items have higher demand than for the entire customer set? Which customers have more duplication those processed by or by? Union Predict Agg -1 s 20 Provenance Queries For a specific customer list, which items have higher demand than for the entire customer set? Which customers have more duplication those processed by or by? Query language goals Declarative ad-hoc queries à la database systems Seamlessly combine provenance and data Amenable to optimization Many interesting optimization issues 21
8 Current Research System Workflow infrastructure Basic provenance capture & operations Algorithms For refresh problem (also foundations & implementation) Theory Decision problems: eager/lazy, intermediate data Provenance foundations for semi-opaque transformations 22 Current Research System Workflow infrastructure Basic provenance capture & operations Algorithms For refresh problem See recent (also foundations & implementation) paper Theory Decision problems: eager/lazy, intermediate data Provenance foundations for semi-opaque transformations 23 Panda System (version 0.1) Command-line Client Data s Workflow Provenance Predicate s Panda Layer ite Forward Filter s Graphical Interface File System Arbitrary acyclic workflows processing nodes automatic capture processing nodes one-one, one-many Backward-tracing and forwardpropagation 24
9 Future Work The large majority of this talk Stay tuned 25 Backup slides Optimization Issues Query-driven provenance capture Eager vs. lazy provenance capture Space-time & query-update tradeoffs Processing-node dependent Ex: Dedup, Agg Retain intermediate data sets? Fine-grained vs. coarse-grained Approximate provenance 27
10 Panda System (version 0.1) Command-line Client Graphical Interface Panda Layer Data s Workflow Provenance Predicate s ite Forward Filter s File System 28 System: Key Features Arbitrary acyclic workflows processing nodes Automatic provenance capture processing nodes One-one and one-many transformations only Backward-tracing and forward-propagation 29 Panda System (version 0.1) Command-line Client Graphical Interface Create Data s Workflow Provenance Predicate s Panda Layer ite Forward Filter s File System 30
11 Panda System (version 0.1) Command-line Client Graphical Interface Create Transformatio n Data s Workflow Provenance Predicate s Panda Layer ite Forward Filter s File System 31 Panda System (version 0.1) Command-line Client Graphical Interface Create Transformatio n Data s Workflow Provenance Predicate s Panda Layer ite Forward Filter s File System 32 Panda System (version 0.1) Command-line Client Graphical Interface Refresh Output Element Panda Layer ite Data s Workflow Provenance Predicate s Forward Filter s File System 33
12 Provenance-Based Refresh Problem Exploit provenance to efficiently compute the up-to-date value of selected output elements Challenges: See recent Formally defining provenance paper & refresh When is new value up-to-date version of old one? Provenance as key When can selective refresh be done correctly and efficiently? Properties of transformations, workflows, provenance Algorithms for wide class of transformations and workflows 34
PANDA A System for Provenance and Data. Jennifer Widom Stanford University
PANDA A System for Provenance and Data Stanford University Example: Sales Prediction Workflow CustList 1 Europe CustList 2... Dedup Split Union Predict ItemAgg CustList n-1 CustList n USA Catalog Items
More informationRobert Ikeda Jennifer Widom. Stanford University
Jennifer Widom Stanford University Example Dedup Union Predict ItemAgg ItemVolumes 1 Pipeline for sales predictions 2 Example Dedup Union Predict ItemAgg ItemVolumes 1 3 Example Dedup Union Predict ItemAgg
More informationUnified Governance for Amazon S3 Data Lakes
WHITEPAPER Unified Governance for Amazon S3 Data Lakes Core Capabilities and Best Practices for Effective Governance Introduction Data governance ensures data quality exists throughout the complete lifecycle
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark Announcements HW2 due this Thursday AWS accounts Any success? Feel
More informationHandout 12 Data Warehousing and Analytics.
Handout 12 CS-605 Spring 17 Page 1 of 6 Handout 12 Data Warehousing and Analytics. Operational (aka transactional) system a system that is used to run a business in real time, based on current data; also
More informationBest practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP
Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can
More information1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda
Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:
More informationAgile Data Management Challenges in Enterprise Big Data Landscape
Agile Data Management Challenges in Enterprise Big Data Landscape Eric Simon, SAP Big Data October, 2017 1 Evolution Towards Enterprise Big Data Landscape administrator Data analyst Athena Redshift #123
More informationPerm Integrating Data Provenance Support in Database Systems
Perm Integrating Data Provenance Support in Database Systems Boris Glavic Database Technology Group Department of Informatics University of Zurich glavic@ifi.uzh.ch Gustavo Alonso Systems Group Department
More informationDatameer Big Data Governance. Bringing open-architected and forward-compatible governance controls to Hadoop analytics
Datameer Big Data Governance Bringing open-architected and forward-compatible governance controls to Hadoop analytics As big data moves toward greater mainstream adoption, its compliance with long-standing
More informationSAMPLE. Preface xi 1 Introducting Microsoft Analysis Services 1
contents Preface xi 1 Introducting Microsoft Analysis Services 1 1.1 What is Analysis Services 2005? 1 Introducing OLAP 2 Introducing Data Mining 4 Overview of SSAS 5 SSAS and Microsoft Business Intelligence
More informationCOMPARISON WHITEPAPER. Snowplow Insights VS SaaS load-your-data warehouse providers. We do data collection right.
COMPARISON WHITEPAPER Snowplow Insights VS SaaS load-your-data warehouse providers We do data collection right. Background We were the first company to launch a platform that enabled companies to track
More informationSecuring Mediated Trace Access Using Black-box Permutation Analysis
Securing Mediated Trace Access Using Black-box Permutation Analysis Prateek Mittal (UIUC), Vern Paxson (UCB/ICSI), Robin Sommer (ICSI/LBNL), Mark Winterrowd(UCB) 1 Thirst for Data Need real world network
More informationDatameer for Data Preparation:
Datameer for Data Preparation: Explore, Profile, Blend, Cleanse, Enrich, Share, Operationalize DATAMEER FOR DATA PREPARATION: EXPLORE, PROFILE, BLEND, CLEANSE, ENRICH, SHARE, OPERATIONALIZE Datameer Datameer
More informationJennifer Widom. Stanford University
Principled Research in Database Systems Stanford University What Academics Give Talks About Other people s papers Thesis and new results Significant research projects The research field BIG VISION Other
More informationThe Seven Steps to Implement DataOps
The Seven Steps to Implement Ops ABSTRACT analytics teams challenged by inflexibility and poor quality have found that Ops can address these and many other obstacles. Ops includes tools and process improvements
More informationThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin,
More informationBRMS: Best (and worst) Practices and Real World Examples
BRMS: Best (and worst) Practices and Real World Examples Edson Tirelli Senior Software Engineer JBoss, by Red Hat 1 It all starts with a question What is the performance of ? 2 Followed
More informationModernizing Business Intelligence and Analytics
Modernizing Business Intelligence and Analytics Justin Erickson Senior Director, Product Management 1 Agenda What benefits can I achieve from modernizing my analytic DB? When and how do I migrate from
More informationReplication in Distributed Systems
Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over
More informationData Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa
Data Warehousing Data Warehousing and Mining Lecture 8 by Hossen Asiful Mustafa Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information,
More informationRepresentation Formalisms for Uncertain Data
Representation Formalisms for Uncertain Data Jennifer Widom with Anish Das Sarma Omar Benjelloun Alon Halevy and other participants in the Trio Project !! Warning!! This work is preliminary and in flux
More informationLogical Provenance in Data-Oriented Workflows
Logical Provenance in Data-Oriented Workflows Robert Ikeda 1, Akash Das Sarma 2, Jennifer Widom 3 Stanford University, USA 1 rmikeda@cs.stanford.edu 2 akashds.iitk@gmail.com 3 widom@cs.stanford.edu Abstract
More informationMCSA SQL SERVER 2012
MCSA SQL SERVER 2012 1. Course 10774A: Querying Microsoft SQL Server 2012 Course Outline Module 1: Introduction to Microsoft SQL Server 2012 Introducing Microsoft SQL Server 2012 Getting Started with SQL
More informationOntology based Model and Procedure Creation for Topic Analysis in Chinese Language
Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,
More informationThis tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.
About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This
More informationProvenance-aware Secure Networks
Provenance-aware Secure Networks Wenchao Zhou Eric Cronin Boon Thau Loo University of Pennsylvania Motivation Network accountability Real-time monitoring and anomaly detection Identifying and tracing malicious
More informationLogi Info v12.5 WHAT S NEW
Logi Info v12.5 WHAT S NEW Introduction Logi empowers companies to embed analytics into the fabric of their organizations and products enabling anyone to analyze data, share insights, and make informed
More informationMICROSOFT BUSINESS INTELLIGENCE
SSIS MICROSOFT BUSINESS INTELLIGENCE 1) Introduction to Integration Services Defining sql server integration services Exploring the need for migrating diverse Data the role of business intelligence (bi)
More informationData Mining & Data Warehouse
Data Mining & Data Warehouse Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Information Technology 2016 2017 (1) Points to Cover Problem:
More informationAn InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager
An InterSystems Guide to the Data Galaxy Benjamin De Boe Product Manager Analytics 3 InterSystems Corporation. All rights reserved. 4 InterSystems Corporation. All rights reserved. 5 InterSystems Corporation.
More informationParallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville
Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information
More informationChapter 3: Google Penguin, Panda, & Hummingbird
Chapter 3: Google Penguin, Panda, & Hummingbird Search engine algorithms are based on a simple premise: searchers want an answer to their queries. For any search, there are hundreds or thousands of sites
More informationA Survey on Database Systems Handling Computable and Real-World Dependencies
A Survey on Database Systems Handling Computable and Real-World Dependencies Beena J Stuvert 1, Preeja V 2 P.G. Student, Department of CSE, SCTCE, Trivandrum, Kerala, India 1 Asst. Professor, Department
More information360SCIENCE. use case. Customer Data Matching for Business Intelligence and Finance M&A
360SCIENCE use case Customer Data Matching for Business Intelligence and Finance M&A Amir Javidan, SVP Operations PayPal - TIO Networks Overview TIO Networks (a PayPal company) is a financial services
More informationThe Hadoop Paradigm & the Need for Dataset Management
The Hadoop Paradigm & the Need for Dataset Management 1. Hadoop Adoption Hadoop is being adopted rapidly by many different types of enterprises and government entities and it is an extraordinarily complex
More informationDeccansoft Software Services. SSIS Syllabus
Overview: SQL Server Integration Services (SSIS) is a component of Microsoft SQL Server database software which can be used to perform a broad range of data migration, data integration and Data Consolidation
More informationChapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationFull file at
Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits
More informationFOUR INDEPENDENT TOOLS TO MANAGE COMPLEXITY INHERENT TO DEVELOPING STATE OF THE ART SYSTEMS. DEVELOPER SPECIFIER TESTER
TELECOM AVIONIC SPACE AUTOMOTIVE SEMICONDUCTOR IOT MEDICAL SPECIFIER DEVELOPER FOUR INDEPENDENT TOOLS TO MANAGE COMPLEXITY INHERENT TO DEVELOPING STATE OF THE ART SYSTEMS. TESTER PragmaDev Studio is a
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationData Warehouse Testing. By: Rakesh Kumar Sharma
Data Warehouse Testing By: Rakesh Kumar Sharma Index...2 Introduction...3 About Data Warehouse...3 Data Warehouse definition...3 Testing Process for Data warehouse:...3 Requirements Testing :...3 Unit
More informationData Governance: Data Usage Labeling and Enforcement in Adobe Cloud Platform
Data Governance: Data Usage Labeling and Enforcement in Adobe Cloud Platform Contents What is data governance? Why data governance? Data governance roles. The Adobe Cloud Platform advantage. A framework
More informationData Immersion : Providing Integrated Data to Infinity Scientists. Kevin Gilpin Principal Engineer Infinity Pharmaceuticals October 19, 2004
Data Immersion : Providing Integrated Data to Infinity Scientists Kevin Gilpin Principal Engineer Infinity Pharmaceuticals October 19, 2004 Informatics at Infinity Understand the nature of the science
More informationSQL Maestro and the ELT Paradigm Shift
SQL Maestro and the ELT Paradigm Shift Abstract ELT extract, load, and transform is replacing ETL (extract, transform, load) as the usual method of populating data warehouses. Modern data warehouse appliances
More informationImplementing a Data Warehouse with Microsoft SQL Server 2014
Course 20463D: Implementing a Data Warehouse with Microsoft SQL Server 2014 Page 1 of 5 Implementing a Data Warehouse with Microsoft SQL Server 2014 Course 20463D: 4 days; Instructor-Led Introduction This
More informationAggregating Knowledge in a Data Warehouse and Multidimensional Analysis
Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com Objectives Explain the basics of: 1. Data
More informationDC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.
DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting April 14, 2009 Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie,
More informationCS639: Data Management for Data Science. Lecture 1: Intro to Data Science and Course Overview. Theodoros Rekatsinas
CS639: Data Management for Data Science Lecture 1: Intro to Data Science and Course Overview Theodoros Rekatsinas 1 2 Big science is data driven. 3 Increasingly many companies see themselves as data driven.
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationCSE 344 Final Review. August 16 th
CSE 344 Final Review August 16 th Final In class on Friday One sheet of notes, front and back cost formulas also provided Practice exam on web site Good luck! Primary Topics Parallel DBs parallel join
More informationEducating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Jian Qin School of Information Studies Syracuse University Microsoft escience Workshop, Chicago, October 9, 2012 Talk points Data
More informationContractors Guide to Search Engine Optimization
Contractors Guide to Search Engine Optimization CONTENTS What is Search Engine Optimization (SEO)? Why Do Businesses Need SEO (If They Want To Generate Business Online)? Which Search Engines Should You
More informationDecision Support, Data Warehousing, and OLAP
Decision Support, Data Warehousing, and OLAP : Contents Terminology : OLAP vs. OLTP Data Warehousing Architecture Technologies References 1 Decision Support and OLAP Information technology to help knowledge
More informationData Mining. Asso. Profe. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS (1)
Data Mining Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of CS 2016 2017 (1) Points to Cover Problem: Heterogeneous Information Sources
More informationData Warehouses Chapter 12. Class 10: Data Warehouses 1
Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationInformatica Enterprise Information Catalog
Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with
More informationSocial-Aware Routing in Delay Tolerant Networks
Social-Aware Routing in Delay Tolerant Networks Jie Wu Dept. of Computer and Info. Sciences Temple University Challenged Networks Assumptions in the TCP/IP model are violated DTNs Delay-Tolerant Networks
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training
More informationMarkLogic 8 Overview of Key Features COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic 8 Overview of Key Features Enterprise NoSQL Database Platform Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a documentcentric, schemaagnostic database Search and
More informationData Warehousing & Mining Techniques
Data Warehousing & Mining Techniques Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 2. Summary Last week: What is a Data
More informationSOLUTION OVERVIEW: DATA CATALOGS FOR DATA RATIONALIZATION
SOLUTION OVERVIEW: DATA CATALOGS FOR DATA RATIONALIZATION Introduction How big of a problem is data redundancy? If you are like most companies, it is much bigger than you would care to admit. For most
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationWhat is Data Warehouse like
What is Data Warehouse like in the Big Data Era? Sales (Asia) Data Warehouse Sales (US) ETL ETL Collects and organizes historical data from multiple sources Inventory Advertising ETL ETL So far Ø Star
More informationMulti-event IDS Categories. Introduction to Misuse Intrusion Detection Systems (IDS) Formal Specification of Intrusion Signatures and Detection Rules
Formal Specification of Intrusion Signatures and Detection Rules By Jean-Philippe Pouzol and Mireille Ducassé 15 th IEEE Computer Security Foundations Workshop 2002 Presented by Brian Kellogg CSE914: Formal
More informationSoftware Reuse and Component-Based Software Engineering
Software Reuse and Component-Based Software Engineering Minsoo Ryu Hanyang University msryu@hanyang.ac.kr Contents Software Reuse Components CBSE (Component-Based Software Engineering) Domain Engineering
More information2. Summary. 2.1 Basic Architecture. 2. Architecture. 2.1 Staging Area. 2.1 Operational Data Store. Last week: Architecture and Data model
2. Summary Data Warehousing & Mining Techniques Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: What is a Data
More informationCategorical databases
Categorical databases David I. Spivak dspivak@math.mit.edu Mathematics Department Massachusetts Institute of Technology Presented on 2017/03/29 at the Broad Institute David I. Spivak (MIT) Categorical
More informationCSSE 490 Model-Based Software Engineering: Transformation Systems
CSSE 490 Model-Based Software Engineering: Transformation Systems Shawn Bohner Office: Moench Room F212 Phone: (812) 877-8685 Email: bohner@rose-hulman.edu Plan for Today FacePamphlet Demo and Discussion
More informationRequirements and Design Overview
Requirements and Design Overview Robert B. France Colorado State University Robert B. France O-1 Why do we model? Enhance understanding and communication Provide structure for problem solving Furnish abstractions
More informationThe Six Principles of BW Data Validation
The Problem The Six Principles of BW Data Validation Users do not trust the data in your BW system. The Cause By their nature, data warehouses store large volumes of data. For analytical purposes, the
More informationData Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems
Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check
More informationDesigning Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses
Designing Data Warehouses To begin a data warehouse project, need to find answers for questions such as: Data Warehousing Design Which user requirements are most important and which data should be considered
More informationIntroduction to Data Science
UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics
More informationData Governance Data Usage Labeling and Enforcement in Adobe Experience Platform
Contents What is data governance? Why data governance? Data governance roles The Adobe Experience Platform advantage A framework for data governance Data usage patterns Data governance in action Conclusion
More informationBest ETL Design Practices. Helpful coding insights in SAS DI studio. Techniques and implementation using the Key transformations in SAS DI studio.
SESUG Paper SD-185-2017 Guide to ETL Best Practices in SAS Data Integration Studio Sai S Potluri, Synectics for Management Decisions; Ananth Numburi, Synectics for Management Decisions; ABSTRACT This Paper
More informationThe AXML Artifact Model
1 The AXML Artifact Model Serge Abiteboul INRIA Saclay & ENS Cachan & U. Paris Sud [Time09] Context: Data management in P2P systems 2 Large number of distributed peers Peers are autonomous Large volume
More informationEnabling Secure Hadoop Environments
Enabling Secure Hadoop Environments Fred Koopmans Sr. Director of Product Management 1 The future of government is data management What s your strategy? 2 Cloudera s Enterprise Data Hub makes it possible
More informationMaster's Thesis Defense
Master's Thesis Defense Development of a Data Management Architecture for the Support of Collaborative Computational Biology Lance Feagan 1 Acknowledgements Ed Komp My mentor. Terry Clark My advisor. Victor
More informationImplementing Trusted Digital Repositories
Implementing Trusted Digital Repositories Reagan W. Moore, Arcot Rajasekar, Richard Marciano San Diego Supercomputer Center 9500 Gilman Drive, La Jolla, CA 92093-0505 {moore, sekar, marciano}@sdsc.edu
More informationToward Models for Forensic Analysis
Toward Models for Forensic Analysis Sean Peisert (UC San Diego) Matt Bishop (UC Davis) Sid Karin (UC San Diego) Keith Marzullo (UC San Diego) SADFE 07 April 11, 2007 1 What is Forensic Analysis? Forensic
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationAnalytics Fundamentals by Mark Peco
Analytics Fundamentals by Mark Peco All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their respective
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationFig 1.2: Relationship between DW, ODS and OLTP Systems
1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions
More informationPrevious Webinar. Access to the video and slides:
Previous Webinar Access to the video and slides: http://www.prometheusresearch.com/webinar-bridging-the-clinic-and-research-divide/ Mission: Empower teams to securely harness complex, sensitive data in
More informationTackling network heterogeneity head-on
Tackling network heterogeneity head-on Timothy Roscoe Networks and Operating Systems Group ETH Zürich ETH Zürich Scene setting Different dimension of MAD networks: Independent evolution Arbitrary policies
More informationTop 20 Data Quality Solutions for Data Science
Top 20 Data Quality Solutions for Data Science Data Science & Business Analytics Meetup Boulder, CO 2014-12-03 Ken Farmer DQ Problems for Data Science Loom Large & Frequently 4000000 Strikingly visible
More informationEnterprise Architect. User Guide Series. Time Aware Models. Author: Sparx Systems. Date: 30/06/2017. Version: 1.0 CREATED WITH
Enterprise Architect User Guide Series Time Aware Models Author: Sparx Systems Date: 30/06/2017 Version: 1.0 CREATED WITH Table of Contents Time Aware Models 3 Clone Structure as New Version 5 Clone Diagram
More informationMinsoo Ryu. College of Information and Communications Hanyang University.
Software Reuse and Component-Based Software Engineering Minsoo Ryu College of Information and Communications Hanyang University msryu@hanyang.ac.kr Software Reuse Contents Components CBSE (Component-Based
More informationComputer-based Tracking Protocols: Improving Communication between Databases
Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability
More informationData Warehousing 3. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa
ICS 421 Spring 2010 Data Warehousing 3 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/1/2010 Lipyeow Lim -- University of Hawaii at Manoa 1 Implementation
More informationThe future of Subsurface Data Management? Building a Data Science Lab Data Lake Jane McConnell, Practice Partner Oil and Gas, Teradata DEJ KL, 3
The future of Subsurface Data Management? Building a Data Science Lab Data Lake Jane McConnell, Practice Partner Oil and Gas, Teradata DEJ KL, 3 October 2017 Analytics and AI is gaining ground in Subsurface
More informationSQL Server Evolution. SQL 2016 new innovations. Trond Brande
SQL Server Evolution SQL 2016 new innovations Trond Brande SQL Server 2016 Editions Enterprise Express SMALL-SCALE DATABASES Development and management tools Easy backup and restore to Microsoft Azure
More informationUpdates through Views
1 of 6 15 giu 2010 00:16 Encyclopedia of Database Systems Springer Science+Business Media, LLC 2009 10.1007/978-0-387-39940-9_847 LING LIU and M. TAMER ÖZSU Updates through Views Yannis Velegrakis 1 (1)
More informationWhat s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services
What s New at AWS? looking at just a few new things for Enterprise Philipp Behre, Enterprise Solutions Architect, Amazon Web Services 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More informationOutline. Managing Information Resources. Concepts and Definitions. Introduction. Chapter 7
Outline Managing Information Resources Chapter 7 Introduction Managing Data The Three-Level Database Model Four Data Models Getting Corporate Data into Shape Managing Information Four Types of Information
More information