Detec%ng the Temporal Context of Queries. Oliver Kennedy, Ying Yang, Jan Chomicki, Ronny Fehling, Zhen Hua Liu, and Dieter Gawlick 09/01/2014

Size: px
Start display at page:

Download "Detec%ng the Temporal Context of Queries. Oliver Kennedy, Ying Yang, Jan Chomicki, Ronny Fehling, Zhen Hua Liu, and Dieter Gawlick 09/01/2014"

Transcription

1 Detec%ng the Temporal Context of Queries Oliver Kennedy, Ying Yang, Jan Chomicki, Ronny Fehling, Zhen Hua Liu, and Dieter Gawlick 09/01/2014

2 Outline Mo.va.on Contextual Analysis Prac.cal Temporal Dependency Detec.on Experiment

3 Mo.va.on Contextual Dependency Errors in ETL Name Account type balance Alice saving 10,000 Bob checking Name Account type balance Alice saving 0 Bob checking 500 Since ETL processes mask the source data and the transforma.on, the change in different stage might be misleading for analysts.

4 Mo.va.on Op.mizing Opera.onal Business Intelligence For each item, there is a threshold for quan.ty. We want to add more goods when the quan.ty is below the threshold. e.g. Threshold of quan.ty for P1345 is set to be 400 and for Pt454 is set to be 30. PARTKEY SUPPKEY QUANTITY PRICE P1345 S Pt454 S ,000 q Such thresholds are determined manually and are typically sta%c.

5 Mo.va.on Op.mizing Opera.onal Business Intelligence q Predic%ng the op%mal thresholds based on live sales data has resulted in higher efficiency and profits. q However, predic.ng these thresholds can be very challenging and costly.

6 Mo.va.on Vola.lity Analysis If a column historically has a certain update rate, and suddenly that rate changes outside of the norm, how do we figure out what the norm is? Can we monitor the difference efficiently?

7 Mo.va.on A lot of assump.ons are made on data. (e.g. assump.ons made during extrac.on, transforma.on, and schema and query crea.on etc.) They are so implicit that (1) it is not easy to detect a mistaken one and (2) hard to explain the result. (why the result changes on the same data and query? Or why certain tuple not in the result set?) For example, when we aggregate temperature on a table, we may assume it is the temperature for a period of %me. When doing ETL, we may assume for some column, the value null means 0, but later it may mean others.

8 Mo.va.on What do people do right now? List all the assump%ons. Why are these solu.ons not good enough? To many assump%ons and the database is too large. Why would not all of these problems be solved through good schema design? Good schema can not compress the tuples.

9 Contextual Analysis What are the assump.ons? We are going to call the set of assump.ons context. What is context? Contexts for the assessment and usage of a data source at hand are modeled as collec%ons of external databases, that can be materialized or virtual, and mappings within the collec%ons and with the data source at hand 1. 1.From Data quality is context dependent by Leopoldo Bertossi,Flavio Rizzolo and Lei Jiang.

10 Contextual Analysis What is context? Context is the metadata for Name Account type balance Alice saving 10,000 Bob checking 500 Name Account type balance Alice saving 0 Bob checking 500 the balance is AVERAGE over %me.

11 Contextual Analysis What is context? G ID Date Time Temperature(Context) /08/30 12:00 C:28 F: /08/30 24:00 C:10 F:50 D C ID Date Time Temperature(Context:C) /08/30 12: /08/30 24:00 10 D F ID Date Time Temperature(Context:F) /08/30 12: /08/30 24:00 50

12 Contextual Analysis Time as Context An analyst decides to run a new promo%on for customers with low balances. T 1 - T n Name Account type balance Alice saving 0 T n+1 Bob checking 500 Name Account type balance Alice saving 1000 Bob checking 500 Q: SELECT * FROM D WHERE balance <1000 Name Account type balance Alice s balance remains to be less than 1000 for long, she should be in the resultset. Bob checking 500

13 Contextual Analysis Detec.ng temporal dependencies We have N versions, 1 query. We want to evaluate the query on each of them and summarize the results in a well- understood way. a. Detec%ng temporal dependencies in the en.re result rela.on. b. Detec%ng temporal dependencies in individual agributes. a. Naïve dependencies b. Row matching

14 Contextual Analysis Detec.ng temporal dependencies D Name Account type balance Alice saving 0 Cathy checking 30,000 Q(D) Name Account type balance Context Alice saving 1000 [0,1000] Bob checking 500 [500] Cathy checking 2,000 [2000, 30000] Row changed Cell changed No change

15 Contextual Analysis Detec.ng temporal dependencies c. Quan%fying the effects of temporal dependencies T 1 - T n T n+1 Name Account type balance Alice saving 0 Bob checking 500 Name Account type balance Alice saving 1000 Bob checking 500 Q: SELECT * FROM D WHERE balance <1000 Name Account type balance Alice saving [0,1000] Bob checking 500

16 Prac.cal Temporal Dependency Detec.on 1. Monte- Carlo Approximate Instead of evalua%ng the query at every version, the query is only evaluated on a fixed set of sample versions S. The %me complexity of detec%ng temporal dependences in Q S scales linearly in S rather than in N, at the cost of result accuracy.

17 Prac.cal Temporal Dependency 2. Dynamic Analysis Detec.on R(a) and S(b) join by column B, the dynamic analysis of the join result is shown in (c).

18 Prac.cal Temporal Dependency 2. Dynamic Analysis Detec.on Name Account Type Balance ROWID XID Alice saving 100 r1 v1 Bob checking 200 r2 v1 Carol saving 5,000 r3 v1 UPDATE Acct SET Balance = Balance WHERE Name = 'Alice'; INSERT INTO Acct VALUES ('Carol', , saving,55); COMMIT; - - Creates v2 DELETE FROM Acct WHERE Name = 'Bob'; COMMIT; - - Creates v3 Name Account Type Balance ROWID XID Alice saving 10,000 r1 v2 Carol saving 5,000 r3 v1 Carol saving 55 r4 v2

19 Prac.cal Temporal Dependency Detec.on 2. Dynamic Analysis An interval encoding of the history of the rela.on Acct. Acct: Name Account Type Balance ROWID start end Alice saving 100 r1 - v2 Alice Saving 10,000 r1 v2 Bob checking 200 r2 - Carol saving 5,000 r3 - Carol saving 55 r4 v2

20 Prac.cal Temporal Dependency 2. Dynamic Analysis The result of query Q on Acct. Detec.on Original Query: SELECT Name, SUM(Balance) FROM ACCT GROUP BY Name Query rewrite Q(Acct): Name Balance start end Alice v2 Alice 10,000 v2 Bob v2 Carol 5,000 - v2 Carol 5,055 v2

21 Prac.cal Temporal Dependency Detec.on 2. Dynamic Analysis The impact assessment process result. Name Account Type Balance Confidence Alice saving [100,10000] 1 Bob checking [200,200] 0.75 Carol saving [5000,5055] 1

22 Prac.cal Temporal Dependency Detec.on 3. Analysis Composi.on D: Name Balance Time Q: SELECT * Alice 90 T1 Alice 120 T2 Alice 85 T5 Alice 85 T6 Precomputa%on: R: Name CONF Balancemin Balancemax Alice 2/ FROM D WHERE Balance >=120

23 Prac.cal Temporal Dependency 3. Analysis Composi.on R Query: SPJ Query rewri%ng Detec.on Name CONF Balancemin Balancemax Alice 2/ SELECT name,cmp_conf(balance >= 120)*R.CONF AS CONF,Balancemin,Balancemax FROM R WHERE Balance >= 120 Name CONF Balancemin Balancemax Alice (1/(120-85)*2/3)

24 Experiments Database: with temporal features ac%vated. Data: TPC- H 1GB with about 100,000 events from the synthe%c workload applied. These events simulate the day to day opera%on of a company. Queries: Qeury Class Output Size Raw Time TPCH1 SPA 4 rows 3s TPCH3 SPJA Many 1.5s TPCH9 SPJA Many 3.5s TPCH20(with changes) SPJ Many 5s

25 Experiments Query 1 Execution Time (s) Naive MonteCarlo-5 MonteCarlo-10 Dynamic Composable Number of Versions Considered Approximate algorithms outperform the exact ones. Dynamic analysis remains a constant factor (approximately 3-4 %mes) faster than the Naive strategy. Both approximate strategies show an expected near constant- %me performance.

26 Experiments Accuracy of the approximate algorithms on TPC- H Queries 1 Relative Error e-05 1e-06 CONF SUM_QTY SUM_BASE_PRICE SUM_DISC_PRICE SUM_CHARGE AVG_QTY AVG_PRICE AVG_DISC COUNT_ORDER Number of Versions Considered Relative Error e-05 1e-06 1e-07 CONF SUM_QTY SUM_BASE_PRICE SUM_DISC_PRICE SUM_CHARGE AVG_QTY AVG_PRICE AVG_DISC COUNT_ORDER 1e Number of Versions Considered 1e Composable Analysis 5- SampleMC 10- Sample MC Relative Error e-05 1e-06 1e-07 CONF SUM_QTY SUM_BASE_PRICE SUM_DISC_PRICE SUM_CHARGE AVG_QTY AVG_PRICE AVG_DISC COUNT_ORDER Number of Versions Considered Composable analysis is quite accurate, genera%ng es%mates to within four nines for most aggregate values. However, it generates extremely pessimis%c es%mates of confidence. This results in final confidence es%mates with as much as 50% error.

27 Experiments Query 3 Execution Time (s) Naive MonteCarlo-5 MonteCarlo-10 Dynamic Composable Number of Versions Considered Approximate methods show linear scaling. Dynamic analysis also shows linear scaling. due to the hierarchical nature of the query.

28 Experiments Query 9 Execution Time (s) Naive MonteCarlo-5 MonteCarlo-10 Dynamic Composable Number of Versions Considered Dynamic analysis is a constant factor faster than Naive analysis. There is a slight linear increase in the cost of the Monte Carlo approach as the number of versions considered approaches 10,000. and we suspect that this slight increase is due to the join width.

29 Experiments Accuracy of the approximate algorithms on TPC- H Queries 9 Relative Error e-05 1e-06 CONF SUM_PROFIT Number of Versions Considered Relative Error e-05 CONF SUM_PROFIT 1e Number of Versions Considered 1e-05 CONF SUM_PROFIT 1e Composable Analysis 5- Sample MC 10- Sample MC Relative Error Number of Versions Considered Accuracy increases as more versions are considered: this is an ar%fact of the actual confidence ge{ng closer to the pessimis%c bound. The monte- carlo methods fared signicantly be er at es%ma%ng the aggregate values computa%ons. However, they did not iden%fy some output groups, crea%ng an extremely high confidence error.

30 Summary What is Context? Prac.cal Temporal Dependency Detec.on a. Monte- Carlo Approximate b. Dynamic Analysis c. Analysis Composi%on

Just In Time Compilation in PostgreSQL 11 and onward

Just In Time Compilation in PostgreSQL 11 and onward Just In Time Compilation in PostgreSQL 11 and onward Andres Freund PostgreSQL Developer & Committer Email: andres@anarazel.de Email: andres.freund@enterprisedb.com Twitter: @AndresFreundTec anarazel.de/talks/2018-09-07-pgopen-jit/jit.pdf

More information

Vectorized Postgres (VOPS extension) Konstantin Knizhnik Postgres Professional

Vectorized Postgres (VOPS extension) Konstantin Knizhnik Postgres Professional Vectorized Postgres (VOPS extension) Konstantin Knizhnik Postgres Professional Why Postgres is slow on OLAP queries? 1. Unpacking tuple overhead (heap_deform_tuple) 2. Interpretation overhead (invocation

More information

ADMS/VLDB, August 27 th 2018, Rio de Janeiro, Brazil OPTIMIZING GROUP-BY AND AGGREGATION USING GPU-CPU CO-PROCESSING

ADMS/VLDB, August 27 th 2018, Rio de Janeiro, Brazil OPTIMIZING GROUP-BY AND AGGREGATION USING GPU-CPU CO-PROCESSING ADMS/VLDB, August 27 th 2018, Rio de Janeiro, Brazil 1 OPTIMIZING GROUP-BY AND AGGREGATION USING GPU-CPU CO-PROCESSING OPTIMIZING GROUP-BY AND AGGREGATION USING GPU-CPU CO-PROCESSING MOTIVATION OPTIMIZING

More information

GPU-Accelerated Analytics on your Data Lake.

GPU-Accelerated Analytics on your Data Lake. GPU-Accelerated Analytics on your Data Lake. Data Lake Data Swamp ETL Hell DATA LAKE 0001010100001001011010110 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>

More information

Efficient in-memory query execution using JIT compiling. Han-Gyu Park

Efficient in-memory query execution using JIT compiling. Han-Gyu Park Efficient in-memory query execution using JIT compiling Han-Gyu Park 2012-11-16 CONTENTS Introduction How DCX works Experiment(purpose(at the beginning of this slide), environment, result, analysis & conclusion)

More information

Parallel DBs. April 25, 2017

Parallel DBs. April 25, 2017 Parallel DBs April 25, 2017 1 Why Scale Up? Scan of 1 PB at 300MB/s (SATA r2 Limit) (x1000) ~1 Hour ~3.5 Seconds 2 Data Parallelism Replication Partitioning A A A A B C 3 Operator Parallelism Pipeline

More information

Infrastructure at your Service. In-Memory-Pläne für den 12.2-Optimizer: Teuer oder billig?

Infrastructure at your Service. In-Memory-Pläne für den 12.2-Optimizer: Teuer oder billig? Infrastructure at your Service. In-Memory-Pläne für den 12.2-Optimizer: Teuer oder billig? About me Infrastructure at your Service. Clemens Bleile Senior Consultant Oracle Certified Professional DB 11g,

More information

Challenges in Query Optimization. Doug Inkster, Ingres Corp.

Challenges in Query Optimization. Doug Inkster, Ingres Corp. Challenges in Query Optimization Doug Inkster, Ingres Corp. Abstract Some queries are inherently more difficult than others for a query optimizer to generate efficient plans. This session discusses the

More information

NewSQL Databases MemSQL and VoltDB Experimental Evaluation

NewSQL Databases MemSQL and VoltDB Experimental Evaluation NewSQL Databases MemSQL and VoltDB Experimental Evaluation João Oliveira 1 and Jorge Bernardino 1,2 1 ISEC, Polytechnic of Coimbra, Rua Pedro Nunes, Coimbra, Portugal 2 CISUC Centre for Informatics and

More information

Lenses: An On-Demand Approach to ETL

Lenses: An On-Demand Approach to ETL Lenses: An On-Demand Approach to ETL Ying Yang +, Niccolo Meneghe0 +, Ronny Fehling*,Zhen Hua Liu*, Oliver Kennedy + + SUNY Buffalo, * Oracle {yyang25, niccolom, okennedy}@buffalo.edu {ronny.fehling, zhen.liu}@oracle.com

More information

Semi-Joins and Bloom Join. Databases: The Complete Book Ch 20

Semi-Joins and Bloom Join. Databases: The Complete Book Ch 20 Semi-Joins and Bloom Join Databases: The Complete Book Ch 20 1 Practical Concerns UNION R1 S1 R1 S2 R2 S1 RN SM R1 R2 RN S1 S2 SM 2 Practical Concerns UNION R1 S1 R1 S2 R2 S1 RN SM R1 R2 RN S1 S2 SM Where

More information

Adaptive Schema Databases

Adaptive Schema Databases Adaptive Databases William Spoth b, Bahareh Sadat Arab i, Eric S. Chan o, Dieter Gawlick o, Adel Ghoneimy o, Boris Glavic i, Beda Hammerschmidt o, Oliver Kennedy b, Seokki Lee i, Zhen Hua Liu o, Xing Niu

More information

Consistency Rationing in the Cloud: Pay only when it matters

Consistency Rationing in the Cloud: Pay only when it matters Consistency Rationing in the Cloud: Pay only when it matters By Sandeepkrishnan Some of the slides in this presenta4on have been taken from h7p://www.cse.iitb.ac.in./dbms/cs632/ra4oning.ppt 1 Introduc4on:

More information

Relational Databases Lecture 2

Relational Databases Lecture 2 Relational Databases Lecture 2 Robb T Koether Hampden-Sydney College Fri, Jan 20, 2012 Robb T Koether (Hampden-Sydney College) Relational DatabasesLecture 2 Fri, Jan 20, 2012 1 / 36 1 Databases Systems

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

h7ps://bit.ly/citustutorial

h7ps://bit.ly/citustutorial Before We Start Setup a Citus Cloud account for the exercises: h7ps://bit.ly/citustutorial Designing a Mul

More information

Data Mining & Data Warehouse

Data Mining & Data Warehouse Data Mining & Data Warehouse Associate Professor Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology (1) 2016 2017 1 Points to Cover Why Do We Need Data Warehouses?

More information

Deformable Part Models

Deformable Part Models Deformable Part Models References: Felzenszwalb, Girshick, McAllester and Ramanan, Object Detec@on with Discrimina@vely Trained Part Based Models, PAMI 2010 Code available at hkp://www.cs.berkeley.edu/~rbg/latent/

More information

Decision Support Systems

Decision Support Systems Decision Support Systems 2011/2012 Week 3. Lecture 6 Previous Class Dimensions & Measures Dimensions: Item Time Loca0on Measures: Quan0ty Sales TransID ItemName ItemID Date Store Qty T0001 Computer I23

More information

Comparison of Database Cloud Services

Comparison of Database Cloud Services Comparison of Database Cloud Services Benchmark Testing Overview ORACLE WHITE PAPER SEPTEMBER 2016 Table of Contents Table of Contents 1 Disclaimer 2 Preface 3 Introduction 4 Cloud OLTP Workload 5 Cloud

More information

High Volume In-Memory Data Unification

High Volume In-Memory Data Unification 25 March 2017 High Volume In-Memory Data Unification for UniConnect Platform powered by Intel Xeon Processor E7 Family Contents Executive Summary... 1 Background... 1 Test Environment...2 Dataset Sizes...

More information

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Fall 2016 Lecture 14 - Data Warehousing and Column Stores References Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross-Tab, and

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Comparison of Database Cloud Services

Comparison of Database Cloud Services Comparison of Database Cloud Services Testing Overview ORACLE WHITE PAPER SEPTEMBER 2016 Table of Contents Table of Contents 1 Disclaimer 2 Preface 3 Introduction 4 Cloud OLTP Workload 5 Cloud Analytic

More information

Probabilis)c Temporal Inference on Reconstructed 3D Scenes

Probabilis)c Temporal Inference on Reconstructed 3D Scenes Probabilis)c Temporal Inference on Reconstructed 3D Scenes Grant Schindler Frank Dellaert Georgia Ins)tute of Technology The World Changes Over Time How can we reason about )me in structure from mo)on

More information

Robust Identification of Fuzzy Duplicates

Robust Identification of Fuzzy Duplicates Robust Identification of Fuzzy Duplicates ì Authors: Surajit Chaudhuri (Microso3 Research) Venkatesh Gan; (Microso3 Research) Rajeev Motwani (Stanford University) Publica;on: 21 st Interna;onal Conference

More information

NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi

NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi Outline Context Rela&onal DBMS NoSQL Data Stores NoSQL Timeline NoSQL Data Stores

More information

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

Processing of Very Large Data

Processing of Very Large Data Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first

More information

Organization. Umeshwar Dayal (Hitachi Labs, USA) Malu Castellanos (HP, USA) Nesime Tatbul(Intel Labs and MIT, USA)

Organization. Umeshwar Dayal (Hitachi Labs, USA) Malu Castellanos (HP, USA) Nesime Tatbul(Intel Labs and MIT, USA) VLDB 14 September 1 st, 2014 General Chair: Organization Umeshwar Dayal (Hitachi Labs, USA) PC Chairs: Malu Castellanos (HP, USA) Nesime Tatbul(Intel Labs and MIT, USA) Our BIG THANKS to Qiming Chen (HP,

More information

Query and Join Op/miza/on 11/5

Query and Join Op/miza/on 11/5 Query and Join Op/miza/on 11/5 Overview Recap of Merge Join Op/miza/on Logical Op/miza/on Histograms (How Es/mates Work. Big problem!) Physical Op/mizer (if we have /me) Recap on Merge Key (Simple) Idea

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

Introduction to Databases

Introduction to Databases Introduction to Databases Lecture 1 Chapters 1-2 Robb T. Koether Hampden-Sydney College Wed, Jan 15, 2014 Robb T. Koether (Hampden-Sydney College) Introduction to Databases Wed, Jan 15, 2014 1 / 23 1 Overview

More information

Cloud Data Management System (CDMS)

Cloud Data Management System (CDMS) Cloud Management System (CMS) Wiqar Chaudry Solu9ons Engineer Senior Advisor CMS Overview he OpenStack cloud data management system features a canonical data modeling framework designed to broker context

More information

In-Memory Data Management

In-Memory Data Management In-Memory Data Management Martin Faust Research Assistant Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam Agenda 2 1. Changed Hardware 2.

More information

Lecture 2 Data Cube Basics

Lecture 2 Data Cube Basics CompSci 590.6 Understanding Data: Theory and Applica>ons Lecture 2 Data Cube Basics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu 1 Today s Papers 1. Gray- Chaudhuri- Bosworth- Layman- Reichart- Venkatrao-

More information

Mimir: Bringing CTables into Practice

Mimir: Bringing CTables into Practice Mimir: Bringing CTables into Practice Arindam Nandi β, Ying Yang β, Oliver Kennedy β Boris Glavic i, Ronny Fehling α, Zhen Hua Liu o, Dieter Gawlick o University at Buffalo β Illinois Institute of Technology

More information

Extending Heuris.c Search

Extending Heuris.c Search Extending Heuris.c Search Talk at Hebrew University, Cri.cal MAS group Roni Stern Department of Informa.on System Engineering, Ben Gurion University, Israel 1 Heuris.c search 2 Outline Combining lookahead

More information

Outline. Spanner Mo/va/on. Tom Anderson

Outline. Spanner Mo/va/on. Tom Anderson Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable

More information

PyTables. An on- disk binary data container, query engine and computa:onal kernel. Francesc Alted

PyTables. An on- disk binary data container, query engine and computa:onal kernel. Francesc Alted PyTables An on- disk binary data container, query engine and computa:onal kernel Francesc Alted Tutorial for the PyData Conference, October 2012, New York City 10 th anniversary of PyTables Hi!, PyTables

More information

Generalizing Map- Reduce

Generalizing Map- Reduce Generalizing Map- Reduce 1 Example: A Map- Reduce Graph map reduce map... reduce reduce map 2 Map- reduce is not a solu;on to every problem, not even every problem that profitably can use many compute

More information

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Syllabus. Syllabus. Motivation Decision Support. Syllabus Presentation: Sophia Discussion: Tianyu Metadata Requirements and Conclusion 3 4 Decision Support Decision Making: Everyday, Everywhere Decision Support System: a class of computerized information systems

More information

DATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23

DATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23 DATA WAREHOUSING II CS121: Relational Databases Fall 2017 Lecture 23 Last Time: Data Warehousing 2 Last time introduced the topic of decision support systems (DSS) and data warehousing Very large DBs used

More information

JITing PostgreSQL using LLVM

JITing PostgreSQL using LLVM JITing PostgreSQL using LLVM Andres Freund PostgreSQL Developer & Committer Email: andres@anarazel.de Email: andres.freund@enterprisedb.com Twitter: @AndresFreundTec anarazel.de/talks/fosdem-2018-02-03/jit.pdf

More information

Re- op&mizing Data Parallel Compu&ng

Re- op&mizing Data Parallel Compu&ng Re- op&mizing Data Parallel Compu&ng Sameer Agarwal Srikanth Kandula, Nicolas Bruno, Ming- Chuan Wu, Ion Stoica, Jingren Zhou UC Berkeley A Data Parallel Job can be a collec/on of maps, A Data Parallel

More information

DATA WAREHOUSE- MODEL QUESTIONS

DATA WAREHOUSE- MODEL QUESTIONS DATA WAREHOUSE- MODEL QUESTIONS 1. The generic two-level data warehouse architecture includes which of the following? a. At least one data mart b. Data that can extracted from numerous internal and external

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic

More information

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Improving the Performance of OLAP Queries Using Families of Statistics Trees Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University

More information

HYRISE In-Memory Storage Engine

HYRISE In-Memory Storage Engine HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University

More information

Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn

Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn Mo>va>on: Parallel Query Processing Increasing parallelism in compu>ng Shared nothing clusters, mul> core technology,

More information

Oracle 1Z0-515 Exam Questions & Answers

Oracle 1Z0-515 Exam Questions & Answers Oracle 1Z0-515 Exam Questions & Answers Number: 1Z0-515 Passing Score: 800 Time Limit: 120 min File Version: 38.7 http://www.gratisexam.com/ Oracle 1Z0-515 Exam Questions & Answers Exam Name: Data Warehousing

More information

Op#mizing MapReduce for Highly- Distributed Environments

Op#mizing MapReduce for Highly- Distributed Environments Op#mizing MapReduce for Highly- Distributed Environments Abhishek Chandra Associate Professor Department of Computer Science and Engineering University of Minnesota hep://www.cs.umn.edu/~chandra 1 Big

More information

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 7: Schemas Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database schema A Database Schema captures: The concepts represented Their attributes

More information

Introduction to Databases

Introduction to Databases Introduction to Databases Lecture 1 Robb T. Koether Hampden-Sydney College Mon, Jan 15, 2018 Robb T. Koether (Hampden-Sydney College) Introduction to Databases Mon, Jan 15, 2018 1 / 16 1 Overview of the

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #13: Mining Streams 1

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #13: Mining Streams 1 CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #13: Mining Streams 1 Data Streams In many data mining situa;ons, we do not know the en;re data set in advance Stream Management is important

More information

Understanding Opera.onal Rou.ng (part II) Geoffrey Xie Naval Postgraduate School

Understanding Opera.onal Rou.ng (part II) Geoffrey Xie Naval Postgraduate School Understanding Opera.onal Rou.ng (part II) Geoffrey Xie Naval Postgraduate School July 6, 2011 Route Aggrega.on Child Route Unallocated Child Prefix: e.g., 10.1.33.0/24 19.1.1.2 Aggregate Route 10.1.1.0/24

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Apache Kylin. OLAP on Hadoop

Apache Kylin. OLAP on Hadoop Apache Kylin OLAP on Hadoop Agenda What s Apache Kylin? Tech Highlights Performance Roadmap Q & A http://kylin.io What s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Time Travel in Column Stores

Time Travel in Column Stores Time Travel in Column Stores Martin Kaufmann # 1, Amin A. Manjili # 2, Stefan Hildenbrand #3, Donald Kossmann #4, Andreas Tonder 5 # Systems Group ETH Zurich, Switzerland 1 martinka@ethz.ch 2 amamin@ethz.ch

More information

Fluxo. Improving the Responsiveness of Internet Services with Automa7c Cache Placement

Fluxo. Improving the Responsiveness of Internet Services with Automa7c Cache Placement Fluxo Improving the Responsiveness of Internet Services with Automac Cache Placement Alexander Rasmussen UCSD (Presenng) Emre Kiciman MSR Redmond Benjamin Livshits MSR Redmond Madanlal Musuvathi MSR Redmond

More information

TechTip: Exploit DB2 Web Query's Defined and Computed Fields

TechTip: Exploit DB2 Web Query's Defined and Computed Fields TechTip: Exploit DB2 Web Query's Defined and Computed Fields Published Thursday, 04 September 2008 19:00 by MC Press On-line [Reprinted with permission from itechnology Manager, published by MC Press,

More information

Data Analysis and Data Science

Data Analysis and Data Science Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

Ibis: A Provenance Manager for Mul5 Layer Systems. Christopher Olston & Anish Das Sarma Yahoo! Research

Ibis: A Provenance Manager for Mul5 Layer Systems. Christopher Olston & Anish Das Sarma Yahoo! Research Ibis: A Provenance Manager for Mul5 Layer Systems Christopher Olston & Anish Das Sarma Yahoo! Research Mo5va5on: Many Sub Systems workflow manager e.g. Oozie inges5on dataflow programming framework e.g.

More information

Mondrian Mul+dimensional K Anonymity

Mondrian Mul+dimensional K Anonymity Mondrian Mul+dimensional K Anonymity Kristen Lefevre, David J. DeWi

More information

ORACLE DATA SHEET ORACLE PARTITIONING

ORACLE DATA SHEET ORACLE PARTITIONING Note: This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development,

More information

UPCRC. Illiac. Gigascale System Research Center. Petascale computing. Cloud Computing Testbed (CCT) 2

UPCRC. Illiac. Gigascale System Research Center. Petascale computing. Cloud Computing Testbed (CCT) 2 Illiac UPCRC Petascale computing Gigascale System Research Center Cloud Computing Testbed (CCT) 2 www.parallel.illinois.edu Mul2 Core: All Computers Are Now Parallel We con'nue to have more transistors

More information

Top 10 SQL- on- Hadoop Pi1alls Monte Zweben

Top 10 SQL- on- Hadoop Pi1alls Monte Zweben Top 10 SQL- on- Hadoop Pi1alls Monte Zweben CEO, Splice Machine SQL- on- Hadoop Landscape A crowded, confusing landscape, full of poten4al and pi5alls Pi1all #1: Individual Lookups and Range Queries Issues!

More information

Run Your Own Oracle Database Benchmarks with Hammerora

Run Your Own Oracle Database Benchmarks with Hammerora Run Your Own Oracle Database Benchmarks with Hammerora Steve Shaw Database Technology Manager Software and Services Group Date: 19-NOV-09 Time: 3.00 3.45 Location: Seoul Steve Shaw Introduction Database

More information

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures) CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm

More information

TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets

TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets TPC-H Benchmark Set TPC-H Benchmark TPC-H is an ad-hoc and decision support benchmark. Some of queries are available in the current Tajo. You can download the TPC-H data generator here. DDL for TPC-H datasets

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

A Script- Based Autotuning Compiler System to Generate High- Performance CUDA code

A Script- Based Autotuning Compiler System to Generate High- Performance CUDA code A Script- Based Autotuning Compiler System to Generate High- Performance CUDA code Malik Khan, Protonu Basu, Gabe Rudy, Mary Hall, Chun Chen, Jacqueline Chame Mo:va:on Challenges to programming the GPU

More information

In-Memory Data Management for Enterprise Applications. BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP)

In-Memory Data Management for Enterprise Applications. BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP) In-Memory Data Management for Enterprise Applications BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP) What is an In-Memory Database? 2 Source: Hector Garcia-Molina

More information

Database design and implementation CMPSCI 645. Lectures 18: Transactions and Concurrency

Database design and implementation CMPSCI 645. Lectures 18: Transactions and Concurrency Database design and implementation CMPSCI 645 Lectures 18: Transactions and Concurrency 1 DBMS architecture Query Parser Query Rewriter Query Op=mizer Query Executor Lock Manager Concurrency Control Access

More information

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING ZEYNEP KORKMAZ CS742 - PARALLEL AND DISTRIBUTED DATABASE SYSTEMS UNIVERSITY OF WATERLOO OUTLINE. Background 2. What is Schism?

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Handling heterogeneous storage devices in clusters

Handling heterogeneous storage devices in clusters Handling heterogeneous storage devices in clusters André Brinkmann University of Paderborn Toni Cortes Barcelona Supercompu8ng Center Randomized Data Placement Schemes n Randomized Data Placement Schemes

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

CSE 562 Final Exam Solutions

CSE 562 Final Exam Solutions CSE 562 Final Exam Solutions May 12, 2014 Question Points Possible Points Earned A.1 7 A.2 7 A.3 6 B.1 10 B.2 10 C.1 10 C.2 10 D.1 10 D.2 10 E 20 Bonus 5 Total 105 CSE 562 Final Exam 2014 Relational Algebra

More information

Data Flow Analysis. Suman Jana. Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006)

Data Flow Analysis. Suman Jana. Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006) Data Flow Analysis Suman Jana Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006) Data flow analysis Derives informa=on about the dynamic behavior of a program by only

More information

Experiences Implemen.ng Usable MPC For Social Good

Experiences Implemen.ng Usable MPC For Social Good Experiences Implemen.ng Usable MPC For Social Good Mayank Varia Hariri Ins.tute, Boston University Based on joint work with BU: Azer Bestavros, Eric Dunton, Frederick Jansen, Kyle Holzinger, Andrei Lapets,

More information

OpenWorld 2015 Oracle Par22oning

OpenWorld 2015 Oracle Par22oning OpenWorld 2015 Oracle Par22oning Did You Think It Couldn t Get Any Be6er? Safe Harbor Statement The following is intended to outline our general product direc2on. It is intended for informa2on purposes

More information

Logisland Event mining at scale. Thomas [ ]

Logisland Event mining at scale. Thomas [ ] Logisland Event mining at scale Thomas Bailet @hurence [2017-01-19] Overview Logisland provides a stream analy0cs solu0on that can handle all enterprise-scale event data and processing Big picture Open

More information

Columnstore Technology Improvements in SQL Server Presented by Niko Neugebauer Moderated by Nagaraj Venkatesan

Columnstore Technology Improvements in SQL Server Presented by Niko Neugebauer Moderated by Nagaraj Venkatesan Columnstore Technology Improvements in SQL Server 2016 Presented by Niko Neugebauer Moderated by Nagaraj Venkatesan Thank You microsoft.com hortonworks.com aws.amazon.com red-gate.com Empower users with

More information

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #21: Data Mining and Warehousing

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #21: Data Mining and Warehousing CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #21: Data Mining and Warehousing Overview Tradi8onal database systems are tuned to many, small, simple queries. New applica8ons

More information

Seman&c Aware Anomaly Detec&on in Real World Parking Data

Seman&c Aware Anomaly Detec&on in Real World Parking Data Seman&c Aware Anomaly Detec&on in Real World Parking Data Arnamoy Bha+acharyya 1, Weihan Wang 2, Chris&ne Tsang 2, Cris&ana Amza 1 1 University of Toronto, 2 Smarking Inc Mo&va&on Mo&va&on heps://www.engadget.com/2017/01/17/google-

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor

Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor Daniel C. Zilio et al Proceedings of the International Conference on Automatic Computing (ICAC 04) Rolando Blanco CS848 - Spring

More information

On-Line Application Processing

On-Line Application Processing On-Line Application Processing WAREHOUSING DATA CUBES DATA MINING 1 Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming,

More information

RBF: A New Storage Structure for Space- Efficient Queries for Multidimensional Metadata in OSS

RBF: A New Storage Structure for Space- Efficient Queries for Multidimensional Metadata in OSS RBF: A New Storage Structure for Space- Efficient Queries for Multidimensional Metadata in OSS Yu Hua 1, Dan Feng 1, Hong Jiang 2, Lei Tian 1 1 School of Computer, Huazhong University of Science and Technology,

More information

ECS 165B: Database System Implementa6on Lecture 14

ECS 165B: Database System Implementa6on Lecture 14 ECS 165B: Database System Implementa6on Lecture 14 UC Davis April 28, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke, as well as slides by Zack Ives. Class Agenda

More information

cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman

cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman What is CitusDB? CitusDB is a scalable analytics database that extends PostgreSQL Citus shards your data and automa/cally parallelizes

More information

Indexing Bi-temporal Windows. Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4

Indexing Bi-temporal Windows. Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4 Indexing Bi-temporal Windows Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4 1 2 3 4 Outline Introduction Bi-temporal Windows Related Work The BiSW Index Experiments Conclusion

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information