How Achaeans Would Construct Columns in Troy. Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte

Size: px
Start display at page:

Download "How Achaeans Would Construct Columns in Troy. Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte"

Transcription

1 How Achaeans Would Construct Columns in Troy Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte

2 Number of Visas Received 1 0,75 0,5 0,25 0 Alekh Jens

3 Health Level 5 days before CIDR 100 percentage Alekh Jens

4 Average Number of Slides per 20min talk Alekh Jens

5 Number of Slides Actually Prepared Alekh Jens

6

7 What is the problem?

8 Row-stores 8

9 Column-stores 9

10 OLTP OLAP 10

11 11

12 OLTP OLAP? Can we do efficient OLAP in Row-stores? 12

13 Any solutions out there?

14 C-Tables * Application User Database Query Processor Relations Physical Representation File 1 File 2 File 3... File n * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR

15 C-Tables * Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR

16 C-Tables * Physical Table Sorted Relation Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR

17 C-Tables * Physical Table Sorted Relation Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR

18 C-Tables * Physical Table Sorted Relation Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR

19 C-Tables * Physical Table Sorted Relation Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR

20 C-Tables * Physical Table T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 T phone f v Sorted Relation Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household T name f v 1 smith 2 steve 3 mark 4 joe 5 kim 6 john 7 jim 8 ian JOINS! * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR

21 C-Tables * C-Table Standard Row C-Table Query Time (sec) Trojan Columns (SP) Row C-Tables Trojan Columns # referenced Attributes (a) Cardinality = 10 JOINS! Figure 7: Comparing query times of C * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR

22 C-Tables * C-Table C-Table Standard RowStandard Row Query Time (sec) Query Time (sec) # referenced Attributes (b) Cardinality 1500 = 100 (c) ring query times of CTable and standard 1125 row for different attribut e (sec) * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR

23 C-Tables * Application User Database Query Processor Relations Physical Representation File 1 File 2 File 3... File n * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR

24 Column Index * Application User Database Query Processor Relations Physical Representation File 1 File 2 File 3... File n * P. Larson et. al. SQL Server Column Store Indexes. SIGMOD

25 Column Index * Physical Table Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_id attribute_id blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, market_segment building, automobile, household, household * P. Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011 segment size = 4 25

26 Column Index * Physical Table Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_id attribute_id blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, market_segment building, automobile, household, household * P. Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011 segment size = 4 26

27 Column Index * Physical Table Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_id attribute_id blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, market_segment building, automobile, household, household * P. Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011 segment size = 4 27

28 Column Index * Application User Database Query Processor Relations Physical Representation File 1 File 2 File 3... File n DEEP CHANGES! * P. Larson et. al. SQL Server Column Store Indexes. SIGMOD

29 Column Index * Application User Database Query Processor Relations Physical Representation File 1 File 2 File 3... File n LONG TIME! * P. Larson et. al. SQL Server Column Store Indexes. SIGMOD

30 Column Index * Application User Database Query Processor Relations Physical Representation File 1 File 2 File 3... File n SOURCE CODE! * P. Larson et. al. SQL Server Column Store Indexes. SIGMOD

31 Column Index * Application User Database Query Processor Relations Physical Representation File 1 File 2 File 3... File n * P. Larson et. al. SQL Server Column Store Indexes. SIGMOD

32 What do we propose?

33 Trojan Columns User Database Application Query Processor Relations UDF Storage Layer Physical Representation File 1 File 2 File 3... File n 31

34 Trojan Columns Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_id attribute_id blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, market_segment building, automobile, household, household 32

35 write-udf Trojan Columns (a) Convert row tuples into blobs Data Parser Tuple Iterator (b) Store blob data (c) Get next row data Data Accesso r Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_id attribute_id blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, market_segment building, automobile, household, household 32

36 read-udf Trojan Columns (e) Reconstruct row tuples Data Parser Tuple Iterator (d) Parse blob data (g)end of table (f) Fetch blob data Data Accessor Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_id attribute_id blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, market_segment building, automobile, household, household 33

37 Example: TPC-H Query 6 Result γ agg (extendedprice * discount) σ shipdate BETWEEN AND AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 π SCAN quantity, discount extendedprice, shipdate lineitem 34

38 Example: TPC-H Query 6 Result Result γ agg (extendedprice * discount) γ agg (extendedprice * discount) σ shipdate BETWEEN AND AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 π SCAN quantity, discount extendedprice, shipdate lineitem σ shipdate BETWEEN AND AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 π SCAN wn quantity, discount extendedprice, shipdate lineitem scanudf scanudf 35

39 Example: TPC-H Query 6 Result Result Result γ agg σ shipdate BETWEEN AND AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 π SCAN (extendedprice * discount) quantity, discount extendedprice, shipdate lineitem γ agg σ shipdate BETWEEN AND AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 π SCAN wn quantity, discount extendedprice, shipdate lineitem (extendedprice * discount) scanudf scanudf γ agg selectudf σ shipdate BETWEEN AND AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 π SCAN (extendedprice * discount) quantity, discount extendedprice, shipdate lineitem selectudf 36

40 Example: TPC-H Query 14 Result γ agg 100 * SUM(CASE WHEN type LIKE PROMO% THEN extendedprice*(1-discount) ELSE 0 END) / SUM(extendedprice*(1-discount)) π partkey type, partkey SCAN part shipdate BETWEEN σ AND π SCAN shipdate, discount extendedprice, partkey lineitem 37

41 Trojan Columns User Database Application Query Processor Relations UDF Storage Layer Physical Representation File 1 File 2 File 3... File n Plug-and-play 38

42 Trojan Columns User Database Application Query Processor Relations UDF Storage Layer Physical Representation File 1 File 2 File 3... File n Quick Deployment 38

43 Trojan Columns User Database Application Query Processor Relations UDF Storage Layer Physical Representation File 1 File 2 File 3... File n Closed-source 38

44 Trojan Columns User Database Application Query Processor Relations UDF Storage Layer Physical Representation File 1 File 2 File 3... File n 38

45 Will this work?

46 Experimental Setup Commercial closed-source Row-store (Standard Row) Trojan Columns in commercial closed-source Row-store (Trojan Columns) Three variants of TPC-H benchmark: 1. simplified queries, simplified dataset 2. simplified queries, original dataset 3. original queries, original dataset

47 10 partsupp supplier nation Simplified 1 Queries, Simplified Dataset 1* Q1 Q2 Q3 Q4 Q5 Q6 Q7 region *** *** *** *** 10 Query Time (sec) 30 Q1 Q6 20 Q12 Q14 Q3 10 Q5 Q10 Q19 Q2 Q4 500 Standard Row Trojan Columns Trojan Columns (SP) Standard Row Trojan Standard Row Standard Row Trojan Columns Trojan Columns Q1 Q2 Q1 Q3 Q2 Q4 Q3 Q5 Q4 Q6 Q5 Q7 Q6 Q7 (a) Simplified queries, Simplified dataset * Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005 Query Time (sec) (b)

48 10 partsupp supplier nation Simplified 1 Queries, Simplified Dataset 1* Q1 Q2 Q3 Q4 Q5 Q6 Q7 region *** *** *** *** 10 Query Time (sec) 30 Q1 Q6 20 Q12 Q14 Q3 10 Q5 Q10 Q19 Q2 Q4 5x 500 Standard Row Trojan Columns Trojan Columns (SP) Standard Row Trojan Standard Row Standard Row Trojan Columns Trojan Columns Q1 Q2 Q1 Q3 Q2 Q4 Q3 Q5 Q4 Q6 Q5 Q7 Q6 Q7 (a) Simplified queries, Simplified dataset * Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005 Query Time (sec) (b)

49 1 375 Simplified Queries, Original Dataset Q1 Q2 Q3 Q4 Q5 Q6 Q7 0 Quer 500 Trojan Columns Trojan Columns (O) Trojan Columns (SP) Factor Factor Query Time (sec) Standard Row Trojan Columns Query Time (sec) Q1 Q2 Q3 Q4 Q5 Q6 Q7 (b) Simplified queries, Unmodified dataset (c) 42

50 1 375 Simplified Queries, Original Dataset Q1 Q2 Q3 Q4 Q5 Q6 Q7 0 Quer 500 Trojan Columns Trojan Columns (O) Trojan Columns (SP) Factor Factor Query Time (sec) Standard Row Trojan Columns x Query Time (sec) Q1 Q2 Q3 Q4 Q5 Q6 Q7 (b) Simplified queries, Unmodified dataset (c) 42

51 Original Queries, Original Dataset * The Good Queries Query Time (sec) Standard Row Trojan Columns 0 Q1 Q6 Q12 Q14 * tpch.org/tpch 43

52 Original Queries, Original Dataset * The Good Queries Query Time (sec) Standard Row Trojan Columns 0 * tpch.org/tpch Q1 Q6 Q12 Q14 9x 43

53 What are the trade-offs?

54 Original Queries, Original Dataset * The Bad Queries Query Time (sec) Standard Row Trojan Columns 0 Q3 Q5 Q10 Q19 * tpch.org/tpch 45

55 Micro-Benchmark: Improvement over Row-store # referenced attributes (r) E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00 selectivity (fraction of tuples accessed) gure 7: Trojan Columns improvement factor in DBMS 46

56 Micro-Benchmark: Improvement over Row-store Not Affected # referenced attributes (r) E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00 selectivity (fraction of tuples accessed) gure 7: Trojan Columns improvement factor in DBMS 46

57 Micro-Benchmark: Improvement over Row-store Not Affected # referenced attributes (r) E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00 selectivity (fraction of tuples accessed) Affected gure 7: Trojan Columns improvement factor in DBMS 46

58 How far are we?

59 Four Systems Commercial Row-store (Standard Row) Trojan Columns in commercial Row-store Commercial Row-store with vendor support for column technology (DBMS-Y) Commercial Column-store (DBMS-Z) (a) default TPC-H schema (b) tuned schema

60 TPC-H Benchmark Query Time (sec) Standard Row DBMS-Y DBMS-Z (b) Trojan Columns DBMS-Z (a) 0 Q1 Q6 Q12 Q14 49

61 TPC-H Benchmark Comparable or Better! Query Time (sec) Standard Row DBMS-Y DBMS-Z (b) Trojan Columns DBMS-Z (a) 0 Q1 Q6 Q12 Q14 49

62 TPC-H Benchmark Comparable or Better! Query Time (sec) Standard Row DBMS-Y DBMS-Z (b) Trojan Columns DBMS-Z (a) 0 Q1 Q6 Q12 Q14 Still to catch-up! 49

63 What about query optimization?

64 Rules out query optimization?

65 Rules out query optimization? NO!

66 Rules out query optimization? NO! QO with aggregate UDFs [SIGMOD 06] Manimal [WebDB 10] HadoopToSQL [EuroSys 10] Black box QO [VLDB 12]

67

68 The UDF Business Model

69 UDFs Not just for application-specific code Integrate core database functionality after the fact Column layouts are just one example! Meet customer demands quickly Provide quick feedback before new product release

70 Summary row-store slow good-enough fast performance

71 Summary row-store native column-store slow good-enough performance fast

72 Summary row-store Trojan Columns native column-store slow good-enough performance fast

Things To Know. When Buying for an! Alekh Jindal, Jorge Quiané, Jens Dittrich

Things To Know. When Buying for an! Alekh Jindal, Jorge Quiané, Jens Dittrich 7 Things To Know When Buying for an! Alekh Jindal, Jorge Quiané, Jens Dittrich 1 What Shoes? Why Shoes? 3 Analyzing MR Jobs (HadoopToSQL, Manimal) Generating MR Jobs (PigLatin, Hive) Executing MR Jobs

More information

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Fall 2016 Lecture 14 - Data Warehousing and Column Stores References Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross-Tab, and

More information

Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri

Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri Overview of Data Exploration Techniques Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri data exploration not always sure what we are looking for (until we find it) data has always been big volume

More information

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses

More information

6.830 Problem Set 2 (2017)

6.830 Problem Set 2 (2017) 6.830 Problem Set 2 1 Assigned: Monday, Sep 25, 2017 6.830 Problem Set 2 (2017) Due: Monday, Oct 16, 2017, 11:59 PM Submit to Gradescope: https://gradescope.com/courses/10498 The purpose of this problem

More information

Benchmark TPC-H 100.

Benchmark TPC-H 100. Benchmark TPC-H 100 vs Benchmark TPC-H Transaction Processing Performance Council (TPC) is a non-profit organization founded in 1988 to define transaction processing and database benchmarks and to disseminate

More information

A Comparison of Knives for Bread Slicing

A Comparison of Knives for Bread Slicing A Comparison of Knives for Bread Slicing Alekh Jindal Endre Palatinus Vladimir Pavlov Jens Dittrich Information Systems Group, Saarland University http://infosys.cs.uni-saarland.de ABSTRACT Vertical partitioning

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

Chapter 9. Cardinality Estimation. How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11

Chapter 9. Cardinality Estimation. How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11 Chapter 9 How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11 Wilhelm-Schickard-Institut für Informatik Universität Tübingen 9.1 Web Forms Applications

More information

Midterm Review. March 27, 2017

Midterm Review. March 27, 2017 Midterm Review March 27, 2017 1 Overview Relational Algebra & Query Evaluation Relational Algebra Rewrites Index Design / Selection Physical Layouts 2 Relational Algebra & Query Evaluation 3 Relational

More information

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra Introduction to Data Management CSE 344 Lectures 8: Relational Algebra CSE 344 - Winter 2017 1 Announcements Homework 3 is posted Microsoft Azure Cloud services! Use the promotion code you received Due

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY [Agrawal, 2(4): April, 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY An Horizontal Aggregation Approach for Preparation of Data Sets in Data Mining Mayur

More information

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building External Sorting and Query Optimization A.R. Hurson 323 CS Building External sorting When data to be sorted cannot fit into available main memory, external sorting algorithm must be applied. Naturally,

More information

I. Introduction. FlashQueryFile: Flash-Optimized Layout and Algorithms for Interactive Ad Hoc SQL on Big Data Rini T Kaushik 1

I. Introduction. FlashQueryFile: Flash-Optimized Layout and Algorithms for Interactive Ad Hoc SQL on Big Data Rini T Kaushik 1 FlashQueryFile: Flash-Optimized Layout and Algorithms for Interactive Ad Hoc SQL on Big Data Rini T Kaushik 1 1 IBM Research - Almaden Abstract High performance storage layer is vital for allowing interactive

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2017/18 Unit 13 J. Gamper 1/42 Advanced Data Management Technologies Unit 13 DW Pre-aggregation and View Maintenance J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:

More information

Increasing Database Performance through Optimizing Structure Query Language Join Statement

Increasing Database Performance through Optimizing Structure Query Language Join Statement Journal of Computer Science 6 (5): 585-590, 2010 ISSN 1549-3636 2010 Science Publications Increasing Database Performance through Optimizing Structure Query Language Join Statement 1 Ossama K. Muslih and

More information

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra Introduction to Data Management CSE 344 Lectures 8: Relational Algebra CSE 344 - Winter 2016 1 Announcements Homework 3 is posted Microsoft Azure Cloud services! Use the promotion code you received Due

More information

Basic operators: selection, projection, cross product, union, difference,

Basic operators: selection, projection, cross product, union, difference, CS145 Lecture Notes #6 Relational Algebra Steps in Building and Using a Database 1. Design schema 2. Create schema in DBMS 3. Load initial data 4. Repeat: execute queries and updates on the database Database

More information

I am: Rana Faisal Munir

I am: Rana Faisal Munir Self-tuning BI Systems Home University (UPC): Alberto Abelló and Oscar Romero Host University (TUD): Maik Thiele and Wolfgang Lehner I am: Rana Faisal Munir Research Progress Report (RPR) [1 / 44] Introduction

More information

CMPT 354: Database System I. Lecture 7. Basics of Query Optimization

CMPT 354: Database System I. Lecture 7. Basics of Query Optimization CMPT 354: Database System I Lecture 7. Basics of Query Optimization 1 Why should you care? https://databricks.com/glossary/catalyst-optimizer https://sigmod.org/sigmod-awards/people/goetz-graefe-2017-sigmod-edgar-f-codd-innovations-award/

More information

Architecture and Implementation of Database Systems (Winter 2015/16)

Architecture and Implementation of Database Systems (Winter 2015/16) Jens Teubner Architecture & Implementation of DBMS Winter 2015/16 1 Architecture and Implementation of Database Systems (Winter 2015/16) Jens Teubner, DBIS Group jens.teubner@cs.tu-dortmund.de Winter 2015/16

More information

Introduction to Database Systems CSE 414. Lecture 16: Query Evaluation

Introduction to Database Systems CSE 414. Lecture 16: Query Evaluation Introduction to Database Systems CSE 414 Lecture 16: Query Evaluation CSE 414 - Spring 2018 1 Announcements HW5 + WQ5 due tomorrow Midterm this Friday in class! Review session this Wednesday evening See

More information

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory

More information

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why

More information

Data Blocks: Hybrid OLTP and OLAP on compressed storage

Data Blocks: Hybrid OLTP and OLAP on compressed storage Data Blocks: Hybrid OLTP and OLAP on compressed storage Ben Brümmer Technische Universität München Fürstenfeldbruck, 26. November 208 Ben Brümmer 26..8 Lehrstuhl für Datenbanksysteme Problem HDD/Archive/Tape-Storage

More information

HyPer-sonic Combined Transaction AND Query Processing

HyPer-sonic Combined Transaction AND Query Processing HyPer-sonic Combined Transaction AND Query Processing Thomas Neumann Technische Universität München December 2, 2011 Motivation There are different scenarios for database usage: OLTP: Online Transaction

More information

Columnstore and B+ tree. Are Hybrid Physical. Designs Important?

Columnstore and B+ tree. Are Hybrid Physical. Designs Important? Columnstore and B+ tree Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 B+ tree & Columnstore on same table = Hybrid design 4? C O L C O L B+ tree B+ tree ? C O L C O L B+ tree B+ tree

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

Column-Stores vs. Row-Stores: How Different Are They Really?

Column-Stores vs. Row-Stores: How Different Are They Really? Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi, Samuel Madden and Nabil Hachem SIGMOD 2008 Presented by: Souvik Pal Subhro Bhattacharyya Department of Computer Science Indian

More information

Jignesh M. Patel. Blog:

Jignesh M. Patel. Blog: Jignesh M. Patel Blog: http://bigfastdata.blogspot.com Go back to the design Query Cache from Processing for Conscious 98s Modern (at Algorithms Hardware least for Hash Joins) 995 24 2 Processor Processor

More information

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto Lecture 02.03. Query evaluation Combining operators. Logical query optimization By Marina Barsky Winter 2016, University of Toronto Quick recap: Relational Algebra Operators Core operators: Selection σ

More information

How am I going to skim through these data?

How am I going to skim through these data? How am I going to skim through these data? 1 Trends Computers keep getting faster But data grows faster yet! Remember? BIG DATA! Queries are becoming more complex Remember? ANALYTICS! 2 Analytic Queries

More information

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS Marcin Zukowski Sandor Heman, Niels Nes, Peter Boncz CWI, Amsterdam VLDB 2007 Outline Scans in a DBMS Cooperative Scans Benchmarks DSM version VLDB,

More information

Extending In-Memory Relational Database Engines with Native Graph Support

Extending In-Memory Relational Database Engines with Native Graph Support Extending In-Memory Relational Database Engines with Native Graph Support EDBT 18 Mohamed S. Hassan 1 Tatiana Kuznetsova 1 Hyun Chai Jeong 1 Walid G. Aref 1 Mohammad Sadoghi 2 1 Purdue University West

More information

Introduction to Database Systems CSE 444

Introduction to Database Systems CSE 444 Introduction to Database Systems CSE 444 Lecture 18: Query Processing Overview CSE 444 - Summer 2010 1 Where We Are We are learning how a DBMS executes a query How come a DBMS can execute a query so fast?

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lectures 16 17: Basics of Query Optimization and Cost Estimation (Ch. 15.{1,3,4.6,6} & 16.4-5) 1 Announcements WQ4 is due Friday 11pm HW3 is due next Tuesday 11pm Midterm is next

More information

Column-Stores vs. Row-Stores: How Different Are They Really?

Column-Stores vs. Row-Stores: How Different Are They Really? Column-Stores vs. Row-Stores: How Different Are They Really? Daniel Abadi, Samuel Madden, Nabil Hachem Presented by Guozhang Wang November 18 th, 2008 Several slides are from Daniel Abadi and Michael Stonebraker

More information

Fundamentals of Database Systems

Fundamentals of Database Systems Fundamentals of Database Systems Assignment: 4 September 21, 2015 Instructions 1. This question paper contains 10 questions in 5 pages. Q1: Calculate branching factor in case for B- tree index structure,

More information

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz BDCC: Exploiting Fine-Grained Persistent Memories for OLAP Peter Boncz NVRAM System integration: NVMe: block devices on the PCIe bus NVDIMM: persistent RAM, byte-level access Low latency Lower than Flash,

More information

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems Query Processing with Indexes CPS 216 Advanced Database Systems Announcements (February 24) 2 More reading assignment for next week Buffer management (due next Wednesday) Homework #2 due next Thursday

More information

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy CompSci 516 Database Systems Lecture 20 Parallel DBMS Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW3 due on Monday, Nov 20, 11:55 pm (in 2 weeks) See some

More information

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost

More information

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk Announcements Introduction to Database Systems CSE 414 Lecture 17: Basics of Query Optimization and Query Cost Estimation Midterm will be released by end of day today Need to start one HW6 step NOW: https://aws.amazon.com/education/awseducate/apply/

More information

Column-Stores vs. Row-Stores How Different Are They Really?

Column-Stores vs. Row-Stores How Different Are They Really? Column-Stores vs. Row-Stores How Different Are They Really? Volodymyr Piven Wilhelm-Schickard-Institut für Informatik Eberhard-Karls-Universität Tübingen 2. Januar 2 Volodymyr Piven (Universität Tübingen)

More information

MemTest: A Novel Benchmark for In-memory Database

MemTest: A Novel Benchmark for In-memory Database MemTest: A Novel Benchmark for In-memory Database Qiangqiang Kang, Cheqing Jin, Zhao Zhang, Aoying Zhou Institute for Data Science and Engineering, East China Normal University, Shanghai, China 1 Outline

More information

Design and Implementation of Bit-Vector filtering for executing of multi-join qureies

Design and Implementation of Bit-Vector filtering for executing of multi-join qureies Undergraduate Research Opportunity Program (UROP) Project Report Design and Implementation of Bit-Vector filtering for executing of multi-join qureies By Cheng Bin Department of Computer Science School

More information

Storing and Processing Temporal Data in a Main Memory Column Store

Storing and Processing Temporal Data in a Main Memory Column Store Storing and Processing Temporal Data in a Main Memory Column Store Martin Kaufmann (supervised by Prof. Dr. Donald Kossmann) SAP AG, Walldorf, Germany and Systems Group, ETH Zürich, Switzerland martin.kaufmann@inf.ethz.ch

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

MRBench : A Benchmark for Map-Reduce Framework

MRBench : A Benchmark for Map-Reduce Framework MRBench : A Benchmark for Map-Reduce Framework Kiyoung Kim, Kyungho Jeon, Hyuck Han, Shin-gyu Kim, Hyungsoo Jung, Heon Y. Yeom School of Computer Science and Engineering Seoul National University Seoul

More information

Exercise Session 5. Data Processing on Modern Hardware L Fall Semester Cagri Balkesen

Exercise Session 5. Data Processing on Modern Hardware L Fall Semester Cagri Balkesen Cagri Balkesen Data Processing on Modern Hardware Exercises Fall 2012 1 Exercise Session 5 Data Processing on Modern Hardware 263-3502-00L Fall Semester 2012 Cagri Balkesen cagri.balkesen@inf.ethz.ch Department

More information

EECS 647: Introduction to Database Systems

EECS 647: Introduction to Database Systems EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2009 External Sorting Today s Topic Implementing the join operation 4/8/2009 Luke Huan Univ. of Kansas 2 Review DBMS Architecture

More information

Processing a Trillion Cells per Mouse Click

Processing a Trillion Cells per Mouse Click Processing a Trillion Cells per Mouse Click Common Sense 13/01 21.3.2013 Alex Hall, Google Zurich Olaf Bachmann, Robert Buessow, Silviu Ganceanu, Marc Nunkesser Outline of the Talk AdSpam team at Google

More information

Agent 7 which languages? skills?

Agent 7 which languages? skills? Agent 7 which languages? skills? select * from languagerel where agent_id = 7 lang_id agent_id 3 7 14 7 19 7 20 7 agent 7 speaks 4 languages select * from skillrel where agent_id = 7 skill_id agent_id

More information

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang Department of Computer Science, University of Houston, USA Abstract. We study the serial and parallel

More information

Toward timely, predictable and cost-effective data analytics. Renata Borovica-Gajić DIAS, EPFL

Toward timely, predictable and cost-effective data analytics. Renata Borovica-Gajić DIAS, EPFL Toward timely, predictable and cost-effective data analytics Renata Borovica-Gajić DIAS, EPFL Big data proliferation Big data is when the current technology does not enable users to obtain timely, cost-effective,

More information

Column-Oriented Database Systems. Liliya Rudko University of Helsinki

Column-Oriented Database Systems. Liliya Rudko University of Helsinki Column-Oriented Database Systems Liliya Rudko University of Helsinki 2 Contents 1. Introduction 2. Storage engines 2.1 Evolutionary Column-Oriented Storage (ECOS) 2.2 HYRISE 3. Database management systems

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

From SQL-query to result Have a look under the hood

From SQL-query to result Have a look under the hood From SQL-query to result Have a look under the hood Classical view on RA: sets Theory of relational databases: table is a set Practice (SQL): a relation is a bag of tuples R π B (R) π B (R) A B 1 1 2

More information

XML Systems & Benchmarks

XML Systems & Benchmarks XML Systems & Benchmarks Christoph Staudt Peter Chiv Saarland University, Germany July 1st, 2003 Main Goals of our talk Part I Show up how databases and XML come together Make clear the problems that arise

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Graph Analytics using Vertica Relational Database

Graph Analytics using Vertica Relational Database Graph Analytics using ertica Relational Database Alekh Jindal* Samuel Madden Malú Castellanos Meichun Hsu Microsoft MIT ertica ertica * work done while at MIT Motivation for graphs on DB Data anyways in

More information

HYRISE In-Memory Storage Engine

HYRISE In-Memory Storage Engine HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University

More information

EECS 647: Introduction to Database Systems

EECS 647: Introduction to Database Systems EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2009 Stating Points A database A database management system A miniworld A data model Conceptual model Relational model 2/24/2009

More information

class 17 updates prof. Stratos Idreos

class 17 updates prof. Stratos Idreos class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value INSERT INTO table_name VALUES (value1,value2,value3,...)

More information

Optimizing Communication for Multi- Join Query Processing in Cloud Data Warehouses

Optimizing Communication for Multi- Join Query Processing in Cloud Data Warehouses Optimizing Communication for Multi- Join Query Processing in Cloud Data Warehouses Swathi Kurunji, Tingjian Ge, Xinwen Fu, Benyuan Liu, Cindy X. Chen Computer Science Department, University of Massachusetts

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Accelerating Queries with Group-By and Join by Groupjoin

Accelerating Queries with Group-By and Join by Groupjoin Accelerating Queries with Group-By and Join by Groupjoin Guido Moerkotte and Thomas Neumann Sept. 2011 Guido Moerkotte A Week at Max Data Seattle 1 / 36 A Week at Max Data Guido Moerkotte A Week at Max

More information

A Sample Solution to the Midterm Test

A Sample Solution to the Midterm Test CS3600.1 Introduction to Database System Fall 2016 Dr. Zhizhang Shen A Sample Solution to the Midterm Test 1. A couple of W s(10) (a) Why is it the case that, by default, there are no duplicated tuples

More information

CSE 344 FEBRUARY 14 TH INDEXING

CSE 344 FEBRUARY 14 TH INDEXING CSE 344 FEBRUARY 14 TH INDEXING EXAM Grades posted to Canvas Exams handed back in section tomorrow Regrades: Friday office hours EXAM Overall, you did well Average: 79 Remember: lowest between midterm/final

More information

CSE 344 APRIL 20 TH RDBMS INTERNALS

CSE 344 APRIL 20 TH RDBMS INTERNALS CSE 344 APRIL 20 TH RDBMS INTERNALS ADMINISTRIVIA OQ5 Out Datalog Due next Wednesday HW4 Due next Wednesday Written portion (.pdf) Coding portion (one.dl file) TODAY Back to RDBMS Query plans and DBMS

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

Benchmarking ETL Workflows

Benchmarking ETL Workflows Benchmarking ETL Workflows Alkis Simitsis 1, Panos Vassiliadis 2, Umeshwar Dayal 1, Anastasios Karagiannis 2, Vasiliki Tziovara 2 1 HP Labs, Palo Alto, CA, USA, {alkis, Umeshwar.Dayal}@hp.com 2 University

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Database Group Research Overview. Immanuel Trummer

Database Group Research Overview. Immanuel Trummer Database Group Research Overview Immanuel Trummer Talk Overview User Query Data Analysis Result Processing Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query

More information

Beyond EXPLAIN. Query Optimization From Theory To Code. Yuto Hayamizu Ryoji Kawamichi. 2016/5/20 PGCon Ottawa

Beyond EXPLAIN. Query Optimization From Theory To Code. Yuto Hayamizu Ryoji Kawamichi. 2016/5/20 PGCon Ottawa Beyond EXPLAIN Query Optimization From Theory To Code Yuto Hayamizu Ryoji Kawamichi 2016/5/20 PGCon 2016 @ Ottawa Historically Before Relational Querying was physical Need to understand physical organization

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Database Design. Wenfeng Xu Hanxiang Zhao

Database Design. Wenfeng Xu Hanxiang Zhao Database Design Wenfeng Xu Hanxiang Zhao Automated Partitioning Design in Parallel Database Systems MPP system: A distributed computer system which consists of many individual nodes, each of which is essentially

More information

In-Memory Data Management Jens Krueger

In-Memory Data Management Jens Krueger In-Memory Data Management Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute OLTP vs. OLAP 2 Online Transaction Processing (OLTP) Organized in rows Online Analytical Processing

More information

Robustness in Automatic Physical Database Design

Robustness in Automatic Physical Database Design Robustness in Automatic Physical Database Design Kareem El Gebaly David R. Cheriton School of Computer Science University of Waterloo Technical Report CS-2007-29 Robustness in Automatic Physical Database

More information

(Extended) Entity Relationship

(Extended) Entity Relationship 03 - Database Design, UML and (Extended) Entity Relationship Modeling CS530 Database Architecture Models and Design Prof. Ian HORROCKS Dr. Robert STEVENS In this Section Topics Covered Database Design

More information

Horizontal Aggregation in SQL to Prepare Dataset for Generation of Decision Tree using C4.5 Algorithm in WEKA

Horizontal Aggregation in SQL to Prepare Dataset for Generation of Decision Tree using C4.5 Algorithm in WEKA Horizontal Aggregation in SQL to Prepare Dataset for Generation of Decision Tree using C4.5 Algorithm in WEKA Mayur N. Agrawal 1, Ankush M. Mahajan 2, C.D. Badgujar 3, Hemant P. Mande 4, Gireesh Dixit

More information

On-Line Application Processing

On-Line Application Processing On-Line Application Processing WAREHOUSING DATA CUBES DATA MINING 1 Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming,

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

Optimizing OLAP Cube Processing on Solid State Drives

Optimizing OLAP Cube Processing on Solid State Drives Optimizing OLAP Cube Processing on Solid State Drives Zhibo Chen University of Houston Houston, TX 77204, USA Carlos Ordonez University of Houston Houston, TX 77204, USA ABSTRACT Hardware technology has

More information

Actian Vector Benchmarks. Cloud Benchmarking Summary Report

Actian Vector Benchmarks. Cloud Benchmarking Summary Report Actian Vector Benchmarks Cloud Benchmarking Summary Report April 2018 The Cloud Database Performance Benchmark Executive Summary The table below shows Actian Vector as evaluated against Amazon Redshift,

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Fall 2016 1 HW8 is out Last assignment! Get Amazon credits now (see instructions) Spark with Hadoop Due next wed CSE 344 - Fall 2016

More information

An Initial Study of Overheads of Eddies

An Initial Study of Overheads of Eddies An Initial Study of Overheads of Eddies Amol Deshpande University of California Berkeley, CA USA amol@cs.berkeley.edu Abstract An eddy [2] is a highly adaptive query processing operator that continuously

More information

A Composite Benchmark for Online Transaction Processing and Operational Reporting

A Composite Benchmark for Online Transaction Processing and Operational Reporting A Composite Benchmark for Online Transaction Processing and Operational Reporting Anja Bog, Jens Krüger, Jan Schaffner Hasso Plattner Institute, University of Potsdam August-Bebel-Str 88, 14482 Potsdam,

More information

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10 Onur Kahraman High Performance Is No Longer A Nice To Have In Analytical Applications Users expect Google Like performance from

More information

Query Optimization Overview

Query Optimization Overview Query Optimization Overview parsing, syntax checking semantic checking check existence of referenced relations and attributes disambiguation of overloaded operators check user authorization query rewrites

More information

Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL

Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL Sanjay Gandhi G 1, Dr.Balaji S 2 Associate Professor, Dept. of CSE, VISIT Engg College, Tadepalligudem, Scholar Bangalore

More information

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Vidya Bodhe P.G. Student /Department of CE KKWIEER Nasik, University of Pune, India vidya.jambhulkar@gmail.com Abstract

More information

1 Algebraic Query Optimization

1 Algebraic Query Optimization 1 Algebraic Query Optimization 1.1 Relational Query Languages We have encountered different query languages for relational databases: Relational Algebra Tuple Relational Calculus SQL (Structured Query

More information

CSCI1270 Introduction to Database Systems

CSCI1270 Introduction to Database Systems CSCI1270 Introduction to Database Systems with thanks to Prof. George Kollios, Boston University Prof. Mitch Cherniack, Brandeis University Prof. Avi Silberschatz, Yale University 1.1 What is a Database

More information

Principles of Database Management Systems

Principles of Database Management Systems Principles of Database Management Systems 5: Query Processing Pekka Kilpeläinen (partially based on Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) Query Processing

More information

Shark: Hive (SQL) on Spark

Shark: Hive (SQL) on Spark Shark: Hive (SQL) on Spark Reynold Xin UC Berkeley AMP Camp Aug 21, 2012 UC BERKELEY SELECT page_name, SUM(page_views) views FROM wikistats GROUP BY page_name ORDER BY views DESC LIMIT 10; Stage 0: Map-Shuffle-Reduce

More information

Toward a Progress Indicator for Database Queries

Toward a Progress Indicator for Database Queries Toward a Progress Indicator for Database Queries Gang Luo Jeffrey F. Naughton Curt J. Ellmann Michael W. Watzke University of Wisconsin-Madison NCR Advance Development Lab {gangluo, naughton}@cs.wisc.edu

More information

Avoiding Sorting and Grouping In Processing Queries

Avoiding Sorting and Grouping In Processing Queries Avoiding Sorting and Grouping In Processing Queries Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion Motivation

More information

April Copyright 2013 Cloudera Inc. All rights reserved.

April Copyright 2013 Cloudera Inc. All rights reserved. Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on

More information