Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri
|
|
- Edwin McDaniel
- 5 years ago
- Views:
Transcription
1 Overview of Data Exploration Techniques Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri
2 data exploration not always sure what we are looking for (until we find it) data has always been big volume velocity variety veracity
3 content structure user interaction middleware database kernel
4 user interaction visualization 45 min interfaces prefetching approximation sampling 45 min middleware adaptive[ indexing, loading, storage] kernel 45 min
5 Part 3* *for Part1 and 2 please look at the websites of the tutorial co-authors
6 applications sql algorithms/operators database kernel cpu caches memory gpu data data index disk
7 too many preparation options lead to complex installation schema storage load indexes query timeline expert users - idle time - workload knowledge
8 users/applications declarative interface ask what you want DBA db system
9 users/applications need to choose the proper system db administrator 1 db administrator 2 data system 1 data system 2
10 how can we prepare if we do not know what we are up against? (loading, indexing, storage, )
11 how can we prepare if we do not know what we are up against? (loading, indexing, storage, ) data systems kernels tailored for data exploration no preparation - easy to use - fast
12 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch
13 indexing load tune query tune= create proper indices offline performance X
14 indexing load tune query tune= create proper indices offline performance X but it depends on the workload! which indices to build? on which data parts? and when to build them?
15 big data V s volume velocity variety veracity not enough space to index all data what can go wrong? not enough idle time to finish proper tuning by the time we finish tuning, the workload changes not enough money - energy - resources
16 big data V s volume velocity variety veracity not enough space to index all data what can go wrong? not enough idle time to finish proper tuning by the time we finish tuning, the workload changes not enough money - energy - resources
17 database cracking
18 database cracking idle time workload knowledge external tools human control
19 database cracking auto-tuning database kernels incremental, adaptive, partial indexing idle time workload knowledge external tools human control
20 initialization querying indexing database cracking auto-tuning database kernels incremental, adaptive, partial indexing idle time workload knowledge external tools human control
21 initialization querying indexing database cracking auto-tuning database kernels incremental, adaptive, partial indexing idle time workload knowledge external tools human control
22 initialization querying indexing database cracking auto-tuning database kernels incremental, adaptive, partial indexing every query workload is treated external as an advice human idle time knowledge tools control on how data should be stored
23 column-store database a fixed-width and dense array per attribute relation1/table Database Cracking CIDR 2007
24 column-store database a fixed-width and dense array per attribute relation1/table Database Cracking CIDR 2007
25 column A Q1: select R.A from R where R.A>10 and R.A< Database Cracking CIDR 2007
26 column A Q1: select R.A from R where R.A>10 and R.A< Database Cracking CIDR 2007
27 column A Q1: select R.A from R where R.A>10 and R.A< sort Database Cracking CIDR 2007
28 column A Q1: select R.A from R where R.A>10 and R.A< sort binary search Database Cracking CIDR 2007
29 column A Q1: select R.A from R where R.A>10 and R.A< sort binary search result Database Cracking CIDR 2007
30 column A Q1: select R.A from R where R.A>10 and R.A< sort binary search result time + knowledge Database Cracking CIDR 2007
31 Q1: select R.A from R where R.A>10 and R.A<14 column A Database Cracking CIDR 2007
32 Q1: select R.A from R where R.A>10 and R.A<14 column A piece1: A<=10 Database Cracking CIDR 2007
33 Q1: select R.A from R where R.A>10 and R.A<14 column A piece1: A<=10 piece2: 10<A<14 Database Cracking CIDR 2007
34 Q1: select R.A from R where R.A>10 and R.A<14 column A piece1: A<=10 piece2: 10<A<14 piece3: A>=14 Database Cracking CIDR 2007
35 column A Q1: select R.A from R where R.A>10 and R.A< piece1: A<=10 piece2: 10<A<14 piece3: A>=14 result Database Cracking CIDR 2007
36 gain knowledge on how data is organized column A Q1: select R.A from R where R.A>10 and R.A< piece1: A<=10 piece2: 10<A<14 piece3: A>=14 result Database Cracking CIDR 2007
37 gain knowledge on how data is organized column A Q1: select R.A from R where R.A>10 and R.A< piece1: A<=10 piece2: 10<A<14 piece3: A>=14 result dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007
38 Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<=16 column A piece1: A<=10 piece2: 10<A<14 piece3: A>=14 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007
39 Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<=16 column A piece1: A<=10 piece2: 10<A<14 piece3: A>=14 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007
40 column A Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<= piece1: A<=10 piece2: 10<A<14 piece3: A>= piece1: A<=7 piece2: 7<A<=10 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007
41 column A Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<= piece1: A<=10 piece2: 10<A<14 piece3: A>= piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007
42 column A Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<= piece1: A<=10 piece2: 10<A<14 piece3: A>= piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14 piece4: 14<=A<=16 piece5: A>16 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007
43 column A Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<= piece1: A<=10 piece2: 10<A<14 piece3: A>= piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14 piece4: 14<=A<=16 piece5: A>16 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007
44 the more we crack, the more we learn column A Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<= piece1: A<=10 piece2: 10<A<14 piece3: A>= piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14 piece4: 14<=A<=16 piece5: A>16 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007
45 Database Cracking CIDR 2007
46 select [15,55] Database Cracking CIDR 2007
47 select [15,55] Database Cracking CIDR 2007
48 select [15,55] Database Cracking CIDR 2007
49 select [15,55] select [15,55] Database Cracking CIDR 2007
50 select [15,55] select [15,55] Database Cracking CIDR 2007
51 touch at most two pieces at a time pieces become smaller and smaller select [15,55] select [15,55] Database Cracking CIDR 2007
52 continuous adaptation set-up 100K random selections random selectivity random value ranges in a 10 million integer column Response time (secs) Scan Crack 0.01 Full Index Query sequence (x1000) Database Cracking CIDR 2007
53 continuous adaptation set-up 100K random selections random selectivity random value ranges in a 10 million integer column almost no initialization overhead Response time (secs) Full Index Scan Crack Query sequence (x1000) Database Cracking CIDR 2007
54 continuous adaptation set-up 100K random selections random selectivity random value ranges in a 10 million integer column almost no initialization overhead Response time (secs) Full Index Scan Crack continuous improvement Query sequence (x1000) Database Cracking CIDR 2007
55 continuous adaptation set-up 100K random selections random selectivity random value ranges in a 10 million integer column almost no initialization overhead Response time (secs) Full Index Scan Crack continuous improvement Query sequence (x1000) Database Cracking CIDR 2007
56 continuous adaptation set-up 10K random selections selectivity 10% random value ranges in a 30 million integer column Cumulative average response time (secs) Full Index Scan Crack Query sequence Database Cracking CIDR 2007
57 continuous adaptation set-up 10K random selections selectivity 10% random value ranges in a 30 million integer column Cumulative average response time (secs) Full Index Scan Crack Query sequence Database Cracking CIDR 2007
58 continuous adaptation set-up 10K random selections selectivity 10% random value ranges in a 30 million integer column 10K queries later, Full Index still has not amortized the initialization costs Cumulative average response time (secs) Full Index Scan Crack Query sequence Database Cracking CIDR 2007
59 A table1
60 table
61 select R.A from R where R.A>10 and R.A<14 table
62 select R.A from R where R.A>10 and R.A<14 table1 select max(r.a),max(r.b),max(s.a),max(s.b) from R,S where v1 <R.C<v2 and v3 <R.D<v4 and v5 <R.E<v6 and k1 <S.C<k2 and k3 <S.D<k4 and k5 <S.E<k and R.F = S.F......
63 select R.A from R where R.A>10 and R.A<14 table1 select max(r.a),max(r.b),max(s.a),max(s.b) from R,S where v1 <R.C<v2 and v3 <R.D<v4 and v5 <R.E<v6 and k1 <S.C<k2 and k3 <S.D<k4 and k5 <S.E<k and R.F = S.F updates... joins concurrency control......
64 basics (CIDR07) cracking databases updates (SIGMOD07) >1 columns >1 columns (SIGMOD09) storagerestrictions (SIGMOD09) robustness robustne (PVLDB12) algorithms (PVLDB11) benchmarking (TPCTC10) concurrency control (PVLDB12) adaptive storage (SIGMOD14) time-series (SIGMOD14) hadoop (Yale/Saarland) multi-cores (SIGMOD15) b-trees (HP Labs)
65 cracking tangram base data table 1 as queries arrive... table 2
66 cracking tangram base data table 1 as queries arrive... table 1 table 2 table 2
67 cracking tangram base data table 1 as queries arrive... table 1 partial materialization table 2 table 2
68 cracking tangram base data table 1 as queries arrive... table 1 partial materialization partial indexing table 2 table 2
69 cracking tangram base data table 1 as queries arrive... table 1 partial materialization partial indexing continuous adaptation table 2 table 2
70 cracking tangram base data table 1 as queries arrive... table 1 x partial materialization partial indexing continuous adaptation storage adaptation table 2 table 2 x x
71 cracking tangram base data table 1 as queries arrive... table 1 partial materialization partial indexing continuous adaptation storage adaptation table 2 table 2
72 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction
73 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment
74 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment
75 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment sort in caches
76 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment sort in caches crack joins
77 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 q1 q2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment sort in caches crack joins lightweight locking
78 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 query random partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment sort in caches crack joins lightweight locking stochastic cracking
79 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch
80 loading load tune query copy data inside the database database now has full control slow process not all data might be needed all the time Adaptive Loading, CIDR 11
81 database vs. unix tools single query cost (secs) 1 file, 4 attributes, 1 billion tuples DB Awk Adaptive Loading, CIDR 11
82 database vs. unix tools single query cost (secs) 1 file, 4 attributes, 1 billion tuples DB Awk break down db cost Adaptive Loading, CIDR 11
83 database vs. unix tools single query cost (secs) 1 file, 4 attributes, 1 billion tuples DB Awk break down db cost loading is a major bottleneck Adaptive Loading, CIDR 11
84 database vs. unix tools single query cost (secs) 1 file, 4 attributes, 1 billion tuples DB Awk break down db cost loading is a major bottleneck but writing/maintaining scripts does not scale Adaptive Loading, CIDR 11
85 adaptive loading load/touch only what is needed and only when it is needed Adaptive Loading, CIDR 11
86 but raw data access is expensive tokenizing - parsing - no indexing - no statistics challenge: fast raw data access Adaptive Loading, CIDR 11
87 query plan Adaptive Loading, CIDR 11
88 query plan scan Adaptive Loading, CIDR 11
89 query plan scan db Adaptive Loading, CIDR 11
90 query plan scan loading Adaptive Loading, CIDR 11
91 query plan scan files loading access raw data adaptively on-the-fly Adaptive Loading, CIDR 11
92 query plan scan files access raw data adaptively on-the-fly cache loading Adaptive Loading, CIDR 11
93 selective parsing file indexing file splitting online statistics query plan scan files access raw data adaptively on-the-fly cache loading Adaptive Loading, CIDR 11
94 three graphs from paper NoDB, SIGMOD 2012
95 three graphs from paper NoDB, SIGMOD 2012
96 three graphs from paper NoDB, SIGMOD 2012
97 three graphs from paper NoDB, SIGMOD 2012
98 three graphs from paper NoDB, SIGMOD 2012
99 reducing data-to-query time three graphs from paper NoDB, SIGMOD 2012
100 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch
101 rows & columns row-store column-store H20, SIGMOD 14
102 no fixed optimal solution Execution Time (sec) DBMS-C DBMS-R Attributes Accessed (%) H20, SIGMOD 14
103 rows & columns row-store column-store H20, SIGMOD 14
104 rows & columns row-store column-store hybrid-store H20, SIGMOD 14
105 which layout is best? H20, SIGMOD 14
106 which layout is best? H20, SIGMOD 14
107 which layout is best? H20, SIGMOD 14
108 which layout is best? too many combinations to maintain in parallel H20, SIGMOD 14
109 query cost q(l)= L Â i=1 max(costi IO,costi CPU ) for a given query we can know which layout is best the one that will cause the fewer cache misses H20, SIGMOD 14
110 if we know all queries up front we can choose the layouts adaptive storage: continuously adapt layouts based on incoming queries H20, SIGMOD 14
111 but computing all possible combinations is expensive query select A+B+C+D from R where A<10 and E>10 1. deal only with attributes referenced in queries 2. handle select clause separately from where clause 3. start from pure column-store and build up 4. stop when no improvement possible H20, SIGMOD 14
112 Query&Response&Time&(sec)& 8" 7" 6" 5" 4" 3" 2" 1" 0" Row.store" Column.store" Best.Hybrid" H2O" 0" 5" 10" 15" 20" 25" 30" 35" 40" 45" 50" Query&Sequence& H20, SIGMOD 14
113 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch
114 querying load tune query SQL interface correct and complete answers dbtouch, CIDR 2013
115 querying load tune query complex and slow - not fit for exploration SQL interface correct and complete answers dbtouch, CIDR 2013
116 just touch the data you need dbtouch, CIDR 2013
117 just touch the data you need this is not about query building it is about query processing dbtouch, CIDR 2013
118 cracking example dbtouch CIDR 2013
119 what does this mean for db kernels? dbtouch, CIDR 2013
120 db select R.a from R what does this mean for db kernels? dbtouch, CIDR 2013
121 db select R.a from R what does this mean for db kernels? dbtouch process only what you touch dbtouch, CIDR 2013
122 from touch to query processing touch location (x) v size(width) row ID=(tuples*x)/size storage dbtouch, CIDR 2013
123 select avg(r.c) where R.a=S.b and S.b<20 S.b R.c R.a avg(r.c) R.a=S.b σ(<20) S.b R.a dbtouch, CIDR 2013
124 select avg(r.c) where R.a=S.b and S.b<20 S.b R.c R.a avg(r.c) R.a=S.b σ(<20) S.b R.a dbtouch, CIDR 2013
125 select avg(r.c) where R.a=S.b and S.b<20 S.b R.c R.a avg(r.c) R.a=S.b σ(<20) S.b R.a dbtouch, CIDR 2013
126 visual objects samples hierarchy dbtouch, CIDR 2013
127 visual objects samples hierarchy base data initial sample dbtouch, CIDR 2013
128 visual objects samples hierarchy base data initial sample create incrementally and on demand dbtouch, CIDR 2013
129 dbtouch, CIDR 2013
130 dbtouch, CIDR 2013
131 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch
132 sampling outside the engine application sampling logic/query rewrite Kernel Aqua, VLDB 1999 BlinkDB, Eurosys 2013
133 sampling with SciBORG queries data Kernel maintain hierarchy of biased samples SciBORG, CIDR 2011
134 sampling with SciBORG impression 1 impression 2 impression 3 base column continuously reorganized based on the workload SciBORG, CIDR 2011
135 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch
136 building systems declaratively abstraction performance vision: being able to define system components in a higher level language without significant performance penalty RodentStore, CIDR 2009 Abstraction without regrets, IEEE Data Engin. Bulletin/PVLDB 2014
137 data systems today allow us to answer queries fast db data systems for exploration should allow us to find fast which queries to ask db explore + approximate processing techniques
138 data systems today allow us to answer queries fast db data systems for exploration should allow us to find fast which queries to ask db explore + approximate processing techniques thank you!
Data Systems that are Easy to Design, Tune and Use. Stratos Idreos
Data Systems that are Easy to Design, Tune and Use data systems that are easy to: (years) (months) design & build set-up & tune (hours/days) use e.g., adapt to new applications, new hardware, spin off
More informationNoDB: Querying Raw Data. --Mrutyunjay
NoDB: Querying Raw Data --Mrutyunjay Overview Introduction Motivation NoDB Philosophy: PostgreSQL Results Opportunities NoDB in Action: Adaptive Query Processing on Raw Data Ioannis Alagiannis, Renata
More informationclass 5 column stores 2.0 prof. Stratos Idreos
class 5 column stores 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ worth thinking about what just happened? where is my data? email, cloud, social media, can we design systems
More informationcomplex plans and hybrid layouts
class 7 complex plans and hybrid layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ essential column-stores features virtual ids late tuple reconstruction (if ever) vectorized execution
More informationQuery processing on raw files. Vítor Uwe Reus
Query processing on raw files Vítor Uwe Reus Outline 1. Introduction 2. Adaptive Indexing 3. Hybrid MapReduce 4. NoDB 5. Summary Outline 1. Introduction 2. Adaptive Indexing 3. Hybrid MapReduce 4. NoDB
More informationbasic db architectures & layouts
class 4 basic db architectures & layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ videos for sections 3 & 4 are online check back every week (1-2 sections weekly) there is a schedule
More informationBridging the Processor/Memory Performance Gap in Database Applications
Bridging the Processor/Memory Performance Gap in Database Applications Anastassia Ailamaki Carnegie Mellon http://www.cs.cmu.edu/~natassa Memory Hierarchies PROCESSOR EXECUTION PIPELINE L1 I-CACHE L1 D-CACHE
More informationColumn Stores vs. Row Stores How Different Are They Really?
Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background
More informationclass 6 more about column-store plans and compression prof. Stratos Idreos
class 6 more about column-store plans and compression prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ query compilation an ancient yet new topic/research challenge query->sql->interpet
More informationcolumn-stores basics
class 3 column-stores basics prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ project description is now online First background info will be given this Friday and detailed lecture on Feb 21 Basic Readings
More informationHolistic Indexing in Main-memory Column-stores
Holistic Indexing in Main-memory Column-stores Eleni Petraki CWI Amsterdam petraki@cwi.nl Stratos Idreos Harvard University stratos@seas.harvard.edu Stefan Manegold CWI Amsterdam manegold@cwi.nl ABSTRACT
More informationclass 17 updates prof. Stratos Idreos
class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing selects
More informationcolumn-stores basics
class 3 column-stores basics prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ Goetz Graefe Google Research guest lecture Justin Levandoski Microsoft Research projects option 1: systems project (now
More informationdata systems 101 prof. Stratos Idreos class 2
class 2 data systems 101 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ 2 classes per week - OH/Labs every day 1 presentation/discussion lead - 2 reviews each week research (or systems)
More informationWeaving Relations for Cache Performance
Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon Computer Platforms in 198 Execution PROCESSOR 1 cycles/instruction Data and Instructions cycles
More informationToward timely, predictable and cost-effective data analytics. Renata Borovica-Gajić DIAS, EPFL
Toward timely, predictable and cost-effective data analytics Renata Borovica-Gajić DIAS, EPFL Big data proliferation Big data is when the current technology does not enable users to obtain timely, cost-effective,
More informationH2O: A Hands-free Adaptive Store
H2O: A Hands-free Adaptive Store Ioannis Alagiannis Stratos Idreos Anastasia Ailamaki Ecole Polytechnique Fédérale de Lausanne {ioannis.alagiannis, anastasia.ailamaki}@epfl.ch Harvard University stratos@seas.harvard.edu
More informationParallel In-situ Data Processing with Speculative Loading
Parallel In-situ Data Processing with Speculative Loading Yu Cheng, Florin Rusu University of California, Merced Outline Background Scanraw Operator Speculative Loading Evaluation SAM/BAM Format More than
More informationData Exploration. Heli Helskyaho Seminar on Big Data Management
Data Exploration Heli Helskyaho 21.4.2016 Seminar on Big Data Management References [1] Marcello Buoncristiano, Giansalvatore Mecca, Elisa Quintarelli, Manuel Roveri, Donatello Santoro, Letizia Tanca:
More informationCourse Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led
Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led About this course This four-day instructor-led course provides students who manage and maintain SQL Server databases
More informationCitation for published version (APA): Ydraios, E. (2010). Database cracking: towards auto-tunning database kernels
UvA-DARE (Digital Academic Repository) Database cracking: towards auto-tunning database kernels Ydraios, E. Link to publication Citation for published version (APA): Ydraios, E. (2010). Database cracking:
More informationsystems & research project
class 4 systems & research project prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ index index knows order about the data data filtering data: point/range queries index data A B C sorted A B C initial
More informationData Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More information[MS10987A]: Performance Tuning and Optimizing SQL Databases
[MS10987A]: Performance Tuning and Optimizing SQL Databases Length : 4 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server Delivery Method : Instructor-led (Classroom) Course
More informationData Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationdata systems 101 prof. Stratos Idreos class 2
class 2 data systems 101 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ big data V s (it is not about size only) volume velocity variety veracity actually none of that is really new
More informationDBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki
DBMS Data Loading: An Analysis on Modern Hardware Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki Data loading: A necessary evil Volume => Expensive 4 zettabytes
More informationclass 8 b-trees prof. Stratos Idreos
class 8 b-trees prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ I spend a lot of time debugging am I doing something wrong? maybe but probably not 1. learn to use gdb 2. after spending
More informationPerformance Tuning & Optimizing SQL Databases Microsoft Official Curriculum (MOC 10987)
Performance Tuning & Optimizing SQL Databases Microsoft Official Curriculum (MOC 10987) Course Length: 4 days Course Delivery: Traditional Classroom Online Live Course Overview This 4-day instructor-led
More informationNoDB: Efficient Query Execution on Raw Data Files
NoDB: Efficient Query Execution on Raw Data Files Ioannis Alagiannis Renata Borovica-Gajic Miguel Branco Stratos Idreos Anastasia Ailamaki Ecole Polytechnique Fédérale de Lausanne {ioannisalagiannis, renataborovica,
More informationSegregating Data Within Databases for Performance Prepared by Bill Hulsizer
Segregating Data Within Databases for Performance Prepared by Bill Hulsizer When designing databases, segregating data within tables is usually important and sometimes very important. The higher the volume
More informationBenchmarking Adaptive Indexing
Benchmarking Adaptive Indexing Goetz Graefe 2, Stratos Idreos 1, Harumi Kuno 2, and Stefan Manegold 1 1 CWI Amsterdam The Netherlands first.last@cwi.nl 2 Hewlett-Packard Laboratories Palo Alto, CA first.last@hp.com
More informationHow Achaeans Would Construct Columns in Troy. Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte
How Achaeans Would Construct Columns in Troy Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte Number of Visas Received 1 0,75 0,5 0,25 0 Alekh Jens Health Level
More informationDB2 Performance Essentials
DB2 Performance Essentials Philip K. Gunning Certified Advanced DB2 Expert Consultant, Lecturer, Author DISCLAIMER This material references numerous hardware and software products by their trade names.
More informationStorage hierarchy. Textbook: chapters 11, 12, and 13
Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular
More informationMIS Database Systems.
MIS 335 - Database Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query in a Database
More informationBIS Database Management Systems.
BIS 512 - Database Management Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query
More informationSQL Server Administration 10987: Performance Tuning and Optimizing SQL Databases. Upcoming Dates. Course Description.
SQL Server Administration 10987: Performance Tuning and Optimizing SQL Databases Learn the high level architectural overview of SQL Server 2016 and explore SQL Server execution model, waits and queues
More informationTriangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017
Triangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017 Joe Sack, Principal Program Manager, Microsoft Joe.Sack@Microsoft.com Adaptability Adapt based on customer
More informationColumn-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi
Column-Stores vs. Row-Stores How Different are they Really? Arul Bharathi Authors Daniel J.Abadi Samuel R. Madden Nabil Hachem 2 Contents Introduction Row Oriented Execution Column Oriented Execution Column-Store
More information6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing:
Lab 2 -- Due Weds. Project meeting sign ups 6.830 Lecture 8 10/2/2017 Recap column stores & paper Join Processing: S = {S} tuples, S pages R = {R} tuples, R pages R < S M pages of memory Types of joins
More informationHyrise - a Main Memory Hybrid Storage Engine
Hyrise - a Main Memory Hybrid Storage Engine Philippe Cudré-Mauroux exascale Infolab U. of Fribourg - Switzerland & MIT joint work w/ Martin Grund, Jens Krueger, Hasso Plattner, Alexander Zeier (HPI) and
More informationAndrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs
Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why
More informationChapter 9. Cardinality Estimation. How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11
Chapter 9 How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11 Wilhelm-Schickard-Institut für Informatik Universität Tübingen 9.1 Web Forms Applications
More informationCost Models. the query database statistics description of computational resources, e.g.
Cost Models An optimizer estimates costs for plans so that it can choose the least expensive plan from a set of alternatives. Inputs to the cost model include: the query database statistics description
More informationAdaptivity. Luca Schroeder & Thomas Lively
Adaptivity Luca Schroeder & Thomas Lively H2O: A Hands-free Adaptive Store. Ioannis Alagiannis, Stratos Idreos and Anastassia Ailamaki ACM SIGMOD International Conference on Data Management, 2014 Three
More informationSelf Tuning Databases
Sections Self Tuning Databases Projects from Microsoft and IBM 1. Microsoft s Database Research Group 1. Situation and Vision 2. The AutoAdmin Project 3. Strategies for Index Selection 4. Components of
More informationDatabasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh
Databasesystemer, forår 2005 IT Universitetet i København Forelæsning 8: Database effektivitet. 31. marts 2005 Forelæser: Rasmus Pagh Today s lecture Database efficiency Indexing Schema tuning 1 Database
More informationDatenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016
Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List
More information7. Query Processing and Optimization
7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one
More informationPerformance in the Multicore Era
Performance in the Multicore Era Gustavo Alonso Systems Group -- ETH Zurich, Switzerland Systems Group Enterprise Computing Center Performance in the multicore era 2 BACKGROUND - SWISSBOX SwissBox: An
More informationAdvanced Database Systems
Lecture II Storage Layer Kyumars Sheykh Esmaili Course s Syllabus Core Topics Storage Layer Query Processing and Optimization Transaction Management and Recovery Advanced Topics Cloud Computing and Web
More informationDATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11
DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance
More informationOutline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012
Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 6 April 12, 2012 1 Acknowledgements: The slides are provided by Nikolaus Augsten
More informationHybrid Shipping Architectures: A Survey
Hybrid Shipping Architectures: A Survey Ivan Bowman itbowman@acm.org http://plg.uwaterloo.ca/~itbowman CS748T 14 Feb 2000 Outline Partitioning query processing Partitioning client code Optimization of
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due
More informationCSE 544: Principles of Database Systems
CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;
More informationChapter 17: Parallel Databases
Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems
More informationColumn-Oriented Database Systems. Liliya Rudko University of Helsinki
Column-Oriented Database Systems Liliya Rudko University of Helsinki 2 Contents 1. Introduction 2. Storage engines 2.1 Evolutionary Column-Oriented Storage (ECOS) 2.2 HYRISE 3. Database management systems
More informationParallel In-situ Data Processing Techniques
Parallel In-situ Data Processing Techniques Florin Rusu, Yu Cheng University of California, Merced Outline Background SCANRAW Operator Speculative Loading Evaluation Astronomy: FITS Format Sloan Digital
More informationColumn-Stores vs. Row-Stores: How Different Are They Really?
Column-Stores vs. Row-Stores: How Different Are They Really? Daniel Abadi, Samuel Madden, Nabil Hachem Presented by Guozhang Wang November 18 th, 2008 Several slides are from Daniel Abadi and Michael Stonebraker
More informationIntroduction to Database Systems CSE 344
Introduction to Database Systems CSE 344 Lecture 6: Basic Query Evaluation and Indexes 1 Announcements Webquiz 2 is due on Tuesday (01/21) Homework 2 is posted, due week from Monday (01/27) Today: query
More informationAdvanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Advanced Databases Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch www.mniazi.ir Parallel Join The join operation requires pairs of tuples to be
More informationB.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2
Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,
More informationAnastasia Ailamaki. Performance and energy analysis using transactional workloads
Performance and energy analysis using transactional workloads Anastasia Ailamaki EPFL and RAW Labs SA students: Danica Porobic, Utku Sirin, and Pinar Tozun Online Transaction Processing $2B+ industry Characteristics:
More informationOptimization Overview
Lecture 17 Optimization Overview Lecture 17 Lecture 17 Today s Lecture 1. Logical Optimization 2. Physical Optimization 3. Course Summary 2 Lecture 17 Logical vs. Physical Optimization Logical optimization:
More informationJignesh M. Patel. Blog:
Jignesh M. Patel Blog: http://bigfastdata.blogspot.com Go back to the design Query Cache from Processing for Conscious 98s Modern (at Algorithms Hardware least for Hash Joins) 995 24 2 Processor Processor
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 10: Basics of Data Storage and Indexes 1 Reminder HW3 is due next Tuesday 2 Motivation My database application is too slow why? One of the queries is very slow why? To
More informationHYRISE In-Memory Storage Engine
HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University
More information! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large
Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!
More informationChapter 20: Parallel Databases
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 20: Parallel Databases. Introduction
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationPerformance Tuning and Optimizing SQL Databases (10987)
Performance Tuning and Optimizing SQL Databases (10987) Formato do curso: Presencial Preço: 1420 Nível: Avançado Duração: 28 horas This four-day instructor-led course provides students who manage and maintain
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables
More informationModule 4. Implementation of XQuery. Part 0: Background on relational query processing
Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:
More informationCSE 344 FEBRUARY 14 TH INDEXING
CSE 344 FEBRUARY 14 TH INDEXING EXAM Grades posted to Canvas Exams handed back in section tomorrow Regrades: Friday office hours EXAM Overall, you did well Average: 79 Remember: lowest between midterm/final
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationChapter 18: Parallel Databases
Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery
More informationChapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction
Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of
More informationI am: Rana Faisal Munir
Self-tuning BI Systems Home University (UPC): Alberto Abelló and Oscar Romero Host University (TUD): Maik Thiele and Wolfgang Lehner I am: Rana Faisal Munir Research Progress Report (RPR) [1 / 44] Introduction
More informationDASlab: The Data Systems Laboratory
DASlab: The Data Systems Laboratory at Harvard SEAS Stratos Idreos Harvard University http://daslab.seas.harvard.edu ABSTRACT DASlab is a new laboratory at the Harvard School of Engineering and Applied
More informationIndexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel
Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes
More informationCrescando: Predictable Performance for Unpredictable Workloads
Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing
More informationclass 13 scans vs indexes prof. Stratos Idreos
class 13 scans vs indexes prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ b-tree - dynamic tree - always balanced 35,50 35, 12,20 50, 1,2,3 12,15,17 20, Stratos Idreos 2 /24 select from
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 10-11: Basics of Data Storage and Indexes (Ch. 8.3-4, 14.1-1.7, & skim 14.2-3) 1 Announcements No WQ this week WQ4 is due next Thursday HW3 is due next Tuesday should be
More informationclass 12 b-trees 2.0 prof. Stratos Idreos
class 12 b-trees 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ A B C A B C clustered/primary index on A Stratos Idreos /26 2 A B C A B C clustered/primary index on A pos C pos
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 15-16: Basics of Data Storage and Indexes (Ch. 8.3-4, 14.1-1.7, & skim 14.2-3) 1 Announcements Midterm on Monday, November 6th, in class Allow 1 page of notes (both sides,
More informationAdministrivia Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications
Administrivia Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications HW6 is due right now. HW7 is out today Phase 1: Wed Nov 9 th Phase 2: Mon Nov 28 th C. Faloutsos A. Pavlo Lecture#18:
More informationWeaving Relations for Cache Performance
Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon David DeWitt, Mark Hill, and Marios Skounakis University of Wisconsin-Madison Memory Hierarchies PROCESSOR EXECUTION PIPELINE
More informationPrinciples of Data Management. Lecture #9 (Query Processing Overview)
Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm
More informationColumn-Stores vs. Row-Stores: How Different Are They Really?
Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi, Samuel Madden and Nabil Hachem SIGMOD 2008 Presented by: Souvik Pal Subhro Bhattacharyya Department of Computer Science Indian
More informationCSC 261/461 Database Systems Lecture 19
CSC 261/461 Database Systems Lecture 19 Fall 2017 Announcements CIRC: CIRC is down!!! MongoDB and Spark (mini) projects are at stake. L Project 1 Milestone 4 is out Due date: Last date of class We will
More informationLecture 12. Lecture 12: The IO Model & External Sorting
Lecture 12 Lecture 12: The IO Model & External Sorting Announcements Announcements 1. Thank you for the great feedback (post coming soon)! 2. Educational goals: 1. Tech changes, principles change more
More informationINSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados
-------------------------------------------------------------------------------------------------------------- INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados Exam 1 - solution
More informationPerformance Monitoring
Performance Monitoring Performance Monitoring Goals Monitoring should check that the performanceinfluencing database parameters are correctly set and if they are not, it should point to where the problems
More informationGoals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query
Goals for Today CS 133: Databases Fall 2018 Lec 02 09/06 Relational Model & Memory and Buffer Manager Prof. Beth Trushkowsky Reason about the conceptual evaluation of an SQL query Understand the storage
More informationHash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example
Student Introduction to Database Systems CSE 414 Hash table example Index Student_ID on Student.ID Data File Student 10 Tom Hanks 10 20 20 Amy Hanks ID fname lname 10 Tom Hanks 20 Amy Hanks Lecture 26:
More informationReCache: Reactive Caching for Fast Analytics over Heterogeneous Data
: Reactive Caching for Fast Analytics over Heterogeneous Data Tahir Azim, Manos Karpathiotakis and Anastasia Ailamaki École Polytechnique Fédérale de Lausanne RAW Labs SA {tahir.azim, manos.karpathiotakis,
More informationJyotheswar Kuricheti
Jyotheswar Kuricheti 1 Agenda: 1. Performance Tuning Overview 2. Identify Bottlenecks 3. Optimizing at different levels : Target Source Mapping Session System 2 3 Performance Tuning Overview: 4 What is
More informationOracle Database 10g The Self-Managing Database
Oracle Database 10g The Self-Managing Database Benoit Dageville Oracle Corporation benoit.dageville@oracle.com Page 1 1 Agenda Oracle10g: Oracle s first generation of self-managing database Oracle s Approach
More information