Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri

Size: px
Start display at page:

Download "Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri"

Transcription

1 Overview of Data Exploration Techniques Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri

2 data exploration not always sure what we are looking for (until we find it) data has always been big volume velocity variety veracity

3 content structure user interaction middleware database kernel

4 user interaction visualization 45 min interfaces prefetching approximation sampling 45 min middleware adaptive[ indexing, loading, storage] kernel 45 min

5 Part 3* *for Part1 and 2 please look at the websites of the tutorial co-authors

6 applications sql algorithms/operators database kernel cpu caches memory gpu data data index disk

7 too many preparation options lead to complex installation schema storage load indexes query timeline expert users - idle time - workload knowledge

8 users/applications declarative interface ask what you want DBA db system

9 users/applications need to choose the proper system db administrator 1 db administrator 2 data system 1 data system 2

10 how can we prepare if we do not know what we are up against? (loading, indexing, storage, )

11 how can we prepare if we do not know what we are up against? (loading, indexing, storage, ) data systems kernels tailored for data exploration no preparation - easy to use - fast

12 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch

13 indexing load tune query tune= create proper indices offline performance X

14 indexing load tune query tune= create proper indices offline performance X but it depends on the workload! which indices to build? on which data parts? and when to build them?

15 big data V s volume velocity variety veracity not enough space to index all data what can go wrong? not enough idle time to finish proper tuning by the time we finish tuning, the workload changes not enough money - energy - resources

16 big data V s volume velocity variety veracity not enough space to index all data what can go wrong? not enough idle time to finish proper tuning by the time we finish tuning, the workload changes not enough money - energy - resources

17 database cracking

18 database cracking idle time workload knowledge external tools human control

19 database cracking auto-tuning database kernels incremental, adaptive, partial indexing idle time workload knowledge external tools human control

20 initialization querying indexing database cracking auto-tuning database kernels incremental, adaptive, partial indexing idle time workload knowledge external tools human control

21 initialization querying indexing database cracking auto-tuning database kernels incremental, adaptive, partial indexing idle time workload knowledge external tools human control

22 initialization querying indexing database cracking auto-tuning database kernels incremental, adaptive, partial indexing every query workload is treated external as an advice human idle time knowledge tools control on how data should be stored

23 column-store database a fixed-width and dense array per attribute relation1/table Database Cracking CIDR 2007

24 column-store database a fixed-width and dense array per attribute relation1/table Database Cracking CIDR 2007

25 column A Q1: select R.A from R where R.A>10 and R.A< Database Cracking CIDR 2007

26 column A Q1: select R.A from R where R.A>10 and R.A< Database Cracking CIDR 2007

27 column A Q1: select R.A from R where R.A>10 and R.A< sort Database Cracking CIDR 2007

28 column A Q1: select R.A from R where R.A>10 and R.A< sort binary search Database Cracking CIDR 2007

29 column A Q1: select R.A from R where R.A>10 and R.A< sort binary search result Database Cracking CIDR 2007

30 column A Q1: select R.A from R where R.A>10 and R.A< sort binary search result time + knowledge Database Cracking CIDR 2007

31 Q1: select R.A from R where R.A>10 and R.A<14 column A Database Cracking CIDR 2007

32 Q1: select R.A from R where R.A>10 and R.A<14 column A piece1: A<=10 Database Cracking CIDR 2007

33 Q1: select R.A from R where R.A>10 and R.A<14 column A piece1: A<=10 piece2: 10<A<14 Database Cracking CIDR 2007

34 Q1: select R.A from R where R.A>10 and R.A<14 column A piece1: A<=10 piece2: 10<A<14 piece3: A>=14 Database Cracking CIDR 2007

35 column A Q1: select R.A from R where R.A>10 and R.A< piece1: A<=10 piece2: 10<A<14 piece3: A>=14 result Database Cracking CIDR 2007

36 gain knowledge on how data is organized column A Q1: select R.A from R where R.A>10 and R.A< piece1: A<=10 piece2: 10<A<14 piece3: A>=14 result Database Cracking CIDR 2007

37 gain knowledge on how data is organized column A Q1: select R.A from R where R.A>10 and R.A< piece1: A<=10 piece2: 10<A<14 piece3: A>=14 result dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007

38 Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<=16 column A piece1: A<=10 piece2: 10<A<14 piece3: A>=14 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007

39 Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<=16 column A piece1: A<=10 piece2: 10<A<14 piece3: A>=14 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007

40 column A Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<= piece1: A<=10 piece2: 10<A<14 piece3: A>= piece1: A<=7 piece2: 7<A<=10 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007

41 column A Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<= piece1: A<=10 piece2: 10<A<14 piece3: A>= piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007

42 column A Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<= piece1: A<=10 piece2: 10<A<14 piece3: A>= piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14 piece4: 14<=A<=16 piece5: A>16 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007

43 column A Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<= piece1: A<=10 piece2: 10<A<14 piece3: A>= piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14 piece4: 14<=A<=16 piece5: A>16 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007

44 the more we crack, the more we learn column A Q1: select R.A from R where R.A>10 and R.A<14 Q2: select R.A from R where R.A>7 and R.A<= piece1: A<=10 piece2: 10<A<14 piece3: A>= piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14 piece4: 14<=A<=16 piece5: A>16 dynamically/on-the-fly within the select-operator Database Cracking CIDR 2007

45 Database Cracking CIDR 2007

46 select [15,55] Database Cracking CIDR 2007

47 select [15,55] Database Cracking CIDR 2007

48 select [15,55] Database Cracking CIDR 2007

49 select [15,55] select [15,55] Database Cracking CIDR 2007

50 select [15,55] select [15,55] Database Cracking CIDR 2007

51 touch at most two pieces at a time pieces become smaller and smaller select [15,55] select [15,55] Database Cracking CIDR 2007

52 continuous adaptation set-up 100K random selections random selectivity random value ranges in a 10 million integer column Response time (secs) Scan Crack 0.01 Full Index Query sequence (x1000) Database Cracking CIDR 2007

53 continuous adaptation set-up 100K random selections random selectivity random value ranges in a 10 million integer column almost no initialization overhead Response time (secs) Full Index Scan Crack Query sequence (x1000) Database Cracking CIDR 2007

54 continuous adaptation set-up 100K random selections random selectivity random value ranges in a 10 million integer column almost no initialization overhead Response time (secs) Full Index Scan Crack continuous improvement Query sequence (x1000) Database Cracking CIDR 2007

55 continuous adaptation set-up 100K random selections random selectivity random value ranges in a 10 million integer column almost no initialization overhead Response time (secs) Full Index Scan Crack continuous improvement Query sequence (x1000) Database Cracking CIDR 2007

56 continuous adaptation set-up 10K random selections selectivity 10% random value ranges in a 30 million integer column Cumulative average response time (secs) Full Index Scan Crack Query sequence Database Cracking CIDR 2007

57 continuous adaptation set-up 10K random selections selectivity 10% random value ranges in a 30 million integer column Cumulative average response time (secs) Full Index Scan Crack Query sequence Database Cracking CIDR 2007

58 continuous adaptation set-up 10K random selections selectivity 10% random value ranges in a 30 million integer column 10K queries later, Full Index still has not amortized the initialization costs Cumulative average response time (secs) Full Index Scan Crack Query sequence Database Cracking CIDR 2007

59 A table1

60 table

61 select R.A from R where R.A>10 and R.A<14 table

62 select R.A from R where R.A>10 and R.A<14 table1 select max(r.a),max(r.b),max(s.a),max(s.b) from R,S where v1 <R.C<v2 and v3 <R.D<v4 and v5 <R.E<v6 and k1 <S.C<k2 and k3 <S.D<k4 and k5 <S.E<k and R.F = S.F......

63 select R.A from R where R.A>10 and R.A<14 table1 select max(r.a),max(r.b),max(s.a),max(s.b) from R,S where v1 <R.C<v2 and v3 <R.D<v4 and v5 <R.E<v6 and k1 <S.C<k2 and k3 <S.D<k4 and k5 <S.E<k and R.F = S.F updates... joins concurrency control......

64 basics (CIDR07) cracking databases updates (SIGMOD07) >1 columns >1 columns (SIGMOD09) storagerestrictions (SIGMOD09) robustness robustne (PVLDB12) algorithms (PVLDB11) benchmarking (TPCTC10) concurrency control (PVLDB12) adaptive storage (SIGMOD14) time-series (SIGMOD14) hadoop (Yale/Saarland) multi-cores (SIGMOD15) b-trees (HP Labs)

65 cracking tangram base data table 1 as queries arrive... table 2

66 cracking tangram base data table 1 as queries arrive... table 1 table 2 table 2

67 cracking tangram base data table 1 as queries arrive... table 1 partial materialization table 2 table 2

68 cracking tangram base data table 1 as queries arrive... table 1 partial materialization partial indexing table 2 table 2

69 cracking tangram base data table 1 as queries arrive... table 1 partial materialization partial indexing continuous adaptation table 2 table 2

70 cracking tangram base data table 1 as queries arrive... table 1 x partial materialization partial indexing continuous adaptation storage adaptation table 2 table 2 x x

71 cracking tangram base data table 1 as queries arrive... table 1 partial materialization partial indexing continuous adaptation storage adaptation table 2 table 2

72 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction

73 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment

74 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment

75 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment sort in caches

76 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment sort in caches crack joins

77 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 q1 q2 partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment sort in caches crack joins lightweight locking

78 cracking tangram base data table 1 table 2 as queries arrive... table 1 table 2 query random partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment sort in caches crack joins lightweight locking stochastic cracking

79 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch

80 loading load tune query copy data inside the database database now has full control slow process not all data might be needed all the time Adaptive Loading, CIDR 11

81 database vs. unix tools single query cost (secs) 1 file, 4 attributes, 1 billion tuples DB Awk Adaptive Loading, CIDR 11

82 database vs. unix tools single query cost (secs) 1 file, 4 attributes, 1 billion tuples DB Awk break down db cost Adaptive Loading, CIDR 11

83 database vs. unix tools single query cost (secs) 1 file, 4 attributes, 1 billion tuples DB Awk break down db cost loading is a major bottleneck Adaptive Loading, CIDR 11

84 database vs. unix tools single query cost (secs) 1 file, 4 attributes, 1 billion tuples DB Awk break down db cost loading is a major bottleneck but writing/maintaining scripts does not scale Adaptive Loading, CIDR 11

85 adaptive loading load/touch only what is needed and only when it is needed Adaptive Loading, CIDR 11

86 but raw data access is expensive tokenizing - parsing - no indexing - no statistics challenge: fast raw data access Adaptive Loading, CIDR 11

87 query plan Adaptive Loading, CIDR 11

88 query plan scan Adaptive Loading, CIDR 11

89 query plan scan db Adaptive Loading, CIDR 11

90 query plan scan loading Adaptive Loading, CIDR 11

91 query plan scan files loading access raw data adaptively on-the-fly Adaptive Loading, CIDR 11

92 query plan scan files access raw data adaptively on-the-fly cache loading Adaptive Loading, CIDR 11

93 selective parsing file indexing file splitting online statistics query plan scan files access raw data adaptively on-the-fly cache loading Adaptive Loading, CIDR 11

94 three graphs from paper NoDB, SIGMOD 2012

95 three graphs from paper NoDB, SIGMOD 2012

96 three graphs from paper NoDB, SIGMOD 2012

97 three graphs from paper NoDB, SIGMOD 2012

98 three graphs from paper NoDB, SIGMOD 2012

99 reducing data-to-query time three graphs from paper NoDB, SIGMOD 2012

100 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch

101 rows & columns row-store column-store H20, SIGMOD 14

102 no fixed optimal solution Execution Time (sec) DBMS-C DBMS-R Attributes Accessed (%) H20, SIGMOD 14

103 rows & columns row-store column-store H20, SIGMOD 14

104 rows & columns row-store column-store hybrid-store H20, SIGMOD 14

105 which layout is best? H20, SIGMOD 14

106 which layout is best? H20, SIGMOD 14

107 which layout is best? H20, SIGMOD 14

108 which layout is best? too many combinations to maintain in parallel H20, SIGMOD 14

109 query cost q(l)= L Â i=1 max(costi IO,costi CPU ) for a given query we can know which layout is best the one that will cause the fewer cache misses H20, SIGMOD 14

110 if we know all queries up front we can choose the layouts adaptive storage: continuously adapt layouts based on incoming queries H20, SIGMOD 14

111 but computing all possible combinations is expensive query select A+B+C+D from R where A<10 and E>10 1. deal only with attributes referenced in queries 2. handle select clause separately from where clause 3. start from pure column-store and build up 4. stop when no improvement possible H20, SIGMOD 14

112 Query&Response&Time&(sec)& 8" 7" 6" 5" 4" 3" 2" 1" 0" Row.store" Column.store" Best.Hybrid" H2O" 0" 5" 10" 15" 20" 25" 30" 35" 40" 45" 50" Query&Sequence& H20, SIGMOD 14

113 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch

114 querying load tune query SQL interface correct and complete answers dbtouch, CIDR 2013

115 querying load tune query complex and slow - not fit for exploration SQL interface correct and complete answers dbtouch, CIDR 2013

116 just touch the data you need dbtouch, CIDR 2013

117 just touch the data you need this is not about query building it is about query processing dbtouch, CIDR 2013

118 cracking example dbtouch CIDR 2013

119 what does this mean for db kernels? dbtouch, CIDR 2013

120 db select R.a from R what does this mean for db kernels? dbtouch, CIDR 2013

121 db select R.a from R what does this mean for db kernels? dbtouch process only what you touch dbtouch, CIDR 2013

122 from touch to query processing touch location (x) v size(width) row ID=(tuples*x)/size storage dbtouch, CIDR 2013

123 select avg(r.c) where R.a=S.b and S.b<20 S.b R.c R.a avg(r.c) R.a=S.b σ(<20) S.b R.a dbtouch, CIDR 2013

124 select avg(r.c) where R.a=S.b and S.b<20 S.b R.c R.a avg(r.c) R.a=S.b σ(<20) S.b R.a dbtouch, CIDR 2013

125 select avg(r.c) where R.a=S.b and S.b<20 S.b R.c R.a avg(r.c) R.a=S.b σ(<20) S.b R.a dbtouch, CIDR 2013

126 visual objects samples hierarchy dbtouch, CIDR 2013

127 visual objects samples hierarchy base data initial sample dbtouch, CIDR 2013

128 visual objects samples hierarchy base data initial sample create incrementally and on demand dbtouch, CIDR 2013

129 dbtouch, CIDR 2013

130 dbtouch, CIDR 2013

131 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch

132 sampling outside the engine application sampling logic/query rewrite Kernel Aqua, VLDB 1999 BlinkDB, Eurosys 2013

133 sampling with SciBORG queries data Kernel maintain hierarchy of biased samples SciBORG, CIDR 2011

134 sampling with SciBORG impression 1 impression 2 impression 3 base column continuously reorganized based on the workload SciBORG, CIDR 2011

135 adaptive loading adaptive indexing adaptive storage vision declarative and flexible systems sampling dbtouch

136 building systems declaratively abstraction performance vision: being able to define system components in a higher level language without significant performance penalty RodentStore, CIDR 2009 Abstraction without regrets, IEEE Data Engin. Bulletin/PVLDB 2014

137 data systems today allow us to answer queries fast db data systems for exploration should allow us to find fast which queries to ask db explore + approximate processing techniques

138 data systems today allow us to answer queries fast db data systems for exploration should allow us to find fast which queries to ask db explore + approximate processing techniques thank you!

Data Systems that are Easy to Design, Tune and Use. Stratos Idreos

Data Systems that are Easy to Design, Tune and Use. Stratos Idreos Data Systems that are Easy to Design, Tune and Use data systems that are easy to: (years) (months) design & build set-up & tune (hours/days) use e.g., adapt to new applications, new hardware, spin off

More information

NoDB: Querying Raw Data. --Mrutyunjay

NoDB: Querying Raw Data. --Mrutyunjay NoDB: Querying Raw Data --Mrutyunjay Overview Introduction Motivation NoDB Philosophy: PostgreSQL Results Opportunities NoDB in Action: Adaptive Query Processing on Raw Data Ioannis Alagiannis, Renata

More information

class 5 column stores 2.0 prof. Stratos Idreos

class 5 column stores 2.0 prof. Stratos Idreos class 5 column stores 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ worth thinking about what just happened? where is my data? email, cloud, social media, can we design systems

More information

complex plans and hybrid layouts

complex plans and hybrid layouts class 7 complex plans and hybrid layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ essential column-stores features virtual ids late tuple reconstruction (if ever) vectorized execution

More information

Query processing on raw files. Vítor Uwe Reus

Query processing on raw files. Vítor Uwe Reus Query processing on raw files Vítor Uwe Reus Outline 1. Introduction 2. Adaptive Indexing 3. Hybrid MapReduce 4. NoDB 5. Summary Outline 1. Introduction 2. Adaptive Indexing 3. Hybrid MapReduce 4. NoDB

More information

basic db architectures & layouts

basic db architectures & layouts class 4 basic db architectures & layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ videos for sections 3 & 4 are online check back every week (1-2 sections weekly) there is a schedule

More information

Bridging the Processor/Memory Performance Gap in Database Applications

Bridging the Processor/Memory Performance Gap in Database Applications Bridging the Processor/Memory Performance Gap in Database Applications Anastassia Ailamaki Carnegie Mellon http://www.cs.cmu.edu/~natassa Memory Hierarchies PROCESSOR EXECUTION PIPELINE L1 I-CACHE L1 D-CACHE

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

class 6 more about column-store plans and compression prof. Stratos Idreos

class 6 more about column-store plans and compression prof. Stratos Idreos class 6 more about column-store plans and compression prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ query compilation an ancient yet new topic/research challenge query->sql->interpet

More information

column-stores basics

column-stores basics class 3 column-stores basics prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ project description is now online First background info will be given this Friday and detailed lecture on Feb 21 Basic Readings

More information

Holistic Indexing in Main-memory Column-stores

Holistic Indexing in Main-memory Column-stores Holistic Indexing in Main-memory Column-stores Eleni Petraki CWI Amsterdam petraki@cwi.nl Stratos Idreos Harvard University stratos@seas.harvard.edu Stefan Manegold CWI Amsterdam manegold@cwi.nl ABSTRACT

More information

class 17 updates prof. Stratos Idreos

class 17 updates prof. Stratos Idreos class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing selects

More information

column-stores basics

column-stores basics class 3 column-stores basics prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ Goetz Graefe Google Research guest lecture Justin Levandoski Microsoft Research projects option 1: systems project (now

More information

data systems 101 prof. Stratos Idreos class 2

data systems 101 prof. Stratos Idreos class 2 class 2 data systems 101 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ 2 classes per week - OH/Labs every day 1 presentation/discussion lead - 2 reviews each week research (or systems)

More information

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon Computer Platforms in 198 Execution PROCESSOR 1 cycles/instruction Data and Instructions cycles

More information

Toward timely, predictable and cost-effective data analytics. Renata Borovica-Gajić DIAS, EPFL

Toward timely, predictable and cost-effective data analytics. Renata Borovica-Gajić DIAS, EPFL Toward timely, predictable and cost-effective data analytics Renata Borovica-Gajić DIAS, EPFL Big data proliferation Big data is when the current technology does not enable users to obtain timely, cost-effective,

More information

H2O: A Hands-free Adaptive Store

H2O: A Hands-free Adaptive Store H2O: A Hands-free Adaptive Store Ioannis Alagiannis Stratos Idreos Anastasia Ailamaki Ecole Polytechnique Fédérale de Lausanne {ioannis.alagiannis, anastasia.ailamaki}@epfl.ch Harvard University stratos@seas.harvard.edu

More information

Parallel In-situ Data Processing with Speculative Loading

Parallel In-situ Data Processing with Speculative Loading Parallel In-situ Data Processing with Speculative Loading Yu Cheng, Florin Rusu University of California, Merced Outline Background Scanraw Operator Speculative Loading Evaluation SAM/BAM Format More than

More information

Data Exploration. Heli Helskyaho Seminar on Big Data Management

Data Exploration. Heli Helskyaho Seminar on Big Data Management Data Exploration Heli Helskyaho 21.4.2016 Seminar on Big Data Management References [1] Marcello Buoncristiano, Giansalvatore Mecca, Elisa Quintarelli, Manuel Roveri, Donatello Santoro, Letizia Tanca:

More information

Course Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led

Course Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led About this course This four-day instructor-led course provides students who manage and maintain SQL Server databases

More information

Citation for published version (APA): Ydraios, E. (2010). Database cracking: towards auto-tunning database kernels

Citation for published version (APA): Ydraios, E. (2010). Database cracking: towards auto-tunning database kernels UvA-DARE (Digital Academic Repository) Database cracking: towards auto-tunning database kernels Ydraios, E. Link to publication Citation for published version (APA): Ydraios, E. (2010). Database cracking:

More information

systems & research project

systems & research project class 4 systems & research project prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ index index knows order about the data data filtering data: point/range queries index data A B C sorted A B C initial

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

[MS10987A]: Performance Tuning and Optimizing SQL Databases

[MS10987A]: Performance Tuning and Optimizing SQL Databases [MS10987A]: Performance Tuning and Optimizing SQL Databases Length : 4 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server Delivery Method : Instructor-led (Classroom) Course

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

data systems 101 prof. Stratos Idreos class 2

data systems 101 prof. Stratos Idreos class 2 class 2 data systems 101 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ big data V s (it is not about size only) volume velocity variety veracity actually none of that is really new

More information

DBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki

DBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki DBMS Data Loading: An Analysis on Modern Hardware Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki Data loading: A necessary evil Volume => Expensive 4 zettabytes

More information

class 8 b-trees prof. Stratos Idreos

class 8 b-trees prof. Stratos Idreos class 8 b-trees prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ I spend a lot of time debugging am I doing something wrong? maybe but probably not 1. learn to use gdb 2. after spending

More information

Performance Tuning & Optimizing SQL Databases Microsoft Official Curriculum (MOC 10987)

Performance Tuning & Optimizing SQL Databases Microsoft Official Curriculum (MOC 10987) Performance Tuning & Optimizing SQL Databases Microsoft Official Curriculum (MOC 10987) Course Length: 4 days Course Delivery: Traditional Classroom Online Live Course Overview This 4-day instructor-led

More information

NoDB: Efficient Query Execution on Raw Data Files

NoDB: Efficient Query Execution on Raw Data Files NoDB: Efficient Query Execution on Raw Data Files Ioannis Alagiannis Renata Borovica-Gajic Miguel Branco Stratos Idreos Anastasia Ailamaki Ecole Polytechnique Fédérale de Lausanne {ioannisalagiannis, renataborovica,

More information

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer Segregating Data Within Databases for Performance Prepared by Bill Hulsizer When designing databases, segregating data within tables is usually important and sometimes very important. The higher the volume

More information

Benchmarking Adaptive Indexing

Benchmarking Adaptive Indexing Benchmarking Adaptive Indexing Goetz Graefe 2, Stratos Idreos 1, Harumi Kuno 2, and Stefan Manegold 1 1 CWI Amsterdam The Netherlands first.last@cwi.nl 2 Hewlett-Packard Laboratories Palo Alto, CA first.last@hp.com

More information

How Achaeans Would Construct Columns in Troy. Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte

How Achaeans Would Construct Columns in Troy. Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte How Achaeans Would Construct Columns in Troy Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte Number of Visas Received 1 0,75 0,5 0,25 0 Alekh Jens Health Level

More information

DB2 Performance Essentials

DB2 Performance Essentials DB2 Performance Essentials Philip K. Gunning Certified Advanced DB2 Expert Consultant, Lecturer, Author DISCLAIMER This material references numerous hardware and software products by their trade names.

More information

Storage hierarchy. Textbook: chapters 11, 12, and 13

Storage hierarchy. Textbook: chapters 11, 12, and 13 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular

More information

MIS Database Systems.

MIS Database Systems. MIS 335 - Database Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query in a Database

More information

BIS Database Management Systems.

BIS Database Management Systems. BIS 512 - Database Management Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query

More information

SQL Server Administration 10987: Performance Tuning and Optimizing SQL Databases. Upcoming Dates. Course Description.

SQL Server Administration 10987: Performance Tuning and Optimizing SQL Databases. Upcoming Dates. Course Description. SQL Server Administration 10987: Performance Tuning and Optimizing SQL Databases Learn the high level architectural overview of SQL Server 2016 and explore SQL Server execution model, waits and queues

More information

Triangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017

Triangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017 Triangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017 Joe Sack, Principal Program Manager, Microsoft Joe.Sack@Microsoft.com Adaptability Adapt based on customer

More information

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi Column-Stores vs. Row-Stores How Different are they Really? Arul Bharathi Authors Daniel J.Abadi Samuel R. Madden Nabil Hachem 2 Contents Introduction Row Oriented Execution Column Oriented Execution Column-Store

More information

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing:

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing: Lab 2 -- Due Weds. Project meeting sign ups 6.830 Lecture 8 10/2/2017 Recap column stores & paper Join Processing: S = {S} tuples, S pages R = {R} tuples, R pages R < S M pages of memory Types of joins

More information

Hyrise - a Main Memory Hybrid Storage Engine

Hyrise - a Main Memory Hybrid Storage Engine Hyrise - a Main Memory Hybrid Storage Engine Philippe Cudré-Mauroux exascale Infolab U. of Fribourg - Switzerland & MIT joint work w/ Martin Grund, Jens Krueger, Hasso Plattner, Alexander Zeier (HPI) and

More information

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why

More information

Chapter 9. Cardinality Estimation. How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11

Chapter 9. Cardinality Estimation. How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11 Chapter 9 How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11 Wilhelm-Schickard-Institut für Informatik Universität Tübingen 9.1 Web Forms Applications

More information

Cost Models. the query database statistics description of computational resources, e.g.

Cost Models. the query database statistics description of computational resources, e.g. Cost Models An optimizer estimates costs for plans so that it can choose the least expensive plan from a set of alternatives. Inputs to the cost model include: the query database statistics description

More information

Adaptivity. Luca Schroeder & Thomas Lively

Adaptivity. Luca Schroeder & Thomas Lively Adaptivity Luca Schroeder & Thomas Lively H2O: A Hands-free Adaptive Store. Ioannis Alagiannis, Stratos Idreos and Anastassia Ailamaki ACM SIGMOD International Conference on Data Management, 2014 Three

More information

Self Tuning Databases

Self Tuning Databases Sections Self Tuning Databases Projects from Microsoft and IBM 1. Microsoft s Database Research Group 1. Situation and Vision 2. The AutoAdmin Project 3. Strategies for Index Selection 4. Components of

More information

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh Databasesystemer, forår 2005 IT Universitetet i København Forelæsning 8: Database effektivitet. 31. marts 2005 Forelæser: Rasmus Pagh Today s lecture Database efficiency Indexing Schema tuning 1 Database

More information

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016 Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List

More information

7. Query Processing and Optimization

7. Query Processing and Optimization 7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one

More information

Performance in the Multicore Era

Performance in the Multicore Era Performance in the Multicore Era Gustavo Alonso Systems Group -- ETH Zurich, Switzerland Systems Group Enterprise Computing Center Performance in the multicore era 2 BACKGROUND - SWISSBOX SwissBox: An

More information

Advanced Database Systems

Advanced Database Systems Lecture II Storage Layer Kyumars Sheykh Esmaili Course s Syllabus Core Topics Storage Layer Query Processing and Optimization Transaction Management and Recovery Advanced Topics Cloud Computing and Web

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012 Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 6 April 12, 2012 1 Acknowledgements: The slides are provided by Nikolaus Augsten

More information

Hybrid Shipping Architectures: A Survey

Hybrid Shipping Architectures: A Survey Hybrid Shipping Architectures: A Survey Ivan Bowman itbowman@acm.org http://plg.uwaterloo.ca/~itbowman CS748T 14 Feb 2000 Outline Partitioning query processing Partitioning client code Optimization of

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due

More information

CSE 544: Principles of Database Systems

CSE 544: Principles of Database Systems CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;

More information

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

Column-Oriented Database Systems. Liliya Rudko University of Helsinki

Column-Oriented Database Systems. Liliya Rudko University of Helsinki Column-Oriented Database Systems Liliya Rudko University of Helsinki 2 Contents 1. Introduction 2. Storage engines 2.1 Evolutionary Column-Oriented Storage (ECOS) 2.2 HYRISE 3. Database management systems

More information

Parallel In-situ Data Processing Techniques

Parallel In-situ Data Processing Techniques Parallel In-situ Data Processing Techniques Florin Rusu, Yu Cheng University of California, Merced Outline Background SCANRAW Operator Speculative Loading Evaluation Astronomy: FITS Format Sloan Digital

More information

Column-Stores vs. Row-Stores: How Different Are They Really?

Column-Stores vs. Row-Stores: How Different Are They Really? Column-Stores vs. Row-Stores: How Different Are They Really? Daniel Abadi, Samuel Madden, Nabil Hachem Presented by Guozhang Wang November 18 th, 2008 Several slides are from Daniel Abadi and Michael Stonebraker

More information

Introduction to Database Systems CSE 344

Introduction to Database Systems CSE 344 Introduction to Database Systems CSE 344 Lecture 6: Basic Query Evaluation and Indexes 1 Announcements Webquiz 2 is due on Tuesday (01/21) Homework 2 is posted, due week from Monday (01/27) Today: query

More information

Advanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

Advanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch Advanced Databases Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch www.mniazi.ir Parallel Join The join operation requires pairs of tuples to be

More information

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2 Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,

More information

Anastasia Ailamaki. Performance and energy analysis using transactional workloads

Anastasia Ailamaki. Performance and energy analysis using transactional workloads Performance and energy analysis using transactional workloads Anastasia Ailamaki EPFL and RAW Labs SA students: Danica Porobic, Utku Sirin, and Pinar Tozun Online Transaction Processing $2B+ industry Characteristics:

More information

Optimization Overview

Optimization Overview Lecture 17 Optimization Overview Lecture 17 Lecture 17 Today s Lecture 1. Logical Optimization 2. Physical Optimization 3. Course Summary 2 Lecture 17 Logical vs. Physical Optimization Logical optimization:

More information

Jignesh M. Patel. Blog:

Jignesh M. Patel. Blog: Jignesh M. Patel Blog: http://bigfastdata.blogspot.com Go back to the design Query Cache from Processing for Conscious 98s Modern (at Algorithms Hardware least for Hash Joins) 995 24 2 Processor Processor

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 10: Basics of Data Storage and Indexes 1 Reminder HW3 is due next Tuesday 2 Motivation My database application is too slow why? One of the queries is very slow why? To

More information

HYRISE In-Memory Storage Engine

HYRISE In-Memory Storage Engine HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Performance Tuning and Optimizing SQL Databases (10987)

Performance Tuning and Optimizing SQL Databases (10987) Performance Tuning and Optimizing SQL Databases (10987) Formato do curso: Presencial Preço: 1420 Nível: Avançado Duração: 28 horas This four-day instructor-led course provides students who manage and maintain

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

Module 4. Implementation of XQuery. Part 0: Background on relational query processing Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:

More information

CSE 344 FEBRUARY 14 TH INDEXING

CSE 344 FEBRUARY 14 TH INDEXING CSE 344 FEBRUARY 14 TH INDEXING EXAM Grades posted to Canvas Exams handed back in section tomorrow Regrades: Friday office hours EXAM Overall, you did well Average: 79 Remember: lowest between midterm/final

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery

More information

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of

More information

I am: Rana Faisal Munir

I am: Rana Faisal Munir Self-tuning BI Systems Home University (UPC): Alberto Abelló and Oscar Romero Host University (TUD): Maik Thiele and Wolfgang Lehner I am: Rana Faisal Munir Research Progress Report (RPR) [1 / 44] Introduction

More information

DASlab: The Data Systems Laboratory

DASlab: The Data Systems Laboratory DASlab: The Data Systems Laboratory at Harvard SEAS Stratos Idreos Harvard University http://daslab.seas.harvard.edu ABSTRACT DASlab is a new laboratory at the Harvard School of Engineering and Applied

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

Crescando: Predictable Performance for Unpredictable Workloads

Crescando: Predictable Performance for Unpredictable Workloads Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing

More information

class 13 scans vs indexes prof. Stratos Idreos

class 13 scans vs indexes prof. Stratos Idreos class 13 scans vs indexes prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ b-tree - dynamic tree - always balanced 35,50 35, 12,20 50, 1,2,3 12,15,17 20, Stratos Idreos 2 /24 select from

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 10-11: Basics of Data Storage and Indexes (Ch. 8.3-4, 14.1-1.7, & skim 14.2-3) 1 Announcements No WQ this week WQ4 is due next Thursday HW3 is due next Tuesday should be

More information

class 12 b-trees 2.0 prof. Stratos Idreos

class 12 b-trees 2.0 prof. Stratos Idreos class 12 b-trees 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ A B C A B C clustered/primary index on A Stratos Idreos /26 2 A B C A B C clustered/primary index on A pos C pos

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 15-16: Basics of Data Storage and Indexes (Ch. 8.3-4, 14.1-1.7, & skim 14.2-3) 1 Announcements Midterm on Monday, November 6th, in class Allow 1 page of notes (both sides,

More information

Administrivia Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Administrivia Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Administrivia Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications HW6 is due right now. HW7 is out today Phase 1: Wed Nov 9 th Phase 2: Mon Nov 28 th C. Faloutsos A. Pavlo Lecture#18:

More information

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon David DeWitt, Mark Hill, and Marios Skounakis University of Wisconsin-Madison Memory Hierarchies PROCESSOR EXECUTION PIPELINE

More information

Principles of Data Management. Lecture #9 (Query Processing Overview)

Principles of Data Management. Lecture #9 (Query Processing Overview) Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm

More information

Column-Stores vs. Row-Stores: How Different Are They Really?

Column-Stores vs. Row-Stores: How Different Are They Really? Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi, Samuel Madden and Nabil Hachem SIGMOD 2008 Presented by: Souvik Pal Subhro Bhattacharyya Department of Computer Science Indian

More information

CSC 261/461 Database Systems Lecture 19

CSC 261/461 Database Systems Lecture 19 CSC 261/461 Database Systems Lecture 19 Fall 2017 Announcements CIRC: CIRC is down!!! MongoDB and Spark (mini) projects are at stake. L Project 1 Milestone 4 is out Due date: Last date of class We will

More information

Lecture 12. Lecture 12: The IO Model & External Sorting

Lecture 12. Lecture 12: The IO Model & External Sorting Lecture 12 Lecture 12: The IO Model & External Sorting Announcements Announcements 1. Thank you for the great feedback (post coming soon)! 2. Educational goals: 1. Tech changes, principles change more

More information

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados -------------------------------------------------------------------------------------------------------------- INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados Exam 1 - solution

More information

Performance Monitoring

Performance Monitoring Performance Monitoring Performance Monitoring Goals Monitoring should check that the performanceinfluencing database parameters are correctly set and if they are not, it should point to where the problems

More information

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query Goals for Today CS 133: Databases Fall 2018 Lec 02 09/06 Relational Model & Memory and Buffer Manager Prof. Beth Trushkowsky Reason about the conceptual evaluation of an SQL query Understand the storage

More information

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example Student Introduction to Database Systems CSE 414 Hash table example Index Student_ID on Student.ID Data File Student 10 Tom Hanks 10 20 20 Amy Hanks ID fname lname 10 Tom Hanks 20 Amy Hanks Lecture 26:

More information

ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data

ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data : Reactive Caching for Fast Analytics over Heterogeneous Data Tahir Azim, Manos Karpathiotakis and Anastasia Ailamaki École Polytechnique Fédérale de Lausanne RAW Labs SA {tahir.azim, manos.karpathiotakis,

More information

Jyotheswar Kuricheti

Jyotheswar Kuricheti Jyotheswar Kuricheti 1 Agenda: 1. Performance Tuning Overview 2. Identify Bottlenecks 3. Optimizing at different levels : Target Source Mapping Session System 2 3 Performance Tuning Overview: 4 What is

More information

Oracle Database 10g The Self-Managing Database

Oracle Database 10g The Self-Managing Database Oracle Database 10g The Self-Managing Database Benoit Dageville Oracle Corporation benoit.dageville@oracle.com Page 1 1 Agenda Oracle10g: Oracle s first generation of self-managing database Oracle s Approach

More information