Mammals Flourished Long Before Dinosaurs Became Extinct
|
|
- Kristina Blake
- 6 years ago
- Views:
Transcription
1 Mammals Flourished Long Before Dinosaurs Became Extinct VLDB 2009 Lyon - Ten Year Award Database Architecture Optimized For The New Bottleneck: Memory Access (VLDB 1999) Stefan Manegold (manegold@cwi.nl) Peter Boncz (boncz@cwi.nl) Martin Kersten (mk@cwi.nl)
2 MICHAEL STONEBRAKER What I see happening is that the large database vendors, whom I ll call the elephants, are selling a one-size-fits-all, 30-year-old architecture that dates from somewhere in the late 1970s. ACM QUEUE May/June 2007
3 MICHAEL STONEBRAKER What I see happening is that the large database vendors, whom I ll call the elephants, are selling a one-size-fits-all, 30-year-old architecture that dates from somewhere in the late 1970s. ACM QUEUE May/June 2007
4 Mammals Reptiles 200 MYA Dinosaur
5 Mammals Reptiles 200 MYA Dinosaur
6 Mamals Reptiles 200 MYA Dinosaur
7 Mammals Reptiles 200 MYA Dinosaur
8 Mammels Reptiles 200 MYA Dinosaur
9 Mamals Reptiles 200 MYA Dinosaur
10 Mamals Reptiles 200 MYA 60 MYA Dinosaur
11 Mamals Reptiles 200 MYA 60 MYA Dinosaur
12 Mamals Reptiles 200 MYA 60 MYA Dinosaur
13 Mammals Reptiles 200 MYA 60 MYA Dinosaur
14 Large mammals once dined on dinosaurs Repenomamus giganticus Repenomamus robustus Yaoming Hu, Jin Meng, Yuanqing Wang and Chuankui Li149 Large Mesozoic mammals fed on young dinosaurs Nature (vol 433, p 149, 2005)
15 Evolution It is not the strongest of the species that survives, nor the most intelligent, but the one most responsive to change. Charles Darwin ( )
16 MonetDB Ingres Mamals Postgresql MySQL Reptiles IMS Dinosaur Oracle SQLserver
17 The genes of a species Reptiles Mammals Dinosaur -SQL86, SQL92,SQL99,SQL03 -n-ary storage scheme -relational algebra + DDL -5+ way indexing schemes -slotted pages of records -Volcano-style computation
18 The evolution of the Fox Troll a relational engine to simplify relational database programming SWI Prolog made a much better relational engine then my first system and Ingres, Oracle
19 The evolution of the Fox Troll a relational engine to simplify relational database programming SWI Prolog made a much better relational engine then my first system and Ingres, Oracle
20 The evolution of the Fox Troll a relational engine to simplify relational database programming 512 bytes Non-first-normal-form disease Object-orientation religion IO pages size increase
21 The evolution of the Fox Troll a relational engine to simplify relational database programming 512 bytes Non-first-normal-form disease Object-orientation religion IO pages size increase
22 The evolution of the Fox Troll a relational engine to simplify relational database programming 512 bytes Non-first-normal-form disease Object-orientation religion IO pages size increase
23 The evolution of the Fox Troll a relational engine to simplify relational database programming MOSAIC FILES PAX 512 bytes Non-first-normal-form disease Object-orientation religion IO pages size increase
24 Albert Einstein We can't solve problems by using the same kind of thinking we used when we created them.
25 SIGMOD 1985
26 SIGMOD 1985
27 The genes of a new species -SQL86, SQL92,SQL99,SQL03 -n-ary storage scheme -relational algebra + DDL -5+ way indexing schemes -slotted pages of records -Volcano-style computation
28 The genes of a new species Binary Association Tables Self managing Arrays Materialize operator -SQL86, SQL92,SQL99,SQL03 -n-ary storage scheme -relational algebra + DDL -5+ way indexing schemes -slotted pages of records -Volcano-style computation
29 SQL 03 Optimizers MonetDB 5 MonetDB kernel The Software Stack
30 Cache-Conscious Query Processing in Stefan Manegold Peter Boncz Martin Kersten
31 Evolution == Progress? Simple Scan: select max(c) from t
32 Evolution == Progress? Hardware Evolution (Moore's Law) => BUT Performance Software Stagnation Improvement!?
33 Databases hit The Memory Wall Detailed and exhaustive analysis for different workloads using 4 RDBMSs by Anastassia Ailamaki et al. in DBMSs On A Modern Processor: Where Does Time Go? (VLDB 1999) CPU is 50%-90% idle, waiting for memory: L1 data stalls L1 instruction stalls L2 data stalls TLB stalls Branch mispredictions Resource stalls
34 CPU & Hierarchical Memory System (1999) 2009: L2 (+L3) on CPU die Memory access: up to 1000 cycles
35 Required DBMS Evolution
36 Data Structure Evolution Row-storage Column-storage wastes bandwidth exploits full bandwidth
37 Algorithm Evolution: Joins Nested-loop: + sequential access to both inner & outer input -- quadratic complexity Sort-merge: + single sequential scan during merge ( benefit ) -- random access during sort ( investment ) Hash-join: + sequential scan over both inputs -- random access to hash table (build & probe)
38 Algorithm Evolution: Partitioned Hash-Joins Phase 1: Cluster both input relations Create clusters that fit in CPU cache Restrict random data access to (smallest) cache Avoid cache capacity misses Phase 2: Join matching clusters
39 Partitioned Hash-Join: Joining (Phase 2) Points measured => Phase 2 solved; but what about Phase 1? Lines modeled [VLDB2002] k 1M 32M Number of clusters
40 Algorithm Evolution: Clustering Problem: Number of clusters exceeds number of cache lines / TLB entries => cache / TLB thrashing Solution: Multi-pass clustering
41 Algorithm Evolution: Multi-Pass Clustering Limit number of clusters per pass Avoid cache / TLB thrashing Trade memory cost for CPU cost
42 Partitioned Hash-Join: Multi-Pass Clustering k 1M 32M Number of clusters
43 Partitioned Hash-Join
44 Joins in Column-Stores: Handling Payload Problem: Join result: pairs of tuple IDs; Out-of order => random access during projection / tuple-reconstruction Solutions: Jive-Join (Li, Ross; VLDB-Journal 1998) Flash-Join (Tsirogiannis, Harizopoulos, Shah, Wiener, Graefe; SIGMOD 2009) Radix-Decluster (Boncz, Manegold, Kersten; VLDB 2004) (Sideways Cracking (Idreos, Kersten, Manegold; SIGMOD 2009)) => post-projection / late materialization
45 Algorithm Evolution: Multi-pass Clustering 1 pass P passes P passes, CPU optimized
46 Algorithm Evolution: Partitioned Hash-Join CPU optimized
47 Cost Model Evolution: Data Access Total data access cost is sum over all cache/memory levels Cost per level is number of cache misses scored by latency Simple tool to measure latency per cache level ( The Calibrator ) few simple basic access patterns sequential, random,... compound access patterns: combinations of basic access patterns basic cost functions: estimate number of cache misses of basic access patterns Rules how to create compound cost functions using basic cost functions Describe data access of algorithms using access patterns
48 The Bigger Picture: Evolving Columnar Database Architecture Stefan Manegold Peter Boncz Martin Kersten
49 RISC Relational Algebra Simple, hardcoded semantics in operators SELECT id, name, (age-30)*50 as bonus FROM people WHERE age > 30
50 RISC Relational Algebra CPU J? Give it nice code! - few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD One loop for an entire column batcalc_minus_int(int* - no per-tuple interpretation res, int* col, - arrays: no record navigation int val, - better instruction cache int locality n) { SELECT for(i=0; id, name, i<n; (age-30)*50 i++) as bonus FROM res[i] people = col[i] - val; } WHERE age > 30 Simple, hardcoded semantics in operators
51 RISC Relational Algebra SELECT id, name, (age-30)*50 as bonus FROM people WHERE age > 30 MATERIALIZED intermediate results
52 Materialization vs Pipelining SELECT id, name (age-30)*50 AS bonus FROM employee WHERE age > 30 next() 102 ivan * ivan next() > 30? PROJECT ivan alice FALSE TRUE SELECT next() alice ivan SCAN
53 MonetDB spin-off: MonetDB/X100 Materialization vs Pipelining
54 next() next() PROJECT SELECT next() alice ivan peggy victor SCAN
55 Vectorized In Cache Processing next() vector = array of ~100 CPU cache Resident next() alice ivan peggy victor > 30? FALSE TRUE TRUE FALSE PROJECT SELECT next() alice ivan peggy victor SCAN
56 Observations: ivan peggy next() called much less often Ł more time spent in primitives less in overhead primitive calls process an array of values in a loop: next() next() ivan peggy alice ivan peggy victor * > 30? FALSE TRUE TRUE FALSE PROJECT SELECT next() alice ivan peggy victor SCAN
57 Observations: next() called much less often Ł more time spent in primitives less in overhead primitive calls process an array of values in a loop: CPU J? Give it nice code! - few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD Hey!! stop reading this, please! One loop for an entire column - no per-tuple interpretation - arrays: no record navigation - better instruction cache locality > 30? FALSE TRUE TRUE FALSE * for(i=0; i<n; i++) res[i] = (col[i] > x) for(i=0; i<n; i++) res[i] = (col[i] - x) for(i=0; i<n; i++) res[i] = (col[i] * x)
58 Observations: next() called much less often Ł more time spent in primitives less in overhead > 30? FALSE TRUE TRUE FALSE for(i=0; i<n; i++) res[i] = (col[i] > x) primitive calls process an array of values in a loop: CPU J? Give it nice code! - few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD Hey!! stop reading this, please! One loop for an entire column - no per-tuple interpretation - arrays: no record navigation - better instruction cache locality * for(i=0; i<n; i++) res[i] = (col[i] - x) for(i=0; i<n; i++) res[i] = (col[i] * x) alice ivan peggy victor Loops with SCAN
59 Observations: Vectorized In Cache Processing next() called much less often Ł more time spent in primitives less vector in overhead = array of ~100 primitive CPU cache calls Resident process an array of values in a loop: CPU J? Give it nice code! - few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD Hey!! stop reading this, please! One loop for an entire column - no per-tuple interpretation - arrays: no record navigation - better instruction cache locality > 30? FALSE TRUE TRUE FALSE next() next() * next() 102 ivan for(i=0; peggy 750 i<n; i++) res[i] = - 30 (col[i] * 50 > x) ivan peggy for(i=0; i<n; > 30 i++)? PROJECT 101 res[i] alice = 22(col[i] FALSE - x) 102 ivan 37 TRUE 104 peggy 45 TRUE for(i=0; 105 victor i<n; 25 FALSE i++) SELECT res[i] = (col[i] * x) alice ivan peggy victor Loops with SCAN
60 The optimal diet? TPCH Q1 Vector Size Ł
61 The optimal diet? TPCH Q1 Less and less iterator.next() and primitive function calls ( interpretation overhead ) Vector Size Ł
62 The optimal diet? Vectors start to exceed the CPU cache, causing additional memory traffic TPCH Q1 Vector Size Ł
63 More on Very fast Ł 1 core consumes 5GBs/sec! Research on achieving I/O balance Columnar data storage [Boncz et al., CIDR 05] New compression techniques [Heman et al., ICDE 06] Cooperative Scans [Zukowski et al., VLDB 07] Built with and exclusively distributed by..
64 MonetDB Highlights Architecture-Conscious Query Processing Data layout, algorithms, cost models Multi-Model: ODMG, SQL, XQuery,.. SPARQL Columns as the building block for complex data structures RISC Relational Algebra (vs CISC) Faster through simplicity: no tuple expression interpreter Decoupling of Transactions from Execution/Buffering ACID, but not ARIES.. Pay as you need transaction overhead. differential, lazy, optimistic, snapshot Run-Time Indexing and Query Optimization Extensible Optimizer Framework cracking, recycling, sampling-based runtime optimization
65 MonetDB vs Traditional Architecture Architecture-Conscious Query Processing vs Magnetic disk I/O conscious processing Multi-Model: ODMG, SQL, XQuery,.. SPARQL vs Relational with Bolt-on Subsystems RISC Relational Algebra vs Tuple-at-a-time Iterator Model Decoupling of Transactions from Execution/Buffering vs ARIES integrated into Execution/Buffering/Indexing Run-Time Indexing and Query Optimization vs Static DBA/Workload-driven Optimization & Indexing
66 The Software Stack SQL 03 Optimizers MonetDB 5 Orthogonal extension of SQL03 Clear computational semantics Extensions Minimal extension to MonetDB MonetDB kernel
67 The Software Stack XQuery SQL 03 RDF Arrays Optimizers X100 MonetDB 4 MonetDB 5 MonetDB kernel compile
68 The Software Stack Extensible query lang frontend framework XQuery.. SGL? SQL 03 RDF Arrays Optimizers SOAP MonetDB 4 MonetDB 5 MonetDB kernel
69 The Software Stack XQuery Extensible Dynamic Runtime QOPT Framework! SOAP SQL 03 Optimizers MonetDB 4 MonetDB 5 RDF Arrays MonetDB kernel
70 The Software Stack XQuery SQL 03 RDF Arrays Optimizers SOAP MonetDB 4 MonetDB 5 MonetDB kernel OGIS Extensible Architecture-Conscious Execution platform compile
71 Farming new species
72 Armada Fabian Groffen Sky server Milena Ivanova Data cell Erietta Liarou Cyclotron Romulo Gonçalves Cracking Stratos Idreos XRPC Jenny Zhang XML pattern search Nan Tang RDF Graphs Lefteris Sidirourgos
73 Acknowledgements Martin Kersten Peter Boncz Niels Nes Stefan Manegold Fabian Groffen Sjoerd Mullender Steffen Goeldner Arjen de Vries Menzo Windhouwer Tim Ruhl Romulo Goncalves Alex van Ballegooij Johan List Georgina Ramirez Marcin Zukowski Roberto Cornacchia Sandor Heman Torsten Grust Jens Teubner Maurice van Keulen Jan Flokstra Milena Ivanova Lefteris Sidirourgos Jan Rittinger Wouter Alink Jennie Zhang Stratos Idreos Erietta Liarou Lefteris Sidirourgos Florian Waas Albrecht Schmidt Jonas Karlsson Martin van Dinther Peter Bosch Carel van den Berg Wilco Quak
74 Whoa MonetDB! Speed lines! PostgreSQL SQLserver Cstore MySQL DB2
MonetDB: Open-source Columnar Database Technology Beyond Textbooks
MonetDB: Open-source Columnar Database Technology Beyond Textbooks http://wwwmonetdborg/ Stefan Manegold StefanManegold@cwinl http://homepagescwinl/~manegold/ >5k downloads per month Why? Why? Motivation
More informationclass 9 fast scans 1.0 prof. Stratos Idreos
class 9 fast scans 1.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ 1 pass to merge into 8 sorted pages (2N pages) 1 pass to merge into 4 sorted pages (2N pages) 1 pass to merge into
More informationclass 5 column stores 2.0 prof. Stratos Idreos
class 5 column stores 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ worth thinking about what just happened? where is my data? email, cloud, social media, can we design systems
More informationArchitecture-Conscious Database Systems
Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query
More informationclass 17 updates prof. Stratos Idreos
class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing selects
More informationclass 6 more about column-store plans and compression prof. Stratos Idreos
class 6 more about column-store plans and compression prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ query compilation an ancient yet new topic/research challenge query->sql->interpet
More informationLarge-Scale Data Engineering. Modern SQL-on-Hadoop Systems
Large-Scale Data Engineering Modern SQL-on-Hadoop Systems Analytical Database Systems Parallel (MPP): Teradata Paraccel Pivotal Vertica Redshift Oracle (IMM) DB2-BLU SQLserver (columnstore) Netteza InfoBright
More informationA high performance database kernel for query-intensive applications. Peter Boncz
MonetDB: A high performance database kernel for query-intensive applications Peter Boncz CWI Amsterdam The Netherlands boncz@cwi.nl Contents The Architecture of MonetDB The MIL language with examples Where
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies SQL on Big Data THE DEBATE: DATABASE SYSTEMS VS MAPREDUCE A major step backwards? MapReduce is a step backward in database access Schemas are good Separation of
More informationPathfinder: XQuery Compilation Techniques for Relational Database Targets
EMPTY_FRAG SERIALIZE (item, pos) ROW# (pos:) (pos1, item) X (iter = iter1) ELEM (iter1, item:) (iter1, item1, pos) ROW# (pos:/iter1) (iter1, pos1, item1) access
More informationcomplex plans and hybrid layouts
class 7 complex plans and hybrid layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ essential column-stores features virtual ids late tuple reconstruction (if ever) vectorized execution
More informationAccelerating Foreign-Key Joins using Asymmetric Memory Channels
Accelerating Foreign-Key Joins using Asymmetric Memory Channels Holger Pirk Stefan Manegold Martin Kersten holger@cwi.nl manegold@cwi.nl mk@cwi.nl Why? Trivia: Joins are important But: Many Joins are (Indexed)
More informationBig Data Infrastructures & Technologies. SQL on Big Data
Big Data Infrastructures & Technologies SQL on Big Data THE DEBATE: DATABASE SYSTEMS VS MAPREDUCE A major step backwards? MapReduce is a step backward in database access Schemas are good Separation of
More informationLarge-Scale Data Engineering
Large-Scale Data Engineering SQL on Big Data THE DEBATE: DATABASE SYSTEMS VS MAPREDUCE A major step backwards? MapReduce is a step backward in database access Schemas are good Separation of the schema
More informationColumn-Oriented Database Systems
Column-Oriented Database Systems Tutorial Peter Boncz (CWI) Adapted from VLDB 29 Tutorial Column-Oriented Database Systems with Daniel Abadi (Yale) Stavros Harizopuolos (HP Labs) What is a column-store?
More informationData Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationData Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationSandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.
Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS Marcin Zukowski Sandor Heman, Niels Nes, Peter Boncz CWI, Amsterdam VLDB 2007 Outline Scans in a DBMS Cooperative Scans Benchmarks DSM version VLDB,
More informationclass 17 updates prof. Stratos Idreos
class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value INSERT INTO table_name VALUES (value1,value2,value3,...)
More informationHash Joins for Multi-core CPUs. Benjamin Wagner
Hash Joins for Multi-core CPUs Benjamin Wagner Joins fundamental operator in query processing variety of different algorithms many papers publishing different results main question: is tuning to modern
More informationPerformance Issues and Query Optimization in Monet
Performance Issues and Query Optimization in Monet Stefan Manegold Stefan.Manegold@cwi.nl 1 Contents Modern Computer Architecture: CPU & Memory system Consequences for DBMS - Data structures: vertical
More informationbasic db architectures & layouts
class 4 basic db architectures & layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ videos for sections 3 & 4 are online check back every week (1-2 sections weekly) there is a schedule
More informationDesigning Database Operators for Flash-enabled Memory Hierarchies
Designing Database Operators for Flash-enabled Memory Hierarchies Goetz Graefe Stavros Harizopoulos Harumi Kuno Mehul A. Shah Dimitris Tsirogiannis Janet L. Wiener Hewlett-Packard Laboratories, Palo Alto,
More informationCache-Aware Database Systems Internals Chapter 7
Cache-Aware Database Systems Internals Chapter 7 1 Data Placement in RDBMSs A careful analysis of query processing operators and data placement schemes in RDBMS reveals a paradox: Workloads perform sequential
More informationCOLUMN STORE DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: Column Stores - SoSe
COLUMN STORE DATABASE SYSTEMS Prof. Dr. Uta Störl Big Data Technologies: Column Stores - SoSe 2016 1 Telco Data Warehousing Example (Real Life) Michael Stonebraker et al.: One Size Fits All? Part 2: Benchmarking
More informationLOD2 Creating Knowledge out of Interlinked Data. Project Number: Start Date of Project: 01/09/2010 Duration: 48 months
Collaborative Project LOD2 Creating Knowledge out of Interlinked Data Project Number: 257943 Start Date of Project: 01/09/2010 Duration: 48 months Deliverable 2.3 Integration of MonetDB Technology in Virtuoso
More informationPerformance in the Multicore Era
Performance in the Multicore Era Gustavo Alonso Systems Group -- ETH Zurich, Switzerland Systems Group Enterprise Computing Center Performance in the multicore era 2 BACKGROUND - SWISSBOX SwissBox: An
More informationCache-Aware Database Systems Internals. Chapter 7
Cache-Aware Database Systems Internals Chapter 7 Data Placement in RDBMSs A careful analysis of query processing operators and data placement schemes in RDBMS reveals a paradox: Workloads perform sequential
More informationMain-Memory Databases 1 / 25
1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low
More informationWeaving Relations for Cache Performance
VLDB 2001, Rome, Italy Best Paper Award Weaving Relations for Cache Performance Anastassia Ailamaki David J. DeWitt Mark D. Hill Marios Skounakis Presented by: Ippokratis Pandis Bottleneck in DBMSs Processor
More informationCrescando: Predictable Performance for Unpredictable Workloads
Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing
More informationBridging the Processor/Memory Performance Gap in Database Applications
Bridging the Processor/Memory Performance Gap in Database Applications Anastassia Ailamaki Carnegie Mellon http://www.cs.cmu.edu/~natassa Memory Hierarchies PROCESSOR EXECUTION PIPELINE L1 I-CACHE L1 D-CACHE
More informationCitation for published version (APA): Ydraios, E. (2010). Database cracking: towards auto-tunning database kernels
UvA-DARE (Digital Academic Repository) Database cracking: towards auto-tunning database kernels Ydraios, E. Link to publication Citation for published version (APA): Ydraios, E. (2010). Database cracking:
More informationColumn-Oriented Database Systems. Liliya Rudko University of Helsinki
Column-Oriented Database Systems Liliya Rudko University of Helsinki 2 Contents 1. Introduction 2. Storage engines 2.1 Evolutionary Column-Oriented Storage (ECOS) 2.2 HYRISE 3. Database management systems
More informationclass 12 b-trees 2.0 prof. Stratos Idreos
class 12 b-trees 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ A B C A B C clustered/primary index on A Stratos Idreos /26 2 A B C A B C clustered/primary index on A pos C pos
More informationA Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture
A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses
More informationWalking Four Machines by the Shore
Walking Four Machines by the Shore Anastassia Ailamaki www.cs.cmu.edu/~natassa with Mark Hill and David DeWitt University of Wisconsin - Madison Workloads on Modern Platforms Cycles per instruction 3.0
More informationData Systems that are Easy to Design, Tune and Use. Stratos Idreos
Data Systems that are Easy to Design, Tune and Use data systems that are easy to: (years) (months) design & build set-up & tune (hours/days) use e.g., adapt to new applications, new hardware, spin off
More informationHardware Acceleration for Database Systems using Content Addressable Memories
Hardware Acceleration for Database Systems using Content Addressable Memories Nagender Bandi, Sam Schneider, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara Overview The Memory
More informationMonetDB/XQuery: High Performance, Purely Relational XQuery Processing
ADT 7 Lecture 4 MonetDB/XQuery: High Performance, Purely Relational XQuery Processing http://pathfinder xquery.org/ http://monetdb xquery.org/ Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/
More informationHardware-Conscious DBMS Architecture for Data-Intensive Applications
Hardware-Conscious DBMS Architecture for Data-Intensive Applications Marcin Zukowski Centrum voor Wiskunde en Informatica Kruislaan 13 1090 GB Amsterdam The Netherlands M.Zukowski@cwi.nl Abstract Recent
More informationSPEED is but one of the design criteria of a database. Parallelism in Database Operations. Single data Multiple data
1 Parallelism in Database Operations Kalle Kärkkäinen Abstract The developments in the memory and hard disk bandwidth latencies have made databases CPU bound. Recent studies have shown that this bottleneck
More informationDATABASE CRACKING: Fancy Scan, not Poor Man s Sort! Don. Holger Pirk Eleni Petraki Strato Idreos
DATABASE CRACKING: Fancy Scan, not Poor Man s Sort! Hardware Folks Cracking Folks Don Holger Pirk Eleni Petraki Strato Idreos Stefan Manegold Martin Kersten EVALUATING RANGE PREDICATES COMPLEXITY ON PAPER
More informationColumn Stores vs. Row Stores How Different Are They Really?
Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More informationH2O: A Hands-free Adaptive Store
H2O: A Hands-free Adaptive Store Ioannis Alagiannis Stratos Idreos Anastasia Ailamaki Ecole Polytechnique Fédérale de Lausanne {ioannis.alagiannis, anastasia.ailamaki}@epfl.ch Harvard University stratos@seas.harvard.edu
More informationIn-Memory Data Management
In-Memory Data Management Martin Faust Research Assistant Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam Agenda 2 1. Changed Hardware 2.
More informationJignesh M. Patel. Blog:
Jignesh M. Patel Blog: http://bigfastdata.blogspot.com Go back to the design Query Cache from Processing for Conscious 98s Modern (at Algorithms Hardware least for Hash Joins) 995 24 2 Processor Processor
More informationArchitecture-Conscious Database Systems
Architecture-Conscious Database Systems Anastassia Ailamaki Ph.D. Examination November 30, 2000 A DBMS on a 1980 Computer DBMS Execution PROCESSOR 10 cycles/instruction DBMS Data and Instructions 6 cycles
More informationArchitecture and Implementation of Database Systems (Winter 2014/15)
Jens Teubner Architecture & Implementation of DBMS Winter 2014/15 1 Architecture and Implementation of Database Systems (Winter 2014/15) Jens Teubner, DBIS Group jens.teubner@cs.tu-dortmund.de Winter 2014/15
More informationQuery Processing on Multi-Core Architectures
Query Processing on Multi-Core Architectures Frank Huber and Johann-Christoph Freytag Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany {huber,freytag}@dbis.informatik.hu-berlin.de
More informationDatabase Systems and Modern CPU Architecture
Database Systems and Modern CPU Architecture Prof. Dr. Torsten Grust Winter Term 2006/07 Hard Disk 2 RAM Administrativa Lecture hours (@ MI HS 2): Monday, 09:15 10:00 Tuesday, 14:15 15:45 No lectures on
More informationclass 10 fast scans 2.0 prof. Stratos Idreos
class 10 fast scans 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ always want to minimize data movement - computation & utilize all resources! registers on chip cache on board
More informationColumn-Stores vs. Row-Stores: How Different Are They Really?
Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi, Samuel Madden and Nabil Hachem SIGMOD 2008 Presented by: Souvik Pal Subhro Bhattacharyya Department of Computer Science Indian
More informationLecture #14 Optimizer Implementation (Part I)
15-721 ADVANCED DATABASE SYSTEMS Lecture #14 Optimizer Implementation (Part I) Andy Pavlo / Carnegie Mellon University / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017 2 TODAY S AGENDA
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationclass 10 b-trees 2.0 prof. Stratos Idreos
class 10 b-trees 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ CS Colloquium HV Jagadish Prof University of Michigan 10/6 Stratos Idreos /29 2 CS Colloquium Magdalena Balazinska
More informationBuffering Database Operations for Enhanced Instruction Cache Performance
ing Database Operations for Enhanced Instruction Cache Performance Jingren Zhou Columbia University jrzhou@cs.columbia.edu Kenneth A. Ross Columbia University kar@cs.columbia.edu ABSTRACT As more and more
More informationFirebird in 2011/2012: Development Review
Firebird in 2011/2012: Development Review Dmitry Yemanov mailto:dimitr@firebirdsql.org Firebird Project http://www.firebirdsql.org/ Packages Released in 2011 Firebird 2.1.4 March 2011 96 bugs fixed 4 improvements,
More informationINTELLIGENT DATABASE GROUP. Foundations of Information Systems. 5 DBMS Architecture. Prof. Dr.-Ing. Wolfgang Lehner
Prof. Dr.-Ing. Wolfgang Lehner INTELLIGENT DATABASE GROUP 5 DBMS Architecture What is in the Lecture?. Database Usage Query Programming Design 2. Database Architecture Indexes Transactions Query Processing
More informationArchitecture and Implementation of Database Systems (Summer 2018)
Jens Teubner Architecture & Implementation of DBMS Summer 2018 1 Architecture and Implementation of Database Systems (Summer 2018) Jens Teubner, DBIS Group jens.teubner@cs.tu-dortmund.de Summer 2018 Jens
More informationAccess Path Selection in Main-Memory Optimized Data Systems
Access Path Selection in Main-Memory Optimized Data Systems Should I Scan or Should I Probe? Manos Athanassoulis Harvard University Talk at CS265, February 16 th, 2018 1 Access Path Selection SELECT x
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationADT 2010 ADT XQuery Updates in MonetDB/XQuery & Other Approaches to XQuery Processing
1 XQuery Updates in MonetDB/XQuery & Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ MonetDB/XQuery: Updates Schedule 9.11.1: RDBMS back-end support
More informationJoin Processing for Flash SSDs: Remembering Past Lessons
Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison $/MB GB Flash Solid State Drives (SSDs) Benefits of
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables
More information7. Query Processing and Optimization
7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one
More informationclass 13 scans vs indexes prof. Stratos Idreos
class 13 scans vs indexes prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ b-tree - dynamic tree - always balanced 35,50 35, 12,20 50, 1,2,3 12,15,17 20, Stratos Idreos 2 /24 select from
More informationQuery Processing: A Systems View. Announcements (March 1) Physical (execution) plan. CPS 216 Advanced Database Systems
Query Processing: A Systems View CPS 216 Advanced Database Systems Announcements (March 1) 2 Reading assignment due Wednesday Buffer management Homework #2 due this Thursday Course project proposal due
More informationWeaving Relations for Cache Performance
Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon Computer Platforms in 198 Execution PROCESSOR 1 cycles/instruction Data and Instructions cycles
More informationUniversity of Waterloo Midterm Examination Sample Solution
1. (4 total marks) University of Waterloo Midterm Examination Sample Solution Winter, 2012 Suppose that a relational database contains the following large relation: Track(ReleaseID, TrackNum, Title, Length,
More informationModule 4. Implementation of XQuery. Part 0: Background on relational query processing
Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:
More informationSTEPS Towards Cache-Resident Transaction Processing
STEPS Towards Cache-Resident Transaction Processing Stavros Harizopoulos joint work with Anastassia Ailamaki VLDB 2004 Carnegie ellon CPI OLTP workloads on modern CPUs 6 4 2 L2-I stalls L2-D stalls L1-I
More informationDSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing
DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing Marcin Zukowski Niels Nes Peter Boncz CWI, Amsterdam, The Netherlands {Firstname.Lastname}@cwi.nl ABSTRACT Comparisons between
More informationRunning CLEF-IP experiments using a graphical query builder
Running CLEF-IP experiments using a graphical query builder W. Alink, R. Cornacchia, A. P. de Vries Centrum Wiskunde en Informatica {alink,roberto,arjen}@cwi.nl Abstract The CWI submission for the CLEF-IP
More informationcolumn-stores basics
class 3 column-stores basics prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ Goetz Graefe Google Research guest lecture Justin Levandoski Microsoft Research projects option 1: systems project (now
More informationIntroduction to Databases
Introduction to Databases Matthew J. Graham CACR Methods of Computational Science Caltech, 2009 January 27 - Acknowledgements to Julian Bunn and Ed Upchurch what is a database? A structured collection
More informationDatenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016
Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List
More informationB.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2
Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from
More informationAdvanced Database Systems
Lecture II Storage Layer Kyumars Sheykh Esmaili Course s Syllabus Core Topics Storage Layer Query Processing and Optimization Transaction Management and Recovery Advanced Topics Cloud Computing and Web
More informationToward timely, predictable and cost-effective data analytics. Renata Borovica-Gajić DIAS, EPFL
Toward timely, predictable and cost-effective data analytics Renata Borovica-Gajić DIAS, EPFL Big data proliferation Big data is when the current technology does not enable users to obtain timely, cost-effective,
More informationSomething to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:
Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base
More informationCurriculum Vitae. Stefan Manegold September 28, 2009
Curriculum Vitae Stefan Manegold September 28, 2009 Personalia Full name Affiliation Telephone E-mail address WWW address Stefan Manegold Centrum Wiskunde & Informatica (CWI) Science Park 123 1098 XG Amsterdam
More informationStorage. Holger Pirk
Storage Holger Pirk Purpose of this lecture We are going down the rabbit hole now We are crossing the threshold into the system (No more rules without exceptions :-) Understand storage alternatives in-memory
More informationExamples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15
Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 1 Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating
More informationDesign and Implementation of Bit-Vector filtering for executing of multi-join qureies
Undergraduate Research Opportunity Program (UROP) Project Report Design and Implementation of Bit-Vector filtering for executing of multi-join qureies By Cheng Bin Department of Computer Science School
More informationBDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz
BDCC: Exploiting Fine-Grained Persistent Memories for OLAP Peter Boncz NVRAM System integration: NVMe: block devices on the PCIe bus NVDIMM: persistent RAM, byte-level access Low latency Lower than Flash,
More informationChapter 1: Introduction
Chapter 1: Introduction Chapter 1: Introduction Purpose of Database Systems Database Languages Relational Databases Database Design Data Models Database Internals Database Users and Administrators Overall
More informationAnnouncements (March 1) Query Processing: A Systems View. Physical (execution) plan. Announcements (March 3) Physical plan execution
Announcements (March 1) 2 Query Processing: A Systems View CPS 216 Advanced Database Systems Reading assignment due Wednesday Buffer management Homework #2 due this Thursday Course project proposal due
More informationArchitecture and Implementation of Database Systems Query Processing
Architecture and Implementation of Database Systems Query Processing Ralf Möller Hamburg University of Technology Acknowledgements The course is partially based on http://www.systems.ethz.ch/ education/past-courses/hs08/archdbms
More informationWhat happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques
376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list
More informationOutline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.
Parallel Database Systems STAVROS HARIZOPOULOS stavros@cs.cmu.edu Outline Background Hardware architectures and performance metrics Parallel database techniques Gamma Bonus: NCR / Teradata Conclusions
More informationImplementation of Relational Operations
Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows
More informationEvaluation of Relational Operations. SS Chung
Evaluation of Relational Operations SS Chung Cost Metric Query Processing Cost = Disk I/O Cost + CPU Computation Cost Disk I/O Cost = Disk Access Time + Data Transfer Time Disk Acess Time = Seek Time +
More informationMonetDB/DataCell: leveraging the column-store database technology for efficient and scalable stream processing Liarou, E.
UvA-DARE (Digital Academic Repository) MonetDB/DataCell: leveraging the column-store database technology for efficient and scalable stream processing Liarou, E. Link to publication Citation for published
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationDB Wide Table Storage. Summer Torsten Grust Universität Tübingen, Germany
DB 2 01 03 Wide Table Storage Summer 2018 Torsten Grust Universität Tübingen, Germany 1 Q₂ Querying a Wider Table 02 The next SQL probe Q₂ looks just Q₁. We query a wider table now, however: SELECT t.*
More informationCompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy
CompSci 516 Database Systems Lecture 20 Parallel DBMS Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW3 due on Monday, Nov 20, 11:55 pm (in 2 weeks) See some
More information