data systems 101 prof. Stratos Idreos class 2
|
|
- Coleen Phelps
- 5 years ago
- Views:
Transcription
1 class 2 data systems 101 prof. Stratos Idreos
2 2 classes per week - OH/Labs every day 1 presentation/discussion lead - 2 reviews each week research (or systems) project + midway check-in ~15-20 hours per week Stratos Idreos 2 /39
3 systems project research project individual project c/c++ groups of three analysis only open to cs165 students (unless proven otherwise) Stratos Idreos 3 /39
4 nosql key-value stores are everywhere project= basic state-of-the-art design Stratos Idreos 4 /39
5 Give me a design given read/write ratio. How new hardware X would impact performance? What hardware is needed to achieve throughput X? Stratos Idreos 5 /39
6 big data V s (it is not about size only) volume velocity variety veracity actually none of that is really new new: our ability to gather and store machine generated data broad understanding that we cannot just manually get value out of data Stratos Idreos 6 /39
7 a data system stores data and provides access to data & makes knowledge generation easy Stratos Idreos 7 /39
8 a data system stores data and provides access to data & makes knowledge generation easy data system data analysis knowledge Stratos Idreos 7 /39
9 a data system stores data and provides access to data & makes knowledge generation easy data system X data analysis knowledge Stratos Idreos 7 /39
10 declarative interface ask what you want data system the system decides how to best store and access data Stratos Idreos 8 /39
11 applications sql database kernel algorithms/operators cpu memory data data data disk Stratos Idreos 9 /39
12 SQL complex legacy tuning expensive nosql simple clean just enough as apps become more complex as apps need to be more scalable newsql goal: understand why, how and what is next Stratos Idreos 10/39
13 SQL complex legacy tuning expensive nosql simple clean just enough Gartner: DBMS Market = ~36Billion nosql/hadoop =~700Million SQL = ~rest as apps become more complex as apps need to be more scalable newsql goal: understand why, how and what is next Stratos Idreos 10/39
14 more applications more data continuous need for new systems more h/w Stratos Idreos 11/39
15 1 soon everyone will need to be a data scientist hmm, my data is too big :( how far away are we from a future where a data system sits in the critical path of everything we do? new applications/requirements Stratos Idreos 12/39
16 2 data exploration not always sure what we are looking for (until we find it) Stratos Idreos 13 /39
17 3 daily data data* daily skills data years [IBMbigdata] years [StratosGuess] data system design, set-up, tune, use Stratos Idreos 14/39
18 sql cpu - cpu - cpu - cpu cpu registers smaller/faster caches memory disk - disk - disk - disk + flash + non volatile memory memory hierarchy system where db runs Stratos Idreos 15/39
19 registers 2x on chip cache 10x on board cache 100x memory my head ~0 this room 1 min this building 10 min New York 1.5 hours Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations award 100Kx disk Pluto 2 years Stratos Idreos 16/39
20 cheaper faster CPU registers on chip cache on board cache memory disk SRAM DRAM cache miss: looking for something which is not in the cache ~1ns ~10ns ~100ns speed memory wall memory miss: looking for something which is not in memory cpu mem time Stratos Idreos 17/39
21 and touch/access only what you need design of storage/access methods/algorithms should minimize: data misses + instruction misses Stratos Idreos 18/39
22 random access & page-based access CPU need to only read x but have to read all of page 1 registers data value x on chip cache page1 page2 page3 data move on board cache memory disk Stratos Idreos 19/39
23 query x<5 (size=120 bytes) memory level N memory level N page size: 5x8 bytes Stratos Idreos 20/39
24 query x<5 scan (size=120 bytes) memory level N memory level N page size: 5x8 bytes Stratos Idreos 20/39
25 query x<5 (size=120 bytes) memory level N memory level N-1 scan page size: 5x8 bytes Stratos Idreos 20/39
26 40 bytes query x<5 (size=120 bytes) memory level N memory level N-1 scan page size: 5x8 bytes Stratos Idreos 20/39
27 40 bytes query x<5 (size=120 bytes) memory level N memory level N-1 scan scan page size: 5x8 bytes Stratos Idreos 20/39
28 40 bytes query x<5 (size=120 bytes) memory level N memory level N-1 scan scan page size: 5x8 bytes Stratos Idreos 20/39
29 80 bytes query x<5 (size=120 bytes) memory level N memory level N-1 scan scan page size: 5x8 bytes Stratos Idreos 20/39
30 80 bytes query x<5 (size=120 bytes) memory level N memory level N page size: 5x8 bytes Stratos Idreos 20/39
31 80 bytes query x<5 (size=120 bytes) memory level N memory level N-1 scan page size: 5x8 bytes Stratos Idreos 20/39
32 80 bytes query x<5 scan (size=120 bytes) memory level N memory level N page size: 5x8 bytes Stratos Idreos 20/39
33 120 bytes query x<5 scan (size=120 bytes) memory level N memory level N page size: 5x8 bytes Stratos Idreos 20/39
34 an oracle gives us the positions query x<5 (size=120 bytes) memory level N memory level N page size: 5x8 bytes Stratos Idreos 21/39
35 an oracle gives us the positions query x<5 oracle (size=120 bytes) memory level N memory level N page size: 5x8 bytes Stratos Idreos 21/39
36 an oracle gives us the positions query x<5 oracle (size=120 bytes) memory level N memory level N page size: 5x8 bytes Stratos Idreos 21/39
37 an oracle gives us the positions 40 bytes query x<5 (size=120 bytes) memory level N memory level N-1 oracle page size: 5x8 bytes Stratos Idreos 21/39
38 an oracle gives us the positions 40 bytes query x<5 (size=120 bytes) memory level N memory level N-1 oracle oracle page size: 5x8 bytes Stratos Idreos 21/39
39 an oracle gives us the positions 40 bytes query x<5 (size=120 bytes) memory level N memory level N-1 oracle oracle page size: 5x8 bytes Stratos Idreos 21/39
40 an oracle gives us the positions 80 bytes query x<5 (size=120 bytes) memory level N memory level N-1 oracle oracle page size: 5x8 bytes Stratos Idreos 21/39
41 an oracle gives us the positions 80 bytes query x<5 (size=120 bytes) memory level N memory level N page size: 5x8 bytes Stratos Idreos 21/39
42 an oracle gives us the positions 80 bytes query x<5 (size=120 bytes) memory level N memory level N-1 oracle page size: 5x8 bytes Stratos Idreos 21/39
43 an oracle gives us the positions 80 bytes query x<5 oracle (size=120 bytes) memory level N memory level N page size: 5x8 bytes Stratos Idreos 21/39
44 an oracle gives us the positions 120 bytes query x<5 oracle (size=120 bytes) memory level N memory level N page size: 5x8 bytes Stratos Idreos 21/39
45 when does it make sense to have an oracle how can we minimize the cost e.g., query x< Stratos Idreos 22/39
46 design space it all starts with how we store data every bit matters Stratos Idreos 23/39
47 in parallel/prefetching sequential access: read one block; consume it completely; discard it; read next what is next? hardware can better predict/buffer sequential pages to be read e.g., 2MB buffers in modern DRAM amortize cost of moving disk arms Stratos Idreos 24/39
48 random access: read one block; consume it partially; discard it; might have to read it again in future; read random next; Stratos Idreos 25/39
49 C/C++ no libraries unless we explicitly allow it we expect you build everything from scratch so you can control storage and access 100% Stratos Idreos 26/39
50 MAIN-MEMORY OPTIMIZED DATA SYSTEMS Stratos Idreos 27/39
51 a simple example assume an array of N integers: find all positions where value>x qualifying positions select operator exists in all systems: sql, nosql, newsql data Stratos Idreos 28/39
52 assume an array of N integers: find all positions where value>x res=new array[data.size] j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i Stratos Idreos 29/39
53 assume an array of N integers: find all positions where value>x res=new array[data.size] j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i what if only 1% qualifies? Stratos Idreos 29/39
54 assume an array of N integers: find all positions where value>x res=new array[data.size] what if only 1% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i memory data Stratos Idreos 29/39
55 assume an array of N integers: find all positions where value>x res=new array[data.size] what if only 1% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i memory data res Stratos Idreos 29/39
56 assume an array of N integers: find all positions where value>x res=new array[data.size] what if only 1% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i memory data Stratos Idreos 29/39
57 assume an array of N integers: find all positions where value>x res=new array[data.size] what if only 1% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i memory data Stratos Idreos 29/39
58 assume an array of N integers: find all positions where value>x res=new array[data.size] what if only 1% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i memory data Stratos Idreos 29/39
59 assume an array of N integers: find all positions where value>x res=new array[data.size] what if only 1% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i memory data Stratos Idreos 29/39
60 assume an array of N integers: find all positions where value>x res=new array[data.size] what if only 1% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i memory data Stratos Idreos 29/39
61 assume an array of N integers: find all positions where value>x res=new array[data.size] what if only 1% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i memory data Stratos Idreos 29/39
62 assume an array of N integers: find all positions where value>x res=new array[data.size] what if only 1% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i memory data Stratos Idreos 29/39
63 assume an array of N integers: find all positions where value>x res=new array[data.size] what if only 1% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i memory data copy res Stratos Idreos 29/39
64 assume an array of N integers: find all positions where value>x res=new array[data.size] what if only 1% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i data but how can we know? memory copy res Stratos Idreos 29/39
65 assume an array of N integers: find all positions where value>x res=new array[data.size] j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i what if 90% qualifies? result size= qualifying values*x bytes Stratos Idreos 30/39
66 assume an array of N integers: find all positions where value>x res=new array[data.size] what if 90% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i result size= qualifying values*x bytes bit vector for res? Stratos Idreos 30/39 1 vs
67 assume an array of N integers: find all positions where value>x res=new array[data.size] what if 90% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i result size= qualifying values*x bytes bit vector for res? if statements= bad, bad, bad Stratos Idreos 30/ vs
68 assume an array of N integers: find all positions where value>x and we haven t even started discussing about how to find the qualifying values res=new array[data.size] what if 90% qualifies? j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i result size= qualifying values*x bytes bit vector for res? if statements= bad, bad, bad Stratos Idreos 30/ vs
69 assume an array of N integers: find all positions where value>x thread1, core1 res=new array[data.size] thread2, core2 j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i logically partition thread3, core3 Stratos Idreos 31/39
70 assume an array of N integers: find all positions where value>x thread1, core1 res=new array[data.size] thread2, core2 j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i logically partition thread3, core3 NUMA architectures? SIMD functionality? & what about result writing? not as simple as spinning off N threads Stratos Idreos 31/39
71 assume an array of N integers: find all positions where value>x N>>1 queries in parallel res=new array[data.size] j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i q1,q2,q3 Stratos Idreos 32/39
72 assume an array of N integers: find all positions where value>x N>>1 queries in parallel res=new array[data.size] j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i q1,q2,q3 Stratos Idreos 32/39
73 assume an array of N integers: find all positions where value>x N>>1 queries in parallel res=new array[data.size] j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i q1,q2,q3 Stratos Idreos 32/39
74 assume an array of N integers: find all positions where value>x N>>1 queries in parallel res=new array[data.size] j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i q1,q2,q3 q4 Stratos Idreos 32/39
75 assume an array of N integers: find all positions where value>x N>>1 queries in parallel res=new array[data.size] j=0 for (i=0; i<data.size; i++) if data[i]>x res[j++]=i q1,q2,q3 q4 q4 Stratos Idreos 32/39
76 assume an array of N integers: find all positions where value>x res=new array[10] j=0 for (i=0; i<10; i++) if data[i]>x res[j++]=i vs res=new array[10] j=0 if data[0]>x res[j++]=i if data[1]>x res[j++]=i if data[2]>x res[j++]=i if data[3]>x res[j++]=i if data[4]>x res[j++]=i if data[5]>x res[j++]=i if data[6]>x res[j++]=i if data[7]>x res[j++]=i if data[8]>x res[j++]=i if data[9]>x res[j++]=i Stratos Idreos 33/39
77 assume an array of N integers: find all positions where value>x option 1: scan all data option 2: use a tree (do not consider tree generation costs) which one is best Stratos Idreos 34/39
78 cost: data touched & computation speed cpu mem time Stratos Idreos 35/39
79 design logical design physical design system design Stratos Idreos 36/39
80 how can I prepare? 1) start browsing some basic texts Get familiar with the very basics of traditional database architectures: Architecture of a Database System. By J. Hellerstein, M. Stonebraker and J. Hamilton. Foundations and Trends in Databases, 2007 Get familiar with very basics of modern database architectures: The Design and Implementation of Modern Column-store Database Systems. By D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden. Foundations and Trends in Databases, 2013 Get familiar with the very basics of modern large scale systems: Massively Parallel Databases and MapReduce Systems. By Shivnath Babu and Herodotos Herodotou. Foundations and Trends in Databases, ) play with basic data structures implementation in C (linked list/hash table/tree) Stratos Idreos 37/39
81 Action steps: 1) Read the syllabus & website carefully, 2) Register to Piazza, 3) Do P0 if you have not taken CS165 and check self-test, 4) Register for paper presentation (week 2), 5) Start submitting your paper reviews (week 3) web site: piazza: piazza.com/harvard/spring2018/cs265/home office hours: Stratos: Wed/Thur/Fri, 3-4pm, MD139 TF office hours: check website textbook: nope research papers will be available from the Harvard network Stratos Idreos 38/39
82 class 2 data systems 101 BIG DATA SYSTEMS prof. Stratos Idreos next week modern main-memory optimized data systems & semester projects
data systems 101 prof. Stratos Idreos class 2
class 2 data systems 101 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ big data V s (it is not about size only) volume velocity variety veracity actually none of that is really new
More informationcolumn-stores basics
class 3 column-stores basics prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ Goetz Graefe Google Research guest lecture Justin Levandoski Microsoft Research projects option 1: systems project (now
More informationHOW INDEX TO STORE DATA DATA
Stratos Idreos HOW INDEX DATA TO STORE DATA ALGORITHMS data structure decisions define the algorithms that access data INDEX DATA ALGORITHMS unordered [7,4,2,6,1,3,9,10,5,8] INDEX DATA ALGORITHMS unordered
More informationbasic db architectures & layouts
class 4 basic db architectures & layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ videos for sections 3 & 4 are online check back every week (1-2 sections weekly) there is a schedule
More informationfrom bits to systems
class 2 from bits to systems prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ today logistics, goals, etc big data & systems (cont d) designing a data system algorithm: what can go wrong
More informationcolumn-stores basics
class 3 column-stores basics prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ project description is now online First background info will be given this Friday and detailed lecture on Feb 21 Basic Readings
More informationclass 17 updates prof. Stratos Idreos
class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing selects
More informationSQL & intro to db architectures
class 3 SQL & intro to db architectures prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ welcome brave cs165 students! 35+62 Stratos Idreos 2 /55 guest lecture Laura Haas Data Systems
More informationclass 5 column stores 2.0 prof. Stratos Idreos
class 5 column stores 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ worth thinking about what just happened? where is my data? email, cloud, social media, can we design systems
More informationclass 17 updates prof. Stratos Idreos
class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value INSERT INTO table_name VALUES (value1,value2,value3,...)
More informationclass 20 updates 2.0 prof. Stratos Idreos
class 20 updates 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value INSERT INTO table_name VALUES
More informationModels & Intro to DB Architectures
class 3 Models & Intro to DB Architectures prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ welcome brave cs165 students! 42+44 Stratos Idreos 2 /49 NO LAPTOP/PHONE POLICY class is based
More informationclass 9 fast scans 1.0 prof. Stratos Idreos
class 9 fast scans 1.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ 1 pass to merge into 8 sorted pages (2N pages) 1 pass to merge into 4 sorted pages (2N pages) 1 pass to merge into
More informationModels & Intro to DB Architectures
class 3 Models & Intro to DB Architectures prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ welcome brave cs165 students! Stratos Idreos 2 /55 NO LAPTOP/PHONE POLICY class is based on
More informationclass 6 more about column-store plans and compression prof. Stratos Idreos
class 6 more about column-store plans and compression prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ query compilation an ancient yet new topic/research challenge query->sql->interpet
More informationclass 11 b-trees prof. Stratos Idreos
class 11 b-trees prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ Midway check-in: Two design docs tmr (Canvas) & tests on Sunday Next weekend: Lab marathon for midway check-in & tests
More informationclass 10 b-trees 2.0 prof. Stratos Idreos
class 10 b-trees 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ CS Colloquium HV Jagadish Prof University of Michigan 10/6 Stratos Idreos /29 2 CS Colloquium Magdalena Balazinska
More informationsystems & research project
class 4 systems & research project prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ index index knows order about the data data filtering data: point/range queries index data A B C sorted A B C initial
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationcomplex plans and hybrid layouts
class 7 complex plans and hybrid layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ essential column-stores features virtual ids late tuple reconstruction (if ever) vectorized execution
More informationCourse Introduction & History of Database Systems
Course Introduction & History of Database Systems CMPT 843, SPRING 2018 JIANNAN WANG https://sfu-db.github.io/dbsystems/ CMPT 843-2018 SPRING - SFU 1 Introduce Yourself What s your name? Where are you
More informationPrinciples of Data Management. Lecture #2 (Storing Data: Disks and Files)
Principles of Data Management Lecture #2 (Storing Data: Disks and Files) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Topics v Today
More informationOverview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri
Overview of Data Exploration Techniques Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri data exploration not always sure what we are looking for (until we find it) data has always been big volume
More informationIntroduction to Data Management. Lecture 14 (Storage and Indexing)
Introduction to Data Management Lecture 14 (Storage and Indexing) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW s and quizzes:
More informationCMPSCI 645 Database Design & Implementation
Welcome to CMPSCI 645 Database Design & Implementation Instructor: Gerome Miklau Overview of Databases Gerome Miklau CMPSCI 645 Database Design & Implementation UMass Amherst Jan 19, 2010 Some slide content
More informationStoring Data: Disks and Files
Storing Data: Disks and Files CS 186 Fall 2002, Lecture 15 (R&G Chapter 7) Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Stuff Rest of this week My office
More informationCS 245: Principles of Data-Intensive Systems. Instructor: Matei Zaharia cs245.stanford.edu
CS 245: Principles of Data-Intensive Systems Instructor: Matei Zaharia cs245.stanford.edu Outline Why study data-intensive systems? Course logistics Key issues and themes A bit of history CS 245 2 My Background
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationclass 12 b-trees 2.0 prof. Stratos Idreos
class 12 b-trees 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ A B C A B C clustered/primary index on A Stratos Idreos /26 2 A B C A B C clustered/primary index on A pos C pos
More informationCS 261 Fall Mike Lam, Professor. Memory
CS 261 Fall 2016 Mike Lam, Professor Memory Topics Memory hierarchy overview Storage technologies SRAM DRAM PROM / flash Disk storage Tape and network storage I/O architecture Storage trends Latency comparisons
More informationCAS CS 460/660 Introduction to Database Systems. Fall
CAS CS 460/660 Introduction to Database Systems Fall 2017 1.1 About the course Administrivia Instructor: George Kollios, gkollios@cs.bu.edu MCS 283, Mon 2:30-4:00 PM and Tue 1:00-2:30 PM Teaching Fellows:
More informationOS and Hardware Tuning
OS and Hardware Tuning Tuning Considerations OS Threads Thread Switching Priorities Virtual Memory DB buffer size File System Disk layout and access Hardware Storage subsystem Configuring the disk array
More informationEvolution of Database Systems
Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second
More informationOS and HW Tuning Considerations!
Administração e Optimização de Bases de Dados 2012/2013 Hardware and OS Tuning Bruno Martins DEI@Técnico e DMIR@INESC-ID OS and HW Tuning Considerations OS " Threads Thread Switching Priorities " Virtual
More informationCaches. Hiding Memory Access Times
Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY
More informationclass 13 scans vs indexes prof. Stratos Idreos
class 13 scans vs indexes prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ b-tree - dynamic tree - always balanced 35,50 35, 12,20 50, 1,2,3 12,15,17 20, Stratos Idreos 2 /24 select from
More informationA Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture
A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses
More informationCSE 451: Operating Systems Spring Module 12 Secondary Storage
CSE 451: Operating Systems Spring 2017 Module 12 Secondary Storage John Zahorjan 1 Secondary storage Secondary storage typically: is anything that is outside of primary memory does not permit direct execution
More informationLow Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden"
Low Overhead Concurrency Control for Partitioned Main Memory Databases Evan P. C. Jones Daniel J. Abadi Samuel Madden" Banks" Payment Processing" Airline Reservations" E-Commerce" Web 2.0" Problem:" Millions
More informationPicture of memory. Word FFFFFFFD FFFFFFFE FFFFFFFF
Memory Sequential circuits all depend upon the presence of memory A flip-flop can store one bit of information A register can store a single word, typically 32-64 bits Memory allows us to store even larger
More informationclass 8 b-trees prof. Stratos Idreos
class 8 b-trees prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ I spend a lot of time debugging am I doing something wrong? maybe but probably not 1. learn to use gdb 2. after spending
More informationCMPT 354: Database System I. Lecture 1. Course Introduction
CMPT 354: Database System I Lecture 1. Course Introduction 1 Outline Motivation for studying this course Course admin and set up Overview of course topics 2 Trend 1: Data grows exponentially 1 ZB = 1,
More informationGoals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query
Goals for Today CS 133: Databases Fall 2018 Lec 02 09/06 Relational Model & Memory and Buffer Manager Prof. Beth Trushkowsky Reason about the conceptual evaluation of an SQL query Understand the storage
More informationCycle Time for Non-pipelined & Pipelined processors
Cycle Time for Non-pipelined & Pipelined processors Fetch Decode Execute Memory Writeback 250ps 350ps 150ps 300ps 200ps For a non-pipelined processor, the clock cycle is the sum of the latencies of all
More informationUser Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM
Module III Overview of Storage Structures, QP, and TM Sharma Chakravarthy UT Arlington sharma@cse.uta.edu http://www2.uta.edu/sharma base Management Systems: Sharma Chakravarthy Module I Requirements analysis
More informationThe Memory Hierarchy 10/25/16
The Memory Hierarchy 10/25/16 Transition First half of course: hardware focus How the hardware is constructed How the hardware works How to interact with hardware Second half: performance and software
More informationAdvanced Database Systems
Lecture II Storage Layer Kyumars Sheykh Esmaili Course s Syllabus Core Topics Storage Layer Query Processing and Optimization Transaction Management and Recovery Advanced Topics Cloud Computing and Web
More informationIntroduction to Data Management. Lecture #13 (Indexing)
Introduction to Data Management Lecture #13 (Indexing) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework info: HW #5 (SQL):
More informationThe Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases
The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases Gurmeet Goindi Principal Product Manager Oracle Flash Memory Summit 2013 Santa Clara, CA 1 Agenda Relational Database
More informationMemory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)
Memory Hierarchy Why are you dressed like that? Halloween was weeks ago! It makes me look faster, don t you think? Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity (Study
More informationLocality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example
Locality CS429: Computer Organization and Architecture Dr Bill Young Department of Computer Sciences University of Texas at Austin Principle of Locality: Programs tend to reuse data and instructions near
More informationDisks and Files. Jim Gray s Storage Latency Analogy: How Far Away is the Data? Components of a Disk. Disks
Review Storing : Disks and Files Lecture 3 (R&G Chapter 9) Aren t bases Great? Relational model SQL Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet A few
More informationCompSci 516 Database Systems
CompSci 516 Database Systems Lecture 20 NoSQL and Column Store Instructor: Sudeepa Roy Duke CS, Fall 2018 CompSci 516: Database Systems 1 Reading Material NOSQL: Scalable SQL and NoSQL Data Stores Rick
More informationChapter Seven Morgan Kaufmann Publishers
Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be
More informationOracle Performance on M5000 with F20 Flash Cache. Benchmark Report September 2011
Oracle Performance on M5000 with F20 Flash Cache Benchmark Report September 2011 Contents 1 About Benchware 2 Flash Cache Technology 3 Storage Performance Tests 4 Conclusion copyright 2011 by benchware.ch
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Bernhard Boser & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/24/16 Fall 2016 - Lecture #16 1 Software
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationModule 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.
The Lecture Contains: Memory organisation Example of memory hierarchy Memory hierarchy Disks Disk access Disk capacity Disk access time Typical disk parameters Access times file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture4/4_1.htm[6/14/2012
More informationStoring Data: Disks and Files. Administrivia (part 2 of 2) Review. Disks, Memory, and Files. Disks and Files. Lecture 3 (R&G Chapter 7)
Storing : Disks and Files Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Lecture 3 (R&G Chapter 7) Administrivia Greetings Office Hours Prof. Franklin
More informationCS 261 Fall Mike Lam, Professor. Memory
CS 261 Fall 2017 Mike Lam, Professor Memory Topics Memory hierarchy overview Storage technologies I/O architecture Storage trends Latency comparisons Locality Memory Until now, we've referred to memory
More information18-447: Computer Architecture Lecture 16: Virtual Memory
18-447: Computer Architecture Lecture 16: Virtual Memory Justin Meza Carnegie Mellon University (with material from Onur Mutlu, Michael Papamichael, and Vivek Seshadri) 1 Notes HW 2 and Lab 2 grades will
More informationLet!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies
1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of
More informationCS122A: Introduction to Data Management. Lecture #14: Indexing. Instructor: Chen Li
CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li 1 Indexing in MySQL (w/innodb) CREATE [UNIQUE FULLTEXT SPATIAL] INDEX index_name [index_type] ON tbl_name (index_col_name,...)
More informationCS425 Fall 2016 Boris Glavic Chapter 1: Introduction
CS425 Fall 2016 Boris Glavic Chapter 1: Introduction Modified from: Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Textbook: Chapter 1 1.2 Database Management System (DBMS)
More informationAlgorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I
Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language
More informationCS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"
CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" October 1, 2012! Prashanth Mohan!! Slides from Anthony Joseph and Ion Stoica! http://inst.eecs.berkeley.edu/~cs162! Caching!
More informationL9: Storage Manager Physical Data Organization
L9: Storage Manager Physical Data Organization Disks and files Record and file organization Indexing Tree-based index: B+-tree Hash-based index c.f. Fig 1.3 in [RG] and Fig 2.3 in [EN] Functional Components
More informationPaging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"
CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 22, 2011! Anthony D. Joseph and Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! Segmentation! Paging! Recap Segmentation
More informationLECTURE 10: Improving Memory Access: Direct and Spatial caches
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses
More information4 Myths about in-memory databases busted
4 Myths about in-memory databases busted Yiftach Shoolman Co-Founder & CTO @ Redis Labs @yiftachsh, @redislabsinc Background - Redis Created by Salvatore Sanfilippo (@antirez) OSS, in-memory NoSQL k/v
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationCS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:
CS550 Advanced Operating Systems (Distributed Operating Systems) Instructor: Xian-He Sun Email: sun@iit.edu, Phone: (312) 567-5260 Office hours: 1:30pm-2:30pm Tuesday, Thursday at SB229C, or by appointment
More informationCSE544 Database Architecture
CSE544 Database Architecture Tuesday, February 1 st, 2011 Slides courtesy of Magda Balazinska 1 Where We Are What we have already seen Overview of the relational model Motivation and where model came from
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationInformation Systems (Informationssysteme)
Information Systems (Informationssysteme) Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2018 c Jens Teubner Information Systems Summer 2018 1 Part IX B-Trees c Jens Teubner Information
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due
More informationComputer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College
Computer Systems C S 0 7 Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College 2 Today s Topics TODAY S LECTURE: Caching ANNOUNCEMENTS: Assign6 & Assign7 due Friday! 6 & 7 NO late
More informationCompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy
CompSci 516 Database Systems Lecture 20 Parallel DBMS Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW3 due on Monday, Nov 20, 11:55 pm (in 2 weeks) See some
More informationclass 10 fast scans 2.0 prof. Stratos Idreos
class 10 fast scans 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ always want to minimize data movement - computation & utilize all resources! registers on chip cache on board
More informationCSCI1270 Introduction to Database Systems
CSCI1270 Introduction to Database Systems with thanks to Prof. George Kollios, Boston University Prof. Mitch Cherniack, Brandeis University Prof. Avi Silberschatz, Yale University 1.1 What is a Database
More informationCrescando: Predictable Performance for Unpredictable Workloads
Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing
More informationRegisters. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH
PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U
More informationLecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter
Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "
More informationQuestion?! Processor comparison!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 5.1-5.2!! (Over the next 2 lectures)! Lecture 18" Introduction to Memory Hierarchies! 3! Processor components! Multicore processors and programming! Question?!
More informationStoring Data: Disks and Files
Storing Data: Disks and Files Module 2, Lecture 1 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Database Management Systems, R. Ramakrishnan 1 Disks and
More informationDisks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?
Why Not Store Everything in Main Memory? Storage Structures Introduction Chapter 8 (3 rd edition) Sharma Chakravarthy UT Arlington sharma@cse.uta.edu base Management Systems: Sharma Chakravarthy Costs
More informationCS 405G: Introduction to Database Systems. Storage
CS 405G: Introduction to Database Systems Storage It s all about disks! Outline That s why we always draw databases as And why the single most important metric in database processing is the number of disk
More informationChapter 1 Introduction
Chapter 1 Introduction Contents The History of Database System Overview of a Database Management System (DBMS) Three aspects of database-system studies the state of the art Introduction to Database Systems
More informationWeb Traffic - pct of Page Views
CS101 Lecture 30: Databases and Data-Driven Applications for example Aaron Stevens 23 November 2010 1 Web Traffic - pct of Page Views Source: alexa.com, 11/22/2010 2 1 What You ll Learn Today What is Facebook?
More informationOverview of Data Management
Overview of Data Management School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Overview of Data Management 1 / 21 What is Data ANSI definition of data: 1 A representation
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationCPU Pipelining Issues
CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput
More informationModern Database Concepts
Modern Database Concepts Introduction to the world of Big Data Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz What is Big Data? buzzword? bubble? gold rush? revolution? Big data is like teenage
More informationCS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015
CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable
More informationReview: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.
Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationCSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble
CSE 451: Operating Systems Spring 2009 Module 12 Secondary Storage Steve Gribble Secondary storage Secondary storage typically: is anything that is outside of primary memory does not permit direct execution
More information+ Random-Access Memory (RAM)
+ Memory Subsystem + Random-Access Memory (RAM) Key features RAM is traditionally packaged as a chip. Basic storage unit is normally a cell (one bit per cell). Multiple RAM chips form a memory. RAM comes
More informationCS 261 Fall Mike Lam, Professor. Virtual Memory
CS 261 Fall 2016 Mike Lam, Professor Virtual Memory Topics Operating systems Address spaces Virtual memory Address translation Memory allocation Lingering questions What happens when you call malloc()?
More informationCS162 Operating Systems and Systems Programming Lecture 13. Caches and TLBs. Page 1
CS162 Operating Systems and Systems Programming Lecture 13 Caches and TLBs March 12, 2008 Prof. Anthony D. Joseph http//inst.eecs.berkeley.edu/~cs162 Review Multi-level Translation What about a tree of
More information