CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #4: Constraints, Storing and Indexes
|
|
- Gillian Ross
- 6 years ago
- Views:
Transcription
1 CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #4: Constraints, Storing and Indexes
2 Constraints in Rela;onal Algebra and SQL
3 Maintaining Integrity of Data Data is dirty. How does an applica>on ensure that a database modifica>on does not corrupt the tables? Two approaches: Applica>on programs check that database modifica>ons are consistent. Use the features provided by SQL. 3
4 Maintaining Integrity of Data Data is dirty. How does an applica>on ensure that a database modifica>on does not corrupt the tables? Two approaches: Applica>on programs check that database modifica>ons are consistent. Use the features provided by SQL. 4
5 Integrity Checking in SQL PRIMARY KEY and UNIQUE constraints. FOREIGN KEY constraints. Constraints on a\ributes and tuples. Triggers (schema- level constraints). How do we express these constraints? How do we check these constraints? What do we do when a constraint is violated? 5
6 Keys in SQL A set of a\ributes S is a key for a rela>on R if every pair of tuples in R disagree on at least one a\ribute in S. Select one key to be the PRIMARY KEY; declare other keys using UNIQUE. 6
7 Primary Keys in SQL Modify the schema of Students to declare PID to be the key. CREATE TABLE Students( PID VARCHAR(8) PRIMARY KEY, Name CHAR(20), Address VARCHAR(255)); What about Courses, which has two a\ributes in its key? CREATE TABLE Courses(Number integer, DeptName: VARCHAR(8), CourseName VARCHAR(255), Classroom VARCHAR(30), Enrollment integer, PRIMARY KEY (Number, DeptName) ); 7
8 Effect of Declaring PRIMARY KEYs Two tuples in a rela>on cannot agree on all the a\ributes in the key. DBMS will reject any ac>on that inserts or updates a tuple in viola>on of this rule. A tuple cannot have a NULL value in a key a\ribute. 8
9 Other Keys in SQL If a rela>on has other keys, declare them using the UNIQUE keyword. Use UNIQUE in exactly the same places as PRIMARY KEY. There are two differences between PRIMARY KEY and UNIQUE: A table may have only one PRIMARY KEY but more than one set of a\ributes declared UNIQUE. A tuple may have NULL values in UNIQUE a\ributes. 9
10 Enforcing Key Constraints Upon which ac>ons should an RDBMS enforce a key constraint? Only tuple update and inser>on. RDMBS searches the tuples in the table to find if any tuple exists that agrees with the new tuple on all a\ributes in the primary key. To speed this process, an RDBMS automa>cally creates an efficient search index on the primary key. User can instruct the RDBMS to create an index on one or more a\ributes 10
11 Foreign Key Constraints Referen>al integrity constraint: in the rela>on Teach (that connects Courses and Professors), if Teach relates a course to a professor, then a tuple corresponding to the professor must exist in Professors. How do we express such constraints in Rela>onal Algebra? Consider the Teach(ProfessorPID, Number, DeptName) rela>on. every non- NULL value of ProfessorPID inteach must be a valid ProfessorPID in Professors. RA πprofessorpid(teach) π PID(Professors). 11
12 Foreign Key Constraints in SQL every non- NULL value of ProfessorPID inteach must be a valid ProfessorPID in Professors. In Teach, declare ProfessorPID to be a foreign key. CREATE TABLE Teach(ProfessorPID VARCHAR(8) REFERENCES Professor(PID), Name VARCHAR(30)...); CREATE TABLE Teach(ProfessorPID VARCHAR(8), Name VARCHAR(30)..., FOREIGN KEY ProfessorPID REFERENCES Professor(PID)); If the foreign key has mul>ple a\ributes, use the second type of declara>on. 12
13 Requirements for FOREIGN KEYs If a rela>on R declares that some of its a\ributes refer to foreign keys in another rela>on S, then these a\ributes must be declared UNIQUE or PRIMARY KEY in S. Values of the foreign key in R must appear in the referenced a\ributes of some tuple in S. 13
14 Enforcing Referen;al Integrity Three policies for maintaining referen>al integrity. Default policy: reject viola>ng modifica>ons. Cascade policy: mimic changes to the referenced a\ributes at the foreign key. Set- NULL policy: set appropriate a\ributes to NULL. 14
15 Specifying Referen;al Integrity Policies in SQL SQL allows the database designer to specify the policy for deletes and updates independently. Op>onally follow the declara>on of the foreign key with ON DELETE and/or ON UPDATE followed by the policy: SET NULL or CASCADE. Constraints can be circular, e.g., if there is a one- one mapping between two rela>ons. In this case, SQL allows us to defer the checking of constraints. 15
16 Constraining ATributes and Tuples SQL also allows us to specify constraints on a\ributes in a rela>on and on tuples in a rela>on. Disallow courses with a maximum enrollment greater than 100. A chairperson of a department must teach at most one course every semester. How do we express such constraints in SQL? How can we change our minds about constraints? A simple constraint: NOT NULL Declare an a\ribute to be NOT NULL ajer its type in a CREATE TABLE statement. Effect is to disallow tuples in which this a\ribute is NULL. 16
17 ATribute- Based CHECK Constraints Disallow courses with a maximum enrollment greater than 100. This constraint only affects the value of a single a\ribute in each tuple. CREATE TABLE Courses(... Enrollment INT CHECK (Enrollment <= 100)...); The condi>on can be any condi>on that can appear in a WHERE clause. CHECK statement may use a subquery to men>on other a\ributes of the same or other rela>ons. An a\ribute- based CHECK constraint is checked only when the value of that a\ribute changes. 17
18 Tuple- Based CHECK Constraints Tuple- based CHECK constraints are checked whenever a tuple is inserted into or updated in a rela;on. Designer may add these constraints ajer the list of a\ributes in a CREATE TABLE statement. A chairperson of a department teaches at most one course in any semester. CREATE TABLE Teach(... CHECK ProfessorPID NOT IN ((SELECT ProfessorPID FROM Teach) INTERSECT (SELECT ChairmanPID FROM Departments) ) ); 18
19 Modifying Constraints SQL allows constraints to be named. Use CONSTRAINT followed by the name of the constraint in front of PRIMARY KEY, UNIQUE, or CHECK. Can use constraint names in ALTER TABLE statements to delete constraints: say DROP CONSTRAINT followed by the name of the constraint. Can add constraints in an ALTER TABLE statement using ADD CONSTRAINT followed by an op>onal name followed by the (required) CHECK statement. 19
20 Asser;ons These are database- schema elements, like rela>ons Defined by: CREATE ASSERTION <name> CHECK (<condi>on>); Condi>on may refer to any rela>on or a\ribute in the database schema. 20
21 Asser;ons: Example Can t have more courses than students ( Pigeonhole Principle ) CREATE ASSERTION FewStudents CHECK ( (SELECT COUNT(*) FROM Students) <= (SELECT COUNT(*) FROM Courses) ); 21
22 Triggers: Mo;va;on Asser>ons are powerful, but the DBMS ojen can t tell when they need to be checked. A\ribute- and tuple- based checks are checked at known >mesà but not powerful. Triggers let the user decide when to check for any condi>on. 22
23 OK, what could have been done? 23
24 Announcements Check updated webpage More details on Grading scheme Homework dates Project deliverables Lecture Schedule 24
25 STORING DATA 25
26 DBMS Layers: Queries Query Optimization and Execution Relational Operators Files and Access Methods TODAY! Buffer Management Disk Space Management DB 26
27 Leverage OS for disk/file management? Layers of abstrac>on are good but: 27
28 Leverage OS for disk/file management? Layers of abstrac>on are good but: Unfortunately, OS ojen gets in the way of DBMS 28
29 Leverage OS for disk/file management? DBMS wants/needs to do things its own way Specialized prefetching Control over buffer replacement policy LRU not always best (some>mes worst!!) Control over thread/process scheduling Convoy problem Arises when OS scheduling conflicts with DBMS locking Control over flushing data to disk WAL protocol requires flushing log entries to disk 29
30 Disks and Files DBMS stores informa>on on disks. but: disks are (rela>vely) VERY slow! Major implica>ons for DBMS design! 30
31 Disks and Files Major implica>ons for DBMS design: READ: disk - > main memory (RAM). WRITE: reverse Both are high- cost opera>ons, rela>ve to in- memory opera>ons, so must be planned carefully! 31
32 Why Not Store It All in Main Memory? 32
33 Why Not Store It All in Main Memory? Costs too much. disk: ~$1/Gb; memory: ~$100/Gb High- end Databases today in the TB range. Approx 60% of the cost of a produc>on system is in the disks. Main memory is volable. Note: some specialized systems do store en>re database in main memory. 33
34 The Storage Hierarchy Smaller, Faster Bigger, Slower 34
35 Main memory (RAM) for currently used data. Disk for the main database (secondary storage). Tapes for archiving older versions of the data (tertiary storage). The Storage Hierarchy Registers L1 Cache. Main Memory Magnetic Disk Magnetic Tape Smaller, Faster Bigger, Slower 35
36 Jim Gray s Storage Latency Analogy: How Far Away is the Data? 10 9 Tape Andromeda The image cannot be displayed. Your computer The image may not have enough memory to cannot open the be image, or the image may have been displayed. corrupted. Your Restart your computer, and computer then open may the file again. If the red x still not appears, have you may have to delete the image and then insert it The again. image cannot be displayed. Your computer may not have enough 2,000 Years 10 6 Disk Pluto 2 Years 100 Memory Boston 1.5 hr On Board Cache On Chip Cache Registers This Building This Room My Head 10 min 1 min 36
37 Disks Secondary storage device of choice. Main advantage over tapes: random access vs. sequenbal. Data is stored and retrieved in units called disk blocks or pages. Unlike RAM, >me to retrieve a disk page varies depending upon loca>on on disk. rela>ve placement of pages on disk is important! 37
38 Anatomy of a Disk Disk head Spindle Tracks Sector Track Cylinder Platter Block size = multiple of sector size (which is fixed) Arm assembly Arm movement Platters Sector #38
39 Accessing a Disk Page Time to access (read/write) a disk block:... 39
40 Accessing a Disk Page Time to access (read/write) a disk block: seek Bme: moving arms to posi>on disk head on track rotabonal delay: wai>ng for block to rotate under head transfer Bme: actually moving data to/from disk surface 40
41 Accessing a Disk Page Rela>ve >mes? seek Bme: rotabonal delay: transfer Bme: 41
42 Accessing a Disk Page Rela>ve >mes? seek Bme: about 1 to 20msec rotabonal delay: 0 to 10msec transfer Bme: < 1msec per 4KB page Seek Rotate transfer Transfer 42
43 Seek ;me & rota;onal delay dominate Key to lower I/O cost: reduce seek/rota>on delays! Also note: For shared disks, much >me spent wai>ng in queue for access to arm/controller Seek Rotate transfer 43
44 Arranging Pages on Disk Next block concept: blocks on same track, followed by blocks on same cylinder, followed by blocks on adjacent cylinder Accesing next block is cheap A useful op>miza>on: pre- fetching See textbook page
45 Rules of thumb 1. Memory access much faster than disk I/O (~ 1000x) Sequen>al I/O faster than random I/O (~ 10x) 45
46 Disk Arrays: RAID Just FYI Logical Physical Benefits: Higher throughput (via data striping ) Longer MTTF (via redundancy) 46
47 Recall: DBMS Layers Queries Query Optimization and Execution Relational Operators Files and Access Methods TODAY! Buffer Management Disk Space Management DB 47
48 Buffer Management in a DBMS (copy of a) disk page free frame Page Requests from Higher Levels Just FYI buffer pool MAIN MEMORY DISK DB choice of frame dictated by replacement policy 48
49 Files FILE: A collec>on of pages, each containing a collec>on of records. Must support: insert/delete/modify record read a par>cular record (specified using record id) scan all records (possibly with some condi>ons on the records to be retrieved) 49
50 Alterna;ve File Organiza;ons Several alterna>ves (w/ trade- offs): Heap files: Suitable when typical access is a file scan retrieving all records. Sorted Files: Index File Organiza>ons: 50
51 Variable length records SLOTTED PAGE organiza>on - popular. occupied records... page header 51
52 Conclusions- - - Storing Memory hierarchy Disks: (>1000x slower) - thus pack info in blocks try to fetch nearby blocks (sequen>ally) Record organiza>on: Slo\ed page 52
53 TREE INDEXES 53
54 Declaring Indexes No standard! Typical syntax: CREATE INDEX StudentsInd ON Students(ID); CREATE INDEX CoursesInd ON Courses(Number, DeptName); 54
55 Types of Indexes Primary: index on a key Used to enforce constraints Secondary: index on non- key a\ribute Clustering: order of the rows in the data pages correspond to the order of the rows in the index Only one clustered index can exist in a given table Useful for range predicates Non- clustering: physical order not the same as index order 55
56 Using Indexes (1): Equality Searches Given a value v, the index takes us to only those tuples that have v in the a\ribute(s) of the index. E.g. (use CourseInd index) SELECT Enrollment FROM Courses WHERE Number = 4604 and DeptName = CS 56
57 Using Indexes (1): Equality Searches Given a value v, the index takes us to only those tuples that have v in the a\ribute(s) of the index. Can use Hashes, but see next 57
58 Using Indexes (2): Range Searches ``Find all students with gpa > 3.0 may be slow, even on sorted file Hashes not a good idea! What to do? Page 1 Page 2 Page 3 Page N Data File 58
59 Range Searches ``Find all students with gpa > 3.0 may be slow, even on sorted file Solu>on: Create an `index file. k1 k2 kn Index File Page 1 Page 2 Page 3 Page N Data File 59
60 Range Searches More details: if index file is small, do binary search there Otherwise?? k1 k2 kn Index File Page 1 Page 2 Page 3 Page N Data File 60
61 B- trees the most successful family of index schemes (B- trees, B+- trees, B*- trees) Can be used for primary/secondary, clustering/non- clustering index. balanced n- way search trees Original Paper: Rudolf Bayer and McCreight, E. M. Organiza>on and Maintenance of Large Ordered Indexes. Acta Informa>ca 1, ,
62 B- trees Eg., B- tree of order d=1: <6 6 >6 <9 9 >
63 B - tree proper;es: each node, in a B- tree of order d: Key order at most n=2d keys at least d keys (except root, which may have just 1 key) all leaves at the same level if number of pointers is k, then node has exactly p1 pn k- 1 keys (leaves are empty) v1 v2 v n- 1 63
64 Proper;es block aware nodes: each node is a disk page O(log (N)) for everything! (ins/del/search) typically, if d = , then 2-3 levels u>liza>on >= 50%, guaranteed; on average 69% 64
65 Queries Algo for exact match query? (eg., ssn=8?) <6 6 >6 <9 9 >
66 JAVA anima;on h\p://slady.net/java/bt/ 66
67 Queries Algo for exact match query? (eg., ssn=8?) <6 6 >6 <9 9 >
68 Queries Algo for exact match query? (eg., ssn=8?) <6 6 >6 <9 9 >
69 Queries Algo for exact match query? (eg., ssn=8?) <6 6 >6 <9 9 >
70 Queries Algo for exact match query? (eg., ssn=8?) <6 6 >6 <9 9 >9 H steps (= disk accesses)
71 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) 71
72 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 6 9 >6 <9 >
73 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 6 9 >6 <9 >
74 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 6 9 >6 <9 >
75 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 6 9 >6 <9 >
76 Varia;ons How could we do even be\er than the B- trees above? 76
77 B+ trees - Mo;va;on B- tree print keys in sorted order: <6 6 >6 <9 9 >
78 B+ trees - Mo;va;on B- tree needs back- tracking how to avoid it? <6 6 >6 <9 9 >
79 B+ trees - Mo;va;on Stronger reason: for clustering index, data records are sca\ered: <6 6 >6 <9 9 >
80 Solu;on: B+ - trees facilitate sequen>al ops They string all leaf nodes together AND replicate keys from non- leaf nodes, to make sure every key appears at the leaf level (vital, for clustering index!) 80
81 B+ trees <6 6 >=6 <9 9 >=
82 B+ trees <6 6 9 Index Pages >=6 <9 >= Data Pages 82
83 B+ trees More details: next (and textbook) In short: on split at leaf level: COPY middle key upstairs at non- leaf level: push middle key upstairs (as in plain B- tree) 83
84 Example B+ Tree Search begins at root, and key comparisons direct it to a leaf Search for 5*, 15*, all data entries >= 24*... Root * 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Based on the search for 15*, we know it is not in the tree! 84
85 Inser;ng a Data Entry into a B+ Tree Find correct leaf L. Put data entry onto L. If L has enough space, done! Else, must split L (into L and a new node L2) Redistribute entries evenly, copy up middle key. parent node may overflow but then: push up middle key. Splits grow tree; root split increases height. 85
86 Example B+ Tree Inser;ng 30* Root * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* 86
87 Example B+ Tree Inser;ng 30* Root * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* 30* 87
88 Example B+ Tree - Inser;ng 8* Root * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* 88
89 Example B+ Tree - Inser;ng 8* Root * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* No Space 89
90 Example B+ Tree - Inser;ng 8* Root So Split! * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* * 3* 5* 14* 16* 19* 20* 22* 23* 24* 27* 5* 7* 8* 29* 90
91 Example B+ Tree - Inser;ng 8* Root So Split! * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* And then push middle UP * 3* 5* 14* 16* 19* 20* 22* 23* 24* 27* 5* 7* 8* 29* 91
92 Example B+ Tree - Inser;ng 8* Root * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* Final State <5 >= * 3* 14* 16* 19* 20* 22* 23* 24* 27* 5* 7* 8* 29* 92
93 Example B+ Tree - Inser;ng 21* Root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* 93
94 Example B+ Tree - Inser;ng 21* Root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* Root is Full, so split recursively 2* 3* 5* 7* 8* 14* 16* 19* 20* 21* 22* 23* 24* 27* 29* 94
95 Example B+ Tree: Recursive split Root * 3* 5* 7* 8* 14* 16* 19* 20* 21* 22* 23* 24* 27* 29* Notice that root was also split, increasing height. 95
96 Example: Data vs. Index Page Split leaf: copy non- leaf: push Data Page Split 2* 3* 5* 7* 8* 5 2* 3* 5* 7* 8* why not non- leaves? Index Page Split #96
97 Same Inser;ng 21*: The Deferred Root Split * 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* Note this has free space. So 97
98 Inser;ng 21*: The Deferred Split Root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* Root LEND keys to sibling, through PARENT! 2* 3* 5* 7* 8* 14* 16* 19* 20* 21* 22* 23* 24* 27* 29* 98
99 Inser;ng 21*: The Deferred Split Root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* Root Shorter, more packed, faster tree * 3* 5* 7* 8* 14* 16* 19* 20* 21* 22* 23* 24* 27* 29* 99
100 Inser;on examples for you to try Root (not shown) 2* 3* 5* 7* 8* 11* 14* 16* 21* 22* 23* Insert the following data entries (in order): 28*, 6*, 25* 100
101 Answer After inserting 28*, 6* * 3* 5* 6* 7* 8* 11* 14* 16* 21* 22* 23* 28* After inserting 25* 101
102 Answer After inserting 25* * 3* 5* 6* 7* 8* 11* 14* 16* 21* 22* 23* 25* 28* 102
103 Dele;ng a Data Entry from a B+ Tree Start at root, find leaf L where entry belongs. Remove the entry. If L is at least half- full, done! If L underflows Try to re- distribute, borrowing from sibling (adjacent node with same parent as L). If re- distribu>on fails, merge L and sibling. update parent and possibly merge, recursively 103
104 Dele;on from B+Tree Root * 3* 5* 7* 8* 14* 16* 19* 20* 21* 22* 23* 24* 27* 29* 104
105 Example: Delete 19* & 20* Root Dele>ng 19* is easy: * 3* 5* 7* 8* 14* 16* 19* 20* 20* 22* 22* 24* 27* 29* 33* 34* 38* 39* 3 Root * 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* Dele>ng 20* - > re- distribu>on (no>ce: 27 copied up) 105
106 ... And Then Dele;ng 24* 3 Root * 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* 4 Root * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* Must merge leaves: OPPOSITE of insert 106
107 ... And Then Dele;ng 24* 3 Root * 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* 4 Root 17 but are we done?? * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* Must merge leaves: OPPOSITE of insert 107
108 ... Merge Non- Leaf Nodes, Shrink Tree 4 Root * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* 5 Root * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* 108
109 Example of Non- leaf Re- distribu;on Tree is shown below during dele>on of 24*. Now, we can re- distribute keys Root * 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* 109
110 Aqer Re- distribu;on need only re- distribute 20 ; did 17, too why would we want to re- distribute more keys? Root * 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* 110
111 Main observa;ons for dele;on If a key value appears twice (leaf + nonleaf), the above algorithms delete it from the leaf, only why not non- leaf, too? 111
112 Main observa;ons for dele;on If a key value appears twice (leaf + nonleaf), the above algorithms delete it from the leaf, only why not non- leaf, too? lazy dele>ons - in fact, some vendors just mark entries as deleted (~ underflow), and reorganize/compact later 112
113 Recap: main ideas on overflow, split (and push, or copy ) or consider deferred split on underflow, borrow keys; or merge or let it underflow
114 B+ Trees in Prac;ce Typical order: 100. Typical fill- factor: 67%. average fanout = 2*100*0.67 = 134 Typical capaci>es: Height 4: 1334 = 312,900,721 entries Height 3: 1333 = 2,406,104 entries 114
115 B+ Trees in Prac;ce Can ojen keep top levels in buffer pool: Level 1 = 1 page = 8 KB Level 2 = 134 pages = 1 MB Level 3 = 17,956 pages = 140 MB 115
116 B+ trees with duplicates Everything so far: assumed unique key values How to extend B+- trees for duplicates? Alt. 2: <key, rid> Alt. 3: <key, {rid list}> 2 approaches, roughly equivalent 116
117 B+ trees with duplicates approach#1: repeat the key values, and extend B+ tree algo s appropriately - eg. many 14 s * 3* 5* 7* 13* 14* 14* 14* 14* 14* 22* 23* 24* 27* 29* 117
118 B+ trees with duplicates approach#1: subtle problem with dele>on: treat rid as part of the key, thus making it unique * 3* 5* 7* 13* 14* 14* 14* 14* 14* 22* 23* 24* 27* 29* 118
119 B+ trees with duplicates approach#2: store each key value: once but store the {rid list} as variable- length field (and use overflow pages, if needed) * 3* 5* 7* 13* 14* {rid list} 22* 23* 24* 27* 29* {rid list, cont d} 119
120 B+trees in Prac;ce prefix compression; bulk- loading; order 120
121 Prefix Key Compression Important to increase fan- out. (Why?) Key values in index entries only `direct traffic ; can ojen compress them. Papadopoulos Pernikovskaya 121
122 Prefix Key Compression Important to increase fan- out. (Why?) Key values in index entries only `direct traffic ; can ojen compress them. Pap Per <room for more separators/keys> 122
123 Bulk Loading of a B+ Tree In an empty tree, insert many keys Why not one- at- a- >me? Too slow! 123
124 Bulk Loading of a B+ Tree Ini>aliza>on: Sort all data entries scan list; whenever enough for a page, pack <repeat for upper level> Root Sorted pages of data entries; not yet in B+ tree 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 124
125 Bulk Loading of a B+ Tree Root Data entry pages not yet in B+ tree 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* Root Data entry pages not yet in B+ tree Prakash * 4* 6* 9* VT 10* CS 11* * 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 125
126 A Note on `Order Order (d) concept replaced by physical space criterion in prac>ce (`at least half- full ). Why do we need it? Index pages can typically hold many more entries than leaf pages. Variable sized records and search keys mean different nodes will contain different numbers of entries. Even with fixed length fields, mul>ple records with the same search key value (duplicates) can lead to variable- sized data entries (if we use Alterna>ve (3)). 126
127 A Note on `Order Many real systems are even sloppier than this: they allow underflow, and only reclaim space when a page is completely empty. (what are the benefits of such slopiness?) 127
128 Conclusions B+tree is the prevailing indexing method Excellent, O(logN) worst- case performance for ins/del/search; (~3-4 disk accesses in prac>ce) guaranteed 50% space u>liza>on; avg 69% 128
129 Conclusions Can be used for any type of index: primary/ secondary, sparse (clustering), or dense (non- clustering) Several fine- extensions on the basic algorithm deferred split; prefix compression; (underflows) bulk- loading duplicate handling 129
CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #4: Constraints, Storing and Indexes
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #4: Constraints, Storing and Indexes Constraints in Rela;onal Algebra and SQL Maintaining Integrity of Data Data is dirty. How does an applica>on
More informationCS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #4: SQL---Part 2
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #4: SQL---Part 2 Overview - detailed - SQL DML other parts: views modifications joins DDL constraints Prakash 2016 VT CS 4604
More informationDatabase design and implementation CMPSCI 645. Lecture 08: Storage and Indexing
Database design and implementation CMPSCI 645 Lecture 08: Storage and Indexing 1 Where is the data and how to get to it? DB 2 DBMS architecture Query Parser Query Rewriter Query Op=mizer Query Executor
More informationProblem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing
15-82 Advanced Topics in Database Systems Performance Problem Given a large collection of records, Indexing with B-trees find similar/interesting things, i.e., allow fast, approximate queries 2 Indexing
More informationStoring Data: Disks and Files
Storing Data: Disks and Files CS 186 Fall 2002, Lecture 15 (R&G Chapter 7) Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Stuff Rest of this week My office
More informationDisks and Files. Jim Gray s Storage Latency Analogy: How Far Away is the Data? Components of a Disk. Disks
Review Storing : Disks and Files Lecture 3 (R&G Chapter 9) Aren t bases Great? Relational model SQL Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet A few
More informationCS 405G: Introduction to Database Systems. Storage
CS 405G: Introduction to Database Systems Storage It s all about disks! Outline That s why we always draw databases as And why the single most important metric in database processing is the number of disk
More informationDatabase Systems. November 2, 2011 Lecture #7. topobo (mit)
Database Systems November 2, 2011 Lecture #7 1 topobo (mit) 1 Announcement Assignment #2 due today Assignment #3 out today & due on 11/16. Midterm exam in class next week. Cover Chapters 1, 2,
More informationStoring Data: Disks and Files
Storing Data: Disks and Files Module 2, Lecture 1 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Database Management Systems, R. Ramakrishnan 1 Disks and
More informationStorage hierarchy. Textbook: chapters 11, 12, and 13
Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular
More informationStoring Data: Disks and Files
Storing Data: Disks and Files Chapter 9 CSE 4411: Database Management Systems 1 Disks and Files DBMS stores information on ( 'hard ') disks. This has major implications for DBMS design! READ: transfer
More informationPrinciples of Data Management. Lecture #2 (Storing Data: Disks and Files)
Principles of Data Management Lecture #2 (Storing Data: Disks and Files) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Topics v Today
More informationIndexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25
Indexing Jan Chomicki University at Buffalo Jan Chomicki () Indexing 1 / 25 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow (nanosec) (10 nanosec) (millisec) (sec) Very small Small
More informationDisks, Memories & Buffer Management
Disks, Memories & Buffer Management The two offices of memory are collection and distribution. - Samuel Johnson CS3223 - Storage 1 What does a DBMS Store? Relations Actual data Indexes Data structures
More informationL9: Storage Manager Physical Data Organization
L9: Storage Manager Physical Data Organization Disks and files Record and file organization Indexing Tree-based index: B+-tree Hash-based index c.f. Fig 1.3 in [RG] and Fig 2.3 in [EN] Functional Components
More informationStoring Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to:
Storing : Disks and Files base Management System, R. Ramakrishnan and J. Gehrke 1 Storing and Retrieving base Management Systems need to: Store large volumes of data Store data reliably (so that data is
More informationStoring and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?
Storing and Retrieving Storing : Disks and Files base Management Systems need to: Store large volumes of data Store data reliably (so that data is not lost!) Retrieve data efficiently Alternatives for
More informationStoring and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?
Storing and Retrieving Storing : Disks and Files Chapter 9 base Management Systems need to: Store large volumes of data Store data reliably (so that data is not lost!) Retrieve data efficiently Alternatives
More informationStoring Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Chapter 7
Storing : Disks and Files Chapter 7 base Management Systems, R. Ramakrishnan and J. Gehrke 1 Storing and Retrieving base Management Systems need to: Store large volumes of data Store data reliably (so
More informationSTORING DATA: DISK AND FILES
STORING DATA: DISK AND FILES CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? How does a DBMS store data? disk, SSD, main memory The Buffer manager controls how
More informationDisks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?
Why Not Store Everything in Main Memory? Storage Structures Introduction Chapter 8 (3 rd edition) Sharma Chakravarthy UT Arlington sharma@cse.uta.edu base Management Systems: Sharma Chakravarthy Costs
More informationDisks & Files. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke
Disks & Files Yanlei Diao UMass Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke DBMS Architecture Query Parser Query Rewriter Query Optimizer Query Executor Lock Manager for Concurrency Access
More informationStoring Data: Disks and Files
Storing Data: Disks and Files Chapter 7 (2 nd edition) Chapter 9 (3 rd edition) Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Database Management Systems,
More informationIntroduction to Data Management. Lecture #13 (Indexing)
Introduction to Data Management Lecture #13 (Indexing) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework info: HW #5 (SQL):
More informationStoring Data: Disks and Files
Storing Data: Disks and Files Chapter 9 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Disks
More informationData Organization B trees
Data Organization B trees Data organization and retrieval File organization can improve data retrieval time SELECT * FROM depositors WHERE bname= Downtown 100 blocks 200 recs/block Query returns 150 records
More informationB-Tree. CS127 TAs. ** the best data structure ever
B-Tree CS127 TAs ** the best data structure ever Storage Types Cache Fastest/most costly; volatile; Main Memory Fast access; too small for entire db; volatile Disk Long-term storage of data; random access;
More informationAnnouncements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)
CompSci 16 Intensive Computing Systems Lecture 7 Storage and Index Instructor: Sudeepa Roy Announcements HW1 deadline this week: Due on 09/21 (Thurs), 11: pm, no late days Project proposal deadline: Preliminary
More informationCS122A: Introduction to Data Management. Lecture #14: Indexing. Instructor: Chen Li
CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li 1 Indexing in MySQL (w/innodb) CREATE [UNIQUE FULLTEXT SPATIAL] INDEX index_name [index_type] ON tbl_name (index_col_name,...)
More informationIntroduction to Data Management. Lecture 14 (Storage and Indexing)
Introduction to Data Management Lecture 14 (Storage and Indexing) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW s and quizzes:
More informationStoring Data: Disks and Files. Administrivia (part 2 of 2) Review. Disks, Memory, and Files. Disks and Files. Lecture 3 (R&G Chapter 7)
Storing : Disks and Files Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Lecture 3 (R&G Chapter 7) Administrivia Greetings Office Hours Prof. Franklin
More informationTHE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan
THE B+ TREE INDEX CS 564- Spring 2018 ACKs: Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? The B+ tree index Basics Search/Insertion/Deletion Design & Cost 2 INDEX RECAP We have the following query:
More informationUser Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM
Module III Overview of Storage Structures, QP, and TM Sharma Chakravarthy UT Arlington sharma@cse.uta.edu http://www2.uta.edu/sharma base Management Systems: Sharma Chakravarthy Module I Requirements analysis
More informationReview 1-- Storing Data: Disks and Files
Review 1-- Storing Data: Disks and Files Chapter 9 [Sections 9.1-9.7: Ramakrishnan & Gehrke (Text)] AND (Chapter 11 [Sections 11.1, 11.3, 11.6, 11.7: Garcia-Molina et al. (R2)] OR Chapter 2 [Sections 2.1,
More informationTree-Structured Indexes
Tree-Structured Indexes CS 186, Fall 2002, Lecture 17 R & G Chapter 9 If I had eight hours to chop down a tree, I'd spend six sharpening my ax. Abraham Lincoln Introduction Recall: 3 alternatives for data
More informationSystem Structure Revisited
System Structure Revisited Naïve users Casual users Application programmers Database administrator Forms DBMS Application Front ends DML Interface CLI DDL SQL Commands Query Evaluation Engine Transaction
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton
More informationAdministrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction
Administrivia Tree-Structured Indexes Lecture R & G Chapter 9 Homeworks re-arranged Midterm Exam Graded Scores on-line Key available on-line If I had eight hours to chop down a tree, I'd spend six sharpening
More informationKathleen Durant PhD Northeastern University CS Indexes
Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical
More informationTopics to Learn. Important concepts. Tree-based index. Hash-based index
CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index vs. non-clustering index) Tree-based vs. hash-based index Tree-based
More informationCSE 444: Database Internals. Lectures 5-6 Indexing
CSE 444: Database Internals Lectures 5-6 Indexing 1 Announcements HW1 due tonight by 11pm Turn in an electronic copy (word/pdf) by 11pm, or Turn in a hard copy in my office by 4pm Lab1 is due Friday, 11pm
More informationStoring Data: Disks and Files
Storing Data: Disks and Files Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Data Access Disks and Files DBMS stores information on ( hard ) disks. This
More informationIndexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Indexing Chapter 8, 10, 11 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Tree-Based Indexing The data entries are arranged in sorted order by search key value. A hierarchical search
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals: Part II Lecture 10, February 17, 2014 Mohammad Hammoud Last Session: DBMS Internals- Part I Today Today s Session: DBMS Internals- Part II Brief summaries
More informationCS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10
CS143: Index Book Chapters: (4 th ) 12.1-3, 12.5-8 (5 th ) 12.1-3, 12.6-8, 12.10 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index
More informationTree-Structured Indexes. Chapter 10
Tree-Structured Indexes Chapter 10 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k 25, [n1,v1,k1,25] 25,
More informationECS 165B: Database System Implementa6on Lecture 3
ECS 165B: Database System Implementa6on Lecture 3 UC Davis April 4, 2011 Acknowledgements: some slides based on earlier ones by Raghu Ramakrishnan, Johannes Gehrke, Jennifer Widom, Bertram Ludaescher,
More informationStoring Data: Disks and Files
Storing Data: Disks and Files Lecture 3 (R&G Chapter 7) Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Administrivia Greetings Office Hours Prof. Franklin
More informationTree-Structured Indexes
Tree-Structured Indexes Yanlei Diao UMass Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke Access Methods v File of records: Abstraction of disk storage for query processing (1) Sequential scan;
More informationParser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.
Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text. Lifecycle of an SQL Query CSE 190D base System Implementation Arun Kumar Query Query Result
More informationSpring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1
Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1 Consider the following table: Motivation CREATE TABLE Tweets ( uniquemsgid INTEGER,
More informationTree-Structured Indexes
Tree-Structured Indexes Chapter 9 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables
More informationPrinciples of Data Management. Lecture #5 (Tree-Based Index Structures)
Principles of Data Management Lecture #5 (Tree-Based Index Structures) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Headlines v Project
More informationTree-Structured Indexes
Tree-Structured Indexes Chapter 10 Comp 521 Files and Databases Fall 2010 1 Introduction As for any index, 3 alternatives for data entries k*: index refers to actual data record with key value k index
More informationMaterial You Need to Know
Review Quiz 2 Material You Need to Know Normalization Storage and Disk File Layout Indexing B-trees and B+ Trees Extensible Hashing Linear Hashing Decomposition Goals: Lossless Joins, Dependency preservation
More informationLecture 8 Index (B+-Tree and Hash)
CompSci 516 Data Intensive Computing Systems Lecture 8 Index (B+-Tree and Hash) Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 HW1 due tomorrow: Announcements Due on 09/21 (Thurs),
More informationFile Structures and Indexing
File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures
More informationCSE 190D Database System Implementation
CSE 190D Database System Implementation Arun Kumar Topic 1: Data Storage, Buffer Management, and File Organization Chapters 8 and 9 (except 8.5.4 and 9.2) of Cow Book Slide ACKs: Jignesh Patel, Paris Koutris
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationTree-Structured Indexes. A Note of Caution. Range Searches ISAM. Example ISAM Tree. Introduction
Tree-Structured Indexes Lecture R & G Chapter 9 If I had eight hours to chop down a tree, I'd spend six sharpening my ax. Abraham Lincoln Introduction Recall: 3 alternatives for data entries k*: Data record
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #5: Hashing and Sor5ng
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #5: Hashing and Sor5ng (Sta7c) Hashing Problem: find EMP record with ssn=123 What if disk space was free, and 5me was at premium? Prakash
More informationLecture 13. Lecture 13: B+ Tree
Lecture 13 Lecture 13: B+ Tree Lecture 13 Announcements 1. Project Part 2 extension till Friday 2. Project Part 3: B+ Tree coming out Friday 3. Poll for Nov 22nd 4. Exam Pickup: If you have questions,
More informationProject is due on March 11, 2003 Final Examination March 18, pm to 10.30pm
Announcements Please remember to send a mail to Deepa to register for a timeslot for your project demo by March 6, 2003 See Project Guidelines on class web page for more details Project is due on March
More informationPhysical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.
Physical Disk Structure Physical Data Organization and Indexing Chapter 11 1 4 Access Path Refers to the algorithm + data structure (e.g., an index) used for retrieving and storing data in a table The
More informationTree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:
Introduction Tree-Structured Indexes Chapter 10 As for any index, 3 alternatives for data entries k*: Data record with key value k
More informationWhy Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page
Why Is This Important? Overview of Storage and Indexing Chapter 8 DB performance depends on time it takes to get the data from storage system and time to process Choosing the right index for faster access
More informationInformation Systems (Informationssysteme)
Information Systems (Informationssysteme) Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2018 c Jens Teubner Information Systems Summer 2018 1 Part IX B-Trees c Jens Teubner Information
More informationData Storage and Query Answering. Data Storage and Disk Structure (2)
Data Storage and Query Answering Data Storage and Disk Structure (2) Review: The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM @200MHz) 6,400
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals: Part II Lecture 11, February 17, 2015 Mohammad Hammoud Last Session: DBMS Internals- Part I Today Today s Session: DBMS Internals- Part II A Brief Summary
More informationInstructor: Amol Deshpande
Instructor: Amol Deshpande amol@cs.umd.edu } Storage and Query Processing Storage and memory hierarchy Query plans and how to interpret EXPLAIN output } Other things Midterm grades: later today Look out
More informationCSIT5300: Advanced Database Systems
CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,
More informationTree-Structured Indexes
Tree-Structured Indexes Chapter 9 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k*: ➀ Data record with key value k ➁
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #2: The Rela0onal Model, and SQL/Rela0onal Algebra
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #2: The Rela0onal Model, and SQL/Rela0onal Algebra Data Model A Data Model is a nota0on for describing data or informa0on. Structure of
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+
More informationTree-Structured Indexes (Brass Tacks)
Tree-Structured Indexes (Brass Tacks) Chapter 10 Ramakrishnan and Gehrke (Sections 10.3-10.8) CPSC 404, Laks V.S. Lakshmanan 1 What will I learn from this set of lectures? How do B+trees work (for search)?
More informationCARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS
CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE 15-415 DATABASE APPLICATIONS C. Faloutsos Indexing and Hashing 15-415 Database Applications http://www.cs.cmu.edu/~christos/courses/dbms.s00/ general
More informationCPSC 421 Database Management Systems. Lecture 11: Storage and File Organization
CPSC 421 Database Management Systems Lecture 11: Storage and File Organization * Some material adapted from R. Ramakrishnan, L. Delcambre, and B. Ludaescher Today s Agenda Start on Database Internals:
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationOutlines. Chapter 2 Storage Structure. Structure of a DBMS (with some simplification) Structure of a DBMS (with some simplification)
Outlines Chapter 2 Storage Structure Instructor: Churee Techawut 1) Structure of a DBMS 2) The memory hierarchy 3) Magnetic tapes 4) Magnetic disks 5) RAID 6) Disk space management 7) Buffer management
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationUnit 3 Disk Scheduling, Records, Files, Metadata
Unit 3 Disk Scheduling, Records, Files, Metadata Based on Ramakrishnan & Gehrke (text) : Sections 9.3-9.3.2 & 9.5-9.7.2 (pages 316-318 and 324-333); Sections 8.2-8.2.2 (pages 274-278); Section 12.1 (pages
More informationTree-Structured Indexes
Introduction Tree-Structured Indexes Chapter 10 As for any index, 3 alternatives for data entries k*: Data record with key value k
More informationPhysical Data Organization. Introduction to Databases CompSci 316 Fall 2018
Physical Data Organization Introduction to Databases CompSci 316 Fall 2018 2 Announcements (Tue., Nov. 6) Homework #3 due today Project milestone #2 due Thursday No separate progress update this week Use
More information10.1 Physical Design: Introduction. 10 Physical schema design. Physical Design: I/O cost. Physical Design: I/O cost.
10 Physical schema design 10.1 Introduction Motivation Disk technology RAID 10.2 Index structures in DBS Indexing concept Primary and Secondary indexes 10.3. ISAM and B + -Trees 10.4. SQL and indexes Criteria
More informationLecture 12. Lecture 12: The IO Model & External Sorting
Lecture 12 Lecture 12: The IO Model & External Sorting Lecture 12 Today s Lecture 1. The Buffer 2. External Merge Sort 2 Lecture 12 > Section 1 1. The Buffer 3 Lecture 12 > Section 1 Transition to Mechanisms
More informationGoals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query
Goals for Today CS 133: Databases Fall 2018 Lec 02 09/06 Relational Model & Memory and Buffer Manager Prof. Beth Trushkowsky Reason about the conceptual evaluation of an SQL query Understand the storage
More informationRAID in Practice, Overview of Indexing
RAID in Practice, Overview of Indexing CS634 Lecture 4, Feb 04 2014 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke 1 Disks and Files: RAID in practice For a big enterprise
More informationDatabase System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static
More informationWhy Sort? Data requested in sorted order. Sor,ng is first step in bulk loading B+ tree index. e.g., find students in increasing GPA order
External Sor,ng Outline Exam will be graded a5er everyone takes it There are two,mes to be fair on an exam, when it s wriaen and when it s graded you only need to trust that I ll be fair at one of them.
More informationEXTERNAL SORTING. Sorting
EXTERNAL SORTING 1 Sorting A classic problem in computer science! Data requested in sorted order (sorted output) e.g., find students in increasing grade point average (gpa) order SELECT A, B, C FROM R
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationIndexes. File Organizations and Indexing. First Question to Ask About Indexes. Index Breakdown. Alternatives for Data Entries (Contd.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears, Roebuck, and Co., Consumer's Guide, 1897 Indexes
More informationCSC 261/461 Database Systems Lecture 17. Fall 2017
CSC 261/461 Database Systems Lecture 17 Fall 2017 Announcement Quiz 6 Due: Tonight at 11:59 pm Project 1 Milepost 3 Due: Nov 10 Project 2 Part 2 (Optional) Due: Nov 15 The IO Model & External Sorting Today
More informationCS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1 Reminder: Rela0onal Algebra Rela2onal algebra is a nota2on for specifying queries
More informationLecture 4. ISAM and B + -trees. Database Systems. Tree-Structured Indexing. Binary Search ISAM. B + -trees
Lecture 4 and Database Systems Binary Multi-Level Efficiency Partitioned 1 Ordered Files and Binary How could we prepare for such queries and evaluate them efficiently? 1 SELECT * 2 FROM CUSTOMERS 3 WHERE
More informationCS542. Algorithms on Secondary Storage Sorting Chapter 13. Professor E. Rundensteiner. Worcester Polytechnic Institute
CS542 Algorithms on Secondary Storage Sorting Chapter 13. Professor E. Rundensteiner Lesson: Using secondary storage effectively Data too large to live in memory Regular algorithms on small scale only
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More information