CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #4: Constraints, Storing and Indexes

Size: px
Start display at page:

Download "CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #4: Constraints, Storing and Indexes"

Transcription

1 CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #4: Constraints, Storing and Indexes

2 Constraints in Rela;onal Algebra and SQL

3 Maintaining Integrity of Data Data is dirty. How does an applica>on ensure that a database modifica>on does not corrupt the tables? Two approaches: Applica>on programs check that database modifica>ons are consistent. Use the features provided by SQL. 3

4 Maintaining Integrity of Data Data is dirty. How does an applica>on ensure that a database modifica>on does not corrupt the tables? Two approaches: Applica>on programs check that database modifica>ons are consistent. Use the features provided by SQL. 4

5 Integrity Checking in SQL PRIMARY KEY and UNIQUE constraints. FOREIGN KEY constraints. Constraints on a\ributes and tuples. Triggers (schema- level constraints). How do we express these constraints? How do we check these constraints? What do we do when a constraint is violated? 5

6 Keys in SQL A set of a\ributes S is a key for a rela>on R if every pair of tuples in R disagree on at least one a\ribute in S. Select one key to be the PRIMARY KEY; declare other keys using UNIQUE. 6

7 Primary Keys in SQL Modify the schema of Students to declare PID to be the key. CREATE TABLE Students( PID VARCHAR(8) PRIMARY KEY, Name CHAR(20), Address VARCHAR(255)); What about Courses, which has two a\ributes in its key? CREATE TABLE Courses(Number integer, DeptName: VARCHAR(8), CourseName VARCHAR(255), Classroom VARCHAR(30), Enrollment integer, PRIMARY KEY (Number, DeptName) ); 7

8 Effect of Declaring PRIMARY KEYs Two tuples in a rela>on cannot agree on all the a\ributes in the key. DBMS will reject any ac>on that inserts or updates a tuple in viola>on of this rule. A tuple cannot have a NULL value in a key a\ribute. 8

9 Other Keys in SQL If a rela>on has other keys, declare them using the UNIQUE keyword. Use UNIQUE in exactly the same places as PRIMARY KEY. There are two differences between PRIMARY KEY and UNIQUE: A table may have only one PRIMARY KEY but more than one set of a\ributes declared UNIQUE. A tuple may have NULL values in UNIQUE a\ributes. 9

10 Enforcing Key Constraints Upon which ac>ons should an RDBMS enforce a key constraint? Only tuple update and inser>on. RDMBS searches the tuples in the table to find if any tuple exists that agrees with the new tuple on all a\ributes in the primary key. To speed this process, an RDBMS automa>cally creates an efficient search index on the primary key. User can instruct the RDBMS to create an index on one or more a\ributes 10

11 Foreign Key Constraints Referen>al integrity constraint: in the rela>on Teach (that connects Courses and Professors), if Teach relates a course to a professor, then a tuple corresponding to the professor must exist in Professors. How do we express such constraints in Rela>onal Algebra? Consider the Teach(ProfessorPID, Number, DeptName) rela>on. every non- NULL value of ProfessorPID inteach must be a valid ProfessorPID in Professors. RA πprofessorpid(teach) π PID(Professors). 11

12 Foreign Key Constraints in SQL every non- NULL value of ProfessorPID inteach must be a valid ProfessorPID in Professors. In Teach, declare ProfessorPID to be a foreign key. CREATE TABLE Teach(ProfessorPID VARCHAR(8) REFERENCES Professor(PID), Name VARCHAR(30)...); CREATE TABLE Teach(ProfessorPID VARCHAR(8), Name VARCHAR(30)..., FOREIGN KEY ProfessorPID REFERENCES Professor(PID)); If the foreign key has mul>ple a\ributes, use the second type of declara>on. 12

13 Requirements for FOREIGN KEYs If a rela>on R declares that some of its a\ributes refer to foreign keys in another rela>on S, then these a\ributes must be declared UNIQUE or PRIMARY KEY in S. Values of the foreign key in R must appear in the referenced a\ributes of some tuple in S. 13

14 Enforcing Referen;al Integrity Three policies for maintaining referen>al integrity. Default policy: reject viola>ng modifica>ons. Cascade policy: mimic changes to the referenced a\ributes at the foreign key. Set- NULL policy: set appropriate a\ributes to NULL. 14

15 Specifying Referen;al Integrity Policies in SQL SQL allows the database designer to specify the policy for deletes and updates independently. Op>onally follow the declara>on of the foreign key with ON DELETE and/or ON UPDATE followed by the policy: SET NULL or CASCADE. Constraints can be circular, e.g., if there is a one- one mapping between two rela>ons. In this case, SQL allows us to defer the checking of constraints. 15

16 Constraining ATributes and Tuples SQL also allows us to specify constraints on a\ributes in a rela>on and on tuples in a rela>on. Disallow courses with a maximum enrollment greater than 100. A chairperson of a department must teach at most one course every semester. How do we express such constraints in SQL? How can we change our minds about constraints? A simple constraint: NOT NULL Declare an a\ribute to be NOT NULL ajer its type in a CREATE TABLE statement. Effect is to disallow tuples in which this a\ribute is NULL. 16

17 ATribute- Based CHECK Constraints Disallow courses with a maximum enrollment greater than 100. This constraint only affects the value of a single a\ribute in each tuple. CREATE TABLE Courses(... Enrollment INT CHECK (Enrollment <= 100)...); The condi>on can be any condi>on that can appear in a WHERE clause. CHECK statement may use a subquery to men>on other a\ributes of the same or other rela>ons. An a\ribute- based CHECK constraint is checked only when the value of that a\ribute changes. 17

18 Tuple- Based CHECK Constraints Tuple- based CHECK constraints are checked whenever a tuple is inserted into or updated in a rela;on. Designer may add these constraints ajer the list of a\ributes in a CREATE TABLE statement. A chairperson of a department teaches at most one course in any semester. CREATE TABLE Teach(... CHECK ProfessorPID NOT IN ((SELECT ProfessorPID FROM Teach) INTERSECT (SELECT ChairmanPID FROM Departments) ) ); 18

19 Modifying Constraints SQL allows constraints to be named. Use CONSTRAINT followed by the name of the constraint in front of PRIMARY KEY, UNIQUE, or CHECK. Can use constraint names in ALTER TABLE statements to delete constraints: say DROP CONSTRAINT followed by the name of the constraint. Can add constraints in an ALTER TABLE statement using ADD CONSTRAINT followed by an op>onal name followed by the (required) CHECK statement. 19

20 Asser;ons These are database- schema elements, like rela>ons Defined by: CREATE ASSERTION <name> CHECK (<condi>on>); Condi>on may refer to any rela>on or a\ribute in the database schema. 20

21 Asser;ons: Example Can t have more courses than students ( Pigeonhole Principle ) CREATE ASSERTION FewStudents CHECK ( (SELECT COUNT(*) FROM Students) <= (SELECT COUNT(*) FROM Courses) ); 21

22 Triggers: Mo;va;on Asser>ons are powerful, but the DBMS ojen can t tell when they need to be checked. A\ribute- and tuple- based checks are checked at known >mesà but not powerful. Triggers let the user decide when to check for any condi>on. 22

23 OK, what could have been done? 23

24 Announcements Check updated webpage More details on Grading scheme Homework dates Project deliverables Lecture Schedule 24

25 STORING DATA 25

26 DBMS Layers: Queries Query Optimization and Execution Relational Operators Files and Access Methods TODAY! Buffer Management Disk Space Management DB 26

27 Leverage OS for disk/file management? Layers of abstrac>on are good but: 27

28 Leverage OS for disk/file management? Layers of abstrac>on are good but: Unfortunately, OS ojen gets in the way of DBMS 28

29 Leverage OS for disk/file management? DBMS wants/needs to do things its own way Specialized prefetching Control over buffer replacement policy LRU not always best (some>mes worst!!) Control over thread/process scheduling Convoy problem Arises when OS scheduling conflicts with DBMS locking Control over flushing data to disk WAL protocol requires flushing log entries to disk 29

30 Disks and Files DBMS stores informa>on on disks. but: disks are (rela>vely) VERY slow! Major implica>ons for DBMS design! 30

31 Disks and Files Major implica>ons for DBMS design: READ: disk - > main memory (RAM). WRITE: reverse Both are high- cost opera>ons, rela>ve to in- memory opera>ons, so must be planned carefully! 31

32 Why Not Store It All in Main Memory? 32

33 Why Not Store It All in Main Memory? Costs too much. disk: ~$1/Gb; memory: ~$100/Gb High- end Databases today in the TB range. Approx 60% of the cost of a produc>on system is in the disks. Main memory is volable. Note: some specialized systems do store en>re database in main memory. 33

34 The Storage Hierarchy Smaller, Faster Bigger, Slower 34

35 Main memory (RAM) for currently used data. Disk for the main database (secondary storage). Tapes for archiving older versions of the data (tertiary storage). The Storage Hierarchy Registers L1 Cache. Main Memory Magnetic Disk Magnetic Tape Smaller, Faster Bigger, Slower 35

36 Jim Gray s Storage Latency Analogy: How Far Away is the Data? 10 9 Tape Andromeda The image cannot be displayed. Your computer The image may not have enough memory to cannot open the be image, or the image may have been displayed. corrupted. Your Restart your computer, and computer then open may the file again. If the red x still not appears, have you may have to delete the image and then insert it The again. image cannot be displayed. Your computer may not have enough 2,000 Years 10 6 Disk Pluto 2 Years 100 Memory Boston 1.5 hr On Board Cache On Chip Cache Registers This Building This Room My Head 10 min 1 min 36

37 Disks Secondary storage device of choice. Main advantage over tapes: random access vs. sequenbal. Data is stored and retrieved in units called disk blocks or pages. Unlike RAM, >me to retrieve a disk page varies depending upon loca>on on disk. rela>ve placement of pages on disk is important! 37

38 Anatomy of a Disk Disk head Spindle Tracks Sector Track Cylinder Platter Block size = multiple of sector size (which is fixed) Arm assembly Arm movement Platters Sector #38

39 Accessing a Disk Page Time to access (read/write) a disk block:... 39

40 Accessing a Disk Page Time to access (read/write) a disk block: seek Bme: moving arms to posi>on disk head on track rotabonal delay: wai>ng for block to rotate under head transfer Bme: actually moving data to/from disk surface 40

41 Accessing a Disk Page Rela>ve >mes? seek Bme: rotabonal delay: transfer Bme: 41

42 Accessing a Disk Page Rela>ve >mes? seek Bme: about 1 to 20msec rotabonal delay: 0 to 10msec transfer Bme: < 1msec per 4KB page Seek Rotate transfer Transfer 42

43 Seek ;me & rota;onal delay dominate Key to lower I/O cost: reduce seek/rota>on delays! Also note: For shared disks, much >me spent wai>ng in queue for access to arm/controller Seek Rotate transfer 43

44 Arranging Pages on Disk Next block concept: blocks on same track, followed by blocks on same cylinder, followed by blocks on adjacent cylinder Accesing next block is cheap A useful op>miza>on: pre- fetching See textbook page

45 Rules of thumb 1. Memory access much faster than disk I/O (~ 1000x) Sequen>al I/O faster than random I/O (~ 10x) 45

46 Disk Arrays: RAID Just FYI Logical Physical Benefits: Higher throughput (via data striping ) Longer MTTF (via redundancy) 46

47 Recall: DBMS Layers Queries Query Optimization and Execution Relational Operators Files and Access Methods TODAY! Buffer Management Disk Space Management DB 47

48 Buffer Management in a DBMS (copy of a) disk page free frame Page Requests from Higher Levels Just FYI buffer pool MAIN MEMORY DISK DB choice of frame dictated by replacement policy 48

49 Files FILE: A collec>on of pages, each containing a collec>on of records. Must support: insert/delete/modify record read a par>cular record (specified using record id) scan all records (possibly with some condi>ons on the records to be retrieved) 49

50 Alterna;ve File Organiza;ons Several alterna>ves (w/ trade- offs): Heap files: Suitable when typical access is a file scan retrieving all records. Sorted Files: Index File Organiza>ons: 50

51 Variable length records SLOTTED PAGE organiza>on - popular. occupied records... page header 51

52 Conclusions- - - Storing Memory hierarchy Disks: (>1000x slower) - thus pack info in blocks try to fetch nearby blocks (sequen>ally) Record organiza>on: Slo\ed page 52

53 TREE INDEXES 53

54 Declaring Indexes No standard! Typical syntax: CREATE INDEX StudentsInd ON Students(ID); CREATE INDEX CoursesInd ON Courses(Number, DeptName); 54

55 Types of Indexes Primary: index on a key Used to enforce constraints Secondary: index on non- key a\ribute Clustering: order of the rows in the data pages correspond to the order of the rows in the index Only one clustered index can exist in a given table Useful for range predicates Non- clustering: physical order not the same as index order 55

56 Using Indexes (1): Equality Searches Given a value v, the index takes us to only those tuples that have v in the a\ribute(s) of the index. E.g. (use CourseInd index) SELECT Enrollment FROM Courses WHERE Number = 4604 and DeptName = CS 56

57 Using Indexes (1): Equality Searches Given a value v, the index takes us to only those tuples that have v in the a\ribute(s) of the index. Can use Hashes, but see next 57

58 Using Indexes (2): Range Searches ``Find all students with gpa > 3.0 may be slow, even on sorted file Hashes not a good idea! What to do? Page 1 Page 2 Page 3 Page N Data File 58

59 Range Searches ``Find all students with gpa > 3.0 may be slow, even on sorted file Solu>on: Create an `index file. k1 k2 kn Index File Page 1 Page 2 Page 3 Page N Data File 59

60 Range Searches More details: if index file is small, do binary search there Otherwise?? k1 k2 kn Index File Page 1 Page 2 Page 3 Page N Data File 60

61 B- trees the most successful family of index schemes (B- trees, B+- trees, B*- trees) Can be used for primary/secondary, clustering/non- clustering index. balanced n- way search trees Original Paper: Rudolf Bayer and McCreight, E. M. Organiza>on and Maintenance of Large Ordered Indexes. Acta Informa>ca 1, ,

62 B- trees Eg., B- tree of order d=1: <6 6 >6 <9 9 >

63 B - tree proper;es: each node, in a B- tree of order d: Key order at most n=2d keys at least d keys (except root, which may have just 1 key) all leaves at the same level if number of pointers is k, then node has exactly p1 pn k- 1 keys (leaves are empty) v1 v2 v n- 1 63

64 Proper;es block aware nodes: each node is a disk page O(log (N)) for everything! (ins/del/search) typically, if d = , then 2-3 levels u>liza>on >= 50%, guaranteed; on average 69% 64

65 Queries Algo for exact match query? (eg., ssn=8?) <6 6 >6 <9 9 >

66 JAVA anima;on h\p://slady.net/java/bt/ 66

67 Queries Algo for exact match query? (eg., ssn=8?) <6 6 >6 <9 9 >

68 Queries Algo for exact match query? (eg., ssn=8?) <6 6 >6 <9 9 >

69 Queries Algo for exact match query? (eg., ssn=8?) <6 6 >6 <9 9 >

70 Queries Algo for exact match query? (eg., ssn=8?) <6 6 >6 <9 9 >9 H steps (= disk accesses)

71 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) 71

72 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 6 9 >6 <9 >

73 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 6 9 >6 <9 >

74 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 6 9 >6 <9 >

75 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 6 9 >6 <9 >

76 Varia;ons How could we do even be\er than the B- trees above? 76

77 B+ trees - Mo;va;on B- tree print keys in sorted order: <6 6 >6 <9 9 >

78 B+ trees - Mo;va;on B- tree needs back- tracking how to avoid it? <6 6 >6 <9 9 >

79 B+ trees - Mo;va;on Stronger reason: for clustering index, data records are sca\ered: <6 6 >6 <9 9 >

80 Solu;on: B+ - trees facilitate sequen>al ops They string all leaf nodes together AND replicate keys from non- leaf nodes, to make sure every key appears at the leaf level (vital, for clustering index!) 80

81 B+ trees <6 6 >=6 <9 9 >=

82 B+ trees <6 6 9 Index Pages >=6 <9 >= Data Pages 82

83 B+ trees More details: next (and textbook) In short: on split at leaf level: COPY middle key upstairs at non- leaf level: push middle key upstairs (as in plain B- tree) 83

84 Example B+ Tree Search begins at root, and key comparisons direct it to a leaf Search for 5*, 15*, all data entries >= 24*... Root * 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Based on the search for 15*, we know it is not in the tree! 84

85 Inser;ng a Data Entry into a B+ Tree Find correct leaf L. Put data entry onto L. If L has enough space, done! Else, must split L (into L and a new node L2) Redistribute entries evenly, copy up middle key. parent node may overflow but then: push up middle key. Splits grow tree; root split increases height. 85

86 Example B+ Tree Inser;ng 30* Root * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* 86

87 Example B+ Tree Inser;ng 30* Root * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* 30* 87

88 Example B+ Tree - Inser;ng 8* Root * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* 88

89 Example B+ Tree - Inser;ng 8* Root * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* No Space 89

90 Example B+ Tree - Inser;ng 8* Root So Split! * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* * 3* 5* 14* 16* 19* 20* 22* 23* 24* 27* 5* 7* 8* 29* 90

91 Example B+ Tree - Inser;ng 8* Root So Split! * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* And then push middle UP * 3* 5* 14* 16* 19* 20* 22* 23* 24* 27* 5* 7* 8* 29* 91

92 Example B+ Tree - Inser;ng 8* Root * 3* 5* 7* 14* 16* 19* 20* 22* 23* 24* 27* 29* Final State <5 >= * 3* 14* 16* 19* 20* 22* 23* 24* 27* 5* 7* 8* 29* 92

93 Example B+ Tree - Inser;ng 21* Root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* 93

94 Example B+ Tree - Inser;ng 21* Root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* Root is Full, so split recursively 2* 3* 5* 7* 8* 14* 16* 19* 20* 21* 22* 23* 24* 27* 29* 94

95 Example B+ Tree: Recursive split Root * 3* 5* 7* 8* 14* 16* 19* 20* 21* 22* 23* 24* 27* 29* Notice that root was also split, increasing height. 95

96 Example: Data vs. Index Page Split leaf: copy non- leaf: push Data Page Split 2* 3* 5* 7* 8* 5 2* 3* 5* 7* 8* why not non- leaves? Index Page Split #96

97 Same Inser;ng 21*: The Deferred Root Split * 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* Note this has free space. So 97

98 Inser;ng 21*: The Deferred Split Root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* Root LEND keys to sibling, through PARENT! 2* 3* 5* 7* 8* 14* 16* 19* 20* 21* 22* 23* 24* 27* 29* 98

99 Inser;ng 21*: The Deferred Split Root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 24* 27* 29* Root Shorter, more packed, faster tree * 3* 5* 7* 8* 14* 16* 19* 20* 21* 22* 23* 24* 27* 29* 99

100 Inser;on examples for you to try Root (not shown) 2* 3* 5* 7* 8* 11* 14* 16* 21* 22* 23* Insert the following data entries (in order): 28*, 6*, 25* 100

101 Answer After inserting 28*, 6* * 3* 5* 6* 7* 8* 11* 14* 16* 21* 22* 23* 28* After inserting 25* 101

102 Answer After inserting 25* * 3* 5* 6* 7* 8* 11* 14* 16* 21* 22* 23* 25* 28* 102

103 Dele;ng a Data Entry from a B+ Tree Start at root, find leaf L where entry belongs. Remove the entry. If L is at least half- full, done! If L underflows Try to re- distribute, borrowing from sibling (adjacent node with same parent as L). If re- distribu>on fails, merge L and sibling. update parent and possibly merge, recursively 103

104 Dele;on from B+Tree Root * 3* 5* 7* 8* 14* 16* 19* 20* 21* 22* 23* 24* 27* 29* 104

105 Example: Delete 19* & 20* Root Dele>ng 19* is easy: * 3* 5* 7* 8* 14* 16* 19* 20* 20* 22* 22* 24* 27* 29* 33* 34* 38* 39* 3 Root * 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* Dele>ng 20* - > re- distribu>on (no>ce: 27 copied up) 105

106 ... And Then Dele;ng 24* 3 Root * 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* 4 Root * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* Must merge leaves: OPPOSITE of insert 106

107 ... And Then Dele;ng 24* 3 Root * 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* 4 Root 17 but are we done?? * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* Must merge leaves: OPPOSITE of insert 107

108 ... Merge Non- Leaf Nodes, Shrink Tree 4 Root * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* 5 Root * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* 108

109 Example of Non- leaf Re- distribu;on Tree is shown below during dele>on of 24*. Now, we can re- distribute keys Root * 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* 109

110 Aqer Re- distribu;on need only re- distribute 20 ; did 17, too why would we want to re- distribute more keys? Root * 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* 110

111 Main observa;ons for dele;on If a key value appears twice (leaf + nonleaf), the above algorithms delete it from the leaf, only why not non- leaf, too? 111

112 Main observa;ons for dele;on If a key value appears twice (leaf + nonleaf), the above algorithms delete it from the leaf, only why not non- leaf, too? lazy dele>ons - in fact, some vendors just mark entries as deleted (~ underflow), and reorganize/compact later 112

113 Recap: main ideas on overflow, split (and push, or copy ) or consider deferred split on underflow, borrow keys; or merge or let it underflow

114 B+ Trees in Prac;ce Typical order: 100. Typical fill- factor: 67%. average fanout = 2*100*0.67 = 134 Typical capaci>es: Height 4: 1334 = 312,900,721 entries Height 3: 1333 = 2,406,104 entries 114

115 B+ Trees in Prac;ce Can ojen keep top levels in buffer pool: Level 1 = 1 page = 8 KB Level 2 = 134 pages = 1 MB Level 3 = 17,956 pages = 140 MB 115

116 B+ trees with duplicates Everything so far: assumed unique key values How to extend B+- trees for duplicates? Alt. 2: <key, rid> Alt. 3: <key, {rid list}> 2 approaches, roughly equivalent 116

117 B+ trees with duplicates approach#1: repeat the key values, and extend B+ tree algo s appropriately - eg. many 14 s * 3* 5* 7* 13* 14* 14* 14* 14* 14* 22* 23* 24* 27* 29* 117

118 B+ trees with duplicates approach#1: subtle problem with dele>on: treat rid as part of the key, thus making it unique * 3* 5* 7* 13* 14* 14* 14* 14* 14* 22* 23* 24* 27* 29* 118

119 B+ trees with duplicates approach#2: store each key value: once but store the {rid list} as variable- length field (and use overflow pages, if needed) * 3* 5* 7* 13* 14* {rid list} 22* 23* 24* 27* 29* {rid list, cont d} 119

120 B+trees in Prac;ce prefix compression; bulk- loading; order 120

121 Prefix Key Compression Important to increase fan- out. (Why?) Key values in index entries only `direct traffic ; can ojen compress them. Papadopoulos Pernikovskaya 121

122 Prefix Key Compression Important to increase fan- out. (Why?) Key values in index entries only `direct traffic ; can ojen compress them. Pap Per <room for more separators/keys> 122

123 Bulk Loading of a B+ Tree In an empty tree, insert many keys Why not one- at- a- >me? Too slow! 123

124 Bulk Loading of a B+ Tree Ini>aliza>on: Sort all data entries scan list; whenever enough for a page, pack <repeat for upper level> Root Sorted pages of data entries; not yet in B+ tree 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 124

125 Bulk Loading of a B+ Tree Root Data entry pages not yet in B+ tree 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* Root Data entry pages not yet in B+ tree Prakash * 4* 6* 9* VT 10* CS 11* * 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 125

126 A Note on `Order Order (d) concept replaced by physical space criterion in prac>ce (`at least half- full ). Why do we need it? Index pages can typically hold many more entries than leaf pages. Variable sized records and search keys mean different nodes will contain different numbers of entries. Even with fixed length fields, mul>ple records with the same search key value (duplicates) can lead to variable- sized data entries (if we use Alterna>ve (3)). 126

127 A Note on `Order Many real systems are even sloppier than this: they allow underflow, and only reclaim space when a page is completely empty. (what are the benefits of such slopiness?) 127

128 Conclusions B+tree is the prevailing indexing method Excellent, O(logN) worst- case performance for ins/del/search; (~3-4 disk accesses in prac>ce) guaranteed 50% space u>liza>on; avg 69% 128

129 Conclusions Can be used for any type of index: primary/ secondary, sparse (clustering), or dense (non- clustering) Several fine- extensions on the basic algorithm deferred split; prefix compression; (underflows) bulk- loading duplicate handling 129

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #4: Constraints, Storing and Indexes

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #4: Constraints, Storing and Indexes CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #4: Constraints, Storing and Indexes Constraints in Rela;onal Algebra and SQL Maintaining Integrity of Data Data is dirty. How does an applica>on

More information

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #4: SQL---Part 2

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #4: SQL---Part 2 CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #4: SQL---Part 2 Overview - detailed - SQL DML other parts: views modifications joins DDL constraints Prakash 2016 VT CS 4604

More information

Database design and implementation CMPSCI 645. Lecture 08: Storage and Indexing

Database design and implementation CMPSCI 645. Lecture 08: Storage and Indexing Database design and implementation CMPSCI 645 Lecture 08: Storage and Indexing 1 Where is the data and how to get to it? DB 2 DBMS architecture Query Parser Query Rewriter Query Op=mizer Query Executor

More information

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing 15-82 Advanced Topics in Database Systems Performance Problem Given a large collection of records, Indexing with B-trees find similar/interesting things, i.e., allow fast, approximate queries 2 Indexing

More information

Storing Data: Disks and Files

Storing Data: Disks and Files Storing Data: Disks and Files CS 186 Fall 2002, Lecture 15 (R&G Chapter 7) Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Stuff Rest of this week My office

More information

Disks and Files. Jim Gray s Storage Latency Analogy: How Far Away is the Data? Components of a Disk. Disks

Disks and Files. Jim Gray s Storage Latency Analogy: How Far Away is the Data? Components of a Disk. Disks Review Storing : Disks and Files Lecture 3 (R&G Chapter 9) Aren t bases Great? Relational model SQL Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet A few

More information

CS 405G: Introduction to Database Systems. Storage

CS 405G: Introduction to Database Systems. Storage CS 405G: Introduction to Database Systems Storage It s all about disks! Outline That s why we always draw databases as And why the single most important metric in database processing is the number of disk

More information

Database Systems. November 2, 2011 Lecture #7. topobo (mit)

Database Systems. November 2, 2011 Lecture #7. topobo (mit) Database Systems November 2, 2011 Lecture #7 1 topobo (mit) 1 Announcement Assignment #2 due today Assignment #3 out today & due on 11/16. Midterm exam in class next week. Cover Chapters 1, 2,

More information

Storing Data: Disks and Files

Storing Data: Disks and Files Storing Data: Disks and Files Module 2, Lecture 1 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Database Management Systems, R. Ramakrishnan 1 Disks and

More information

Storage hierarchy. Textbook: chapters 11, 12, and 13

Storage hierarchy. Textbook: chapters 11, 12, and 13 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular

More information

Storing Data: Disks and Files

Storing Data: Disks and Files Storing Data: Disks and Files Chapter 9 CSE 4411: Database Management Systems 1 Disks and Files DBMS stores information on ( 'hard ') disks. This has major implications for DBMS design! READ: transfer

More information

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files) Principles of Data Management Lecture #2 (Storing Data: Disks and Files) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Topics v Today

More information

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25 Indexing Jan Chomicki University at Buffalo Jan Chomicki () Indexing 1 / 25 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow (nanosec) (10 nanosec) (millisec) (sec) Very small Small

More information

Disks, Memories & Buffer Management

Disks, Memories & Buffer Management Disks, Memories & Buffer Management The two offices of memory are collection and distribution. - Samuel Johnson CS3223 - Storage 1 What does a DBMS Store? Relations Actual data Indexes Data structures

More information

L9: Storage Manager Physical Data Organization

L9: Storage Manager Physical Data Organization L9: Storage Manager Physical Data Organization Disks and files Record and file organization Indexing Tree-based index: B+-tree Hash-based index c.f. Fig 1.3 in [RG] and Fig 2.3 in [EN] Functional Components

More information

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to:

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to: Storing : Disks and Files base Management System, R. Ramakrishnan and J. Gehrke 1 Storing and Retrieving base Management Systems need to: Store large volumes of data Store data reliably (so that data is

More information

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes? Storing and Retrieving Storing : Disks and Files base Management Systems need to: Store large volumes of data Store data reliably (so that data is not lost!) Retrieve data efficiently Alternatives for

More information

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes? Storing and Retrieving Storing : Disks and Files Chapter 9 base Management Systems need to: Store large volumes of data Store data reliably (so that data is not lost!) Retrieve data efficiently Alternatives

More information

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Chapter 7

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Chapter 7 Storing : Disks and Files Chapter 7 base Management Systems, R. Ramakrishnan and J. Gehrke 1 Storing and Retrieving base Management Systems need to: Store large volumes of data Store data reliably (so

More information

STORING DATA: DISK AND FILES

STORING DATA: DISK AND FILES STORING DATA: DISK AND FILES CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? How does a DBMS store data? disk, SSD, main memory The Buffer manager controls how

More information

Disks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?

Disks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory? Why Not Store Everything in Main Memory? Storage Structures Introduction Chapter 8 (3 rd edition) Sharma Chakravarthy UT Arlington sharma@cse.uta.edu base Management Systems: Sharma Chakravarthy Costs

More information

Disks & Files. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Disks & Files. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke Disks & Files Yanlei Diao UMass Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke DBMS Architecture Query Parser Query Rewriter Query Optimizer Query Executor Lock Manager for Concurrency Access

More information

Storing Data: Disks and Files

Storing Data: Disks and Files Storing Data: Disks and Files Chapter 7 (2 nd edition) Chapter 9 (3 rd edition) Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Database Management Systems,

More information

Introduction to Data Management. Lecture #13 (Indexing)

Introduction to Data Management. Lecture #13 (Indexing) Introduction to Data Management Lecture #13 (Indexing) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework info: HW #5 (SQL):

More information

Storing Data: Disks and Files

Storing Data: Disks and Files Storing Data: Disks and Files Chapter 9 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Disks

More information

Data Organization B trees

Data Organization B trees Data Organization B trees Data organization and retrieval File organization can improve data retrieval time SELECT * FROM depositors WHERE bname= Downtown 100 blocks 200 recs/block Query returns 150 records

More information

B-Tree. CS127 TAs. ** the best data structure ever

B-Tree. CS127 TAs. ** the best data structure ever B-Tree CS127 TAs ** the best data structure ever Storage Types Cache Fastest/most costly; volatile; Main Memory Fast access; too small for entire db; volatile Disk Long-term storage of data; random access;

More information

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6) CompSci 16 Intensive Computing Systems Lecture 7 Storage and Index Instructor: Sudeepa Roy Announcements HW1 deadline this week: Due on 09/21 (Thurs), 11: pm, no late days Project proposal deadline: Preliminary

More information

CS122A: Introduction to Data Management. Lecture #14: Indexing. Instructor: Chen Li

CS122A: Introduction to Data Management. Lecture #14: Indexing. Instructor: Chen Li CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li 1 Indexing in MySQL (w/innodb) CREATE [UNIQUE FULLTEXT SPATIAL] INDEX index_name [index_type] ON tbl_name (index_col_name,...)

More information

Introduction to Data Management. Lecture 14 (Storage and Indexing)

Introduction to Data Management. Lecture 14 (Storage and Indexing) Introduction to Data Management Lecture 14 (Storage and Indexing) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW s and quizzes:

More information

Storing Data: Disks and Files. Administrivia (part 2 of 2) Review. Disks, Memory, and Files. Disks and Files. Lecture 3 (R&G Chapter 7)

Storing Data: Disks and Files. Administrivia (part 2 of 2) Review. Disks, Memory, and Files. Disks and Files. Lecture 3 (R&G Chapter 7) Storing : Disks and Files Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Lecture 3 (R&G Chapter 7) Administrivia Greetings Office Hours Prof. Franklin

More information

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan THE B+ TREE INDEX CS 564- Spring 2018 ACKs: Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? The B+ tree index Basics Search/Insertion/Deletion Design & Cost 2 INDEX RECAP We have the following query:

More information

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM Module III Overview of Storage Structures, QP, and TM Sharma Chakravarthy UT Arlington sharma@cse.uta.edu http://www2.uta.edu/sharma base Management Systems: Sharma Chakravarthy Module I Requirements analysis

More information

Review 1-- Storing Data: Disks and Files

Review 1-- Storing Data: Disks and Files Review 1-- Storing Data: Disks and Files Chapter 9 [Sections 9.1-9.7: Ramakrishnan & Gehrke (Text)] AND (Chapter 11 [Sections 11.1, 11.3, 11.6, 11.7: Garcia-Molina et al. (R2)] OR Chapter 2 [Sections 2.1,

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes CS 186, Fall 2002, Lecture 17 R & G Chapter 9 If I had eight hours to chop down a tree, I'd spend six sharpening my ax. Abraham Lincoln Introduction Recall: 3 alternatives for data

More information

System Structure Revisited

System Structure Revisited System Structure Revisited Naïve users Casual users Application programmers Database administrator Forms DBMS Application Front ends DML Interface CLI DDL SQL Commands Query Evaluation Engine Transaction

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

Administrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction

Administrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction Administrivia Tree-Structured Indexes Lecture R & G Chapter 9 Homeworks re-arranged Midterm Exam Graded Scores on-line Key available on-line If I had eight hours to chop down a tree, I'd spend six sharpening

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Topics to Learn. Important concepts. Tree-based index. Hash-based index CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index vs. non-clustering index) Tree-based vs. hash-based index Tree-based

More information

CSE 444: Database Internals. Lectures 5-6 Indexing

CSE 444: Database Internals. Lectures 5-6 Indexing CSE 444: Database Internals Lectures 5-6 Indexing 1 Announcements HW1 due tonight by 11pm Turn in an electronic copy (word/pdf) by 11pm, or Turn in a hard copy in my office by 4pm Lab1 is due Friday, 11pm

More information

Storing Data: Disks and Files

Storing Data: Disks and Files Storing Data: Disks and Files Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Data Access Disks and Files DBMS stores information on ( hard ) disks. This

More information

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Indexing Chapter 8, 10, 11 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Tree-Based Indexing The data entries are arranged in sorted order by search key value. A hierarchical search

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals: Part II Lecture 10, February 17, 2014 Mohammad Hammoud Last Session: DBMS Internals- Part I Today Today s Session: DBMS Internals- Part II Brief summaries

More information

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10 CS143: Index Book Chapters: (4 th ) 12.1-3, 12.5-8 (5 th ) 12.1-3, 12.6-8, 12.10 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index

More information

Tree-Structured Indexes. Chapter 10

Tree-Structured Indexes. Chapter 10 Tree-Structured Indexes Chapter 10 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k 25, [n1,v1,k1,25] 25,

More information

ECS 165B: Database System Implementa6on Lecture 3

ECS 165B: Database System Implementa6on Lecture 3 ECS 165B: Database System Implementa6on Lecture 3 UC Davis April 4, 2011 Acknowledgements: some slides based on earlier ones by Raghu Ramakrishnan, Johannes Gehrke, Jennifer Widom, Bertram Ludaescher,

More information

Storing Data: Disks and Files

Storing Data: Disks and Files Storing Data: Disks and Files Lecture 3 (R&G Chapter 7) Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Administrivia Greetings Office Hours Prof. Franklin

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes Yanlei Diao UMass Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke Access Methods v File of records: Abstraction of disk storage for query processing (1) Sequential scan;

More information

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text. Lifecycle of an SQL Query CSE 190D base System Implementation Arun Kumar Query Query Result

More information

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1 Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1 Consider the following table: Motivation CREATE TABLE Tweets ( uniquemsgid INTEGER,

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes Chapter 9 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

Principles of Data Management. Lecture #5 (Tree-Based Index Structures) Principles of Data Management Lecture #5 (Tree-Based Index Structures) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Headlines v Project

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes Chapter 10 Comp 521 Files and Databases Fall 2010 1 Introduction As for any index, 3 alternatives for data entries k*: index refers to actual data record with key value k index

More information

Material You Need to Know

Material You Need to Know Review Quiz 2 Material You Need to Know Normalization Storage and Disk File Layout Indexing B-trees and B+ Trees Extensible Hashing Linear Hashing Decomposition Goals: Lossless Joins, Dependency preservation

More information

Lecture 8 Index (B+-Tree and Hash)

Lecture 8 Index (B+-Tree and Hash) CompSci 516 Data Intensive Computing Systems Lecture 8 Index (B+-Tree and Hash) Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 HW1 due tomorrow: Announcements Due on 09/21 (Thurs),

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

CSE 190D Database System Implementation

CSE 190D Database System Implementation CSE 190D Database System Implementation Arun Kumar Topic 1: Data Storage, Buffer Management, and File Organization Chapters 8 and 9 (except 8.5.4 and 9.2) of Cow Book Slide ACKs: Jignesh Patel, Paris Koutris

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Tree-Structured Indexes. A Note of Caution. Range Searches ISAM. Example ISAM Tree. Introduction

Tree-Structured Indexes. A Note of Caution. Range Searches ISAM. Example ISAM Tree. Introduction Tree-Structured Indexes Lecture R & G Chapter 9 If I had eight hours to chop down a tree, I'd spend six sharpening my ax. Abraham Lincoln Introduction Recall: 3 alternatives for data entries k*: Data record

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #5: Hashing and Sor5ng

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #5: Hashing and Sor5ng CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #5: Hashing and Sor5ng (Sta7c) Hashing Problem: find EMP record with ssn=123 What if disk space was free, and 5me was at premium? Prakash

More information

Lecture 13. Lecture 13: B+ Tree

Lecture 13. Lecture 13: B+ Tree Lecture 13 Lecture 13: B+ Tree Lecture 13 Announcements 1. Project Part 2 extension till Friday 2. Project Part 3: B+ Tree coming out Friday 3. Poll for Nov 22nd 4. Exam Pickup: If you have questions,

More information

Project is due on March 11, 2003 Final Examination March 18, pm to 10.30pm

Project is due on March 11, 2003 Final Examination March 18, pm to 10.30pm Announcements Please remember to send a mail to Deepa to register for a timeslot for your project demo by March 6, 2003 See Project Guidelines on class web page for more details Project is due on March

More information

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks. Physical Disk Structure Physical Data Organization and Indexing Chapter 11 1 4 Access Path Refers to the algorithm + data structure (e.g., an index) used for retrieving and storing data in a table The

More information

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*: Introduction Tree-Structured Indexes Chapter 10 As for any index, 3 alternatives for data entries k*: Data record with key value k

More information

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page Why Is This Important? Overview of Storage and Indexing Chapter 8 DB performance depends on time it takes to get the data from storage system and time to process Choosing the right index for faster access

More information

Information Systems (Informationssysteme)

Information Systems (Informationssysteme) Information Systems (Informationssysteme) Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2018 c Jens Teubner Information Systems Summer 2018 1 Part IX B-Trees c Jens Teubner Information

More information

Data Storage and Query Answering. Data Storage and Disk Structure (2)

Data Storage and Query Answering. Data Storage and Disk Structure (2) Data Storage and Query Answering Data Storage and Disk Structure (2) Review: The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM @200MHz) 6,400

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals: Part II Lecture 11, February 17, 2015 Mohammad Hammoud Last Session: DBMS Internals- Part I Today Today s Session: DBMS Internals- Part II A Brief Summary

More information

Instructor: Amol Deshpande

Instructor: Amol Deshpande Instructor: Amol Deshpande amol@cs.umd.edu } Storage and Query Processing Storage and memory hierarchy Query plans and how to interpret EXPLAIN output } Other things Midterm grades: later today Look out

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes Chapter 9 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k*: ➀ Data record with key value k ➁

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #2: The Rela0onal Model, and SQL/Rela0onal Algebra

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #2: The Rela0onal Model, and SQL/Rela0onal Algebra CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #2: The Rela0onal Model, and SQL/Rela0onal Algebra Data Model A Data Model is a nota0on for describing data or informa0on. Structure of

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+

More information

Tree-Structured Indexes (Brass Tacks)

Tree-Structured Indexes (Brass Tacks) Tree-Structured Indexes (Brass Tacks) Chapter 10 Ramakrishnan and Gehrke (Sections 10.3-10.8) CPSC 404, Laks V.S. Lakshmanan 1 What will I learn from this set of lectures? How do B+trees work (for search)?

More information

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE 15-415 DATABASE APPLICATIONS C. Faloutsos Indexing and Hashing 15-415 Database Applications http://www.cs.cmu.edu/~christos/courses/dbms.s00/ general

More information

CPSC 421 Database Management Systems. Lecture 11: Storage and File Organization

CPSC 421 Database Management Systems. Lecture 11: Storage and File Organization CPSC 421 Database Management Systems Lecture 11: Storage and File Organization * Some material adapted from R. Ramakrishnan, L. Delcambre, and B. Ludaescher Today s Agenda Start on Database Internals:

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Outlines. Chapter 2 Storage Structure. Structure of a DBMS (with some simplification) Structure of a DBMS (with some simplification)

Outlines. Chapter 2 Storage Structure. Structure of a DBMS (with some simplification) Structure of a DBMS (with some simplification) Outlines Chapter 2 Storage Structure Instructor: Churee Techawut 1) Structure of a DBMS 2) The memory hierarchy 3) Magnetic tapes 4) Magnetic disks 5) RAID 6) Disk space management 7) Buffer management

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Unit 3 Disk Scheduling, Records, Files, Metadata

Unit 3 Disk Scheduling, Records, Files, Metadata Unit 3 Disk Scheduling, Records, Files, Metadata Based on Ramakrishnan & Gehrke (text) : Sections 9.3-9.3.2 & 9.5-9.7.2 (pages 316-318 and 324-333); Sections 8.2-8.2.2 (pages 274-278); Section 12.1 (pages

More information

Tree-Structured Indexes

Tree-Structured Indexes Introduction Tree-Structured Indexes Chapter 10 As for any index, 3 alternatives for data entries k*: Data record with key value k

More information

Physical Data Organization. Introduction to Databases CompSci 316 Fall 2018

Physical Data Organization. Introduction to Databases CompSci 316 Fall 2018 Physical Data Organization Introduction to Databases CompSci 316 Fall 2018 2 Announcements (Tue., Nov. 6) Homework #3 due today Project milestone #2 due Thursday No separate progress update this week Use

More information

10.1 Physical Design: Introduction. 10 Physical schema design. Physical Design: I/O cost. Physical Design: I/O cost.

10.1 Physical Design: Introduction. 10 Physical schema design. Physical Design: I/O cost. Physical Design: I/O cost. 10 Physical schema design 10.1 Introduction Motivation Disk technology RAID 10.2 Index structures in DBS Indexing concept Primary and Secondary indexes 10.3. ISAM and B + -Trees 10.4. SQL and indexes Criteria

More information

Lecture 12. Lecture 12: The IO Model & External Sorting

Lecture 12. Lecture 12: The IO Model & External Sorting Lecture 12 Lecture 12: The IO Model & External Sorting Lecture 12 Today s Lecture 1. The Buffer 2. External Merge Sort 2 Lecture 12 > Section 1 1. The Buffer 3 Lecture 12 > Section 1 Transition to Mechanisms

More information

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query Goals for Today CS 133: Databases Fall 2018 Lec 02 09/06 Relational Model & Memory and Buffer Manager Prof. Beth Trushkowsky Reason about the conceptual evaluation of an SQL query Understand the storage

More information

RAID in Practice, Overview of Indexing

RAID in Practice, Overview of Indexing RAID in Practice, Overview of Indexing CS634 Lecture 4, Feb 04 2014 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke 1 Disks and Files: RAID in practice For a big enterprise

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Why Sort? Data requested in sorted order. Sor,ng is first step in bulk loading B+ tree index. e.g., find students in increasing GPA order

Why Sort? Data requested in sorted order. Sor,ng is first step in bulk loading B+ tree index. e.g., find students in increasing GPA order External Sor,ng Outline Exam will be graded a5er everyone takes it There are two,mes to be fair on an exam, when it s wriaen and when it s graded you only need to trust that I ll be fair at one of them.

More information

EXTERNAL SORTING. Sorting

EXTERNAL SORTING. Sorting EXTERNAL SORTING 1 Sorting A classic problem in computer science! Data requested in sorted order (sorted output) e.g., find students in increasing grade point average (gpa) order SELECT A, B, C FROM R

More information

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based

More information

Indexes. File Organizations and Indexing. First Question to Ask About Indexes. Index Breakdown. Alternatives for Data Entries (Contd.

Indexes. File Organizations and Indexing. First Question to Ask About Indexes. Index Breakdown. Alternatives for Data Entries (Contd. File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears, Roebuck, and Co., Consumer's Guide, 1897 Indexes

More information

CSC 261/461 Database Systems Lecture 17. Fall 2017

CSC 261/461 Database Systems Lecture 17. Fall 2017 CSC 261/461 Database Systems Lecture 17 Fall 2017 Announcement Quiz 6 Due: Tonight at 11:59 pm Project 1 Milepost 3 Due: Nov 10 Project 2 Part 2 (Optional) Due: Nov 15 The IO Model & External Sorting Today

More information

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1 CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1 Reminder: Rela0onal Algebra Rela2onal algebra is a nota2on for specifying queries

More information

Lecture 4. ISAM and B + -trees. Database Systems. Tree-Structured Indexing. Binary Search ISAM. B + -trees

Lecture 4. ISAM and B + -trees. Database Systems. Tree-Structured Indexing. Binary Search ISAM. B + -trees Lecture 4 and Database Systems Binary Multi-Level Efficiency Partitioned 1 Ordered Files and Binary How could we prepare for such queries and evaluate them efficiently? 1 SELECT * 2 FROM CUSTOMERS 3 WHERE

More information

CS542. Algorithms on Secondary Storage Sorting Chapter 13. Professor E. Rundensteiner. Worcester Polytechnic Institute

CS542. Algorithms on Secondary Storage Sorting Chapter 13. Professor E. Rundensteiner. Worcester Polytechnic Institute CS542 Algorithms on Secondary Storage Sorting Chapter 13. Professor E. Rundensteiner Lesson: Using secondary storage effectively Data too large to live in memory Regular algorithms on small scale only

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information