Database Design and Programming

Similar documents
Real SQL Programming 1

Chapter 9 SQL in a server environment

Database-Connection Libraries

Database-Connection Libraries. Java Database Connectivity PHP

Schedule. Feb. 12 (T) Advising Day. No class. Reminder: Midterm is Feb. 14 (TH) Today: Feb. 7 (TH) Feb. 21 (TH) Feb. 19 (T)

Chapter 9 SQL in a server environment

File Structures and Indexing

CS 525: Advanced Database Organization 03: Disk Organization

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

CS143: Disks and Files

Database Systems II. Secondary Storage

Storage hierarchy. Textbook: chapters 11, 12, and 13

Data Storage and Query Answering. Data Storage and Disk Structure (2)

CS 245: Database System Principles

Disks, Memories & Buffer Management

Classifying Physical Storage Media. Chapter 11: Storage and File Structure. Storage Hierarchy (Cont.) Storage Hierarchy. Magnetic Hard Disk Mechanism

Classifying Physical Storage Media. Chapter 11: Storage and File Structure. Storage Hierarchy. Storage Hierarchy (Cont.) Speed

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Database Systems II. Record Organization

Representing Data Elements

CS 554: Advanced Database System

Storing Data: Disks and Files

L9: Storage Manager Physical Data Organization

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig.

Data Storage and Query Answering. Data Storage and Disk Structure (4)

Programming in Java

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

Organization of Records in Blocks

Database Systems. November 2, 2011 Lecture #7. topobo (mit)

SQL in a Server Environment

CMSC424: Database Design. Instructor: Amol Deshpande

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Chapter 7

Working with Databases and Java

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Storing Data: Disks and Files

CSCI/CMPE Object-Oriented Programming in Java JDBC. Dongchul Kim. Department of Computer Science University of Texas Rio Grande Valley

JDBC Architecture. JDBC API: This provides the application-to- JDBC Manager connection.

Disks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to:

Some Practice Problems on Hardware, File Organization and Indexing

Top 50 JDBC Interview Questions and Answers

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Today: Secondary Storage! Typical Disk Parameters!

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

SQL DML and DB Applications, JDBC

Lab # 9. Java to Database Connection

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

CMSC424: Database Design. Instructor: Amol Deshpande

Storing Data: Disks and Files

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

Non-interactive SQL. EECS Introduction to Database Management Systems

Outlines. Chapter 2 Storage Structure. Structure of a DBMS (with some simplification) Structure of a DBMS (with some simplification)

Mass-Storage Structure

Storing Data: Disks and Files

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

Storage Devices for Database Systems

DataBase Lab JAVA-DATABASE CONNECTION. Eng. Haneen El-masry

High Performance Computing Course Notes High Performance Storage

I/O CANNOT BE IGNORED

CPSC 421 Database Management Systems. Lecture 11: Storage and File Organization

System Structure Revisited

Disk Scheduling. Based on the slides supporting the text

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.

Embedded SQL. csc343, Introduction to Databases Diane Horton with examples from Ullman and Widom Fall 2014

CS 405G: Introduction to Database Systems. Storage

CS-245 Database System Principles

Professor: Pete Keleher! Closures, candidate keys, canonical covers etc! Armstrong axioms!

INTRODUCTION TO JDBC - Revised spring

Monday, May 4, Discs RAID: Introduction Error detection and correction Error detection: Simple parity Error correction: Hamming Codes

Advanced Database Systems

Disks and Files. Jim Gray s Storage Latency Analogy: How Far Away is the Data? Components of a Disk. Disks

JDBC, Transactions. Niklas Fors JDBC 1 / 38

CSE 190D Database System Implementation

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Chapter 14: Mass-Storage Systems. Disk Structure

Chapter 13: Indexing. Chapter 13. ? value. Topics. Indexing & Hashing. value. Conventional indexes B-trees Hashing schemes (self-study) record

Advanced Database Systems

INTRODUCTION TO JDBC - Revised Spring

RAID in Practice, Overview of Indexing

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 10: Mass-Storage Systems

JDBC. Sun Microsystems has included JDBC API as a part of J2SDK to develop Java applications that can communicate with databases.

Physical Representation of Files

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

V. Mass Storage Systems

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

I/O CANNOT BE IGNORED

Chapter 12: Indexing and Hashing

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Chapter 1 Disk Storage, Basic File Structures, and Hashing.

Disks & Files. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Unit 3 Disk Scheduling, Records, Files, Metadata

e-pg Pathshala Subject: Computer Science Paper: Web Technology Module: JDBC INTRODUCTION Module No: CS/WT/26 Quadrant 2 e-text

Ch 11: Storage and File Structure

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

CSE 308. Database Issues. Goals. Separate the application code from the database

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

Transcription:

Database Design and Programming Jan Baumbach jan.baumbach@imada.sdu.dk http://www.baumbachlab.net

JDBC Java Database Connectivity (JDBC) is a library similar for accessing a DBMS with Java as the host language >200 drivers available: PostgreSQL, MySQL, Oracle, ODBC,... http://jdbc.postgresql.org/ 2

Making a Connection import java.sql.*;... Loaded by forname URL of the database your name, and password go here The JDBC classes Class.forName( org.postgresql.driver ); Connection mycon = DriverManager.getConnection( );... The driver for postgresql; others exist 3

URL for PostgreSQL database getconnection(jdbc:postgresql://< host>[:<port>]/<database>?user=<u ser>&password=<password>); Alternatively use getconnection variant: getconnection("jdbc:postgresql:// <host>[:<port>]/<database>", <user>, <password>); DriverManager.getConnection("jdbc :postgresql://10.110.4.32:5434/po stgres", "petersk", "geheim"); 4

Statements JDBC provides two classes: 1. Statement = an object that can accept a string that is a SQL statement and can execute such a string 2. PreparedStatement = an object that has an associated SQL statement ready to execute 5

Creating Statements The Connection class has methods to create Statements and PreparedStatements Statement stat1 = mycon.createstatement(); PreparedStatement stat2 = mycon.createstatement( SELECT beer, price FROM Sells + ); WHERE bar = C.Ch. createstatement with no argument returns a Statement; with one argument it returns a PreparedStatement 6

Executing SQL Statements JDBC distinguishes queries from modifications, which it calls updates Statement and PreparedStatement each have methods executequery and executeupdate For Statements: one argument the query or modification to be executed For PreparedStatements: no argument 7

Example: Update stat1 is a Statement We can use it to insert a tuple as: stat1.executeupdate( INSERT INTO Sells + VALUES( C.Ch., Eventyr,30) ); 8

Example: Query stat2 is a PreparedStatement holding the query SELECT beer, price FROM Sells WHERE bar = C.Ch. executequery returns an object of class ResultSet we ll examine it later The query: ResultSet menu = stat2.executequery(); 9

Accessing the ResultSet An object of type ResultSet is similar to a cursor Method next() advances the cursor to the next tuple The first time next() is applied, it gets the first tuple If there are no more tuples, next() returns the value false 10

Accessing Components of Tuples When a ResultSet is referring to a tuple, we can get the components of that tuple by applying certain methods to the ResultSet Method getx (i ), where X is some type, and i is the component number, returns the value of that component The value must have type X 11

Example: Accessing Components Menu = ResultSet for query SELECT beer, price FROM Sells WHERE bar = C.Ch. Access beer and price from each tuple by: while (menu.next()) { thebeer = menu.getstring(1); } theprice = menu.getfloat(2); /*something with thebeer and theprice*/ 12

Important Details Reusing a Statement object results in the ResultSet being closed Always create new Statement objects using createstatement() or explicitly close ResultSets using the close method For transactions, for the Connection con use con.setautocommit(false) and explicitly con.commit() or con.rollback() If AutoCommit is false and there is no commit, closing the connection = rollback 13

Python and Databases many different modules for accessing databases commercial: mxodbc, open source: pygresql, psycopg2, we use psycopg2 install using easy_install psycopg2 import with import psycopg2 14

Connection String Database connection described by a connection string Example: con_str = """ host='10.110.4.32' port=5434 dbname='postgres' user='peter' password='geheim' """ 15

Making a Connection With the DB library imported and the connection string con_str available: con = psycopg2.connect(con_str); Function connect in the DB API Class is connection because it is returned by psycopg2.connect( ) 16

Cursors in Python Queries are executed for a cursor A cursor is obtained from connection Example: cursor = con.cursor() Queries or modifications are executed using the execute( ) method Cursors can then be used in a for-loop 17

Example: Executing a Query Find all the bars that sell a beer given by the variable beer beer = 'Od.Cl. cursor = con.cursor() cursor.execute( "SELECT bar FROM Sells" + WHERE beer = '%s ;" % beer); Remember this variable is replaced by the value of beer 18

Example: Tuple Cursors bar = 'C.Ch.' cur = con.cursor() cur.execute("select beer, price" + " FROM Sells" + " WHERE bar = " + bar + ";") for row in cur: print row[0] + for + row[1] 19

Caution: SQL Injection SQL queries are often constructed by programs These queries may take constants from user input Careless code can allow rather unexpected queries to be constructed and executed 20

Example: SQL Injection Relation Accounts(name, passwd, acct) Web interface: get name and password from user, store in strings n and p, issue query, display account number cur.execute("select acct FROM " + "Accounts WHERE name = '%s' " + AND passwd = '%s';" % (n,p)) 21

User (Who Is Not Bill Gates) Types Name: gates -- Comment in PostgreSQL Password: who cares? Your account number is 1234-567 22

The Query Executed SELECT acct FROM Accounts WHERE name = gates -- AND passwd = who cares? All treated as a comment 23

Summary 8 More things you should know: Stored Procedures, PL/pgsql Declarations, Statements, Loops, Cursors, Tuple Variables Three-Tier Approach, JDBC, psycopg2 24

Data Storage 25

Computer System CPU...... RAM SATA Secondary & Tertiary Storage 26

The Memory Hierarchy a lot/mb Cache 0.3 ns cost 30/GB 8/GB RAM primary Solid-State Disk secondary 1.5 ns 0.1 ms latency 0.4/GB Harddisk tertiary 7.5 ms 27

DBMS and Storage Databases typically too large to keep in primary storage Tables typically kept in secondary storage Large amounts of data that are only accessed infrequently are stored in tertiary storage (or even on tape robot) Indexes and current tables cached in primary storage 28

Harddisk N rotating magenetic platters 2xN heads for reading and writing track, cylinder, sector, gap 29

Harddisk Access access time: how long does it take to load a block from the harddisk? seek time: how long does it take to move the heads to the right cylinder? rotational delay: how long does it take until the head gets to the right sectors? transfer time: how long does it take to read the block? access = seek + rotational + transfer 30

Seek Time average seek time = ½ time to move head from outermost to innermost cylinder 31

Rotational Delay average rotational delay = ½ rotation head here block to read 32

Transfer Time Transfer time = 1/n rotation when there are n blocks on one track to here from here 33

Access Time Typical harddisk: Maximal seek time: 10 ms Rotational speed: 7200 rpm Block size: 4096 bytes Sectors (512 bytes) per track: 1600 (average) Average access time: 9.21 ms Average seek time: 5 ms Average rotational delay: 60/7200/2 = 4.17 ms Average transfer time: 0.04 ms 34

Random vs Sequential Access Random access of blocks: 1/0.00921s * 4096 byte = 0.42 Mbyte/s Sequential access of blocks: 120/s * 200 * 4096 byte = 94 Mbyte/s Performance of the DBMS dominated by number of random accesses 35

On Disk Cache CPU...... RAM SATA cache cache Secondary & Tertiary Storage 36

Problems with Harddisks Even with caches, harddisk remains bottleneck for DBMS performance Harddisks can fail: Intermittent failure Media decay Write failure Disk crash Handle intermittent failures by rereading the block in question 37

Detecting Read Failures Use checksums to detect failures Simplest form is parity bit: 0 if number of ones in the block is even 1 if number of ones in the block is odd Detects all 1-bit failures Detects 50% of many-bit failures By using n bits, we can reduce the chance of missing an error to 1/2^n 38

Disk Arrays Use more than one disk for higher reliability and/or performance RAID (Redundant Arrays of Independent Disks) logically one disk 39

RAID 0 Alternate blocks between two or more disks ( Striping ) Increases performance both for writing and reading No increase in reliability Disk 1 2 0 1 2 3 4 5 Storing blocks 0-5 in the first three blocks of disk 1 & 2 40

RAID 1 Duplicate blocks on two or more disks ( Mirroring ) Increases performance for reading Increases reliability significantly Disk 1 2 0 0 1 1 2 2 Storing blocks 0-2 in the first three blocks of disk 1 & 2 41

RAID 5 Stripe blocks on n+1 disks where for each block, one disk stores parity information More performant when writing than RAID 1 Increased reliability compared to RAID 0 Disk 1 2 3 0 1 P 2 5 P P 3 4 Storing blocks 0-5 in the first three blocks of disk 1, 2 & 3 42

RAID Capacity Assume disks with capacity 1 TByte RAID 0: N disks = N TByte RAID 1: N disks = 1 TByte RAID 5: N disks = (N-1) TByte RAID 6: N disks = (N-M) TByte... 43

Storage of Values Basic unit of storage: Byte Integer: 4 bytes Example: 42 is Characters: ASCII, UTF8,... Boolean: and 8 bits 00000000 00000000 00000000 00101010 00000000 11111111 44

Storage of Values Dates: Days since January 1, 1900 DDMMYYYY (not DDMMYY) Time: Seconds since midnight HHMMSS Strings: Null terminated Length given L a r s 4 L a r s 45

DBMS Storage Overview Values Records Blocks Files Memory 46

Record Collection of related data items (called Fields) Typically used to store one tuple Example: Sells record consisting of bar field beer field price field 47

Record Metadata For fixed-length records, schema contains the following information: Number of fields Type of each field Order in record For variable-length records, every record contains this information in its header 48

Record Header Reserved part at the beginning of a record Typically contains: Record type (which Schema?) Record length (for skipping) Time stamp (last access) 49

Files Files consist of blocks containing records How to place records into blocks? assume fixed length blocks assume a single file 50

Files Options for storing records in blocks: 1. Separating records 2. Spanned vs. unspanned 3. Sequencing 4. Indirection 51

1. Separating Records Block R1 R2 R3 a.no need to separate - fixed size recs. b.special marker c.give record lengths (or offsets) i. within each record ii. in block header 52

2. Spanned vs Unspanned Unspanned: records must be in one block R1 R2 R3 R4 R5 Spanned: one record in two or more blocks R1 R2 R3 (a) R3 (b) R4 R5 R6 R7 (a) Unspanned much simpler, but wastes space Spanned necessary if record size > block 53 size

3. Sequencing Ordering records in a file (and in the blocks) by some key value Can be used for binary search Options: a. Next record is physically contiguous R1 Next (R1)... a. Records are linked R1 Next (R1) 54

4. Indirection How does one refer to records? a. Physical address (disk id, cylinder, head, sector, offset in block) b. Logical record ids and a mapping table Indirection map 17 Rec ID Physical addr. 2:34:5:742:2340 Tradeoff between flexibility and cost 55

Modification of Records How to handle the following operations on the record level? 1. Insertion 2. Deletion 3. Update 56

1. Insertion Easy case: records not in sequence Insert new record at end of file If records are fixed-length, insert new record in deleted slot Difficult case: records are sorted Find position and slide following records If records are sequenced by linking, insert overflow blocks 57

2. Deletion a. Immediately reclaim space by shifting other records or removing overflows b. Mark deleted and list as free for re-use Tradeoffs: How expensive is immediate reclaim? How much space is wasted? 58

3. Update If records are fixed-length and the order is not affected: Fetch the record, modify it, write it back Otherwise: Delete the old record Insert the new record overwriting the tombstones from the deletion 59

Data Organizaton There are millions of ways to organize the data on disk Flexibility Space Utilization Complexity Performance 60

Summary 9 More things you should know: Memory Hierarchy Storage on harddisks Values, Records, Blocks, Files Storing and modifying records 61

Index Structures 62

Finding Records How do we find the records for a query? Example: SELECT * FROM Sells Need to examine every block in every file Group blocks into files by relation! Example: SELECT * FROM Sells WHERE price = 20; Need to examine every block in the file 63

Finding Records Use of indexes allows to narrow search to (almost) only the relevant blocks Value Index Blocks Holding records Matching records Indexes can be dense or sparse 64

Dense Index Dense Index 10 20 30 40 50 60 70 80 90 100 110 120 Sequential File 10 20 30 40 50 60 70 80 90 100 65

Sparse Index 2nd level 10 90 170 250 330 410 490 570 Sparse Index 10 30 50 70 90 110 130 150 170 190 210 230 Sequential File 10 20 30 40 50 60 70 80 90 100 66

Deletion from Sparse Index Delete 40 10 30 50 70 90 110 130 150 10 20 30 40 50 60 70 80 67

Deletion from Sparse Index Delete 30 10 30 40 50 70 90 110 130 150 10 20 30 40 40 50 60 70 80 68

Deletion from Sparse Index Delete 30 & 40 10 30 50 50 70 70 90 110 130 150 10 20 30 40 50 60 70 80 69

Insertion into Sparse Index Insert 35 10 30 50 70 90 110 130 150 10 20 30 35 50 60 70 80 70

Insertion into Sparse Index Insert 25 10 30 50 70 90 110 130 150 10 20 30 35 50 60 70 80 25 71

Sparse vs Dense Sparse uses less index space per record (can keep more of index in memory) Sparse allows multi-level indexes Dense can tell if record exists without accessing it Dense needed for secondary indexes Primary index = order of records in storage Secondary index = impose different order 72

Secondary Index 2nd level 10 20 50 Careful when Looking for 20 Secondary Index 10 10 20 20 20 30 40 50 50 60 Sequential File 20 40 10 20 50 30 10 50 60 20 73

Secondary Index 2nd level 10 50 Secondary Index 10 20 30 40 50 60 Sequential File 20 40 10 20 50 30 10 50 60 20 74

Combining Indexes SELECT * FROM Sells WHERE beer = Od.Cl. AND price = 20 Beer index Sells Price index OC 20 C.Ch. Just intersect buckets in memory! 75

Conventional Indexes Sparse, Dense, Multi-level,... Advantages: Simple Sequential index is good for scans Disadvantage: Inserts expensive Lose sequentiality and balance 76

Example: Unbalanced Index 10 20 30 33 40 50 60 70 80 90 39 31 35 36 32 38 34 overflow area (not sequential) 77