EECS 647: Introduction to Database Systems

Similar documents
Queries for Today. CS186: Introduction to Database Systems. What? Why? Who? How? For instance? Joe Hellerstein and Christopher Olston.

CS 405G: Introduction to Database Systems. Lecture 1: Introduction

Fall Semester 2002 Prof. Michael Franklin

Queries for Today. CS186: Introduction to Database Systems. What? Why? Who? How? For instance? Minos Garofalakis and Joe Hellerstein.

EECS 647: Introduction to Database Systems

CAS CS 460/660 Introduction to Database Systems. Fall

Introduction and Overview

Outline. Database Management Systems (DBMS) Database Management and Organization. IT420: Database Management and Organization

Introduction to Database Systems CS432. CS432/433: Introduction to Database Systems. CS432/433: Introduction to Database Systems

Introduction and Overview

Outline. Databases and DBMS s. Recent Database Applications. Earlier Database Applications. CMPSCI445: Information Systems.

BBM371- Data Management. Lecture 1: Course policies, Introduction to DBMS

Introduction to Data Management. Lecture #1 (Course Trailer )

Introduction to Data Management. Lecture #2 (Big Picture, Cont.)

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

Course Web Site. 445 Staff and Mailing Lists. Textbook. Databases and DBMS s. Outline. CMPSCI445: Information Systems. Yanlei Diao and Haopeng Zhang

Introduction to CS 4604

CMPSCI445: Information Systems

745: Advanced Database Systems

CS 405G: Introduction to Database Systems

Why are you here? Introduction. Course roadmap. Course goals. What do you want from a DBMS? What is a database system? Aren t databases just

Concepts Of Database Management 7th Edition Solution Manual

Database Management Systems MIT Introduction By S. Sabraz Nawaz

Introduction. Random things to do after this course. Course roadmap. CPS 116 Introduction to Database Systems

Conceptual Design. The Entity-Relationship (ER) Model

Concepts Of Database Management 7th Edition Pratt

Chapter 1 Introduction

CS430/630 Database Management Systems Spring, Betty O Neil University of Massachusetts at Boston

Introduction to Data Management. Lecture #1 (Course Trailer ) Instructor: Chen Li

Introduction to Data Management. Lecture #1 (Course Trailer )

9/8/2018. Prerequisites. Grading. People & Contact Information. Textbooks. Course Info. CS430/630 Database Management Systems Fall 2018

EECS 647: Introduction to Database Systems

Database Management System

CMPT 354 Database Systems I. Spring 2012 Instructor: Hassan Khosravi

Introduction to Database Systems. Chapter 1. Instructor: . Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Database Applications (15-415)

Course and Contact Information. Course Description. Course Objectives

1. Data Model, Categories, Schemas and Instances. Outline

Introduction to Database S ystems Systems CSE 444 Lecture 1 Introduction CSE Summer

Quick Facts about the course. CS 2550 / Spring 2006 Principles of Database Systems. Administrative. What is a Database Management System?

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

COMP Instructor: Dimitris Papadias WWW page:

Course and Contact Information. Course Description. Course Objectives

CMPSCI 645 Database Design & Implementation

CMPT 354: Database System I. Lecture 1. Course Introduction

Database Management System Implementation. Who am I? Who is the teaching assistant? TR, 10:00am-11:20am NTRP B 140 Instructor: Dr.

The Relational Model. Database Management Systems

Introduction to Database Management Systems

Database Systems Management

CSE 344 JANUARY 3 RD - INTRODUCTION

CS634 Architecture of Database Systems Spring Elizabeth (Betty) O Neil University of Massachusetts at Boston

Introduction. Example Databases

Database Management Systems. Chapter 1

By Marina Barsky CSC 343. Introduction to databases. Summer

CSC 4710 / CSC 6710 Database Systems. Rao Casturi

CS 146 Database Systems

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #1: Introduction

Introduction to Data Management CSE 344. Lecture 1: Introduction

Your New App. Motivation. Data Management is Universal. Staff. Introduction to Data Management (Database Systems) CSE 414. Lecture 1: Introduction

Introduction to Database Systems CSE 444. Lecture 1 Introduction

EECS3421 Introduction to Database Management Systems. Thanks to John Mylopoulos and Ryan Johnson for material in these slides

Database Systems ( 資料庫系統 ) Practicum in Database Systems ( 資料庫系統實驗 ) 9/20 & 9/21, 2006 Lecture #1

Introduction to Data Management. Lecture #13 (Indexing)

Fundamentals of Information Systems, Seventh Edition

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1

B.C.A DATA BASE MANAGEMENT SYSTEM MODULE SPECIFICATION SHEET. Course Outline

COMP.3090/3100 Database I & II. Textbook

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li

Relational Database Systems Part 01. Karine Reis Ferreira

Database Concepts. CS 377: Database Systems

Introduction to Databases Fall-Winter 2009/10. Syllabus

Administrivia. The Relational Model. Review. Review. Review. Some useful terms

Overview of Data Management

CS 564: DATABASE MANAGEMENT SYSTEMS. Spring 2018

Storing Data: Disks and Files

Introduction. Who wants to study databases?

CMPUT 291 File and Database Management Systems

Who, where, when. Database Management Systems (LIX022B05) Literature. Evaluation. Lab Sessions. About this course. After this course...

Chapter 1: Introduction

Introduction to Database S ystems Systems CSE 444 Lecture 1 Introduction CSE Summer

Introduction to Data Management (Database Systems) CSE 414

Database Management Systems Chapter 1 Instructor: Oliver Schulte Database Management Systems 3ed, R. Ramakrishnan and J.

Database Systems. Lecture2:E-R model. Juan Huo( 霍娟 )

CS145: Intro to Databases. Lecture 1: Course Overview

CSE 544 Principles of Database Management Systems

CMPUT 391 Database Management Systems. Fall Semester 2006, Section A1, Dr. Jörg Sander. Introduction

Database. Università degli Studi di Roma Tor Vergata. ICT and Internet Engineering. Instructor: Andrea Giglio

Fundamentals of Databases

CS564: Database Management Systems. Lecture 1: Course Overview. Acks: Chris Ré 1

Overview of db design Requirement analysis Data to be stored Applications to be built Operations (most frequent) subject to performance requirement

Introduction to Data Management. Lecture 14 (Storage and Indexing)

CPSC 421 Database Management Systems. Lecture 11: Storage and File Organization

Introduction to Data Management. Lecture #2 Intro II & Data Models I

CS 445 Introduction to Database Systems

Databases and Database Management Systems

DATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems

CSE 3241: Database Systems I Databases Introduction (Ch. 1-2) Jeremy Morris

Page 1. Goals for Today" What is a Database " Key Concept: Structured Data" CS162 Operating Systems and Systems Programming Lecture 13.

COWLEY COLLEGE & Area Vocational Technical School

CSC 261/461 Database Systems. Fall 2017 MW 12:30 pm 1:45 pm CSB 601

Transcription:

EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2009

Queries for Today What is a database? What is a database management system? Why take a database course? Who will teach? How to take the class? Preview of class contents 1/20/2009 Luke Huan Univ. of Kansas 2

What: Database Systems Then 1/20/2009 Luke Huan Univ. of Kansas 3

What: Database Systems Today 1/20/2009 Luke Huan Univ. of Kansas 4

What: Database Systems Today 1/20/2009 Luke Huan Univ. of Kansas 5

What: Database Systems Today 1/20/2009 Luke Huan Univ. of Kansas 6

So What is a Database? A database is a very large, integrated collection of data. Data is a group of facts that can be recorded and have an implicit meaning. Typically a database is used to model a real-world enterprise (or a miniworld) Entities (e.g., basketball teams, games) Relationships (e.g. KU s basketball team won <you name it> last week) Might surprise you how flexible this is Web search: Entities: words, documents Relationships: word in document, document links to document. P2P filesharing: Entities: words, filenames, hosts Relationships: word in filename, file available at host 1/20/2009 Luke Huan Univ. of Kansas 7

Databases make these folks happy... End users in many fields Business, education, science, DB application programmers Build data entry & analysis tools on top of DBMSs Build web services that run off DBMSs Database administrators (DBAs) Design logical/physical schemas Handle security and authorization Data availability, crash recovery Database tuning as needs evolve DBMS vendors, programmers Oracle, IBM, MS must understand how a DBMS works 1/20/2009 Luke Huan Univ. of Kansas 8

What is a Database Management System? A Database Management System (DBMS) is a collection of programs that enable uses to create and maintain databases store, manage, and access data in a databases. Typically this term is used narrowly Relational databases with transactions E.g. Oracle, DB2, SQL Server Mostly because they predate other large repositories Also because of technical richness When we say DBMS in this class we will usually follow this convention But keep an open mind about applying the ideas! 1/20/2009 Luke Huan Univ. of Kansas 9

What: Is the WWW a DBMS? Fairly sophisticated search available Crawler indexes pages on the web Keyword-based search for pages But, currently data is mostly unstructured and untyped search only: can t modify the data can t get summaries, complex combinations of data few guarantees provided for freshness of data, consistency across data items, fault tolerance, web sites typically have a (relational) DBMS in the background to provide these functions. 1/20/2009 Luke Huan Univ. of Kansas 10

What: Is a File System a DBMS? Thought Experiment 1: You and your project partner are editing the same file. Q: How do you write You both save it at the same time. programs over a Whose changes survive? A) Yours B) Partner s C) Both D) Neither E)??? subsystem when it promises you only???? Thought Experiment 2: You re updating a file. A: Very, very carefully!! The power goes out. Which changes survive? A) All B) None C) All Since Last Save D)??? 1/20/2009 Luke Huan Univ. of Kansas 11

Current Commercial Outlook A major part of the software industry: Oracle, IBM, Microsoft also Sybase, Informix (now IBM), Teradata smaller players: java-based dbms, devices, OO, Lots of related industries data warehouse, document management, storage, backup, reporting, business intelligence, ERP, CRM, app integration Traditional Relational DBMS products dominant and evolving adapted for extensibility (user-defined types), native XML support. Open Source coming on strong MySQL, PostgreSQL, Apache Derby, BerkeleyDB, Ingres, EigenBase And of course, the other database technologies Search engines, P2P, etc. 1/20/2009 Luke Huan Univ. of Kansas 12

What database systems will we cover? We will be try to be broad and touch upon Relational DBMS (e.g. Oracle, SQL Server, DB2, Postgres) Semi-structured DB systems (e.g. XML repositories like Xindice) Data mining: transfer data to knowledge! Starting point We assume you have used web search engines We assume you don t know relational databases Yet they pioneered many of the key ideas So focus will be on relational DBMSs With frequent side-notes on search engines, XML issues 1/20/2009 Luke Huan Univ. of Kansas 13

Why take this class? A. Database systems are at the core of CS B. They are incredibly important to society C. The topic is intellectually rich D. It isn t that much work E. Looks good on your resume Let s spend a little time on these 1/20/2009 Luke Huan Univ. of Kansas 14

Instructors Prof. Luke Huan, EECS jhuan@eecs.ku.edu Prof. Office Hours: Who? Eaton Hall 2034, M, W 4:15-5:15pm 1/20/2009 Luke Huan Univ. of Kansas 15

How? Workload Projects with a real world focus: Build a web-based application w/postgresql and your favorite internet programming language (PHP, JAVA, PERL, just naming a few) Self-initiated projects with similar scope Homework assignments and quizzes Exams 2 Midterms & 1 Final Projects to be done in groups of 2 Pick your partner ASAP The course is front-loaded most of the hard work is in the first half 1/20/2009 Luke Huan Univ. of Kansas 16

How? Administrative Text book: Database Management Systems, Raghu Ramakrishnan, Johannes Gehrke, 3rd ed. Available at KU Student Stores. Class website - people.eecs.ku.edu/~jhuan/eecs647_s09 read it regularly Please include eecs 647 in your mail to the instructor (for quick response) Grading, hand-in policies, etc. is on syllabus * The textbook will be available at the KU bookstore today or tomorrow 1/20/2009 Luke Huan Univ. of Kansas 17

Academic Misconduct We take academic misconduct very seriously. We encourage student to work together for doing homework and projects, but each student should write down their own solution. You are absolutely encouraged to discuss with your classmates about your homework assignments, programming exercises, and final projects You are responsible for all your works For homework assignments and projects resulted from discussion, please always acknowledge other people s contribution by including a sentence at the beginning of the hand-in saying I discussed the homework (project) with XXX (include more names if necessary). There is absolutely NO penalty for doing so. 1/20/2009 Luke Huan Univ. of Kansas 18

Agenda for the rest of today Today: a preview of the class contents: ER model Relational model Logical database design SQL File system Query processing Transaction Data mining XML Database in the future Next Monday the Entity-Relationship model Today s lecture is from Chapter 1 in R & G Read Chapter 2 for next class. 1/20/2009 Luke Huan Univ. of Kansas 19

Design a Database ER model Relational model Database logical design Query with SQL Agenda 1/20/2009 Luke Huan Univ. of Kansas 20

Data Models Describe Data A data model is a collection of concepts for describing data. Different levels of data models Conceptual (semantic) data model uses data (facts) to describe a physical world Representational model describes data in database management systems Physical data model describe how data is stored in files in computers. 1/20/2009 Luke Huan Univ. of Kansas 21

An ER Model for a mini-world name name ssn Salary number manager Employees Works_For Departments Entity: Real-world object distinguishable from other objects. An entity is described (in DB) using a set of attributes. Relationship: Association among two or more entities. E.g., John works in Pharmacy department.

Relational Model Describes Data Relational model is the most popular representational data model that are used by current commercial DBMS. Relational model describes data using tables SSN 123456789 Name John Smith Salary 30000 ssn name Salary 333444555 Franklin Wong 40000 Employees 453453453 Joyce English 25000 Cardinality = 3, degree = 3, all rows distinct Do all columns in a relation instance have to be distinct? 1/20/2009 Luke Huan Univ. of Kansas 23

Logical Database Design name name ssn Salary number manager Employees Works_For Departments SSN Name Salary DNumber DName Manager 123456789 333444555 453453453 John Smith Franklin Wong Joyce English 30000 40000 25000 5 SSN 123456789 333444555 453453453 research Dnum 5 5 5 123456789 1/20/2009 Luke Huan Univ. of Kansas 24

Logical Database Design (Cont.) name name ssn Salary number manager Employees Works_For Departments SSN Name Salary DNum DNumber DName Manager 123456789 John Smith 30000 5 5 research 123456789 333444555 Franklin Wong 40000 5 453453453 Joyce English 25000 5 1/20/2009 Luke Huan Univ. of Kansas 25

Logical Database Design (Cont.) name name ssn Salary number manager Employees Works_For Departments SSN Name Salary DNumber DName Manager 123456789 John Smith 30000 5 research 123456789 333444555 Franklin Wong 40000 5 research 123456789 453453453 Joyce English 25000 5 research 123456789 1/20/2009 Luke Huan Univ. of Kansas 26

SQL: Query Information from a Database Query 1: Retrieve the salary of Franklin Wong Q1:SELECT SALARY FROM EMPLOYEE WHERE NAME= Franklin Wong Answer: 40000 SSN 123456789 333444555 453453453 Name John Smith Franklin Wong Joyce English Salary 30000 40000 25000 1/20/2009 Luke Huan Univ. of Kansas 27

Agenda Design a Database Management System File systems Disk storage Indexing Query processing Transactions Concurrency Recovery 1/20/2009 Luke Huan Univ. of Kansas 28

Components of a Disk The platters spin (say, 90rps). The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!). Disk head Arm movement Spindle Tracks Sector Platters Arm assembly

Indexing is the Key to Speed up Disk Operations ``Find all employees whose salary > $25000 If data is in sorted file, do binary search to find first such employee, then scan to find others. Cost of binary search can be quite high. Simple idea: Create an `index file. k1 k2 kn Index File Page 1 Page 2 Page 3 Page N Data File Can do binary search on (smaller) index file!

Concurrent execution of user programs Why? Utilize CPU while waiting for disk I/O (database programs make heavy use of disk) Avoid short programs waiting behind long ones e.g. ATM withdrawal while bank manager sums balance across all accounts 1/20/2009 Luke Huan Univ. of Kansas 31

Key concept: Transaction an atomic sequence of database actions (reads/writes) takes DB from one consistent state to another transaction consistent state 1 consistent state 2 1/20/2009 Luke Huan Univ. of Kansas 32

checking: $200 savings: $1000 Example Transaction transfer $100 from Saving to checking checking: $300 savings: $900 Here, consistency is based on our knowledge of banking semantics In general, up to writer of transaction to ensure transaction preserves consistency DBMS provides (limited) automatic enforcement, via integrity constraints e.g., balances must be >= 0 1/20/2009 Luke Huan Univ. of Kansas 33

Advanced topics XML Data mining Association Classification Clustering Database in the future Agenda 1/20/2009 Luke Huan Univ. of Kansas 34

Acknowledgement We thank Dr. Jun Yang at the Duke University to provide his class slides of database management systems. Some slides used in this class are downloaded from UC Berkeley Introduction to Database Systems class at http://cs186.declarativity.net/ 1/20/2009 Luke Huan Univ. of Kansas 35