A Few Tips and Suggestions. Database System II Preliminaries. Applications (contd.) Applications

Similar documents
A Few Tips and Suggestions. Database System II Preliminaries. Applications (contd.) Applications

A Few Tips and Suggestions. Database System I Preliminaries. Applications (contd.) Applications

Introduction to Database Management Systems

Introduction to Database Systems. Chapter 1. Instructor: . Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Database Management Systems. Chapter 1

Introduction to Database Systems CS432. CS432/433: Introduction to Database Systems. CS432/433: Introduction to Database Systems

Database Management Systems Chapter 1 Instructor: Oliver Schulte Database Management Systems 3ed, R. Ramakrishnan and J.

Introduction and Overview

Introduction and Overview

Relational data model

BBM371- Data Management. Lecture 1: Course policies, Introduction to DBMS

The Relational Data Model. Data Model

Introduction to Data Management. Lecture #1 (Course Trailer )

Introduction to Data Management. Lecture #1 (Course Trailer ) Instructor: Chen Li

Introduction to Data Management. Lecture #1 (Course Trailer )

Database Management Systems. Chapter 3 Part 1

The Relational Model. Chapter 3

Week 1 Part 1: An Introduction to Database Systems

The Relational Model. Chapter 3. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Database Management System

The Relational Model. Relational Data Model Relational Query Language (DDL + DML) Integrity Constraints (IC)

Introduction to Data Management. Lecture #4 (E-R Relational Translation)

Review. The Relational Model. Glossary. Review. Data Models. Why Study the Relational Model? Why use a DBMS? OS provides RAM and disk

The Relational Model. Database Management Systems

Page 1. Goals for Today" What is a Database " Key Concept: Structured Data" CS162 Operating Systems and Systems Programming Lecture 13.

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Database Applications (15-415)

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Page 1. Quiz 18.1: Flow-Control" Goals for Today" Quiz 18.1: Flow-Control" CS162 Operating Systems and Systems Programming Lecture 18 Transactions"

Administrivia. The Relational Model. Review. Review. Review. Some useful terms

Lecture 1: Introduction

Introduction to Data Management. Lecture #1 (The Course Trailer )

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Relational Model. Topics. Relational Model. Why Study the Relational Model? Linda Wu (CMPT )

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

Why Study the Relational Model? The Relational Model. Relational Database: Definitions. The SQL Query Language. Relational Query Languages

CS 443 Database Management Systems. Professor: Sina Meraji

Fall Semester 2002 Prof. Michael Franklin

DATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems

CS430/630 Database Management Systems Spring, Betty O Neil University of Massachusetts at Boston

Introduction to Data Management. Lecture #18 (Transactions)

Introduction. Example Databases

Database Systems ( 資料庫系統 ) Practicum in Database Systems ( 資料庫系統實驗 ) 9/20 & 9/21, 2006 Lecture #1

Relational Databases BORROWED WITH MINOR ADAPTATION FROM PROF. CHRISTOS FALOUTSOS, CMU /615

Introduction to Database Systems

Database. Università degli Studi di Roma Tor Vergata. ICT and Internet Engineering. Instructor: Andrea Giglio

Database Applications (15-415)

CMPUT 291 File and Database Management Systems

Database Management Systems. Syllabus. Instructor: Vinnie Costa

Database Systems. Lecture2:E-R model. Juan Huo( 霍娟 )

One Size Fits All: An Idea Whose Time Has Come and Gone

Chapter 1: Introduction

Transaction Management Overview

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

9/8/2018. Prerequisites. Grading. People & Contact Information. Textbooks. Course Info. CS430/630 Database Management Systems Fall 2018

Database Management Systems MIT Introduction By S. Sabraz Nawaz

Introduction to Data Management. Lecture #5 Relational Model (Cont.) & E-Rà Relational Mapping

The Relational Model. Outline. Why Study the Relational Model? Faloutsos SCS object-relational model

CAS CS 460/660 Introduction to Database Systems. Fall

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Concurrency Control & Recovery

Lecture 2 SQL. Announcements. Recap: Lecture 1. Today s topic. Semi-structured Data and XML. XML: an overview 8/30/17. Instructor: Sudeepa Roy

CIS 330: Applied Database Systems

CompSci 516 Database Systems. Lecture 2 SQL. Instructor: Sudeepa Roy

Transaction Management Overview. Transactions. Concurrency in a DBMS. Chapter 16

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

Outline. Database Management Systems (DBMS) Database Management and Organization. IT420: Database Management and Organization

Concurrency Control & Recovery

Introduction to Data Management. Lecture #2 Intro II & Data Models I

Database Systems ( 資料庫系統 )

Queries for Today. CS186: Introduction to Database Systems. What? Why? Who? How? For instance? Joe Hellerstein and Christopher Olston.

1/19/2012. Finish Chapter 1. Workers behind the Scene. CS 440: Database Management Systems

The Relational Model of Data (ii)

The Relational Model. Week 2

John Edgar 2

CMPT 354 Database Systems I. Spring 2012 Instructor: Hassan Khosravi

The Relational Model. Roadmap. Relational Database: Definitions. Why Study the Relational Model? Relational database: a set of relations

MIS Database Systems.

BIS Database Management Systems.

The Relational Model

CS275 Intro to Databases. File Systems vs. DBMS. Why is a DBMS so important? 4/6/2012. How does a DBMS work? -Chap. 1-2

Intro to Transaction Management

DBMS. Relational Model. Module Title?

Introduction to Databases

Database Technology Introduction. Heiko Paulheim

Introduction to Data Management. Lecture #4 E-R Model, Still Going

Chapter 1 Introduction

What is Data? ANSI definition: Volatile vs. persistent data. Data. Our concern is primarily with persistent data

What is Data? Volatile vs. persistent data Our concern is primarily with persistent data

Database Management Systems MIT Lesson 01 - Introduction By S. Sabraz Nawaz

COURSE 1. Database Management Systems

CSE 3241: Database Systems I Databases Introduction (Ch. 1-2) Jeremy Morris

Comp 5311 Database Management Systems. 4b. Structured Query Language 3

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

Introduction. Who wants to study databases?

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

Quick Facts about the course. CS 2550 / Spring 2006 Principles of Database Systems. Administrative. What is a Database Management System?

CS425 Midterm Exam Summer C 2012

Overview of Transaction Management

Transcription:

A Few Tips and Suggestions Database System II Preliminaries Purpose of doing MS and its implications Is your Goal getting A s and a GPA of 4.0? Try to match your career goals with what you learn Instructor: Sharma Chakravarthy sharma@cse.uta.edu The University of Texas at Arlington How the interview has changed over the years What the employers are looking for How to Succeed in your Career! 1 2 Applications Computation intensive applications E. g., finite element analysis, Computer aided design, wind tunnel testing, simulations, numerical analysis Data intensive applications E. g., library of congress book management, Walmart point of sales database, Amazon inventory database, airline reservations, Homeland security database and applications Computation & data intensive applications Data mining, drill down and exploration of data warehouse, dimensional analysis, Google search, stream/sensor data processing, scientific computations Mining very large volumes of unstructured data: cloud computing, NoSQL processing, Map/Reduce paradigm Applications (contd.) We are familiar with cpu intensive applications. They keep most/all of their data in main memory for processing (Java projects, data structures projects). The files you used were very small! We know how to design these algorithms and applications (algorithms and software engineering) Design and implementation of data intensive applications is different from main memory applications. We deal with this in cse 5330 What is the equivalent of UML for data modeling? In fact, EER came way earlier than UML! 3 4 1

Applications (contd.) What applications are likely to be data intensive? Online shopping: Netflix user rentals, Amazon, ebay, Online indomation: Google, Yahoo, Wiki, Online services: credit cards, banking, Others: library book search, comparative shopping, EDGAR database, Facebook database, Enterprise databases: For these application, we not only have to design the computational part, but we have to design the data to be represented and managed Applications (Contd.) In cpu intensive applications, representation of data was assumed to be simpler of the two The choice was in the identification of one of the main-memory data structures (arrays, lists, trees, hash tables ) The focus was on the algorithm, its correctness, and complexity 5 6 Applications (contd.) How do we model a data intensive application Requirements analysis Use of EER model to represent real-world objects (entities and relationships) Mapping EER diagram to a set of relations Normalizing them and storing them in a database Developing applications/gui on top of a DBMS Applications (contd.) In addition, we also need to model processing of data This is done by Listing commonly used computations E.g., which book is borrowed by who? E. g., generate a report of customers who have not paid the last credit card bill (or last 2 credit card bills) E. g., Track a UPS shipment to know the current status E.g., 10 most popular movies (overall or in each genre) E.g., honors students in the COE Notion of transactions The above has to be designed and developed 7 8 2

Applications (contd.) We want to model the data component and at the same time keep the computation in mind For example, Payment of rent and receipt is modeled as a data item which indicates what amount was paid on what date! (actions correspond to data insertion!) The actual exchange of receipt or payment is not captured in that way! With the captured data, it should be possible to generate a receipt, or a non-payment notice after the due date Applications (contd.) We cannot model actions in a database; instead, we capture the underlying data that corresponds to actions; actions need to be modeled/implemented as computations (or code)! For example, Although invoice is used in real-life, the contents of an invoice is what is captured in the database so that an invoice can be generated If you capture all the items purchased, receipt or invoice can be generated Temporally changing data can be captured using date and timestamp as properties 9 10 Applications (contd.) Internet applications accept login id, password, and security question/answers The question is: how to model it using EER? The key is not to concentrate on the process But on the underlying data needed For the above, you need to store, for each user, a set of data: login id, password, security questions/answers The interface that implements the login process can access this data and use it appropriately Applications (contd.) There are a number of other issues: Interfaces/gui Use of templates Physical data base design Authorization and access to sensitive information Salary, ss#, Continuous operation (24 x 7) Recovery/backups, mirroring Optimization of the system Scalability, distribution, etc. 11 12 3

In this course We focus on DBMS architecture and its components Storage and indexing Transaction management and Recovery Query Optimization Buffer Management Projects will be to implement some of the above components in a realistic setting We will also look at new approaches, such as Map/reduce Non-relational or NoSQL DBMSs What Is a DBMS? A very large, integrated collection of data. Models a real-world enterprise. Entities e.g., students, courses, buildings, websites Relationships e.g., Billy Joel performs at UTA, students take courses, lenders issue mortgages, an employee manages a department, students submit projects A Database Management System (DBMS) is a software package designed to store and manage databases. 13 14 Why Use a DBMS? Data independence and efficient access. Reduced application development time. Data integrity and security. Uniform data administration. Concurrent access, recovery from crashes. Persistence, scalability, portability Why Study Databases?? Shift from computation to information management at the low end : ad hoc data, spread sheet at the high end : scientific data management Datasets increasing in diversity and volume. Digital libraries, interactive video, Human Genome project, EOS project, multi-media dbms... need for DBMS is exploding DBMS study encompasses all aspects of CS OS, languages, theory, AI, multimedia, logic? 15 16 4

Evolution of DBMSs Application Modeling Semantics Program Tx Mgmt Pre 1970 s File I/O Application Modeling Semantics Program Tx Mgmt File I/O DBMS Non-relational (1970) Application Modeling Semantics Query Opt Tx Mgmt File I/O DBMS Relational (1980) [Adapted from Peter Lyngbaek:OOPSLA:1991] Application Modeling (static/dynamic) Query Opt Tx Mgmt File I/O DBMS OODBMS (1990) Moving On Application Mining XML support Workflow support Modeling (static/dynamic) Query Opt Tx Mgmt File I/O Application Stream Processing Mining XML support Workflow support Modeling (static/dynamic) Query Opt Tx Mgmt File I/O DBMS [2000] DSMS [2009] Non-DBMS Applications [2013]. 17 18 Impedance mismatch If you are studying DBMS, you have to understand this clearly Cpu is way faster than disk access. In fact, it is a Million or more times faster Cpu accessed are in nano secs and disk accesses are still in milli secs! Because of this speed difference, it takes much longer to bring data from a disk So, cpu is waiting/idling most of the times for data (if not architected properly)! This is called impedance mismatch! 19 Performance Issues There are 2 primary sources of performance problems Improving existing abstraction/adding new functionality Increase in data volume without any other changes In the absence of the above, we can continue to improve performance (improvements are offset by the increase in data or additional software needed for supporting the abstraction) 20 5

Solutions There are several solutions Doubling of cpu speed every 10 months or sooner (Moore s Law) Have you heard of Neuromorphic computing? Increase in disk i/o throughput Use of multiple processors (Map/Reduce leverages this on a large scale) The above CANNOT completely address the problems mentioned earlier New algorithms/techniques, optimization are critical (e.g., data structures, indexes, concurrent processing, ) The introduction of SSDs (solid state disks) are warranting a re-examination of some of the above issues No impedance mismatch!! 21 My Thoughts Improving functionality/abstraction is a natural evolution (not only in databases but in all aspects of life) However, this has a critical impact on performance Performance has to keep up with the increased burden in novel ways (e.g., scheduling, load shedding as witnessed in stream processing) There is an adjustment period for the performance to catch up and it always does 22 What can be done We seem to bundle to much into our system and make it all or nothing situation! We need to unbundle and provide mechanisms to configure systems based on application needs For example, cc and recovery is not needed for a number of newer applications (e.g., stream processing, mining) Similarly, novel storage representations with appropriate search and efficient access can be useful for other applications We can/should not wait for the CPU and hardware speedup to address our performance issues This is happening now with Map/Reduce and NoSQL Systems 23 Data Models A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using the a given data model. The relational model of data is the most widely used model today. Main concept: relation, basically a table with rows and columns. Every relation has a schema, which describes the columns, or fields. 24 6

Levels of Abstraction Example: University Database Many views, single conceptual (logical) schema and physical schema. Views describe how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used. View 1 View 2 View 3 Conceptual Schema Physical Schema Schemas are defined using DDL; data is modified/queried using DML. Conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string) Physical schema: Relations stored as unordered files. Index on first column of Students. External Schema (View): Course_info(cid:string,enrollment:integer) 25 26 Data Independence Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data. One of the most important benefits of using a DBMS! DBMS Components Architecture typically client/server DDL, DML parsing, processing Physical DB design (files, indexing, access methods) Query processing, optimization Buffer management Concurrency control Recovery Utilities backup, archiving, exporting, OLAP, report generator, physical design wizards, access to multiple DBMSs, designer tools, tuning of parameters, etc. etc. 27 28 7

DBMS Architecture SQL commands from applications Query Processing Transaction Manager Lock Manager Concurrency control Query Optimization and Execution (parser, optimizer, Plan executor) Files and Access Methods Buffer Management Disk Space Management Log DB Recovery Manager System catalog, data files, Index files Need an expressive Query Language over a simple data model Non-procedural query language Burden of optimization on the system (not on the user as in hierarchical and network databases) Query representation should not affect query optimization Need for report generation 29 30 Concurrency Control Concurrent execution of user programs is essential for good DBMS performance. Because disk accesses are frequent, and relatively slow, it is important to keep the cpu humming by working on several user programs concurrently. Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. DBMS ensures such problems do not arise: users can pretend they are using a single-user system. Transaction: An Execution of a DB Program Key concept is transaction, which is an atomic (eithe all or nothing) sequence of database actions (reads/writes). Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins. 31 32 8

The Log The following actions are recorded in the log: Ti writes an object: the old value and the new value. Log record must go to disk before the changed page! Ti commits/aborts: a log record indicating this action. Log records chained together by Xact id, so it s easy to undo a specific Xact (e.g., to resolve a deadlock). Log is often archived on stable storage. All log related activities (and in fact, all CC related activities such as lock/unlock, dealing with deadlocks etc.) are handled transparently by the DBMS. Databases make these folks happy... End users and DBMS vendors DB application programmers E.g. smart webmasters Database administrator (DBA) Designs logical /physical schemas Handles security and authorization Data availability, crash recovery Database tuning as needs evolve Must understand how a DBMS works! 33 34 Why Study the Relational Model? Most widely used model. Vendors: IBM (Informix), Microsoft, Oracle, Sybase, etc. Competitors Earlier Competitor: Object-Oriented model ObjectStore, Versant, Ontos a synthesis: object-relational model Legacy systems in older models e.g., IBM s IMS Network models are not used today Informix Universal Server, UniSQL, O2, Oracle More recently, NoSQL systems Based on different data models Key-store, document store, graph Tradeoffs in functionality ACID vs. CAP (or BASE) 35 More recent competitor: Map/reduce, NoSQL Works on unstructured data Useful for one-time processing as opposed to data management Overhead is low, however, programming level is high Less expensive, use of commodity machines!! 36 9

Relational Database: Definitions Example Instance of Students Relation Relational database: a set of relations. Relation: made up of 2 parts: Instance : a table, with rows and columns. #rows = cardinality, #fields = degree / arity Schema : specifies name of relation, plus name and type of each column. E.g. Students(sid: string, name: string, login: string, age: integer, gpa: real) Can think of a relation as a set of rows or tuples. (i.e., all rows are distinct) sid name login age gpa 53666 Jones jones@cs 18 3.4 53688 Smith smith@eecs 18 3.2 53650 Smith smith@math 19 3.8 Cardinality = 3, degree = 5, all rows distinct Do all columns in a relation instance have to be distinct? 37 38 Creating Relations in SQL Adding and Deleting Tuples Creates the Students relation. Observe that the type (domain) of each field is specified, and enforced by the DBMS whenever tuples are added or modified. As another example, the Enrolled table holds information about courses that students take. CREATE TABLE Students (sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL) CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2)) 39 Can insert a single tuple using: INSERT INTO Students (sid, name, login, age, gpa) VALUES ( 53688, Smith, smith@ee, 18, 3.2) Can delete all tuples satisfying some condition (e.g., name = Smith): DELETE FROM Students S WHERE S.name = Smith Powerful variants of these commands are available; more later! 40 10

Integrity Constraints (ICs) IC: condition that must be true for any instance of the database; e.g., domain constraints. ICs are specified when schema is defined. ICs are checked when relations are modified. A legal instance of a relation is one that satisfies all specified ICs. DBMS should not allow illegal instances. If the DBMS checks ICs, stored data is more faithful to real-world meaning. Avoids data entry errors, too! Relational Query Languages A major strength of the relational model: supports simple, powerful querying of data. Queries can be written intuitively, and the DBMS is responsible for efficient evaluation. The key: precise semantics for relational queries. Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change. 41 42 The SQL Query Language Relational Model: Summary The most widely used relational query language. Current standard is SQL-2011. To find all 18 year old students, we can write: SELECT * FROM Students S WHERE S.age=18 To find just names and logins, replace the first line: SELECT S.name, S.login sid name login age gpa 53666 Jones jones@cs 18 3.4 53688 Smith smith@ee 18 3.2 A tabular representation of data. Simple and intuitive, currently the most widely used. Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations. Two important ICs: primary and foreign keys In addition, we always have domain constraints. Powerful and natural query languages exist. 43 44 11

Relational Model: Summary (contd.) ACID properties are important Atomicity, consistency, isolation, and durability Concurrency control and Recovery together guarantee ACID properties Other data models The older hierarchical and network models are not used much (although they are in use) However, several non-relational data models have been proposed and currently used in specific contexts They are called NoSQL DBMSs Key-value store Document graph 45 46 Thank You! Sharma Chakravarthy 47 47 12