Cleveland State University

Similar documents
Cleveland State University

Cleveland State University

Cleveland State University

CIS 408 Internet Computing (3-0-3)

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

Cleveland State University

Cleveland State University

Course and Contact Information. Course Description. Course Objectives

San José State University Science/Computer Science Database Management System I

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS ADVANCED DATABASE MANAGEMENT SYSTEMS CSIT 2550

San José State University Computer Science Department CS157A: Introduction to Database Management Systems Sections 5 and 6, Fall 2015

IS Spring 2018 Database Design, Management and Applications

CAS CS 460/660 Introduction to Database Systems. Fall

Course and Contact Information. Course Description. Course Objectives

Course Syllabus - CNT 4703 Design and Implementation of Computer Communication Networks Fall 2011

Textbook(s) and other required material: Raghu Ramakrishnan & Johannes Gehrke, Database Management Systems, Third edition, McGraw Hill, 2003.

IS 331-Fall 2017 Database Design, Management and Applications

CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung

CASPER COLLEGE COURSE SYLLABUS MSFT 1600 Managing Microsoft Exchange Server 2003 Semester/Year: Fall 2007

CSC 261/461 Database Systems. Fall 2017 MW 12:30 pm 1:45 pm CSB 601

Course and Contact Information. Catalog Description. Course Objectives

Database Systems: Concepts, design, and implementation ISE 382 (3 Units)

CPS352 - DATABASE SYSTEMS. Professor: Russell C. Bjork Spring semester, Office: KOSC 242 x4377

TEACHING & ASSESSMENT PLAN

Database Management Systems CS Spring 2017

San José State University Computer Science Department CS49J, Section 3, Programming in Java, Fall 2015

CPS352 Database Systems Syllabus Fall 2012

Database Systems (INFR10070) Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

CS157a Fall 2018 Sec3 Home Page/Syllabus

CS 525 Advanced Database Organization - Spring 2017 Mon + Wed 1:50-3:05 PM, Room: Stuart Building 111

San Jose State University College of Science Department of Computer Science CS185C, Introduction to NoSQL databases, Spring 2017

Meetings This class meets on Mondays from 6:20 PM to 9:05 PM in CIS Room 1034 (in class delivery of instruction).

Fundamentals of Database Systems

CMPSCI 645 Database Design & Implementation

LSC 740 Database Management Fall 2008

Course: Database Management Systems. Lê Thị Bảo Thu

9/8/2018. Prerequisites. Grading. People & Contact Information. Textbooks. Course Info. CS430/630 Database Management Systems Fall 2018

BIG DATA COURSE CONTENT

TBD TA Office hours: Will be posted on elearning. SLO3: Students will demonstrate competency in data modeling, including dimensional modeling.

San José State University College of Science / Department of Computer Science Introduction to Database Management Systems, CS157A-3-4, Fall 2017

INSTITUTE OF AERONAUTICAL ENGINEERING

Rochester Institute of Technology Golisano College of Computing and Information Sciences Department of Information Sciences and Technologies

INST Database Design and Modeling - Section 0101 Spring Tentative Syllabus

ADVANCED DATABASES ; Spring 2015 Prof. Sang-goo Lee (11:00pm: Mon & Wed: Room ) Advanced DB Copyright by S.-g.

15-415: Database Applications School of Computer Science Carnegie Mellon University, Qatar Fall 2016

15-415: Database Applications School of Computer Science Carnegie Mellon University, Qatar Spring 2014

ITP489 In-Memory DBMS for Real Time Analytics

CPS352 - DATABASE SYSTEMS. Professor: Russell C. Bjork Spring semester, Office: KOSC 242 x4377

Database Systems Management

Web Programming Fall 2011

Implementing and Maintaining Microsoft SQL Server 2005 Analysis Services

CS430/630 Database Management Systems Spring, Betty O Neil University of Massachusetts at Boston

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS LINUX SYSTEM ADMINISTRATION CSIT 2411

SULTAN QABOOS UNIVERSITY COURSE OUTLINE PROGRAM: B.Sc. in Computer Science. Laboratory (Practical) Field or Work Placement

MSIS-DL 317 Syllabus. Faisal Akkawi, Ph.D. Introduction to Databases Fall 09

6234A - Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Fall Principles of Knowledge Discovery in Databases. University of Alberta

Arindrajit Roy; Office hours:

Avi Silberschatz, Henry F. Korth, S. Sudarshan, Database System Concept, McGraw- Hill, ISBN , 6th edition.

CMPUT 391 Database Management Systems. Fall Semester 2006, Section A1, Dr. Jörg Sander. Introduction

Advanced Relational Database Management MISM Course F A Fall 2017 Carnegie Mellon University

CS 241 Data Organization using C

TITLE OF COURSE SYLLABUS, SEMESTER, YEAR

INF 315E Introduction to Databases School of Information Fall 2015

INF 551: Overview of Data Informatics in Large Data Environments Section: 32405D Spring 2017 (4 units), 3:30 5:20 PM, MW, SOS B44

COWLEY COLLEGE & Area Vocational Technical School

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS LINUX ADVANCED SYSTEM & NETWORK ADMINISTRATION CSIT 2475

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #1: Introduction

Advanced Topics in Database Systems Spring 2016

CS634 Architecture of Database Systems Spring Elizabeth (Betty) O Neil University of Massachusetts at Boston

CS 200, Section 1, Programming I, Fall 2017 College of Arts & Sciences Syllabus

Specific Objectives Contents Teaching Hours 4 the basic concepts 1.1 Concepts of Relational Databases

Course Web Site. 445 Staff and Mailing Lists. Textbook. Databases and DBMS s. Outline. CMPSCI445: Information Systems. Yanlei Diao and Haopeng Zhang

San Jose State University College of Science Department of Computer Science CS185C, NoSQL Database Systems, Section 1, Spring 2018

INFS 2150 (Section A) Fall 2018

Developing SQL Data Models

CMPSCI445: Information Systems

CPSC 5157G Computer Networks

In this course, you need to use Pearson etext. Go to "Pearson etext and Video Notes".

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS 3D MODELING & ANIMATION VPT 2165

COMP-421: Database Systems. Joseph D silva McConnel Engg. 102

CSE 341. Database Systems, Algorithms and Application s Spring 2017 (Jan 17, 2017) CHECK ON PIAZZA FOR UPDATES DURING THE SEMESTER!!!!!!!

MWF 9:00-9:50AM & 12:00-12:50PM (ET)

Lecture Notes CPSC 321 (Fall 2018) Today... Survey. Course Overview. Homework. HW1 (out) S. Bowers 1 of 8

Database Security MET CS 674 On-Campus/Blended

programming exercises.

Advanced Relational Database Management MISM Course S A3 Spring 2019 Carnegie Mellon University

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS ROUTING/SWITCH CONFIGURATION CSIT 2750

WAYLAND BAPTIST UNIVERSITY VIRTUAL CAMPUS SCHOOL OF BUSINESS SYLLABUS

COMP-421: Database Systems. Joseph D silva McConnel Engg. 102

Course Outline Faculty of Computing and Information Technology

15CS53: DATABASE MANAGEMENT SYSTEM

Outline. Databases and DBMS s. Recent Database Applications. Earlier Database Applications. CMPSCI445: Information Systems.

In this course, you need to use Pearson etext. Go to "Pearson etext and Video Notes".

CS 405G: Introduction to Database Systems. Lecture 1: Introduction

CMSC Introduction to Database Systems

CMPT 354 Database Systems I. Spring 2012 Instructor: Hassan Khosravi

Red Hat Certified Engineer (RH300) 50 Cragwood Rd, Suite 350 South Plainfield, NJ 07080

SCHEME OF TEACHING AND EXAMINATION B.E. (ISE) VIII SEMESTER (ACADEMIC YEAR )

Concepts Of Database Management 7th Edition Solution Manual

Transcription:

Cleveland State University CIS 611/711 Enterprise Databases and Data Warehouse (3-0-3) Prerequisites: CIS430/CIS 530 Instructor: Dr. Sunnie S. Chung Office Location: FH222 Phone: 216 687 4661 Email: sschung.cis@gmail.com s.chung@csuohio.edu Webpage: http://eecs.csuohio.edu/~sschung Office Time: Mon 4:00 6:00 PM, Tues 2:00 4:00 PM (and by appointment) Class Location: MC 0305 Catalog Description: Detailed study of modern enterprise level database systems and their applications for decision support and data analytics systems. The course presents theoretical and practical approaches to logical and physical database system design with normalization theory, elimination of update anomalies, lossless joins, and dependency preserving decompositions. The course focuses on query processing, query execution techniques, and query optimization strategies of modern relational database systems. The course extends the study to design and implementation of applications of modern enterprise database systems for data analytics with Parallel Data Warehouse (PDW) and OLAP. It continues an exploration on integrated big data management systems with PDW and non-relational database systems in current enterprise database systems. Finally, the course presents current database research topics and the selective papers. Key Concepts: Modern enterprise databases, logical database design theory, normalization theory, functional dependencies, elimination of update anomalies, lossless joins, and dependency preserving decompositions, physical database system, database file system, index, query processing and execution techniques, analysis of query processing cost, query optimization and performance analysis, query rewrite optimization, data warehouse, OLAP, decision support system, data analytics system, architecture of parallel data warehouse (PDW), integrated big data management systems. Expected Outcomes: Upon successful completion of the course, the student will be able to: Create -or reengineer- well designed relational databases mostly free from redundancy and abnormal update syndromes; Design physical database with comprehensive knowledge of performance optimization and index; Develop a query optimizer with query processing strategies, execution techniques for practical database problems; Understand complex query optimization strategies of enterprise level modern database systems and develop new optimization strategies to solve practical database problems; Analyze database system performance; Develop enterprise level database applications with Parallel Data Warehouse and OLAP for practical data analytics problems; Exposed to recent database research, in particular, integrated big data management systems with Parallel Data Warehouse, Columnar databases and Hadoop based NoSQL systems. List of Required Materials: 1. Visual Studio 2013 or higher 2. SQL Server 2014 or higher 3. Microsoft SQL Server Data Tools (for Analysis Service) - Business Intelligence (SSDT BI) for Visual Studio 2014 or higher 4. Adventure Works (a Sample Data Warehouse) for SQL Server 2012/2014 or higher

They are available at the Microsoft Academic Alliance program: http://e5.onthehub.com/webstore/productsbymajorversionlist.aspx?ws=31b9929b-c09b-e011-969d-0030487d8897 Text: 1. Selective from Database Research Literatures 2. "Fundamentals of Database Systems". Elmasri / Navathe. 7 th Edision. Addison/Wesley Pub Co. ISBN-13: 978-0133970777 ISBN-10: 0133970779 3. Database Management Systems 3rd Edition by Raghu Ramakrishnan and Johannes Gehrke. Ed. McGraw-Hill. Available at http://pages.cs.wisc.edu/~dbbook/openaccess/thirdedition/supporting_ material.htm 4. Data Mining Concepts and Techniques Jiawei Han / Micheline Kamber, Morgan Kaufmann Publishers, (2011) ISBN-13: 978-0123814791 ISBN-10: 0123814790 Edition: 3 rd Supplement Text: 1. The Theory of Relational Databases, D. Maier, Ed. Comp. Sc. Press, 1983 ISBN 0-914894-42-0. We have obtained the author s permission to use electronic copies of his book. A PDF version of the book is included in the ACM SIGMOD Anthology Vol. 4 Issue 1. Fall 2000 (http://www.sigmod.org/sigmod/dblp/db/about/cd4-1.html ) The book is also available from Dr. J. Freytag web site (Humboldt Universitat, Berlin) at http://www.dbis.informatik.hu-berlin.de/~freytag/maier Class Web Page: http://eecs.csuohio.edu/~sschung/cis611/cis611.html Official Calendar Please consult the university page at: http://www.csuohio.edu/enrollmentservices/registrar/calendar/index.html Final exam: Tues Dec 12 4:00-6:00 PM. Grading: The course grade is based on a student's overall performance through the entire Semester. The final grade is distributed among the following components: 1. Exams (Midterm & Final) 40% (15% for Midterm, 25% for Final) 2. Computer Labs 30% (about 3-4 Assignments) 3. 1 Project 20% (2 person group project): Project Specifications in detail will be given in class 4. Research Topic Presentation: 10% I reserve the right to change the weighting and the number of assignments. Additional Requirements for CIS711 Doctoral Students: Doctoral students who take CIS711 must select a project to work on. Doctoral students who take CIS711 must work on the project individually (instead of 2 person group) The list of projects and research papers for doctoral students will be given separately in class. A tentative example of the selection of the research projects and the paper list are given at the end of the course schedule here. In each exam, one additional problem is designed to be completed by doctoral students only The following grading scale will be used to calculate final grades (subject to curving if class grades on exams are substantially below expected)

A 93% + A: Outstanding (student's performance is genuinely excellent) A- 90% - 92.9% B + 87% - 89.9% B 80% - 86.9% B: Student's performance is satisfied for every course requirements and acceptable but not necessarily distinguishable B- 78% - 79.9% C 70% - 77.9% C: Student's performance is not satisfied for every course requirement and is not acceptable to pass F <70% F: Failure Grading may vary depending on the overall class average with curve and weight. For exams, most of the questions will be problem solving, analytical and descriptive problems. Student Conduct: Students are expected to do their own work. Academic misconduct, student misconduct, cheating and plagiarism will not be tolerated. Violations will be subject to disciplinary action as specified in the CSU Student Conduct Code. A copy can be obtained on the web page at: http://www.csuohio.edu/studentlife/studentcodeofconduct.pdf or by contacting Valerie Hinton Hannah, Judicial Affairs Officer in the Department of Student Life (MC 106 email v.hintonhannah@csuohio.edu ). For more information consult the following web page CSU Judicial Affairs available at http://www.csuohio.edu/studentlife/judicial-affairs Tentative Course Schedule: The schedule of topics to be covered is given below. The schedule and topics covered are tentative. They may vary depending upon the progress made. Week of Topic Reading 1 2 Architecture of Modern Enterprise Database Systems Advanced SQL Overview of SQL Query Processing. View Processing The relational model of data. Attributes and Atomic domains, Key and Referential Integrity rules Relational Algebra, Relational Tuple Calculus (The Design and Implementation of INGRES. Stonebraker et al. ACM Transactions on Database Systems, Vol. 1, No. 3, September 1976, Pages 189-222). Query Execution Steps 3-4 Database Design Normalization Theory. Database and Database Schemes. Functional Dependencies and Normalization for Relational Databases. Redundancy and Abnormal behavior. Normal Forms, 1NF, 2NF, 3NF, BCNF. Attribute Closure, XClosure Algorithm, Covers for Functional Dependencies. Non-redundant Covers. Inference Rules. Lossless-Join decompositions. Decompositions that Preserve Dependencies. Elmasri. Chp. 1-9, Elmasri 15 Maier - Chp. 4 5-6 File Structure, Disk Storage Disk Access Fundamental Index: Primary, Secondary, Clustered, Multi-Level Index B/B+ Tree 7-8 External Sorting External Hashing Techniques Access Path 9-10 Query Processing Techniques Query Execution Steps Elmasri Chap15, 16 Maier - Chap. 5, 6 Elmasri 17, 18 Ramakrishnan Chap 4, 7 Ramakrishnan 11, 12, 14.

Evaluation of Relational Operators: Projection, Join Types, Group-By, Aggregation Join Algorithms and Analysis SQL Query Processing Cost Query Optimization Concept and Techniques 11-12 Advanced Query Optimization techniques for Complex Queries with: Advanced Join Types Correlated Subquery Processing with/without Aggregation Aggregation with Group By (Partitioned Group By) Query Rewrites Elmasri 19, 20, 21 Ramakrishnan Chap 11, 12 P. Seshadri, H. Pirahesh, and T. Leung. Complex Query Decorrelation. ICDE, pages 450-458, 1996. C. Zuzarte, H. Pirahesh, W. Ma, Q. Cheng, L. Liu, K. Wong. WinMagic: Subquery Elimination using Window Aggregation. SIGMOD, pages 652-656, 2003. C. Galindo-Legaria and M. Joshi. Orthogonal Optimization of Subqueries and Aggregation. SIGMOD, pages 571-581, 2001. R. Ahmed, A. Lee, A. Witkowski, D. Das, H. Su, M. Zait, and T. Cruanes. Cost-based Query Transformation in Oracle. VLDB, pages 1026-1036, 2006. S. Bellamkonda, R. Ahmed, A. Witkowski, A. Amor, M. Zait, and C. Lin. Enhanced Subquery Optimizations in Oracle. VLDB, pages 1366-1377, 2009. [CS94] S. Chaudhuri and K. Shim, Including Group-By in Query Optimization, in the Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994. [CS96] S. Chaudhuri and K. Shim, Optimizing Queries with Aggregate Views, in the Proceedings of EDBT 1996. Design Note for Correlated Subqueries and Exists Predicate in DB2 IBM (DN-2208-02) 13 Data Warehouse and On Line Analytical Processing (OLAP) Multi-Dimensional Data Warehouse Design OLAP Aggregation Operators: Cube, Roll Up, Drill Down Implementation of Data Warehouse and OLAP J. Han - Chap 2. Listed papers. An Overview of Data Warehousing and OLAP Technology by Surajit Chaudhuri (Microsoft) and Umeshwar Dayal (HP Labs), in the proceedings of IEEE 1995 Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross Tab, and SubTotals by Jim Gray (Microsoft), et al, in the proceedings of IEEE 1996 14 Data Mining with DW - Building Data Analytics Applications using DW and OLAP - MDX, DMX Integrated Big Data Management Systems 15-16 Presentation of Significant Database Industry Research Papers on Massively Parallel Processing (MPP) Systems, Big Data Processing with PDW, Columnar databases, Cloud Computing, and more: List of Selected Papers will be given in class. Selected Papers Tentative Technical Presentation Topics: (It may vary every year) Select one and prepare a 25 min talk on the subject. Semistructured/Unstructured Data Processing using Structured Data Model

o FaceBook o EBay Parallel Computing for Big Data Processing: Parallel Data Warehouse (PDW) Columnar Databases : SAP Hana Hadoop based NoSQL Systems : Apache Hadoop, Pig Latin, Hbase, Hive, MongoDB Lammal, Ralf. Google's MapReduce Programming Model Revisited. Integrated Big Data Processing Systems with PDW and NoSql Systems Information Retrieval: How Google Search Engine Works Cloud Computing Examples of Research Topics and Paper list are listed below. More New Topic and Paper Selections will be given in class. Tentative List of Research Papers and Projects for CIS 711 Doctoral Students: CIS 711 Doctoral Students should choose one of the following research topics and give a 30 min presentation on the papers (will be given in class) and complete a project related to the subjects. Paper List and Project Specification on each research topic below will be given in class. Examples of Selective Current Database Research Topics in Modern Enterprise Database Systems: 1. Integrated Big Data Processing Systems Orca: A Modular Query Optimizer Architecture for Big Data Petabyte Scale Databases and Storage Systems Deployed at Facebook Integrating Hadoop and parallel DBMS 2. Big Data Processing System with Parallel Data Warehouse (PDW) Query Optimization in Microsoft SQL Server Parallel Data Warehouse 3. Information Retrieval System: Semantic Content Based Approaches http://eecs.csuohio.edu/~sschung/ist734/googlebigtable-osdi06.pdf http://eecs.csuohio.edu/~sschung/ist734/mapreduce-osdi04.pdf 4. Enterprise Big Data Processing System with Cloud: Google Cloud, Amazon Cloud, Microsoft Azure 5. Enterprise Big Data Processing System with Parallel Data Warehouse (PDW) on Cloud http://eecs.csuohio.edu/~sschung/ist734/cloudvista_huiqixu_vldb2012.pdf http://eecs.csuohio.edu/~sschung/ist734/azuresqlmicrosoft_sigmod2010-campbell http://eecs.csuohio.edu/~sschung/ist734/azure2_microsoft_ieee2011.pdf 6. Columnar Databases A Storage Advisor for Hybrid Store Databases by SAP Efficient Transaction Processing in SAP HANA Database--The End of a Column Store Myth SAP Course policy (1) Exams 1. All exams are closed books and closed notes. 2. No makeup exams will be given!

3. Examination Policy: Students are allowed to bring to the Final a summary page (standard letter size) with their own notes. During the exams: (1) the use of books, cell phones, calculators, or any electronic devices is prohibited, and (2) students must not share any materials. (2) Homework assignments 1. All homework assignments are due at the beginning of class on the specified date. An assignment turned in one day late will get a 10% penalty, two days late will get a 20% penalty, etc. Assignments turned in after the beginning of class on the due date will be counted as one day late and will receive a 10% penalty. 2. All homework assignments will be accepted with a 25% grade penalty for up to a week and then not accepted at all. 3. All assignments must be individually and independently completed. Should two or more students turn in substantially the same solution or program, in the judgment of the instructor, the solution will be considered a group effort. All involved in a group effort homework will receive a zero grade for that assignment. A student turning in a group effort assignment more than once will automatically receive an F grade for the course. 4. No late assignment will be accepted after the assignment is graded and returned. ADA Adherence: If you need course adaptations or accommodations because of a disability, if you have emergency medical information to share with me, or if you need special arrangements in case the building must be evacuated, please make an appointment with me as soon as possible. My office location and hours are listed on top of this syllabus. If you need further information, please contact the Office of Disability Services (Main Classroom 147), phone number 216.687.2015, on the web at http://www.csuohio.edu/offices/disability/