Cleveland State University

Similar documents
Cleveland State University

Cleveland State University

Cleveland State University

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

CIS 408 Internet Computing (3-0-3)

Cleveland State University

Cleveland State University

Course Syllabus - CNT 4703 Design and Implementation of Computer Communication Networks Fall 2011

Textbook(s) and other required material: Raghu Ramakrishnan & Johannes Gehrke, Database Management Systems, Third edition, McGraw Hill, 2003.

Course and Contact Information. Course Description. Course Objectives

CASPER COLLEGE COURSE SYLLABUS MSFT 1600 Managing Microsoft Exchange Server 2003 Semester/Year: Fall 2007

IS Spring 2018 Database Design, Management and Applications

Meetings This class meets on Mondays from 6:20 PM to 9:05 PM in CIS Room 1034 (in class delivery of instruction).

IS 331-Fall 2017 Database Design, Management and Applications

ITP489 In-Memory DBMS for Real Time Analytics

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS ADVANCED DATABASE MANAGEMENT SYSTEMS CSIT 2550

CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung

CSC 261/461 Database Systems. Fall 2017 MW 12:30 pm 1:45 pm CSB 601

Course and Contact Information. Course Description. Course Objectives

When does RDBMS representation make sense When do other representations make sense. Prerequisites: CS 450/550 Database Concepts

Murach's HTML and CSS3 3 rd Edition By Boehm, Anne Fresno, Calif Publisher: Mike Murach & Associates, 2015 ISBN-13:

San José State University Computer Science Department CS157A: Introduction to Database Management Systems Sections 5 and 6, Fall 2015

BIG DATA COURSE CONTENT

Course and Contact Information. Catalog Description. Course Objectives

Fundamentals of Database Systems

Database Systems: Concepts, design, and implementation ISE 382 (3 Units)

INF 551: Overview of Data Informatics in Large Data Environments Section: 32405D Spring 2017 (4 units), 3:30 5:20 PM, MW, SOS B44

Advanced Relational Database Management MISM Course F A Fall 2017 Carnegie Mellon University

Syllabus Revised 08/21/17

San José State University College of Science / Department of Computer Science Introduction to Database Management Systems, CS157A-3-4, Fall 2017

Microsoft Big Data and Hadoop

Web Programming Fall 2011

CS157a Fall 2018 Sec3 Home Page/Syllabus

LSC 740 Database Management Fall 2008

Introduction to Data Management CSE 344. Lecture 1: Introduction

TEACHING & ASSESSMENT PLAN

Database Design and Management - BADM 352 Fall 2009 Syllabus and Schedule

02 Hr/week. Theory Marks. Internal assessment. Avg. of 2 Tests

Murach's HTML and CSS3 3 rd Edition By Boehm, Anne Fresno, Calif Publisher: Mike Murach & Associates, 2015 ISBN-13:

Database Management Systems CS Spring 2017

Fall Principles of Knowledge Discovery in Databases. University of Alberta

TBD TA Office hours: Will be posted on elearning. SLO3: Students will demonstrate competency in data modeling, including dimensional modeling.

15-415: Database Applications School of Computer Science Carnegie Mellon University, Qatar Fall 2016

Advanced Relational Database Management MISM Course S A3 Spring 2019 Carnegie Mellon University

IT 341 Fall 2017 Syllabus. Department of Information Sciences and Technology Volgenau School of Engineering George Mason University

SCHEME OF TEACHING AND EXAMINATION B.E. (ISE) VIII SEMESTER (ACADEMIC YEAR )

Specific Objectives Contents Teaching Hours 4 the basic concepts 1.1 Concepts of Relational Databases

Database Systems (INFR10070) Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

CSC 407 Database System I COURSE PARTICULARS COURSE INSTRUCTORS COURSE DESCRIPTION

Avi Silberschatz, Henry F. Korth, S. Sudarshan, Database System Concept, McGraw- Hill, ISBN , 6th edition.

ITSC 1319 INTERNET/WEB PAGE DEVELOPMENT SYLLABUS

CMPUT 391 Database Management Systems. Fall Semester 2006, Section A1, Dr. Jörg Sander. Introduction

Mastering Data Warehouse Aggregates Solutions For Star Schema Performance

Your New App. Motivation. Data Management is Universal. Staff. Introduction to Data Management (Database Systems) CSE 414. Lecture 1: Introduction

CS 525 Advanced Database Organization - Spring 2017 Mon + Wed 1:50-3:05 PM, Room: Stuart Building 111

INF 315E Introduction to Databases School of Information Fall 2015

CPS352 Database Systems Syllabus Fall 2012

NEW YORK CITY COLLEGE OF TECHNOLOGY COMPUTER SYSTEMS TECHNOLOGY DEPARTMENT CST4714 DATABASE ADMINISTRATION (2 class hours, 2 lab hours, 3 credits)

Division of Engineering, Computer Programming, and Technology

Syllabus Revised 08/15/2018

CMPT 354: Database System I. Lecture 1. Course Introduction

Stages of Data Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #1: Introduction

INST Database Design and Modeling - Section 0101 Spring Tentative Syllabus

TITLE OF COURSE SYLLABUS, SEMESTER, YEAR

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Philadelphia University Faculty of Information Technology Department of Computer Science --- Semester, 2007/2008. Course Syllabus

CMSC Introduction to Database Systems

Course specification

CISN 340 Data Communication and Networking Fundamentals Fall 2012 (Hybrid)

San Jose State University College of Science Department of Computer Science CS185C, Introduction to NoSQL databases, Spring 2017

San Jose State University College of Science Department of Computer Science CS185C, NoSQL Database Systems, Section 1, Spring 2018

New Approaches to Big Data Processing and Analytics

Gerlinde Brady Phone: Office Hours: see Web at:

Outline. Databases and DBMS s. Recent Database Applications. Earlier Database Applications. CMPSCI445: Information Systems.

Rochester Institute of Technology Golisano College of Computing and Information Sciences Department of Information Sciences and Technologies

Course: Database Management Systems. Lê Thị Bảo Thu

Arindrajit Roy; Office hours:

San José State University Science/Computer Science Database Management System I

CS 564: DATABASE MANAGEMENT SYSTEMS

CoSci 487 SYLLABUS Introduction to Networks

South Portland, Maine Computer Information Security

CASPER COLLEGE COURSE SYLLABUS BIOL 1000, Introduction to Biology I

GET 433 Course Syllabus Spring 2017

CMPSCI 645 Database Design & Implementation

CPS352 - DATABASE SYSTEMS. Professor: Russell C. Bjork Spring semester, Office: KOSC 242 x4377

DATABASE MANAGEMENT SYSTEMS

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

CMPS 182: Introduction to Database Management Systems. Instructor: David Martin TA: Avi Kaushik. Syllabus

9/8/2018. Prerequisites. Grading. People & Contact Information. Textbooks. Course Info. CS430/630 Database Management Systems Fall 2018

Computer Science Department

CAS CS 460/660 Introduction to Database Systems. Fall

Fundamentals of Computer Science CSCI 136 Syllabus Fall 2018

Course specification STAFFING OTHER REQUISITES RATIONALE SYNOPSIS. The University of Southern Queensland

ITP454 Enterprise Resource Planning, Design, and Implementation

Cleveland State University Department of Electrical and Computer Engineering. CIS 408: Internet Computing

Introduction to Database S ystems Systems CSE 444 Lecture 1 Introduction CSE Summer

Dealing with Data Especially Big Data

CPSC 5157G Computer Networks

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Transcription:

Cleveland State University CIS 612 Modern Database Programming & Big Data Processing (3-0-3) Fall 2014 Section 50 Class Nbr. 2670. Tues, Thur 4:00 5:15 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred. Instructor: Dr. Sunnie S. Chung Office Location: BU346 Phone: 216 687 4732 Email: sschung.cis@gmail.com Webpage: http://grail.csuohio.edu/~sschung Office Time: Tues, Thurs 1:30 3:30 PM and 5:15 5:45 PM (or by appointment) Class Location: BU 0207 Section 50 Tue & Thu 4:00-5:15 PM Catalog Description: Detailed study of modern database programming and non-relational database systems for big data processing. First, the course presents practical approaches to modern database programming and its applications. The course advances with study of semi-structured/non-structured databases, XML data processing, XPATH, and XQuery. The course continues study of advanced features of modern databases with data warehouse and OLAP, data mining algorithms and applications. The course extends study with architectures and features of modern parallel computing systems for big data processing Parallel Data Warehouse (PDW) with OLAP and Map Reduce with Hadoop. It continues an exploration of applications and tools of big data processing systems with HIVE, HBase, PigLatin, Mongo DB, VoltDB, and SQL/NoSQL. Finally, the course will explore the most significant database industry trends for big data processing. Key Concepts: Modern Database Programming, Web Data Processing, Semi-structured databases, XML data processing, XPath, XQuery, Unstructured data processing, Google s Big Table, Parallel Data Warehouse (PDW), OLAP Cube, Data Mining Algorithms, Business Analytics, Big Data Processing, Map Reduce, Hadoop, Hive, HBase, PigLatin, ACID, MongoDB, VoltDB, SQL vs NoSQL, and Cloud Computing. Expected Outcomes: Upon successful completion of the course, the student will be able to create modern database applications to process non-traditional data such as web data, semi-structured/unstructured data, and XML data. The student will be able to design data warehouse, create OLAP Cubes and write analytical OLAP queries. The student further advances skills for modern database applications by obtaining data mining concepts and techniques to create data mining applications with data warehouse and OLAP cubes. More importantly, the student will obtain comprehensive knowledge and analytical skills to build an infrastructure for a big data processing system and its applications by understandings the problems of big data processing and its well known approaches in the modern database systems. The student will further extend knowledge with the architectures and the features of modern parallel computing systems for big data processing PDW, Map Reduce and Hadoop. The student will be exposed to major tools to create Map Reduce applications for big data processing systems and further exposed to the most significant database industry research trends for big data processing. List of Required Materials: Any RDBMS: Oracle Database 11g or higher - This is available at http://www.oracle.com/us/downloads/index.html Microsoft SQL Server 2012, Microsoft Visual Studio 2012 or any higher Microsoft SQL Server Business Intelligence Data Analytic Tool 2012 OLAP Server and SQL Server Data Tool They are available at the Microsoft Academic Alliance program: http://e5.onthehub.com/webstore/productsbymajorversionlist.aspx?ws=31b9929b-c09b-e011-969d-0030487d8897

Adventure Works 2012 Data Warehouse Database for SQL Server 2012 Will be directed in class to get this Text Book: 1. Data Mining Concepts and Techniques Jiawei Han / Micheline Kamber, Morgan Kaufmann Publishers, (2001) ISBN 1-55860-489-8 2. List of Selected Database Industry Research Papers on Big Data Processing and Business Analytics will be given in class Supplement Text Book: 1. Data Mining with Microsoft SQL Server BI Data Tool (2008 or 2012), Jamie MacLennan, Bogdan Crivat, Publisher: Wiley; 1 Edition ISBN-10: 0470277742 ISBN-13: 978-0470277744 Edition: 1 2. Database Management Systems 3rd Edition by Raghu Ramakrishnan and Johannes Gehrke. Ed. McGraw-Hill. Available at http://pages.cs.wisc.edu/~dbbook/openaccess/thirdedition/supporting_ material.htm 3. "Fundamentals of Database Systems". Elmasri / Navathe. Ed. Addison/Wesley Pub Co. 6th Edition, (2007). ISBN-10: 0321369572 Official Calendar Please consult the page http://www.csuohio.edu/enrollmentservices/registrar/calendar/index.html Final exam: Tues, Dec 10 4:00-6:00 PM. Term begins Aug 26 First Weekday Class Aug 26 Last Day to Add Sep 1 Last Day to Drop Sep 6 Last Day to Withdraw Nov 1 Last Day of Classes Dec 6 Final Exams Dec 9-14 Commencement Dec 16 Fall Incomplete Deadline May 3 Labor Day (University Holiday) Sep 2 Columbus Day (University Holiday) Oct 14 Veterans Day (Tuesday no classes) Nov 12 Thanksgiving Recess Nov 28 Dec 1 Grading: The course grade is based on a student's overall performance through the entire Semester. The final grade is distributed among the following components: 1. Exams (Exam 1, 2 & Final) 55% (15% each Exam, 25% Final) 2. Computer Projects 35% (about 4-5 lab assignments) 3. Research Topic Presentation: 10% A 94% + A: Outstanding (student's performance is genuinely excellent) A- 90% - 93% B+ 87% - 89% B 80% - 86% B: Very Good (student's performance is clearly commendable but not necessarily outstanding) B- 70% - 79% C <70% C: Good (student's performance meets every course requirement and is acceptable; not distinguished) D F < 60% D: Below Average (student's performance fails to meet course objectives and standards) F: Failure (student's performance is unacceptable)

Examination Policy: Students are allowed to bring to the tests a summary page (standard letter size) with their own notes. During the exams: (1) the use of books, cell phones, calculators, or any electronic devices is prohibited, and (2) students must not share any materials. Make-Up Exam Policy: No makeup exams will be given unless notified and agreed to in advance. Requests will be considered only in case of exceptional demonstrated need. Homework Policy: The students are expected to attend all classes. The students are responsible for collecting the notes, handouts and any other course material distributed during the class period. All assignments must be individually and independently completed and must represent the effort of the student turning in the assignment. Should two or more students turn in substantially the same solution or output, in the judgment of the instructor, the solution will be considered group effort. All involved in group effort homework will receive a zero grade for that assignment. A student turning in a group effort assignment more than once will automatically receive an F grade for the course. Late Assignment: All lab assignments are due at the beginning of class on the date specified. Laboratory Assignments handed in after the class has begun will be accepted with a 25% grade penalty for up to a week and then not accepted at all. All laboratory assignments must be completed. Failure to do so will lower your course grade one additional letter grade. Student Conduct: Students are expected to do their own work. Academic misconduct, student misconduct, cheating and plagiarism will not be tolerated. Violations will be subject to disciplinary action as specified in the CSU Student Conduct Code. A copy can be obtained on the web page at: http://www.csuohio.edu/studentlife/studentcodeofconduct.pdf or by contacting Valerie Hinton Hannah, Judicial Affairs Officer in the Department of Student Life (MC 106 email v.hintonhannah@csuohio.edu ). For more information consult the following web page CSU Judicial Affairs available at http://www.csuohio.edu/studentlife/jaffairs/faq.html Course Schedule: The schedule of topics and their order of coverage is given below. The schedule and topics to be covered may vary depending upon the progress made. Week of Topic Reading 1,2 Review: Database Foundations DBMS Architecture, SQL Query Processing Complex Queries, Advanced Topics in Views Elmasri. Chp. 8,10,11 Lecture Notes 2, 3, 4 Modern Database Programming: Embedded SQL, Dynamic SQL, PHP Stored Procedure, User Defined Function (UDF), User Defined Type (UDT), User Defined Aggregate (UDA), Table Function, CLR, LINQ, Database Triggers 4, 5, 6 Modern Databases: Enhanced Data Models for Advanced Applications Semi Structured and Unstructured Databases: XML Data Processing: - XML Schema, Syntax/Semantics, Protocol - XPath - XQuery Elmasri, Lecture Notes Elmasri 12, 19, 20, 21 Listed papers. Lecture Notes

Introduction to Information Retrieval and Web Data Processing Data Models for Unstructured Data Processing Bigtable: A Distributed Storage System for Structured Data, by Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Google, Inc. in the Proceedings of OSDI 2006 7 Review of Query Processing Techniques and Operators: Join, Projection, Selection, Group By, Aggregations 8, 9 Data Warehouse and OLAP - Decision Support Technology - On Line Analytical Processing - Star Schema - OLAP Aggregation Operators: Data Cube, Roll Up, Drill Down - Building BI Data Analytics with DW and OLAP Ramakrishnan 11, 12 Lecture Notes. Selected Papers J. Han Chap 1,2,3. Listed Papers, Lecture Notes An Overview of Data Warehousing and OLAP Technology by Surajit Chaudhuri (Microsoft) and Umeshwar Dayal (HP Labs), in the proceedings of IEEE 1995 Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross Tab, and SubTotals by Jim Gray (Microsoft), et al, in the proceedings of IEEE 1996 9, 10 Data Mining for Business Intelligence Analytics - Concept & Architecture - Support and Confidence - Association Rules - Data Mining Algorithms: o Associate Rule Mining o APRIORI Algorithm and Optimization o Frequent Pattern Tree o Decision Tree - Lift and Other Correlation Measures J Han Chap 6. Listed Papers, Lecture Notes 11, 12, 13, 14 - Integrating association rule mining with relational database systems: Alternatives and implications by S. Sarawagi, et al (IBM Almaden Research Center) in the proceedings of SIGMOD'98 Big Data Processing and Parallel Computing: Google Map Reduce Tutorial Apache Hadoop File System Pig Latin on Apache Hadoop by Yahoo and Apache Data Warehouse HIVE with Hadoop by Facebook HBase Key Value Stores Map Reduce Join Algorithms Parallel Data Warehouse with OLAP Query Processing: Microsoft Extended PDW with Map Reduce and Hadoop : Oracle, Teradata Columnar Databases : SAP HANA Databases Extended PDW with Columnar Data Processing :Teradata, Microsoft ACID MongoDB VoltDB SQL vs NoSQL Listed papers. Lecture Notes

- MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean (Google) and Sanjay Ghemawat (Google) in the proceedings of OSDI 2004 - Apathy Hadoop in White Papers by Apache, Yahoo - Pig Latin: A Not-So-Foreign Language for Data Processing, Christopher Olston, et al. (Yahoo! Research) in the proceedings of SIGMOD 2008 - Data Warehousing and Analytics Infrastructure at Facebook. by Ashish Thusoo, et al. (Facebook) in the proceedings of SIGMOD 2010 - Petabyte Scale Databases and Storage Systems Deployed at Facebook, Dhruba Borthakur, et al. in the proceedings of SIGMOD 2010 Cloud Computing - SQL Azure as a Self-Managing Database Service: Lessons Learned and Challenges Ahead by Kunal Mukerjee, et al (Microsoft) in the proceedings of IEEE Computer Society Technical Committee on Data Engineering 2011 15, 16 Presentation of Significant Database Industry Research Papers on Big Data Processing: List of Selected Papers will be given in class. Selected Papers Technical Topics: Select one and prepare a 25 min talk on the subject. 1. Semistructured/Unstructured Data Processing using Structured Data Model a. FaceBook b. EBay 2. Data Warehousing and Analytics Infrastructure at Facebook 3. Parallel Computing for Big Data Processing: Google Map Reduce with Apache Hadoop Parallel Data Warehouse (PDW) 4. MapReduce: Simplified Data Processing on Large Clusters by Google 5. Lammal, Ralf. Google's MapReduce Programming Model Revisited. 6. Open Source Apache Hadoop 7. Pig Latin, Hbase, Hive 8. Map Reduce Join Algorithmes, 9. Data Partition Techniques 10. SQL vs NoSQL 11. Processing MR/Hadoop with PDW : Oracle, Teradata 12. Columnar Databases : SAP Hana 13. Information Retrieval: How Google Search Engine Works 14. Cloud Computing : Microsoft AZURE. More New Topics to come here. NOTE: The instructor reserves the right to retain, for pedagogical reasons, either the original or a copy of your work submitted either individually or as a group project for this class. Students' names will be deleted from any retained items.