Cleveland State University

Similar documents
Cleveland State University

Cleveland State University

Cleveland State University

CIS 408 Internet Computing (3-0-3)

CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung

Cleveland State University

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

Cleveland State University

San José State University Computer Science Department CS157A: Introduction to Database Management Systems Sections 5 and 6, Fall 2015

CSC 261/461 Database Systems. Fall 2017 MW 12:30 pm 1:45 pm CSB 601

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS ADVANCED DATABASE MANAGEMENT SYSTEMS CSIT 2550

Course and Contact Information. Course Description. Course Objectives

San Jose State University College of Science Department of Computer Science CS185C, Introduction to NoSQL databases, Spring 2017

IS Spring 2018 Database Design, Management and Applications

BIG DATA COURSE CONTENT

San Jose State University College of Science Department of Computer Science CS185C, NoSQL Database Systems, Section 1, Spring 2018

Course and Contact Information. Course Description. Course Objectives

Advanced Topics in Database Systems Spring 2016

Project on Data Analytics CIS 660 Sunnie S Chung

Big Data Hadoop Stack

IS 331-Fall 2017 Database Design, Management and Applications

San José State University College of Science / Department of Computer Science Introduction to Database Management Systems, CS157A-3-4, Fall 2017

Textbook(s) and other required material: Raghu Ramakrishnan & Johannes Gehrke, Database Management Systems, Third edition, McGraw Hill, 2003.

CASPER COLLEGE COURSE SYLLABUS MSFT 1600 Managing Microsoft Exchange Server 2003 Semester/Year: Fall 2007

Course and Contact Information. Catalog Description. Course Objectives

San José State University Science/Computer Science Database Management System I

Meetings This class meets on Mondays from 6:20 PM to 9:05 PM in CIS Room 1034 (in class delivery of instruction).

CMPS 182: Introduction to Database Management Systems. Instructor: David Martin TA: Avi Kaushik. Syllabus

INF 551: Overview of Data Informatics in Large Data Environments Section: 32405D Spring 2017 (4 units), 3:30 5:20 PM, MW, SOS B44

NEW YORK CITY COLLEGE OF TECHNOLOGY COMPUTER SYSTEMS TECHNOLOGY DEPARTMENT CST4714 DATABASE ADMINISTRATION (2 class hours, 2 lab hours, 3 credits)

San José State University Computer Science Department CS49J, Section 3, Programming in Java, Fall 2015

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

In this course, you need to use Pearson etext. Go to "Pearson etext and Video Notes".

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Database Systems: Concepts, design, and implementation ISE 382 (3 Units)

ITP489 In-Memory DBMS for Real Time Analytics

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS ROUTING/SWITCH CONFIGURATION CSIT 2750

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

CPS352 - DATABASE SYSTEMS. Professor: Russell C. Bjork Spring semester, Office: KOSC 242 x4377

Advanced Relational Database Management MISM Course F A Fall 2017 Carnegie Mellon University

Advanced Relational Database Management MISM Course S A3 Spring 2019 Carnegie Mellon University

ITSC 1319 INTERNET/WEB PAGE DEVELOPMENT SYLLABUS

Course Syllabus - CNT 4703 Design and Implementation of Computer Communication Networks Fall 2011

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS LINUX SYSTEM ADMINISTRATION CSIT 2411

TITLE OF COURSE SYLLABUS, SEMESTER, YEAR

San José State University College of Science / Computer Science Department Data BS Management Systems 2, CS 157B-02, Spring, 2017

CISN 340 Data Communication and Networking Fundamentals Fall 2012 (Hybrid)

CPS352 Database Systems Syllabus Fall 2012

Presented by Sunnie S Chung CIS 612

ISM 324: Information Systems Security Spring 2014

INST Database Design and Modeling - Section 0101 Spring Tentative Syllabus

Introduction to Data Management CSE 344. Lecture 1: Introduction

CSCI 6312 Advanced Internet Programming

In this course, you need to use Pearson etext. Go to "Pearson etext and Video Notes".

Stages of Data Processing

Philadelphia University Faculty of Information Technology Department of Computer Science --- Semester, 2007/2008. Course Syllabus

CS157a Fall 2018 Sec3 Home Page/Syllabus

Microsoft Big Data and Hadoop

LSC 740 Database Management Fall 2008

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

COWLEY COLLEGE & Area Vocational Technical School

Big Data Architect.

Rochester Institute of Technology Golisano College of Computing and Information Sciences Department of Information Sciences and Technologies

CPS352 - DATABASE SYSTEMS. Professor: Russell C. Bjork Spring semester, Office: KOSC 242 x4377

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

CMPUT 391 Database Management Systems. Fall Semester 2006, Section A1, Dr. Jörg Sander. Introduction

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS. INTRODUCTION TO INTERNET SOFTWARE DEVELOPMENT CSIT 2230 (formerly CSIT 2645)

Your New App. Motivation. Data Management is Universal. Staff. Introduction to Data Management (Database Systems) CSE 414. Lecture 1: Introduction

CISC 7610 Lecture 2b The beginnings of NoSQL

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS LINUX ADVANCED SYSTEM & NETWORK ADMINISTRATION CSIT 2475

CSCI 201L Syllabus Principles of Software Development Spring 2018

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

Dr. Angela Guercio Dr. Natalia Dragan. Spring 2011

9/8/2018. Prerequisites. Grading. People & Contact Information. Textbooks. Course Info. CS430/630 Database Management Systems Fall 2018

INFS 2150 (Section A) Fall 2018

San José State University Department of Computer Science CS-174, Server-side Web Programming, Section 2, Spring 2018

Art 645 Introduction to Web Site Design Los Angeles City College

SULTAN QABOOS UNIVERSITY COURSE OUTLINE PROGRAM: B.Sc. in Computer Science. Laboratory (Practical) Field or Work Placement

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Database Management Systems CS Spring 2017

San José State University Computer Science CS 122 Advanced Python Programming Spring 2018

Computer Science Department

TBD TA Office hours: Will be posted on elearning. SLO3: Students will demonstrate competency in data modeling, including dimensional modeling.

Lecture Notes CPSC 321 (Fall 2018) Today... Survey. Course Overview. Homework. HW1 (out) S. Bowers 1 of 8

WAYLAND BAPTIST UNIVERSITY VIRTUAL CAMPUS SCHOOL OF BUSINESS SYLLABUS

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

CS 445 Introduction to Database Systems

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Class Hours: Saturday Monday Address: 9000 Overland Ave., Culver City, CA :00 p.m. 4:15 p.m. Location: Building and room number

Outline. Databases and DBMS s. Recent Database Applications. Earlier Database Applications. CMPSCI445: Information Systems.

SURVEY ON BIG DATA TECHNOLOGIES

GET 433 Course Syllabus Spring 2017

Division of Engineering, Computer Programming, and Technology

Dealing with Data Especially Big Data

Common Syllabus revised

ESET 349 Microcontroller Architecture, Fall 2018

Introduction to Big Data. Hadoop. Instituto Politécnico de Tomar. Ricardo Campos

CMPT 354: Database System I. Lecture 1. Course Introduction

Transcription:

Cleveland State University CIS 612/CIS712 Big Data & Parallel Database Processing Systems (3-0-3) Prerequisites: CIS 530. CIS 611 Preferred. Instructor: Dr. Sunnie S. Chung Office Location: FH 222 Phone: 216 687 4661 Email: s.chung@csuohio.edu sschung.cis@gmail.com Webpage: http://eecs.csuohio.edu/~sschung Office Time: Tues, Thur 2:00 4:00 PM (email me for an appointment) Catalog Description: Detailed study of modern database processing and parallel database systems for big data processing. The course first presents the concept of Transaction with ACID and concurrency control strategies in active database systems. The course continues with semi-structured/unstructured data processing strategies with Jason/Html/Xml data, XPath, and XQuery. The course advances the study with big data processing strategies on Hadoop file system with Map Reduce and focuses on the study of Massively Parallel Processing (MPP) systems for big data processing NoSQL, NewSQL systems, and Cloud Computing platforms and infrastructures for big data processing. The course covers data model, index, querying techniques, data processing methods, and ACID issues on such systems with Google s Big Table, Hive, HBase, PigLatin, Mongo DB, and VoltDB. Throughout the projects that processes real time big data stream from the popular social network sites, the students will get hands-on experiences in such big data processing systems. Finally, the course will explore the latest advances in industry research for big data processing and data analytics. Key Concepts: Transaction, Concurrency Control, Modern Database Programming, Semi-structured database processing, JASON, XML data processing, XPath, XQuery, Web Data Processing, Unstructured data processing, Massively Parallel Processing (MPP) systems, Map Reduce, Hadoop, Cloud Computing platform and infrastructures, Parallel Data Warehouse (PDW), OLAP, Big Data Processing strategies, Google s Big Table, Hive, HBase, PigLatin, MongoDB, VoltDB, ACID in NoSQL, NewSQL, and Cloud Computing. Expected Outcomes: Upon successful completion of the course, the student will be able to: Understand a well-defined Transaction concept and concurrency control strategies in database processing systems; Create modern database applications that process non-traditional data - semi-structured data such as JASON or XML data, or unstructured data such as web logging data; Understand big data processing techniques and comprehensive knowledge on Massively Parallel Processing (MPP) systems NoSQL/New SQL systems, and Cloud Computing; Obtain hands-on experiences on parallel data processing systems and tools, and cloud computing platforms and infrastructures for big data processing; Build an infrastructure for big data processing systems; Exposed to the latest advances in database industry research in big data processing; List of Required Materials: Any RDBMS: Oracle Database 11g or higher - This is available at http://www.oracle.com/us/downloads/index.html Microsoft SQL Server 2014, Microsoft Visual Studio 2014 or any higher Microsoft SQL Server Data Analytic Tool 2014 They are available at the Microsoft Academic Alliance program: http://e5.onthehub.com/webstore/productsbymajorversionlist.aspx?ws=31b9929b-c09b-e011-969d-0030487d8897 Open Source Systems: Installation/Set Up details for these system will be given in class.

Hadoop/MapReduce and VM Hive HBase PigLatin MongoDB VoltDB Text Book: 1. "Fundamentals of Database Systems". Elmasri / Navathe. 7 th Edision. Addison/Wesley Pub Co. ISBN-13: 978-0133970777 ISBN-10: 0133970779 2. will be available on the class webpage 3. List of Selected Database Research Papers on Big Data Processing Systems and Data Analytics will be given in class Supplement Materials: will be given in class Tutorials for Hadoop/Map Reduce and VM Tutorials for NoSQL Systems - Hive, HBase, PigLatin, MongoDB, Tutorials for New SQL System VoltDB Official Calendar Please consult the university web page at: http://www.csuohio.edu/enrollmentservices/registrar/calendar/index.html Final exam: Mon May 8 4:00-6:00 PM Grading: The course grade is based on a student's overall performance through the entire Semester. The final grade is distributed among the following components: 1. Exams 35% (15% Midterm, 20% Final) 2. Computer Labs 30% (about 4 Lab Assignments) 3. Project and Presentation on Big Data Processing: 2 person group project (25%) 4. Research Paper Presentation: 10% I reserve the right to change the weighting and the number of assignments. Additional Requirements for CIS712 Doctoral Students: Doctoral students who take CIS712 must select a in-depth project to work on. (Examples of the tentative topics of the projects are given in the course schedule section below) Doctoral students who take CIS712 must work on the project individually (instead of 2 person group) The list of projects and research papers for doctoral students may be given separately in class. It may vary every year. A tentative selection of projects and papers are given at the end of the course schedule here. In each exam, one additional problem might be designed to be completed by doctoral students only A 93% + A: Outstanding (student's performance is genuinely excellent) A- 92.9% - 90% B + 87% - 89.9% B 80% - 86.9% B: Student's performance is satisfied for every course requirement and acceptable but not necessarily distinguishable B- 78% - 79.9% C 70% - 77.9% C: Student's performance is not satisfied for every course requirement and is not

acceptable to pass F <70% F: Failure Examination Policy: Students are allowed to bring to the tests a summary page (standard letter size) with their own notes. During the exams: (1) the use of books, cell phones, calculators, or any electronic devices is prohibited, and (2) students must not share any materials. Make-Up Exam Policy: No makeup exams will be given unless notified and agreed to in advance. Requests will be considered only in case of exceptional demonstrated need. Homework Policy: The students are expected to attend all classes. The students are responsible for collecting the notes, handouts and any other course material distributed during the class period. All assignments must be individually and independently completed and must represent the effort of the student turning in the assignment. Should two or more students turn in substantially the same solution or output, in the judgment of the instructor, the solution will be considered group effort. All involved in group effort homework will receive a zero grade for that assignment. A student turning in a group effort assignment more than once will automatically receive an F grade for the course. Late Assignment: All lab assignments are due at the beginning of class on the date specified. Laboratory Assignments handed in after the class has begun will be accepted with a 25% grade penalty for up to a week and then not accepted at all. All laboratory assignments must be completed. Failure to do so will lower your course grade one additional letter grade. Student Conduct: Students are expected to do their own work. Academic misconduct, student misconduct, cheating and plagiarism will not be tolerated. Violations will be subject to disciplinary action as specified in the CSU Student Conduct Code. A copy can be obtained on the web page at: http://www.csuohio.edu/studentlife/studentcodeofconduct.pdf or by contacting Valerie Hinton Hannah, Judicial Affairs Officer in the Department of Student Life (MC 106 email v.hintonhannah@csuohio.edu ). For more information consult the following web page CSU Judicial Affairs available at http://www.csuohio.edu/studentlife/jaffairs/faq.html Course Schedule: The schedule of topics and their order of coverage is given below. The schedule and topics to be covered may vary depending upon the progress made. Week of Topic Reading 1, 2 DBMS Architecture, Complex Queries, Advanced Topics in Views Introduction to Big Data Transaction, ACID Concurrency Control 3-4 Modern Database Programming: Database Triggers Stored Procedure, Embedded SQL, Dynamic SQL, JDBC/ODBC, PHP User Defined Function (UDF), User Defined Type (UDT), User Defined Aggregate (UDA), Table Function, CLR, LINQ,.NET Database Programmability and Extensibility in Microsoft SQL Server by José A. Blakeley, et al (Microsoft Corporation) in the proceedings of SIGMOD 2008 Elmasri, Chap. 21, 22,

5-6 Modern Databases: Enhanced Data Models for Advanced Applications Semi Structured and Unstructured Databases: XML Data Processing: - XML Schema, Syntax/Semantics, Protocol - XPath - XQuery Data Transformation from Semi Structure to Relation JSON Data Processing, Elmasri Chap. 12, 19, 20 Listed papers. 7-8 Introduction to Information Retrieval and Web Data Processing Data Models for Unstructured Data Processing. Selected Papers Bigtable: A Distributed Storage System for Structured Data, by Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Google, Inc. in the Proceedings of OSDI 2006 9-10 Big Data Processing and Parallel Computing: Introduction of Big Data Google s Map Reduce Paradigm Apache Hadoop File System for Parallel Processing MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean (Google) and Sanjay Ghemawat (Google) in the proceedings of OSDI 2004 Apathy Hadoop in White Papers by Apache, Yahoo 11-14 Big Data Processing and Massively Parallel Processing Systems NoSQL/NewSQL Systems Pig Latin on Apache Hadoop by Yahoo and Apache Data Warehouse HIVE with Hadoop by Facebook HBase Key Value Stores Map Reduce Join Algorithms Parallel Data Warehouse with OLAP Query Processing: Microsoft Extended PDW with Map Reduce and Hadoop : Oracle, Teradata MongoDB VoltDB NoSQL vs NewSQL ACID Tutorials Pig Latin: A Not-So-Foreign Language for Data Processing, Christopher Olston, et al. (Yahoo! Research) in the proceedings of SIGMOD 2008 Data Warehousing and Analytics Infrastructure at Facebook. by Ashish Thusoo, et al. (Facebook) in the proceedings of SIGMOD 2010 Petabyte Scale Databases and Storage Systems Deployed at Facebook, Dhruba Borthakur, et al. in the proceedings of SIGMOD 2014 Fast Data in the Era of Big Data: Twitter s Real-Time Related Query Suggestion Architecture, Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin (Twitter, Inc), SIGMOD 2014. The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah (LinkedIn), SIGMOD 2015 Avatara: OLAP for Webscale Analytics Products Lili Wu Roshan Sumbaly Chris Riccomini Gordon Koo Hyung Jin

Kim Jay, Kreps Sam Shah (LinkedIn), SIGMOD 2014 More papers will be given in class Cloud Computing Microsoft Azure as a Self-Managing Database Service: Lessons Learned and Challenges Ahead by Kunal Mukerjee, et al (Microsoft) in the proceedings of IEEE Computer Society Technical Committee on Data Engineering 2014 15, 16 Presentation on Significant Database Research in Big Data Processing, Cloud Computing, and more: List of Selected Papers will be given in class. Selected Papers Tentative Technical Presentation Topics: (It may vary every year) 1. Semistructured/Unstructured Data Processing 2. Hadoop based Data Warehousing and Analytics Infrastructure at Facebook 3. Parallel Computing for Big Data Processing: Google Cloud, Amazon Cloud Hadoop Based NoSQL Systems NewSQL Systems 4. MapReduce: Simplified Data Processing on Large Clusters by Google 5. Lammal, Ralf. Google's MapReduce Programming Model Revisited. 6. Stream Processing Sparks 7. NoSQL Systems: Pig Latin, HBase, Hive, Mongo DB, 8. Map Reduce Join Algorithmes, 9. Data Partition Techniques 10. Performance Survey : SQL vs NoSQL 11. Processing MR/Hadoop with PDW : Oracle, Teradata 12. Information Retrieval: Google Search Engine 13. Big Data Integration Systems 14. Cloud Computing : Microsoft AZURE, Amazon Cloud, Google Cloud 15. More on these Tentative List of Research Papers and Projects for CIS 712 Doctoral Students: CIS 712 Doctoral Students should choose one of the research topics and give a 30 min presentation on the papers in the topic and complete a project related to the subjects. The Paper List and Project Specification on each research topic will be given in class. ADA Adherence: If you need course adaptations or accommodations because of a disability, if you have emergency medical information to share with me, or if you need special arrangements in case the building must be evacuated, please make an appointment with me as soon as possible. My office location and hours are listed on top of this syllabus. If you need further information, please contact the Office of Disability Services (Main Classroom 147), phone number 216.687.2015, on the web at http://www.csuohio.edu/offices/disability/.