Semi-Structured Data Management (CSE 511)

Similar documents
Data Processing at Scale (CSE 511)

GRADUATE PROGRAMS IN ENTERPRISE AND CLOUD COMPUTING

Fulton Schools of Engineering Career Fair, Tempe Fall 2018 Masters & PhD Recruiting Day Pages 2-3

Bismarck State College

Curriculum Scheme. Dr. Ambedkar Institute of Technology, Bengaluru-56 (An Autonomous Institute, Affiliated to V T U, Belagavi)

School of Engineering & Computational Sciences

Please consult the Department of Engineering about the Computer Engineering Emphasis.

BACHELOR OF SCIENCE IN INFORMATION TECHNOLOGY

INFORMATION TECHNOLOGY (IT)

Textbook(s) and other required material: Raghu Ramakrishnan & Johannes Gehrke, Database Management Systems, Third edition, McGraw Hill, 2003.

Department of Computer Science and Engineering

School of Engineering and Computational Sciences

DEPARTMENT OF COMPUTER SCIENCE

ECC Career Fair, Tempe, Spring 2018 Master's & PhD Student Recruiting Day Major UNK 1st Year Sophomore Junior Senior Post-Bach MS/PhD TOTAL

INSTITUTE OF AERONAUTICAL ENGINEERING

Discover Viterbi: Computer Science, Cyber Security & Informatics Programs. Viterbi School of Engineering University of Southern California Fall 2017

EASTERN ARIZONA COLLEGE Database Design and Development

Building the Cybersecurity Workforce. November 2017

Fundamentals of Database Systems V7. Course Outline. Fundamentals of Database Systems V Jul 2018

Assessment for all units is ongoing and continuous consisting of tests, assignments and reports. Most units have a final two-hour examination.

BACHELOR OF SCIENCE IN INFORMATION TECHNOLOGY

NEW BRUNSWICK. IT Systems Administrator

COMPUTER SCIENCE, BACHELOR OF SCIENCE (B.S.)

Undergraduate Program for Specialty of Software Engineering

Bachelor of Information Technology (Network Security)

B.S. INTEGRATED TECHNOLOGY - WEB DESIGN AND DEVELOPMENT/M.B.A.

MSc(IT) Program. MSc(IT) Program Educational Objectives (PEO):

BEST BIG DATA CERTIFICATIONS

B.TECH(COMPUTER) Will be equipped with sound knowledge of mathematics, science and technology useful to build complex computer engineering solutions.

PROFESSIONAL MASTER S IN

COMPUTER SCIENCE, BACHELOR OF SCIENCE (B.S.) WITH A CONCENTRATION IN CYBERSECURITY

The Information Technology Program (ITS) Contents What is Information Technology?... 2

ACADEMIC AFFAIRS COUNCIL ******************************************************************************

BOARD OF REGENTS ACADEMIC AFFAIRS COMMITTEE 4 STATE OF IOWA SEPTEMBER 12-13, 2018

Academic Reference Standards (ARS) for Electronics and Electrical Communications Engineering, B. Sc. Program

Introducing Maryville University s CYBER SECURITY ONLINE PROGRAMS. Bachelor of Science in Cyber Security & Master of Science in Cyber Security

FIVE YEAR INTEGRATED M.Sc. SOFTWARE SYSTEMS PROGRAMME

Descriptions for CIS Classes (Fall 2017)

SOFTWARE ENGINEERING. Curriculum in Software Engineering. Program Educational Objectives

SYLLABUS. Departmental Syllabus

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Updated with information about the new certificate programs THE KU MSIT HANDBOOK 1

College Of. Technological Innovation

PART-TIME MASTER S DEGREE PROGRAM. Information Systems. Choose from seven specializations study on campus and online.

COMPUTER SCIENCE AND ENGINEERING (CSEG)

Eight units must be completed and passed to be awarded the Diploma.

COWLEY COLLEGE & Area Vocational Technical School

BSc (Honours) Computer Science Curriculum Outline

Dealing with Data Especially Big Data

OVERVIEW OF SUBJECT REQUIREMENTS

Faculty of Engineering and Informatics. Programme Specification. School of Electrical Engineering and Computer Science

Make Your Statement. Master of Management in Library and Information Science online. Graduate Certificate in Library and Information Management online

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Academic Course Description

Challenges for Data Driven Systems

SCHOOL OF APPLIED HANDS ON. HIGH VALUE. TECHNOLOGY LIVE AND ONLINE COURSES

Course Outline. Upgrading Your Skills to MCSA Windows Server 2012 R2. Upgrading Your Skills to MCSA Windows Server 2012 R2

STUDENT AND ACADEMIC SERVICES

BEng (Hons) Chemical Engineering (Minor: Energy Engineering) E403 (Under Review)

Saint Petersburg Electrotechnical University "LETI" (ETU "LETI") , Saint Petersburg, Russian FederationProfessoraPopova str.

ENGINEERING (ENGR) Engineering (ENGR) 1. ENGR 1352 Engineering Design with CAD for CHE

Associate in Science and Bachelor of Science in Information Technology

MASTER OF SCIENCE IN COMPUTER SCIENCE

COMPUTER SCIENCE (CSCI)

ASSIUT UNIVERSITY. Faculty of Computers and Information Department of Information Technology. on Technology. IT PH.D. Program.

Computer Science & Engineering Department, School of Engineering UG Prospectus

Bachelor of Information Technology (Course Code: C2000) Bachelor of Computer Science (Course Code: C2001)

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Bachelor of Computer Science (Course Code: C2001)

IT Systems Administrator

MSc Digital Marketing

Computer Information Systems (CIS) CIS 105 Current Operating Systems/Security CIS 101 Introduction to Computers

COWLEY COLLEGE & Area Vocational Technical School

COURSE COUNSELING (FOR YEARS 3 & 4 IN )

Curriculum Mapping for National Curriculum Statement Grades R-12 and Oracle Academy.

Academic Course Description

Developing Enterprise Cloud Solutions with Azure

Specific Objectives Contents Teaching Hours 4 the basic concepts 1.1 Concepts of Relational Databases

MCTS/MCSE - Windows Server 2008 R2. Course Outline. MCTS/MCSE - Windows Server 2008 R Jun 2018

DIGITAL SCIENCES - B.S.

The University of Pittsburgh: A Major Research Institution The i School and New Directions

CISM - Certified Information Security Manager. Course Outline. CISM - Certified Information Security Manager.

Software Reliability and Reusability CS614

Apache Spark and Scala Certification Training

SECTION 4: OUTCOME MEASURE 6 Multiple Documents: Initial Level: TExES State Content Exam Data Chart

Management Information Systems

Report to External Review Board

Computer Information Systems (CIS) CIS 105 Current Operating Systems/Security CIS 101 Introduction to Computers

Programme Specification

Bachelor of Science Information Studies School of Information Program Summary

INSTITUTE OF INFORMATION TECHNOLOGY UNIVERSITY OF DHAKA

Building Partnerships to meet. Global Security Challenges. Dr. Taylor Eighmy August 14, 2018

Seneca. August 26,20II

IIT SCHOOL. Of APPLIed. Hands on. HigH value. technology. live and online CoURses

Academic Course Description

Modern Database Systems CS-E4610

Study Scheme & Syllabus Of B.Tech Production Engineering 2014 onwards

BSc (Hons) Software Engineering (FT) - IC320

QUALITY IMPROVEMENT PLAN (QIP) FOR THE CONSTRUCTION MANAGEMENT DEGREE PROGRAM

CIB Session 12th NoSQL Databases Structures

Transcription:

Semi-Structured Data Management (CSE 511) Note: Below outline is subject to modifications and updates. About this Course Database systems are used to provide convenient access to disk-resident data through efficient query processing, indexing structures, concurrency control, and recovery. T his course delves into new frameworks for processing and generating large-scale datasets with parallel and distributed algorithms, covering the design, deployment and use of state-of-the-art data processing systems, which provide scalable access to data. Specific topics covered include: y Efficient query processing y Indexing structures y Distributed database design y Parallel query execution yconcurrency control in distributed parallel database systems y Data management in cloud computing environments y Data management in Map/Reduce-based y NoSQL database systems Learning Outcomes Learners completing this course will be able to: ydifferentiate among major data models such as relational, spatial, and NoSQL yperform queries (e.g., SQL) and analytics tasks in state-of-the-art database systems yapply leading-edge techniques to design/tune distributed and parallel database systems yutilize existing NoSQL database systems as appropriate for specified cases yperform database operations (e.g., selection, projection, join, and groupby) in state-of-the-art cluster computing systems such as Hadoop/Spark y Perform scalable data processing operations (e.g., selection, projection, join, and groupby) in cloud computing environments, including Amazon AWS Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 1

Projects yproject 1: Movie Recommendation Database yproject 2: Distributed Movie Recommendation Database yproject 3: Location-Aware Twitter Analytics yproject 4: Spatial Data Processing using Apache Spark yproject 5: SQL queries on Amazon EC2 Course Content Instruction y Video Lectures yother Videos y Readings y Interactive Learning Objects y Live office hours ywebinars Assessments y Practice activities and quizzes (auto-graded) y Practice assignments (instructoror peer-reviewed) yteam and/or individual project(s) (instructor-graded) y Final exam (graded) Estimated Workload/Time Commitment Per Week Approximately 9 hours per week Required Prior Knowledge and Skills y Basic statistics and computer science knowledge including computer organization and architecture, discrete mathematics, data structures, and algorithms y Knowledge of high-level programming languages (e.g., C++, Java) and scripting language (e.g., Python) Technology Requirements Hardware y Standard with major OS Software and Other yto complete course projects, some of the following software may be required: Amazon AWS ycloud, Hadoop/Spark, GitHub, PostgreSQL, MongoDB, Neo4j. Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 2

Course Outline Unit 1: Basic Data Processing Concepts 1.1: Explain Data Models and Data processing concepts 1.2: Utilize Relational Model and Relational Algebra 1.3: Utilize SQL query language Module 1: Big Data and Data Processing Introduction to Data and Data Processing Database Management Systems Data Models Module 2: Basic Data Concepts Database Systems - What and Why? Database Management Systems Data Model Database Design: Entity Relationship Model to Relational Model Entity Relational Model ER to Relational Model Assignment: Create a Movie Database Relational Model and Relational Algebra Relational Data Model Relational Algebra: Query Language Query Language: Union Query Language: Difference Query Language: Cartesian Product Query Language: Selection Query Language: Projection Query Language: Intersection Query Language: 0-Join SQL Query Language: Part 1: SQL Query Language Part 2: SQL Query Language Assignment: SQL Query for Movie Recommendation Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 3

Unit 2: Data Storage and Indexing 2.1 Recognize major data storage layouts 2.2 Identify major indexing schemes in Database Systems Module 1: Major Storage Layouts Introduction to Data Storage Alternative File Organizations Module 2: Major Indexing Schemes in Database Systems Hash-based Indexes Index Classification Unit 3: Transactions and Recovery 3.1 Examine the ACID properties 3.2 Explain Transactions and Concurrency Control concepts 3.3 Describe how recovery from failures happens in database systems Module 1: ACID Properties Principles of Transactions: ACID Properties Module 2: Concurrency Control Concepts Concurrency Control Module 3: Lock-based Concurrency Control and Recovery from Failures Lock-Based Concurrency Control Database Recovery Unit 4: Principles of Distributed and Parallel Database Systems 4.1 Describe data fragmentation and replication models 4.2 Describe the components of a distributed database 4.3. Apply skills learned to complete an assignment using data partitioning Module 1: Distributed Databases: Why, What? Why Distribution? Module 2: Data Fragmentation and Replication Model Introduction to Fragmentation Introduction to Replication Assignment: Data Fragmentation Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 4

Module 3: Advanced Distributed Database Systems Query Processing and Optimization in Distributed Databases Distributed Query Processing Total Cost of Query Execution Plan Assignment: Query Processing Module 4: Parallel Database Systems Parallel Data Architecture Introduction to Parallel DBMS The Different Types of DBMS Parallelism Parallel Sorting and Joins Assignment: Parallel Sort and Joins Unit 5: NoSQL Database Systems Module 1: NoSQL Database Systems Key-Value Stores Graph Databases Document Databasesy Module 2: Big Data Analytics Systems Intro Map-Reduce / Spark Data Analytics in Map-Reduce / Spark Graph Processing Engines Module 3: Data Processing on Modern HW PROJECT: Distributed Movie Recommendation Database Unit 6: Big Data Tools PROJECT: Location-Aware Twitter Analytics PROJECT: Spatial Data Processing using Apache Spark Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 5

Unit 7: Additional Tools Used for Data Visualization 7.1 Explain data processing in the cloud 7.2 Evaluate service models 7.3 Evaluate deployment models Module 1: Introduction to Cloud Computing Introduction to Cloud Computing Module 2: Service Models Service Models Module 3: Deployment Models Deployment Models Unit 8: Cloud-based Data Management 8.1 Explain AWS Module 1: Amazon Web Services Introduction to Amazon Web Services AWS Computing AWS Storage AWS Queueing Services Module 2: Build an Elastic Cloud Application AWS Interfaces Auto-Scaling Module 3: Build a MapReduce Cloud Application AWS Security PROJECT: SQL queries on Amazon EC2 Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 6

About ASU Established in Tempe in 1885, Arizona State University (ASU) has developed a new model for the American Research University, creating an institution that is committed to access, excellence and impact. As the prototype for a New American University, ASU pursues research that contributes to the public good, and ASU assumes major responsibility for the economic, social and cultural vitality of the communities that surround it. Recognizing the university s groundbreaking initiatives, partnerships, programs and research, U.S. News and World Report has named ASU as the most innovative university all three years it has had the category. The innovation ranking is due at least in part to a more than 80 percent improvement in ASU s graduation rate in the past 15 years, the fact that ASU is the fastest-growing research university in the country and the emphasis on inclusion and student success that has led to more than 50 percent of the school s in-state freshman coming from minority backgrounds. About Ira A. Fulton Schools of Engineering Structured around grand challenges and improving the quality of life on a global scale, the Ira A. Fulton Schools of Engineering at Arizona State University integrates traditionally separate disciplines and supports collaborative research in the multidisciplinary areas of biological and health systems; sustainable engineering and the built environment; matter, transport and energy; and computing and decision systems. As the largest engineering program in the United States, students can pursue their educational and career goals through 25 undergraduate degrees or 39 graduate programs and rich experiential education offerings. The Fulton Schools are dedicated to engineering programs that combine a strong core foundation with top faculty and a reputation for graduating students who are aggressively recruited by top companies or become superior candidates for graduate studies in medicine, law, engineering and science. About the School of Computing, Informatics, & Decision Systems Engineering The School of Computing, Informatics, and Decision Systems Engineering advances developments and innovation in artificial intelligence, big data, cybersecurity and digital forensics, and software engineering. Our faculty are winning prestigious honors in professional societies, resulting in leadership of renowned research centers in homeland security operational efficiency, data engineering, and cybersecurity and digital forensics. The school s rapid growth of student enrollment isn t limited to the number of students at ASU s Tempe and Polytechnic campuses as it continues to lead in online education. In addition to the Online Master of Computer Science, the school also offers an Online Bachelor of Science in Software Engineering, and the first fouryear, completely online Bachelor of Science in Engineering program in engineering management. Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 7

Creators Mohamed Sarwat is an Assistant Professor of Computer Science and the director of the Data Systems (DataSys) lab at Arizona State University (ASU). He is also an affiliate member of the Center for Assured and Scalable Data Engineering (CASCADE). Before joining ASU, Mohamed obtained his MSc and PhD degrees in computer science from the University of Minnesota. His research interest lies in the broad area of data management systems. Ming Zhao is an associate professor of the ASU School of Computing, Informatics, and Decision Systems Engineering. Before joining ASU, he was an associate professor of the School of Computing and Information Sciences (SCIS) at Florida International University. He directs the Research Laboratory for Virtualized Infrastructure, Systems, and Applications (VISA). His research interests are in distributed/cloud computing, big data, high-performance computing, autonomic computing, virtualization, storage systems and operating systems. Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 8

Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 9

Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 10

Creators Established in Tempe in 1885, Arizona State University (ASU) has developed a new model for the American Research University, creating an institution that is committed to access, excellence and impact. As the prototype for a New American University, ASU pursues research that contributes to the public good, and ASU assumes major responsibility for the economic, social and cultural vitality of the communities that surround it. Recognizing the university s groundbreaking initiatives, partnerships, programs and research, U.S. News and World Report has named ASU as the most innovative university all three years it has had the category. The innovation ranking is due at least in part to a more than 80 percent improvement in ASU s graduation rate in the past 15 years, the fact that ASU is the fastest-growing research university in the country and the emphasis on inclusion and student success that has led to more than 50 percent of the school s in-state freshman coming from minority backgrounds. Mohamed Sarwat is an Assistant Professor of Computer Science and the director of the Data Systems (DataSys) lab at Arizona State University (ASU). He is also an affiliate member of the Center for Assured and Scalable Data Engineering (CASCADE). Before joining ASU, Mohamed obtained his MSc and PhD degrees in computer science from the University of Minnesota. His research interest lies in the broad area of data management systems. Ming Zhao is an associate professor of the ASU School of Computing, Informatics, and Decision Systems Engineering. Before joining ASU, he was an associate professor of the School of Computing and Information Sciences (SCIS) at Florida International University. He directs the Research Laboratory for Virtualized Infrastructure, Systems, and Applications (VISA). His research interests are in distributed/cloud computing, big data, high-performance computing, autonomic computing, virtualization, storage systems and operating systems. Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 11