CSE 344 MAY 7 TH EXAM REVIEW

Similar documents
CSE 344 Final Review. August 16 th

Database Systems CSE 414

CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA

CSE 344 JANUARY 26 TH DATALOG

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Announcements. What is Datalog? Why Do We Learn Datalog? Database Systems CSE 414. Midterm. Datalog. Lecture 13: Datalog (Ch

University of Waterloo Midterm Examination Solution

Querying Data with Transact SQL

CSE 414 Final Examination. Name: Solution

CSE344 Midterm Exam Fall 2016

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

CSE344 Midterm Exam Winter 2017

CSE 344 MAY 2 ND MAP/REDUCE

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13

Querying Data with Transact-SQL

University of Waterloo Midterm Examination Sample Solution

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

CSE 562 Final Exam Solutions

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

CSE 544, Winter 2009, Final Examination 11 March 2009

Introduction to Database Systems CSE 414. Lecture 16: Query Evaluation

CSE 344 Final Examination

Introduction to Data Management CSE 344

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

20761B: QUERYING DATA WITH TRANSACT-SQL

Querying Data with Transact-SQL (761)

Querying Data with Transact-SQL

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Relational Databases

CSE 344 Midterm. Wednesday, Nov. 1st, 2017, 1:30-2:20. Question Points Score Total: 100

Announcements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

CSE 344 Midterm. Wednesday, Oct. 31st, 2018, 1:30-2:20. Question Points Score Total: 100

CSE 344 Midterm Exam

Querying Data with Transact-SQL

Database Systems CSE 414

20761 Querying Data with Transact SQL

CSE344 Final Exam Winter 2017

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Database Systems CSE 414

Chapter 17: Parallel Databases

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

Course Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led

Querying Data with Transact SQL Microsoft Official Curriculum (MOC 20761)

Lecture 1: Conjunctive Queries

Querying Microsoft SQL Server 2014

Querying Data with Transact-SQL

Database Applications (15-415)

Principles of Data Management. Lecture #9 (Query Processing Overview)

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

Query Execution [15]

MySQL for Developers with Developer Techniques Accelerated

Introduction to Data Management CSE 344

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

RELATIONAL OPERATORS #1

CSE 344 JULY 9 TH NOSQL

MTA Database Administrator Fundamentals Course

Midterm Review. Winter Lecture 13

Querying Data with Transact-SQL

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases. Introduction

Chapter 19 Query Optimization

T-SQL Training: T-SQL for SQL Server for Developers

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Class Overview. Two Classes of Database Applications. NoSQL Motivation. RDBMS Review: Client-Server. RDBMS Review: Serverless

[AVNICF-MCSASQL2012]: NICF - Microsoft Certified Solutions Associate (MCSA): SQL Server 2012

CSE 344 Midterm Nov 1st, 2017, 1:30-2:20

CSCI-6421 Final Exam York University Fall Term 2004

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

AVANTUS TRAINING PTE LTD

Advanced Databases: Parallel Databases A.Poulovassilis

Parser: SQL parse tree

Querying Data with Transact-SQL

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

CSE 190D Spring 2017 Final Exam

20461: Querying Microsoft SQL Server 2014 Databases

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Introduction to Database Systems CSE 444

University of California, Berkeley. (2 points for each row; 1 point given if part of the change in the row was correct)

CompSci 516 Data Intensive Computing Systems

Querying Microsoft SQL Server 2008/2012

Parallel DBs. April 25, 2017

Querying Microsoft SQL Server

Announcements. Subqueries. Lecture Goals. 1. Subqueries in SELECT. Database Systems CSE 414. HW1 is due today 11pm. WQ1 is due tomorrow 11pm

Query Optimization. Introduction to Databases CompSci 316 Fall 2018

Announcements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

CSE414 Midterm Exam Spring 2018

Duration Level Technology Delivery Method Training Credits. Classroom ILT 5 Days Intermediate SQL Server

Transcription:

CSE 344 MAY 7 TH EXAM REVIEW

EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck!

EXAM LENGTH Production v. Verification Practice exam Short answer Simplest answer possible Problems not necessarily in order of difficulty

GENERAL TOPICS Databases Motivations and definitions Relational Databases SQL Relational Algebra Datalog Semi-structured Data Motivations and definitions

GENERAL TOPICS Internals Indexes Physical plans/cost Estimation Disk I/o Parallel Shared Nothing Map Reduce

DATABASES Motivations Collections of related files Databases vs. DBMS What is stored? What is the DBMS responsibility?

DATABASES Motivations Collections of related files Databases vs. DBMS What is stored? What is the DBMS responsibility? Data storage and manipulation Black box thought Physical data independence

RELATIONAL DATABASES Motivations Breaking away from singular flat files Why/how do we break up data? Data model Schemas and keys Records and attributes Attribute types/typing

RELATIONAL DATABASES Primary keys What are the constraints? When do we select keys? Multiple keys Foreign keys Constraints vs. Joining Keys across different data

SQL STRUCTURE Flat tables First normal form Crosswalks and joins Breaking up data into multiple relations

SQL CODE Create statements Key declarations Type declarations Insert/Delete statements Update statements Drop table

SQL CODE Select From Where Group by Having Order by

SQL CODE Distinct (and relation to group by) Inner vs. Outer Joining Left/Right/Full Nested loop semantics Cross join with selection Self joins Produce companies that produce gadgets and cameras

SQL CODE Aggregation Count,sum,min,max,avg Null values IS NOT null Count(null) Where vs. Having

SQL CODE Constructing Queries FWGHOS Subqueries In Select (Single attribute projection) In From (subquery AS, WITH AS) In Where (EXISTS, IN, ANY) Correlated vs. Non-correlated Un-nesting Finding the Witness

SQL CODE Negation in subqueries Monotonicity Definitions Example Difficulties and necessity of subqueries

RELATIONAL ALGEBRA Set vs. Bag semantics Why bag? Query plans and RA expressions Operations (on relations, some with conditions) Union, difference Selection Projection Joins

RELATIONAL ALGEBRA Operations (on relations, some with conditions) Union, difference Selection Projection Joins Duplicate elimination Grouping Sorting

RELATIONAL ALGEBRA Operations (on relations, some with conditions) Union, difference Selection Projection Joins (remember your conditions) Duplicate elimination Grouping Sorting

RELATIONAL ALGEBRA How do we know SQL and RA are equally expressive? Translating one to the other Multiple RA expressions possible for same query DBMS optimization

RELATIONAL ALGEBRA Producing RA expressions/trees From queries Visa-versa Bag vs. Set RA Datalog is set semantic

DATALOG Queries which cannot be defined in RA Recursive queries Expressing RA expressions in datalog Set semantics (procedural) Simple, concise, elegant Fixed point semantics Recursion builds from basecase Left/right/non-linear

DATALOG Logical framework Explicitly defined intermediate results Terminology Facts and Rules Extensional vs. Intensional Predicates Head and body Head vs. Existential Variables Unsafe rules

DATALOG Writing Rules Safety Base cases Aggregation and negation Variable scope Simple recursive queries Converting from RA

SEMISTRUCTURED DATA Motivations Transactional vs. Analytical Data Data distribution Consistency Partition vs. Replication Key-value storage -> Document Storage

JSON Gives structure to data Objects and collections Self described Separate and less constrained than SQL++ Nested structure (non-first normal form)

ASTERIX DB Document-based NoSQL Semi-structured Over JSON objects Constraints (types, no duplicates) SQL++ Description vs. Manipulation

ASTERIX DB Dataverse Database set of data currently working with Types UUID auto generated Null vs. Missing Nested collections Open v. Closed Required v. Optional fields

ASTERIX DB Datasets Relations Defined over a type Must have a key Indexes Over particular attributes Speeds up 1-d selection (BTREE), 2-d selection (RTREE) and substring selection (KEYWORD)

ASTERIX DB SQL++ Heterogeneity Unnesting Nesting/Aggregation and non-first normal Multi-value join Supports one to many Can often be represented in SQL

SEMISTRUCTURED Distributed systems Short-term analysis Lower set-up costs Higher query costs (often)

INTERNALS Physical Plans Operators Pipelining (selection, projection) Joins Hash Merge Index Nested Loop

INTERNALS Physical Plans Operators Not discussed Grouping/aggregation

INTERNALS Physical Plans Indexes Clustered v. Unclustered Hash v. B-Tree Single v. Compound When to apply Benefit?

INTERNALS Physical Plans Cost estimation Disk I/Os Blocks and Tuples Formulae (good for your notesheet) Tuple estimation Selectivity factor Disk Scheduling Starvation Motivations

PARALLEL DB Motivations Can t store all on one DB High throughput Speedup v. Scaleup Definitions Replication and Partitioning Shared-memory, shared-disk, shared-nothing Inter-query, inter-operator, intra-operator Block, hash, range partitioning

PARALLEL DB Applications Startup costs Skew Distributed join v. Broadcast join Reshuffling Map/Reduce Framework/Model When to apply What is programmed v. handled by framework No code

QUESTIONS That s the material Things that will be on the exam Short answer SQL Subquery Datalog Relational Algebra Cost Estimation

QUESTIONS Smaller question material Parallel DB Semi-structured data SQL++ Disk I/O DB Design

ADVICE Look through the exam first Try and do easiest questions first Short answer questions are worth equal amounts, varying difficulty Long exam, get easy points first Always be sure you understand the question

ADVICE Go through previous exams Good judgement for questions Go through HW,OQ assignments If I ve asked you something before, I am certain that you should know how to do it Think about how null values/your assumptions impact the interpretation of the data