MIS2502: Review for Exam 2. Jing Gong

Similar documents
MIS2502: Review for Exam 2. JaeHwuen Jung

MIS2502: Review for Exam 2. Jing Gong

Exam #2 Review. Zuyin (Alvin) Zheng

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

MIS2502: Data Analytics The Information Architecture of an Organization. Jing Gong

MIS2502: Review for Exam 1. Jing Gong

MIS2502: Data Analytics SQL Getting Information Out of a Database. Jing Gong

MySQL. Prof.Sushila Aghav

CSE 781 Data Base Management Systems, Summer 09 ORACLE PROJECT

MIS2502: Data Analytics SQL Getting Information Out of a Database Part 1: Basic Queries

B.2 Measures of Central Tendency and Dispersion

L e a r n S q l select where

MIS2502: Data Analytics Relational Data Modeling. Jing Gong

MIS2502: Data Analytics Relational Data Modeling. Jing Gong

Advance SQL: SQL Performance Tuning. SQL Views

Exam #1 Review. Zuyin (Alvin) Zheng

Fall 2007, Final Exam, Data Structures and Algorithms

SQL Functionality SQL. Creating Relation Schemas. Creating Relation Schemas

MIS2502: Data Analytics MySQL and SQL Workbench. Jing Gong

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Jarek Szlichta

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

SQL Server 2012 Development Course

The Lincoln National Life Insurance Company Universal Life Portfolio

How to use SQL to create a database

DATABASE MANAGEMENT SYSTEMS

Once you have defined a view, you can reference it like any other table in a database.

MERGING DATAFRAMES WITH PANDAS. Appending & concatenating Series

DEPARTMENT OF HOUSING AND URBAN DEVELOPMENT. [Docket No. FR-6090-N-01]

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City

Panelists. Patrick Michael. Darryl M. Bloodworth. Michael J. Zylstra. James C. Green

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Microsoft Exam Querying Microsoft SQL Server 2012 Version: 13.0 [ Total Questions: 153 ]

Ocean Express Procedure: Quote and Bind Renewal Cargo

CHAPTER: 4 ADVANCE SQL: SQL PERFORMANCE TUNING (12 Marks)

user specifies what is wanted, not how to find it

DSC 201: Data Analysis & Visualization

Telecommunications and Internet Access By Schools & School Districts

Oracle 11g Partitioning new features and ILM

collection of data that is used primarily in organizational decision making.

MIS2502: Data Analytics Extract, Transform, Load. Jing Gong

A New Method of Using Polytomous Independent Variables with Many Levels for the Binary Outcome of Big Data Analysis

Ex: If you use a program to record sales, you will want to remember data:

Processing of Very Large Data

Figure 1 Map of US Coast Guard Districts... 2 Figure 2 CGD Zip File Size... 3 Figure 3 NOAA Zip File Size By State...

SQL Fundamentals. Chapter 3. Class 03: SQL Fundamentals 1

HKTA TANG HIN MEMORIAL SECONDARY SCHOOL SECONDARY 3 COMPUTER LITERACY. Name: ( ) Class: Date: Databases and Microsoft Access

CostQuest Associates, Inc.

How to use SQL to work with a MySQL database

Some Basic Aggregate Functions FUNCTION OUTPUT The number of rows containing non-null values The maximum attribute value encountered in a given column

Basics of Dimensional Modeling

Configuring Oracle GoldenGate OGG 11gR2 local integrated capture and using OGG for mapping and transformations

Accommodating Broadband Infrastructure on Highway Rights-of-Way. Broadband Technology Opportunities Program (BTOP)

Guide Users along Information Pathways and Surf through the Data

SQL is a standard language for accessing and manipulating databases.

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Assignment Grading Rubric

SIT772 Database and Information Retrieval WEEK 6. RELATIONAL ALGEBRAS. The foundation of good database design

Data Analysis and Data Science

Oracle 1Z Oracle Database 12c SQL. Download Full Version :

2018 NSP Student Leader Contact Form

Database: Introduction

STOP DROWNING IN DATA. START MAKING SENSE! An Introduction To SQLite Databases. (Data for this tutorial at

Presentation to NANC. January 22, 2003

Best Practices in Data Modeling. Dan English

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

BEGINNING T-SQL. Jen McCown MidnightSQL Consulting, LLC MinionWare, LLC

CS 327E Lecture 2. Shirley Cohen. January 27, 2016

MAKING MONEY FROM YOUR UN-USED CALLS. Connecting People Already on the Phone with Political Polls and Research Surveys. Scott Richards CEO

QUETZALANDIA.COM. 5. Data Manipulation Language

Chapter 3. Introduction to relational databases and MySQL. 2010, Mike Murach & Associates, Inc. Murach's PHP and MySQL, C3

MIS2502: Data Analytics Relational Data Modeling (2) Alvin Zuyin Zheng

INTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey

CS 1655 / Spring 2013! Secure Data Management and Web Applications

MIS2502: Data Analytics Principles of Data Visualization. Alvin Zuyin Zheng

2. In Video #6, we used Power Query to append multiple Text Files into a single Proper Data Set:

CGS 3066: Spring 2017 SQL Reference

1Z Oracle Database 11g - SQL Fundamentals I Exam Summary Syllabus Questions

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehouse and Data Mining

Presented on July 24, 2018

Practical 8. SHANKER SINH VAGHELA BAPU INSTITUTE OF TECHNOLGY Subject: - DBMS Branch: - IT/CE Semester: - 3rd. 1. What does SQL stand for?

BUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree

Language. f SQL. Larry Rockoff COURSE TECHNOLOGY. Kingdom United States. Course Technology PTR. A part ofcenqaqe Learninq

CSC Web Programming. Introduction to SQL

Distracted Driving- A Review of Relevant Research and Latest Findings

Silicosis Prevalence Among Medicare Beneficiaries,

Exam Name: Querying Microsoft SQL Server 2012

Slice Intelligence!

HOW TO CREATE AND MAINTAIN DATABASES AND TABLES. By S. Sabraz Nawaz Senior Lecturer in MIT FMC, SEUSL

L Information Systems for Engineers. Final exam. ETH Zurich, Autumn Semester 2017 Friday

Database Design and Administration for OnBase WorkView Solutions. Mike Martel Senior Project Manager

FUN WITH ANALYTIC FUNCTIONS UTOUG TRAINING DAYS 2017

BCIS 4660 Homework #7 Hints

CHAPTER 16. More on SQL- Grouping Records and Table Joins

SIR MICHELANGELO REFALO

15CSL58: DATABASE MANAGEMENT LABORATORY

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table

DATABASE MANAGERS. Basic database queries. Open the file Pfizer vs FDA.mdb, then double click to open the table Pfizer payments.

Transcription:

MIS2502: Review for Exam 2 Jing Gong gong@temple.edu http://community.mis.temple.edu/gong

Overview Date/Time: Thursday, March 24, in class (1 hour 20 minutes) Place: Regular classroom Please arrive 5 minutes early! Multiple-choice and short-answer questions Closed-book, closed-note No computer

Coverage Check the Exam 2 Study Guide Not every item on this list may be on the exam, and there may be items on the exam not on this list.

SQL Out: LIMIT Example: Which order has the highest order amount? (Display the order ID and amount, and assume there is no tie) MyDB.Order OrderID CustomerID OrderDate Amount 10308 2 1996-09-18 100 10309 1 1996-09-19 10 10310 1 1996-09-20 63 SELECT OrderID, Amount FROM MyDB.Order ORDER BY Amount DESC LIMIT 1; OrderID 10308 100 Amount

SQL Joins Used to combine two or more tables, based on the common fields between them. Suppose we have. MyDB.Order OrderID CustomerID OrderDate Amount 10308 2 1996-09-18 100 10309 1 1996-09-19 10 10310 1 1996-09-20 63 MyDB.Customer CustomerID LastName FirstName Country 1 Futterkiste Alfreds Germany 2 Trujillo Ana US 3 Moreno Antonio Mexico

A Correct, Simple Join SELECT * FROM MyDB.Order, MyDB.Customer WHERE Order.CustomerID=Customer.CustomerID; MyDB.Order OrderID CustomerID OrderDate Amount 10308 2 1996-09-18 100 10309 1 1996-09-19 10 10310 1 1996-09-20 63 MyDB.Customer CustomerID LastName FirstName Country 1 Futterkiste Alfreds Germany 2 Trujillo Ana US 3 Moreno Antonio Mexico OrderID CustomerID OrderDate Amount CustomerID LastName FirstName Country 10308 2 1996-09-18 100 2 Trujillo Ana US 10309 1 1996-09-19 10 1 Futterkiste Alfreds Germany 10310 1 1996-09-20 63 1 Futterkiste Alfreds Germany

What If We Don t Have the WHERE condition? SELECT * FROM MyDB.Order, MyDB.Customer WHERE Order.CustomerID=Customer.CustomerID; MyDB.Order OrderID CustomerID OrderDate Amount 10308 2 1996-09-18 100 10309 1 1996-09-19 10 10310 1 1996-09-20 63 MyDB.Customer 3 3 = 9 rows OrderID CustomerID OrderDate Amount CustomerID LastName FirstName Country 10308 2 1996-09-18 100 1 Futterkiste Alfreds Germany 10308 2 1996-09-18 100 2 Trujillo Ana US 10308 2 1996-09-18 100 3 Moreno Antonio Mexico 10309 1 1996-09-19 10 1 Futterkiste Alfreds Germany 10309 1 1996-09-19 10 2 Trujillo Ana US 10309 1 1996-09-19 10 3 Moreno Antonio Mexico 10310 1 1996-09-20 63 1 Futterkiste Alfreds Germany 10310 1 1996-09-20 63 2 Trujillo Ana US It will fetch every possible combination (pair) of records from the two tables CustomerID LastName FirstName Country 1 Futterkiste Alfreds Germany 2 Trujillo Ana US 3 Moreno Antonio Mexico 10310 1 1996-09-20 63 3 Moreno Antonio Mexico

More Variations to Join Q: What is the total order amount by country? Step 1: We start with a simple join SELECT * FROM MyDB.Order, MyDB.Customer WHERE Order.CustomerID=Customer.CustomerID; OrderID CustomerID OrderDate Amount CustomerID LastName FirstName Country 10308 2 1996-09-18 100 2 Trujillo Ana US 10309 1 1996-09-19 10 1 Futterkiste Alfreds Germany 10310 1 1996-09-20 63 1 Futterkiste Alfreds Germany Step 2: We then end up with - SELECT Customer.Country, SUM(Order.Amount) FROM MyDB.Order, MyDB.Customer WHERE Order.CustomerID=Customer.CustomerID GROUP BY Customer.Country; Country SUM(Amount) US 100 Germany 73

SQL (Subselects) Subselect query can return One single value (one column, one row) A temporary table (one or multiple columns, one or multiple rows)

One single value: Used With Comparison Operators SELECT column_name1 FROM schema_name.table_name1 WHERE column_name2 comparison_operator (SELECT column_name3 FROM schema_name.table_name2 WHERE condition); comparison_operator could be equality operators such as =, >, <, >=, <=. Treated as a single value

Subselect as One Single Value: Example 1 Q: What are the order IDs with order amount below average? MyDB.Order OrderID CustomerID OrderDate Amount 10308 2 1996-09-18 100 10309 1 1996-09-19 10 10310 1 1996-09-20 63 Step 1. We start by write the subselect query that returns the average order amount: Amount SELECT AVG(Amount) FROM MyDB.Order Step 2. We treat the subselect query as a single value: SELECT OrderID, Amount FROM MyDB.Order WHERE Amount < (SELECT AVG(Amount) FROM MyDB.Order); OrderID 10308 100 10310 63 Amount 57.6667

Subselect as One Single Value: Example 2 Q: Which German customer placed the order(s) with the highest order amount? MyDB.Order OrderID CustomerID OrderDate Amount 10308 2 1996-09-18 100 10309 1 1996-09-19 10 10310 1 1996-09-20 63 MyDB.Customer CustomerID LastName FirstName Country 1 Futterkiste Alfreds Germany 2 Trujillo Ana US 3 Moreno Antonio Mexico Recall the simple join query SELECT * FROM MyDB.Order, MyDB.Customer WHERE Order.OrderID=Customer.CustomerID; OrderID CustomerID OrderDate Amount CustomerID LastName FirstName Country 10308 2 1996-09-18 100 2 Trujillo Ana US 10309 1 1996-09-19 10 1 Futterkiste Alfreds Germany 10310 1 1996-09-20 63 1 Futterkiste Alfreds Germany

Subselect as One Single Value: Example 2 Q: Which German customer placed the order(s) with the highest order amount? Step 1. We start by write the subselect query that returns the highest order amount for German customers: SELECT MAX(Order.Amount) FROM MyDB.Order, MyDB.Customer WHERE Order.CustomerID=Customer.CustomerID AND Customer.Country= Germany ; Step 2. We treat the subselect query as a single value: SELECT Customer.LastName, Customer.FirstName FROM MyDB.Order, MyDB.Customer WHERE Order.OrderID=Customer.CustomerID AND Customer.Country= Germany AND Amount= (SELECT MAX(Order.Amount) FROM MyDB.Order, MyDB.Customer WHERE Order.CustomerID=Customer.CustomerID AND Customer.Country= Germany ;); Amount 63 LastName FirstName Amount Futterkiste Alfreds 63

As a temporary table SELECT column_name(s) FROM (SELECT column_name(s) FROM schema_name.table_name2 WHERE condition) WHERE condition; Treated as a table

SQL (CREATE TABLE) If there is no foreign key: CREATE TABLE schema_name.table_name ( columnname1 datatype [NULL][NOT NULL], columnname2 datatype [NULL][NOT NULL], PRIMARY KEY (KeyName); If there is foreign key: CREATE TABLE schema_name.table_name ( columnname1 datatype [NULL][NOT NULL], columnname2 datatype [NULL][NOT NULL], PRIMARY KEY (KeyName), FOREIGN KEY (ForeignKeyName) REFERENCES reference_table_name(referencekeyname)); When to use NOT NULL?

DROP TABLE, ALTER TABLE DROP TABLE schema_name.table_name; ALTER TABLE schema_name.table_name ADD COLUMN column_name datatype [NULL][NOT NULL]; or ALTER TABLE schema_name.table_name DROP COLUMN column_name; or ALTER TABLE schema_name.table_name CHANGE COLUMN old_column_name new_column_name datatype [NULL][NOT NULL]; Drops a table Adds a column to the table Removes a column from the table Changes a column in the table

INSERT INTO, UPDATE, DELETE FROM To insert a record into a table: INSERT INTO schema_name.table_name (columnname1, columnname2, columnname3) VALUES (value1, value2, value3); To change data in a row: UPDATE schema_name.table_name SET columnname1=value1, columnname2=value2 WHERE condition; To delete a row from a table: DELETE FROM schema_name.table_name WHERE condition; What to use in the WHERE condition? How about this WHERE condition?

Data Types Data type Description Examples INT Integer 3, -10 DECIMAL(p,s) VARCHAR(n) Decimal. Example: decimal(5,2) is a number that has 3 digits before decimal and 2 digits after decimal String (numbers and letters) with maximum length n 3.23, 3.14159 'Hello, I like pizza, MySQL!' DATETIME, DATE Date/Time, or just Date '2011-09-01 17:35:00', '2011-04-12' BOOLEAN Boolean value 0 or 1 Q: What does DECIMAL(4, 1) indicate? A: The value cannot go beyond -999.9~ 999.9

ETL and Assignment 4 Extract data from the operational data store Transform data into an analysisready format Load it into the analytical data store What is it? Why is it important? Data consistency Data quality Explain the purpose of each component (Extract, Transform, Load) Excel functions VLOOKUP and CONCATENATE

Dimensional Data Modeling Data warehouse vs data mart vs data cube Kimball s four step process for data mart design Data Cube Store Ardmore, PA Temple Main Cherry Hill, NJ King of Prussia, PA quantity & total price quantity & total price quantity & total price quantity & total price Product Diet Famous M&Ms Doritos Coke Amos quantity quantity quantity & total & total & total price price price quantity quantity quantity & total & total & total price price price quantity quantity quantity & total & total & total price price price Mar. 2012 quantity quantity quantity Feb. 2012 & total & total & total price price price Jan. 2012 1. Choose the business process 3. Decide on the level of granularity 2. Identify the fact 4. Identify the dimensions Star schema non-volatility of data cubes

Pivot Tables and Assignment 5 Given a question about a set of data, be able to identify the fields required to create a pivot table Identify which fields are assigned as VALUES and which ones are assigned as ROWS Identify the correct function for aggregation: i.e., SUM, COUNT, AVERAGE

Pivot Tables vs. data cubes Equivalent to dimensions in a data cube Equivalent to measured facts in a data cube

Data Visualization Data visualization principles. Tell a story Graphical integrity (lie factor) Minimize graphical complexity (data ink, chartjunk)

Chart #1: 150000 140000 130000 120000 110000 100000 90000 80000 70000 60000 50000 160000 140000 120000 100000 80000 60000 40000 20000 0 Bonus Assignment Monthly Sales Monthly Sales ($) Issues: Tell a Story The vertical axis isn t labeled. We don t know the unit. Graphical Integrity The vertical axis does not start from zero Graphical Complexity Horizontal and vertical lines are unnecessary (Chartjunk)

Chart #1: 150000 140000 130000 120000 110000 100000 90000 80000 70000 60000 50000 160000 140000 120000 100000 80000 60000 40000 20000 0 Bonus Assignment Monthly Sales Monthly Sales ($) Issues: Tell a Story The vertical axis isn t labeled. We don t know the unit. Graphical Integrity The vertical axis does not start from zero Graphical Complexity Horizontal and vertical lines are unnecessary (Chartjunk) This also works

Bonus Assignment Chart #2: Quantity Sold by Customer Type Issues: Silver 32% Gold 40% Graphical Integrity The 3D chart makes it difficult to compare the sizes Platinum 28% Quantity Sold by Customer Type Silver 32% Gold 40% Graphical Complexity The 3D chart requires more ink (Chartjunk) Platinum 28%

Chart #3: 200000 150000 100000 50000 0 33991 9986 184847 126167 67524 76853 46094 19389 30667 33014 20076 Bonus Assignment Total Sales by State 92287 54359 136338 107344 66318 2693532756 47377 53469 AZ CO GA LA MN NC OH RI TN VA Total Issues: Tell a Story Vertical axis isn t labeled. We don t know the units Because there are many states to compare, horizontal lines may be helpful 200000 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 Total Sales by State ($) AZ CA CO FL GA HI LA MI MN MO NC NY OH PA RI SC TN UT VA WA Graphical Integrity The 3D chart makes it difficult to compare sizes The cone-shaped bars make it even harder to compare sizes Graphical Complexity The 3D chart requires more ink (Chartjunk) The number labels are unnecessary

Good Luck!