Introduction to SQL 4/24/2017. University of Iowa SAS Users Group. 1. Introduction and basic uses 2. Joins and Views 3. Reporting examples

Similar documents
Example1D.1.sas. * Procedures : ; * 1. print to show the dataset. ;

Quality Control of Clinical Data Listings with Proc Compare

Ready To Become Really Productive Using PROC SQL? Sunil K. Gupta, Gupta Programming, Simi Valley, CA

The Art of Defensive Programming: Coping with Unseen Data

Using a HASH Table to Reference Variables in an Array by Name. John Henry King, Hopper, Arkansas

ECLT 5810 SAS Programming - Introduction

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

INF 315E Introduction to Databases School of Information Fall 2015

Using Relational Databases and SQL for Social Scientific Research: Theory and Practice

INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER

CGS 3066: Spring 2017 SQL Reference

EE221 Databases Practicals Manual

SQL language. Jaroslav Porubän, Miroslav Biňas, Milan Nosáľ (c)

CMPT 354: Database System I. Lecture 1. Course Introduction

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

MySQL for Beginners Ed 3

1. Data Model, Categories, Schemas and Instances. Outline

Welcome! Power BI User Group (PUG) Copenhagen

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

Query Optimization. Chapter 13

Efficiently Join a SAS Data Set with External Database Tables

CSE 344 JANUARY 3 RD - INTRODUCTION

An Animated Guide: Proc Transpose

PRXChange: Accept No Substitutions Kenneth W. Borowiak, PPD, Inc.

CSE 544 Principles of Database Management Systems

Database. Quiz with Explainations. Hans-Petter Halvorsen, M.Sc.

SYLLABUS. Departmental Syllabus. Structured Query Language (SQL)

SQL. The Basics Advanced Manipulation Constraints Authorization 1. 1

Chapter 9: Working With Data

Data Models and SQL for GIS. John Porter Department of Environmental Sciences University of Virginia

CS 327E Lecture 5. Shirley Cohen. September 14, 2016

Chapter 6: Modifying and Combining Data Sets

Score. 1 (10) 2 (10) 3 (8) 4 (13) 5 (9) Total (50)

origin destination duration New York London 415 Shanghai Paris 760 Istanbul Tokyo 700 New York Paris 435 Moscow Paris 245 Lima New York 455

INTRODUCTION to SAS STATISTICAL PACKAGE LAB 3

CAS CS 460/660 Introduction to Database Systems. Fall

Oracle Compare Two Database Tables Sql Query Join

Query Processing & Optimization

DBMS Questions for IBPS Bank Exam

Chapter 11: Query Optimization

DATABASE SYSTEMS. Introduction to MySQL. Database System Course, 2016

745: Advanced Database Systems

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

Shawn Dorward, MVP. Getting Started with Power Query

Proc SQL seminar

Shawn Dorward, MVP. Getting Started with Power Query

A Practical Introduction to SAS Data Integration Studio

Table of Contents. Table of Contents

Ministry of Higher Education and Scientific research

WebSphere Information Integrator

Today Learning outcomes LO2

Intro To SQL ODBC 2/3/2016

Informatics 1: Data & Analysis

Intro to SQL. Getting Your Data out of PowerSchool Lori Ivey-Dennings

Read this before starting!

CS145: Intro to Databases. Lecture 1: Course Overview

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Today's Party. Example Database. Faloutsos/Pavlo CMU /615

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

BASUG 2017 An Animated Guide: Learning The PDV via the SAS Debugger

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

ACS-3902 Fall Ron McFadyen 3D21 Slides are based on chapter 5 (7 th edition) (chapter 3 in 6 th edition)

Informatics 1: Data & Analysis

The Relational Data Model. Data Model

Chapter 13: Query Optimization. Chapter 13: Query Optimization

An Introduction to SAS University Edition

Sql 2008 Copy Table Structure And Database To

Modern Database Systems CS-E4610

SYLLABUS. Departmental Syllabus

The Relational Model

The SAS Platform. Georg Morsing

CS 327E Lecture 2. Shirley Cohen. January 27, 2016

DATABASE MANAGEMENT SYSTEM SUBJECT CODE: CE 305

Course Contents: 1 Datastage Online Training

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

NCSS: Databases and SQL

proc print data=account; <insert statement here> run;

Getting Started with the XML Engine

The Associative Model of Data and Sentences. The Next Generation of Structured Data. Lazysoft. Copyright 2014 Lazysoft

Study (s) Degree Center Acad. Period

The Relational Model Constraints and SQL DDL

PROC SQL: Why and How

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1

Institute of Aga. Network Database LECTURER NIYAZ M. SALIH

The Relational Model. Chapter 3

CS313D: ADVANCED PROGRAMMING LANGUAGE

Oracle Schema Create Date Index Trunc >>>CLICK HERE<<<

Principles of Data Management

Database Management Systems,

SAS Online Training: Course contents: Agenda:

Data, Databases, and DBMSs

LABORATORY. 16 Databases OBJECTIVE REFERENCES. Write simple SQL queries using the Simple SQL app.

PharmaSUG Paper TT11

Draft. Students Table. FName LName StudentID College Year. Justin Ennen Science Senior. Dan Bass Management Junior

Relational Model. Topics. Relational Model. Why Study the Relational Model? Linda Wu (CMPT )

Oracle Compare Two Database Tables Sql Query List All

SAS ODBC Driver. Overview: SAS ODBC Driver. What Is ODBC? CHAPTER 1

Relational Databases. Charles Severance

Microsoft Access - Using Relational Database Data Queries (Stored Procedures) Paul A. Harris, Ph.D. Director, GCRC Informatics.

Difference Between User And Schema In Sql. Server 2008 >>>CLICK HERE<<<

Transcription:

University of Iowa SAS Users Group 1. Introduction and basic uses 2. Joins and Views 3. Reporting examples 1

Patient Sex YOB Blood Type Visit # Weight Joe male 73 A+ 1 150 Joe male 73 A+ 2 153 Joe male 73 A+ 3 151 Sally female 88 O- 1 128 Sally female 88 O- 2 127 Sally female 88 O- 3 126 Tom male 71 B+ 1 163 Tom male 71 B+ 2 165 Old School Flat File Structure Patient Sex YOB Blood Type Joe male 73 A+ Sally female 88 O- Tom male 71 B+ Patient Visit # Weight Joe 1 150 Joe 2 153 Joe 3 151 Sally 1 128 Sally 2 127 Sally 3 126 Tom 1 163 Tom 2 165 Relational File Structure SQL Oracle Database MySQL Microsoft SQL Server Access IBM DB2 Informix SAP Sybase Teradata PostgreSQL SQLite 2

SQL Performs direct data access functions Retrieves and Updates tables Proc SQL; A Simple select select name, sex, age from class; ---------- proc print data=class noobs; var name sex age; run; Result: output Name Sex Age Alfred M 14 Alice F 13 Barbara F 13 Carol F 14 Henry M 14 James M 12 Jane F 12 Janet F 15 Jeffrey M 13 John M 12 Joyce F 11 Judy F 14 Louise F 12 Mary F 15 Philip M 16 Robert M 12 Ronald M 15 Thomas M 11 William M 15 3

A Simple select select * from class; ---------- proc print data=class noobs; run; Result: output select with where select * from class where Sex= M ; ---------- proc print data=class noobs; where Sex= M ; run; Result: output 4

select with where with nested select select * from class where height<(select median(height) from class); Result: output select with order by select * from class order by sex, age; ---------- proc sort data=class; by sex age; proc print data=class noobs; run; Result: output John M 12 59 Alfred M 14 69 Ronald M 15 67 Philip M 16 72 5

create table create table ClassCopy as select * from class; ---------- data ClassCopy; set class; run; Result: (no output) insert insert into class (Name, Sex, Age, Age Height) values ( Norm, M, 15, 54.2); ---------- data class; set class end=eof; output; if eof then do; name= Norm ; Sex= M ; age=15; height=54.2; output; end; run; Result: (no output) Norm M 15 54.2 6

Norm M 15 54.2 delete delete from class where name= Norm ; ---------- data class; set class; if name= Norm then delete; run; Result: (no output) update update class set height=69.3 where name= Alfred ; set height=57.9, age=19 where name= Alice ; ---------- data class; set class; if name= Alfred then height=69.3; if name= Alice then do; height=57.9; age=19; end; run; Result: (no output) Alfred M 14 69.3 Alice F 19 57.9 7

functions select avg(age) as MeanAge from class; ---------- proc means data=class; var age; run; Result: output 13.31579 create a macro variable select avg(age) into :mnage from class; %put Mean age is &mnage ; Result: log Mean age is 13.31579 8

Proc SQL Syntax More fun things we can do with Proc SQL 1. Create new variables 2. Group data 3. Subset grouped data 4. Conditional Processing 5. Joining Datasets (WITHOUT sorting!) Proc SQL Syntax: Creating new variables using as Dataset: Baseball_career Dataset: Baseball_select only variables you pick will be in output you can rename variables with the as statement 9

Proc SQL Syntax: Creating new variables using as Dataset: Baseball_career Dataset: newvar -or- Proc SQL Syntax: Grouping Data Dataset: Baseball_career Result: (no output) 10

Proc SQL Syntax: Grouping Data Dataset: Baseball_career Result: (no output) Proc SQL Syntax: Grouping Data Dataset: Baseball_career Result: (no output) 11

Proc SQL Syntax: Conditional Processing Dataset: Baseball_career Dataset: baseball_hr Proc SQL Syntax: Joining Datasets Dataset: Baseball_career Dataset: Baseball_1986 No need for sorting 12

Proc SQL Syntax: Joining Datasets Inner Joins Uses where keeps common observations based on keyword Dataset = innerball Proc SQL Syntax: Joining Datasets Outer Joins Uses on keeps all observations from both datasets Dataset = outerball 13

Proc SQL Syntax: Joining Datasets Left/Right Joins Uses on keeps observations in left dataset Dataset = leftball PROC SQL and the Cloud Canvas Data Star schema database in the Amazon Cloud Fact and dimension tables Page view table > 1 billion rows 14

Practical Need for PROC SQL Data Reports for Instructors What are the distributions of scores? Which students are participating in discussion forums? Using the Unizin Data Platform The Office of Teaching, Learning and Technology configured connections to UDP via SAS SAS provides robust data management tools SAS provides a variety of statistical analyses and tools for creating reports Very simple to use ODBC connection to connect to Redshift SQL in SAS can be read by DBAs or other application developers Some downsides of SAS Not able to process Requests table for many courses at once The Big Integer is Pythonic Not commonly used by developers 30 15

Overcoming some hurdles 31 Reporting the Outcomes of Mandatory Quiz The Department of Art and Art History requires students to take a mandatory quiz about safety. An Auditing and Compliance Department must collect identities of students who have passed the quiz. We connect to UDP and our SIS in same programming steps. 32 16

Efficiency of assignment_dim Ability to find assignments by their title Effective because submission_dim does not have course_id values PROC SQL; create table ELEMENTS_quiz as SELECT A.TITLE, A.ID AS ASSIGNMENT_ID, (dbsastype = (id='char(20)' course_id = 'char(20)')) A INNER JOIN WORK.ELEMENTS B ON A.COURSE_ID=B.ID WHERE title = 'SAAH Quiz'; QUIT; 33 Report for College of Medicine Goal: Report of scores on all major assignments for students in the medical school 1. Pull all courses for the Med School from Account_dim and inner join with Course_dim Where course_dim.workflow_state = available Enrollment_term_id is current semester 2. Pull all exams from assignment_dim table Account_dim Course_dim Parent_account_id Course_id Course_id Enrollment_term_id Workflow_state proc sql noprint; create table ASSIGNMENTS as select *, id as assignment_id label = "assignment_id, course_id from u.assignment_dim where course_id in (select course_id from MED_V2) and prxmatch('m/exam/oi', title) > 0 and workflow_state = 'published'; Assignment_dim Submission_dim Course_id Assignment_id Title Assignment_id User_id Grade Workflow_state 34 17

Report for College of Medicine Goal: Report of scores on all major assignments for students in the medical school 3. Inner join assignment_dim with submission_dim table Workflow_state = graded User_id ^= Remove % signs using TRANWRD function 4. Inner join with pseudonym_dim table to get SIS_user_id select TRANWRD(grade, %, ) Account_dim Course_dim Assignment_dim Parent_account_id Course_id Course_id Enrollment_term_id Workflow_state Course_id Assignment_id Title Submission_dim Assignment_id User_id Grade Workflow_state 35 Preparation of a Discussion Report Goal: Report of discussion posts, replies, and views for each student Getting Discussion Posts and Replies Pseudonym_dim 1. Match user_id to university_id using Pseudonym_dim table 2. Pull Discussion_entry_fact by Discussion_entry_fact course_id 3. Inner join Discussion_entry_fact with Discussion_entry_dim If depth > 1 output to Replies Discussion_entry_dim If depth = 1 output to Posts 4. Sort by User_id Count overall user posts & replies 5. Inner join with Discussion_topic_dim 6. Sort by User_id & Topic_id Count topic posts and replies by user User_id SIS_user_id Discussion_topic_dim Course_id Discussion_entry_id Topic_id User_id ID Depth ID Title 36 18

Preparation of a Discussion Report Goal: Report of discussion posts, replies, and views for each student Note: Views were defined as rows in the requests table related to discussions, as long as the subsequent rows were for different topic_ids or > 15 minutes apart 1. Pull from requests by course_id 2. Extract discussion_id from URL 3. Sort by user_id, topic_id_abbr, timestamp 4. Count views by user_id with conditional statements and lag function proc sql noprint; create table requests2_&course_number as select *, input(substr(url, 33, 6), 12.) as topic_id_abbr label = 'topic_id_abbr' from requests_&course_number; timestamp2 = timestamp - (lag(timestamp)); if first.sis_user_id then posts_read = 1; if sis_user_id = lag(sis_user_id) and topic_id_abbr = lag(topic_id_abbr) and timestamp2 < 900 then posts_read = posts_read + 0 ; 37 sis_user_id VIEWS REPLIES POSTED user_id STUDENT_FIRST_NAME STUDENT_LAST_NAME 00000001 48 21 14-2999999999999 Prince Hamlet 00000002 16 19 10 88888888888888 King Lear 00000003 23 10 10-777777777777 Lord Montague 00000004 37 13 12-5555555555555 Lord Capulet 00000005 68 26 15 4444444444444 Duke Albany 00000006 16 11 13-3333333333333 Duke Cornwall 38 19