An SQL Tutorial Some Random Tips

Similar documents
This is usually wrong. It loses information for people that have either no payroll record, or no investment record - An outer ioin is called for.

Overview of Data Management Tasks (command file=datamgt.sas)

INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER

Get Going with PROC SQL Richard Severino, Convergence CT, Honolulu, HI

Stat Wk 3. Stat 342 Notes. Week 3, Page 1 / 71

Tales from the Help Desk 6: Solutions to Common SAS Tasks

PROC SQL vs. DATA Step Processing. T Winand, Customer Success Technical Team

David Ghan SAS Education

capabilities and their overheads are therefore different.

Sample Questions. SAS Advanced Programming for SAS 9. Question 1. Question 2

Uncommon Techniques for Common Variables

Contents of SAS Programming Techniques

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

Dictionary.coumns is your friend while appending or moving data

Implementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations. SQL: Structured Query Language

Why choose between SAS Data Step and PROC SQL when you can have both?

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

SQL Metadata Applications: I Hate Typing

What is SQL? Designed to retrieve data from relational databases, as well as to build and administer those databases.

T-SQL Training: T-SQL for SQL Server for Developers

Posters. Workarounds for SASWare Ballot Items Jack Hamilton, First Health, West Sacramento, California USA. Paper

SAS Certification Handout #10: Adv. Prog. Ch. 5-8

Know Thy Data : Techniques for Data Exploration

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 8 Advanced SQL

Using the SQL Editor. Overview CHAPTER 11

Unit Assessment Guide

Hypothesis Testing: An SQL Analogy

How to Create Data-Driven Lists

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 8 Advanced SQL

SAS File Management. Improving Performance CHAPTER 37

SQL Data Querying and Views

MOBILE MACROS GET UP TO SPEED SOMEWHERE NEW FAST Author: Patricia Hettinger, Data Analyst Consultant Oakbrook Terrace, IL

Identifying Duplicate Variables in a SAS Data Set

A Better Perspective of SASHELP Views

Writing Analytical Queries for Business Intelligence

SQL: The Sequel. Phil Rhodes TAIR 2013 February 11, Concurrent Session A6

MTA Database Administrator Fundamentals Course

Tips for Mastering Relational Databases Using SAS/ACCESS

... ) city (city, cntyid, area, pop,.. )

David Beam, Systems Seminar Consultants, Inc., Madison, WI

PROC SQL VS. DATA STEP PROCESSING JEFF SIMPSON SAS CUSTOMER LOYALTY

Efficiently Join a SAS Data Set with External Database Tables

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

Querying Data with Transact SQL

Paper S Data Presentation 101: An Analyst s Perspective

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

Language. f SQL. Larry Rockoff COURSE TECHNOLOGY. Kingdom United States. Course Technology PTR. A part ofcenqaqe Learninq

PROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need

Workbooks (File) and Worksheet Handling

Querying Data with Transact-SQL

SAS Viya 3.1 FAQ for Processing UTF-8 Data

SQL Data Query Language

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

10 The First Steps 4 Chapter 2

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Planting Your Rows: Using SAS Formats to Make the Generation of Zero- Filled Rows in Tables Less Thorny

Comparison of different ways using table lookups on huge tables

Beginning Tutorials. Paper 53-27

SAS Institute Exam A SAS Advanced Programming Version: 6.0 [ Total Questions: 184 ]

SQL: Data Querying. B0B36DBS, BD6B36DBS: Database Systems. h p:// Lecture 4

Using Recursion for More Convenient Macros

A Quick and Gentle Introduction to PROC SQL

Advanced SQL Tribal Data Workshop Joe Nowinski

20761 Querying Data with Transact SQL

Chapter 2. Performing Advanced Queries Using PROC SQL

20461: Querying Microsoft SQL Server 2014 Databases

Querying Microsoft SQL Server

Base and Advance SAS

Top 5 Handy PROC SQL Tips You Didn t Think Were Possible

Principles of Data Management

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 7 Introduction to Structured Query Language (SQL)

Submitting SAS Code On The Side

20461: Querying Microsoft SQL Server

Validation Summary using SYSINFO

Querying Microsoft SQL Server (MOC 20461C)

Querying Microsoft SQL Server 2014

Are Your SAS Programs Running You? Marje Fecht, Prowerk Consulting, Cape Coral, FL Larry Stewart, SAS Institute Inc., Cary, NC

[AVNICF-MCSASQL2012]: NICF - Microsoft Certified Solutions Associate (MCSA): SQL Server 2012

Chapter # 7 Introduction to Structured Query Language (SQL) Part II

Selec%on and Decision Structures in Java: If Statements and Switch Statements CSC 121 Fall 2016 Howard Rosenthal

COURSE OUTLINE MOC 20461: QUERYING MICROSOFT SQL SERVER 2014

Teradata SQL Features Overview Version

PharmaSUG China Mina Chen, Roche (China) Holding Ltd.

Oracle Database 11g: SQL and PL/SQL Fundamentals

INTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey

AVANTUS TRAINING PTE LTD

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Adjusting for daylight saving times. PhUSE Frankfurt, 06Nov2018, Paper CT14 Guido Wendland

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

STATION

Objectives Reading SAS Data Sets and Creating Variables Reading a SAS Data Set Reading a SAS Data Set onboard ia.dfwlax FirstClass Economy

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

Useful Tips from my Favorite SAS Resources

Efficient Processing of Long Lists of Variable Names

An Introduction to SAS/FSP Software Terry Fain, RAND, Santa Monica, California Cyndie Gareleck, RAND, Santa Monica, California

JUST PASSING THROUGH OR ARE YOU? DETERMINE WHEN SQL PASS THROUGH OCCURS TO OPTIMIZE YOUR QUERIES Misty Johnson Wisconsin Department of Health

Ready To Become Really Productive Using PROC SQL? Sunil K. Gupta, Gupta Programming, Simi Valley, CA

DBLOAD Procedure Reference

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

Querying Microsoft SQL Server 2008/2012

Transcription:

An SQL Tutorial Some Random Tips Presented by Jens Dahl Mikkelsen SAS Institute A/S Author: Paul Kent SAS Institute Inc, Cary, NC.

Short Stories Towards a Better UNION Outer Joins. More than two too. Logical Expressions Case Free SoRtiNg Data about Data

A Better UNION?

A Better UNION? Can SQL do this? DATA A; SET B C; RUN;

A Better UNION? Can SQL do this? sure Create table A as select * from B union select * from C;

A Better UNION? not quite! SQL Set Operators have strict mathematical semantics UNION is formed on a column by column basis not matched by name UNION requires no duplicate rows in result this is an expensive operation

A Better UNION? SQL can do this... Create table A as select * from B union ALL CORRESPONDING select * from C;

A Better UNION? SQL CORRESPONDING avoids the SORT QUERY is still interpreted DATASTEP Doesn t need sort to begin with Program is compiled into Native Machine Code Which Means...

A Better UNION? DATA STEP : 1 SQL : 0 huh? isn t this an SQL talk?

<not> A Better UNION? Choose your battles wisely. Do not abandon those DATA STEP skills. Might still choose SQL if: UNION is part of a larger query You expect to port the program to a NON SAS environment Performance is not your only Metric

Outer Joins More than Two Too

Outer Join vs. inner join Data A Data B

Outer Join vs. inner join Inner Join select * from a, b where a.key = b.key;

Outer Join vs. inner join Left Join select * from a left join b on a.key = b.key;

Outer Join vs. inner join Full Join select * from a full join b on a.key = b.key;

Outer Joins Most People get their SQL joins *wrong* Non Matched records are dropped Information is lost from reports Duplicate Matches seems to double up Totals are unpredictable

Outer Joins Consider these Example Datasets PEOPLE PAYROLL INVESTments All linked by a common PERSON

Outer Joins select * from people, payroll, invest where peo.person = pay.person and peo.person = inv.person ;

Outer Joins This is usually *wrong* Even if all people are recorded in PEOPLE Some may not get paid Some may not have investments SQL default is to drop records with no match Where clause is not true. This combination of rows is not considered interesting

Outer Joins select * from people left join payroll on peo.person = pay.person left join invest on peo.person = inv.person ;

Outer Joins Better... This query retains people who are not paid not invested SQL provides missing values

Outer Joins {LEFT RIGHT FULL} JOIN is SQL syntax that supplements the, used in a FROM Clause. ON {expression} is used instead of a WHERE clause May still have records in the result set for which the ON Clause is not TRUE

Have we got it right yet? Select * has problems Includes the join-key person three times people.person payroll.person invest.person How to choose the correct one?

Outer Joins select people.*, pay.var1,pay.var2,... inv.var1,inv.var2,... from...

Outer Joins Are Your Data perfect? This example assumed: People is a complete set No payroll records exist without corresponding people record No investment records exist without corresponding people record

Outer Joins You may know the data are perfect RDBMS integrity constraints Application controls Your Boss told you so But what if it aint so?

Outer Joins select COALESCE(peo.person, pay.person, inv.person) as person, peo,var1,peo.var2,... pay.var1,pay.var2,... inv.var1,inv.var2,... from...

Outer Joins COALESCE returns its first non-missing argument correctly selects the person key even if the corresponding record is not from the people dataset. Is it correct yet? what if we have a payroll record and an investment record for PAUL, but no people record?

Outer Joins From Clause needs fixing too. Left Join is only appropriate in situations where you are 100% confident in the master detail relationship. Full Join can handle the uncertainty of data coming from either table and not the other Full Joins much harder to optimise. Indexes are not useful.

Outer Joins from peo full join pay on peo.person = pay.person full join inv on peo.person = inv.person ;

Is it correct yet? No What about PAUL who has a payroll record as well as an investment record, but no people record...

Outer Joins from peo full join pay on peo.person = pay.person full join inv on peo.person = inv.person ;

Outer Joins select COALESCE(peo.person, pay.person, inv.person) as person, peo,var1,peo.var2,... pay.var1,pay.var2,... inv.var1,inv.var2,... from...

from peo full join pay on peo.person = pay.person full join inv on COALESCE( peo.person, pay.person) = inv.person ;

Outer Joins - SQL 3 select * from people full natural join payroll on person full natural join invest on person;

Outer Joins - SQL 3 Fixes all these problems Choosing the key once only on the select clause Coalescing the key properly in the select clause in the on clause

Logical Expressions This is only ONE s and ZERO s

Logical Expressions Boolean Expressions in SAS *must* evaluate to one of {0,1} You may be able to exploit this: to construct a score for a matching scheme to tally the number of records for which the expression was true

Logical Expressions as Join Criteria A Match is a Catch if n or more of these conditions are true Age is within one year of matching First Name matches Last Name matches Initial matches

Logical Expressions as Join Criteria select * from A,B where (abs( a.dob - b.dob) < 365) +(a.lastname = b.lastname) +(a.initial = b.initial) +(a.fname = b.fname) >= 2;

Logical Expressions as Join Criteria Careful! Make up internal Cartesian Products. Very expensive to evaluate each possible combination of rows from contributing tables

Logical Expressions as Counters Exploit the identity that: the SUM of a logical expression is equivalent to the number of rows for which that expression was true. False contributes 0 to the sum. True contributes 1.

Logical Expressions as Counters data pets; input person $ cats dogs @@; cards; paul 5 1 linda 0 2 chris 0 2 pat 0 0 kelsey 1 0 thais 0 4 run;

Logical Expressions as Counters select person, cats > 0 as cat_own, cats > 2 as cat_love, dogs > 0 as dog_own, dogs > 2 as dog_love from pets ;

Logical Expressions as Counters PERSON CAT_OWN CAT_LOVE DOG_OWN DOG_LOVE ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒƒƒƒ paul 1 1 1 0 linda 0 0 1 0 chris 0 0 1 0 pat 0 0 0 0 kelsey 1 0 0 0 thais 0 0 1 1

Logical Expressions as Counters select sum(cats > 0) as cat_own, sum(cats > 2) as cat_love, sum(dogs > 0) as dog_own, sum(dogs > 2) as dog_love from pets ;

Logical Expressions as Counters CAT_OWN CAT_LOVE DOG_OWN DOG_LOVE ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 2 1 4 1

Logical Expressions as Counters Useful? Perhaps..

Case Free Sorting

Case Free Sorting Tech Support gets requests for SORT while ignoring case SORT by formatted values SORT a special number to the beginning end

Case Free Sorting PROC SQL allows: An expression most anywhere you could have a variable. A Subquery most anywhere you could have an expression This is ANSI SQL2, but some RDBMS do not implement it yet.

Case Free Sorting ORDER BY upcase(name) ORDER BY put(variable, format.) ORDER BY CASE WHEN ACCOUNT = 999 THEN. ELSE ACCOUNT END

Data about Data How to use DICTIONARY.TABLES to write programs that respond to the contents of libraries dynamically Suppose you have a SAS library with airline data you want to display column info for tables having the string flight in the member label you want listings of those tables.

Data about Data Get a list of available tables from DICTIONARY.TABLES Get column info from DICTIONARY.COLUMNS Use some sneaky macro and SQL tricks!

Data about Data Get a list of available tables reset noprint; select quote(memname) into :members seperated by ',' from dictionary.tables where libname='airline' and upcase(memlabel) contains FLIGHT ;

Data about Data Get the column information reset print flow= 15 20; select memname, name, label, type, length, format, idxusage from dictionary.columns where libname = 'AIRLINE' and memname in(&members);

Data about Data Column Member Column Column Column Index Name Name Column Label Type Length Column Format Type ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒ DELAY FLIGHT Flight number char 3 COMPOSITE DELAY DATE Departure date num 8 DATE7. COMPOSITE DELAY DELAY Delay in minutes num 8 5. FLINFO FLIGHT Flight Number char 3 $3. SIMPLE FLINFO ORIG Origin char 3 $3. FLINFO DEST Destination char 3 $3. FLINFO MILES Distance in num 8 5. Nautic Miles MARCH FLIGHT Flight number char 3 $3. COMPOSITE MARCH DATE Departure date num 8 DATE7. COMPOSITE MARCH DEPART Departure (local num 8 TIME8. time) MARCH MAIL Weight of mail (kg) num 8 5. MARCH FREIGHT Weight of freight num 8 5. (kg) MARCH BOARDED No. of boarded num 8 5. passengers MARCH TRANSFER No. of transfer num 8 5. passengers MARCH NONREV No. of non-revenue num 8 5. pass. MARCH DEPLANE No. of disembarked num 8 5. pass. MARCH CAPACITY Max. no of pass. num 8 5. in plane SCHEDULE FLIGHT Flight number char 3 $3. COMPOSITE SCHEDULE DATE Date num 8 DATE7. COMPOSITE SCHEDULE IDNUM Id of crew member char 4 $4. COMPOSITE

Data about Data Get the available table names into macro variables reset noprint; select memname into :mem1 thru :mem99 from dictionary.tables where libname='airline' and upcase(memlabel) contains 'FLIGHT' ; %let n_mems = &sqlobs;

Data about Data Make listings of those tables (first 20 obs.) %macro prt_mems; reset print outobs=20 number; %do i = 1 %to &n_mems; title " Listing of first 20 rows of AIRLINE.&&&mem&i"; select * from airline.&&&mem&i ; %end; title; %mend; %prt_mems;

Data about Data Partial listing Listing of first 20 rows of AIRLINE.DELAY Delay Flight Departure in Row number date minutes ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 182 01MAR94 0 2 114 01MAR94 8 3 202 01MAR94-5 4 219 01MAR94 18 5 439 01MAR94-4 6 387 01MAR94-2 7 290 01MAR94-8 8 523 01MAR94 4 9 982 01MAR94 0 10 622 01MAR94-5 11 821 01MAR94 16 12 872 01MAR94 3 13 416 01MAR94 4 14 132 01MAR94 14 15 829 01MAR94-6 16 183 01MAR94-8 17 271 01MAR94 5 18 921 01MAR94-5 19 302 01MAR94-2 20 431 01MAR94 13

Thats All Folks! SAS and SAS/ACCESS are registered trademarks of SAS Institute Inc., Cary, NC, USA. Other brand names are trademarks or registered trademarks of their respective holders.