Reading data in SAS and Descriptive Statistics

Similar documents
STAT:5400 Computing in Statistics

Chapter 2: Getting Data Into SAS

EXST SAS Lab Lab #6: More DATA STEP tasks

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

Stat 302 Statistical Software and Its Applications SAS: Data I/O

STAT 503 Fall Introduction to SAS

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Module 2

Stat 302 Statistical Software and Its Applications SAS: Data I/O & Descriptive Statistics

ECLT 5810 SAS Programming - Introduction

Introductory Guide to SAS:

The Programmer's Solution to the Import/Export Wizard

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

DSCI 325: Handout 2 Getting Data into SAS Spring 2017

Writing Programs in SAS Data I/O in SAS

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC

ST Lab 1 - The basics of SAS

25 Saving Setting Guide Import/Export Nodes and Symbols

22S:166. Checking Values of Numeric Variables

STAT 7000: Experimental Statistics I

From An Introduction to SAS University Edition. Full book available for purchase here.

EXAMPLE 2: INTRODUCTION TO SAS AND SOME NOTES ON HOUSEKEEPING PART II - MATCHING DATA FROM RESPONDENTS AT 2 WAVES INTO WIDE FORMAT

Using an ICPSR set-up file to create a SAS dataset

Exploring the Microsoft Access User Interface and Exploring Navicat and Sequel Pro, and refer to chapter 5 of The Data Journalist.

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Base and Advance SAS

SAS/ACCESS 9.2. Interface to PC Files Reference. SAS Documentation

INTRODUCTION to. Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010

PHPM 672/677 Lab #2: Variables & Conditionals Due date: Submit by 11:59pm Monday 2/5 with Assignment 2

Getting Your Data into SAS The Basics. Math 3210 Dr. Zeng Department of Mathematics California State University, Bakersfield

Econ Stata Tutorial I: Reading, Organizing and Describing Data. Sanjaya DeSilva

An Introduction to SAS University Edition

ECONOMICS 351* -- Stata 10 Tutorial 1. Stata 10 Tutorial 1

Using SAS to Analyze CYP-C Data: Introduction to Procedures. Overview

SAS Training Spring 2006

EXST3201 Mousefeed01 Page 1

Dr. Barbara Morgan Quantitative Methods

Epidemiology Principles of Biostatistics Chapter 3. Introduction to SAS. John Koval

STAT:5400 Computing in Statistics. Other software packages. Microsoft Excel spreadsheet very convenient for entering data in flatfile

Lab #1: Introduction to Basic SAS Operations

SAS and Data Management Kim Magee. Department of Biostatistics College of Public Health

SAS/ACCESS Interface to PC Files for SAS Viya 3.2: Reference

5b. Descriptive Statistics - Part II

Other Data Sources SAS can read data from a variety of sources:

Download Instructions

Introduction to SAS. I. Understanding the basics In this section, we introduce a few basic but very helpful commands.

Exercise 1: Introduction to Stata

Moving Data and Results Between SAS and Excel. Harry Droogendyk Stratia Consulting Inc.

ssh tap sas913 sas

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

SAS/ACCESS Interface to PC Files for SAS Viya : Reference

Laboratory Topics 1 & 2

.txt - Exporting and Importing. Table of Contents

SAS CURRICULUM. BASE SAS Introduction

SAS and Data Management

Chapter 5: Compatibility of Data Files

After opening Stata for the first time: set scheme s1mono, permanently

Introduction to Stata - Session 2

File Importing - Text Files

The SAS interface is shown in the following screen shot:

INTRODUCING. Access. R. Kumar

Assignment 0. Nothing here to hand in

Chapter 1 The DATA Step

Biostatistics 600 SAS Lab Supplement 1 Fall 2012

Introduction to Stata

Revision of Stata basics in STATA 11:

A Short Guide to Stata 10 for Windows

A Step by Step Guide to Learning SAS

ECO375 Tutorial 1 Introduction to Stata

Overview of Data Management Tasks (command file=datamgt.sas)

Chapter 7 File Access. Chapter Table of Contents

Lab 1: Introduction to Data

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

Level I: Getting comfortable with my data in SAS. Descriptive Statistics

Using Dynamic Data Exchange

How to use Pivot table macro

DSCI 325: Handout 10 Summarizing Numerical and Categorical Data in SAS Spring 2017

Introduction to Stata: An In-class Tutorial

The INPUT Statement: Where It

28 Simply Confirming On-site Status

SPSS Statistics 21.0 GA Fix List. Release notes. Abstract

Basic Medical Statistics Course

Learning SAS by Example

StatLab Workshops 2008

CSV Import Guide. Public FINAL V

Stata: A Brief Introduction Biostatistics

Stata v 12 Illustration. First Session

* Sample SAS program * Data set is from Dean and Voss (1999) Design and Analysis of * Experiments. Problem 3, page 129.

Homework 1 Excel Basics

Chapter 6: Getting Data out of TntMPD

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

Accessing Data and Creating Data Structures. SAS Global Certification Webinar Series

An Introduction to Stata Part I: Data Management

IT 433 Final Exam. June 9, 2014

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Introduction to SAS Procedures SAS Basics III. Susan J. Slaughter, Avocet Solutions

17. Reading free-format data. GIORGIO RUSSOLILLO - Cours de prépara)on à la cer)fica)on SAS «Base Programming» 386

Department of Instructional Technology & Media Services Blackboard Grade Book

An Introduction to R- Programming

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata

Introduction to SAS Mike Zdeb ( , #1

Transcription:

P8130 Recitation 1: Reading data in SAS and Descriptive Statistics Zilan Chai Sep. 18 th /20 th 2017

Outline Intro to SAS (windows, basic rules) Getting Data into SAS Descriptive Statistics

SAS Windows Log Explorer Editor

Editor window a text editor, where you enter SAS commands and create SAS programs. Log window - after you submit a SAS program, any notes, errors, or warnings associated with your program as well as the program statements themselves will appear in the Log window. Output window - displays any printable results Explorer window gives easy access to your SAS files and libraries Results table of contents for the Output window (tree lists)

SAS Language Most programs have two major components: Data step(s) reads data, manipulates/combines it and print reports Procedure step(s) perform analysis on the data and generate output Every step ends with a statement Sequential execution of statements Every statement ends with a semicolon ; Statement can continue on the next line Statements can be on the same line as other statements NOT case sensitive! Exception quoted strings Comments: can be inserted in two ways *... ; OR /*...*/

Simple examples * The following data step inputs 6 variables for 4 individuals; data sbp; input pat_name $ pat_id gender $ year1 year2 year3; /*year1-year3*/ cards; John 1002 M 90 120 125 Alice 1003 F 140 148 116 Mike 1004 F 121 130 117 Barbara 1005 M 151 144 148 ; proc import datafile= 'path:\filename' out= dataname;

Rules for SAS names Follow these simple rules when making up names for variables and data set members: Names must be 32 characters or less Names can contain only letters, numbers, or underscores.(no %$!*&#@) Names must start with a letter or an underscore Names can contain upper- and lowercase letters.

Methods of getting your dada into SAS 1. Entering data directly into SAS data sets 2. Creating SAS data sets from raw data files 3. Reading Files with the IMPORT Procedure 4. Reading Files with the Import Wizard 5. Using existed SAS data sets

1. Entering data directly into SAS data sets (Data Step INPUT Statement) * The following data step inputs 6 variables for 4 individuals; data sbp; input pat_name $ pat_id gender $ year1 year2 year3; /*year1-year3*/ cards; John 1002 M 90 120 125 Alice 1003 F 140 148 116 Mike 1004 F 121 130 117 Barbara 1005 M 151 144 148 ; DATA statement names the dataset INPUT statement lists variable names, also specifys the data format CARDS statement lists data RUN statement tells SAS to execute the block of code of this step

Some modifiers for the INPUT statement & @ @n tells SAS to use two whitespaces characters to signal the end of a character variable holds the line to allow further input statements in this iteration of the data step on the same data placed before a variable name, it moves the pointer to column n +n placed before a variable name, it moves the pointer to the right n columns @@ holds the line to allow continued reading from the line on subsequent iterations of the data step (lets SAS know that there are multiple observations per line)

2. Creating SAS data sets from raw data files (Data Step INFILE Statement) Read external files containing data The INFILE statement must proceed the INPUT statement data iron; infile path\iron.dat'; *where to find data; input ID $ serferr ironaa tibcaa trnsfrec trnsfia; *variable names;.dat files are primarily associated with Data and have no specific structure (can be text, graphic, or general binary data)

INFILE Statement - Options Missover Lrecl=num Dlm= chars Dsd firstobs=num Obs=num set values to missing if an input statement would read more than one line treat the input as having a length of numcharacters. Required if input records are longer than 256 characters use the characters in charsinstead of blanks and tabs as separators in list (free-form) input read comma-separated data start reading data on the numline of the file stop reading after the numdata line (e.g., obs=5)

3. Reading Files with the IMPORT Procedure Easier for certain types of data files Scans the data file, automatically determines the variable types, assigns lengths to the character variables, and can recognize some date formats proc import datafile= 'path:\filename' out= dataname;

PROC IMPORT -Options DBMS= Replace Delimiter specify the file extension if there is none: CSV (commadelimited), TAB (tab-delimited), XLS (Excel), ACCESS, DTA (Stata), DLM (other), etc. overwrite an existing data set with the one specified in out= option specify what delimiter is used; default is a blank space (e.g., delimiter= / ) Getnames=NO do not get variable names from the first line of input file. Default is YES. In NO, the variables are named Var1, Var2,

4. IMPORT WIZARD SAS has a nice feature for importing data: 1. Go to File -> Import Data 2. Select the type of file you want to import (e.g.,.csv,.txt) 3. Locate the file 4. Choose destination (default WORK library) and name for the data set (member) Proc Import statements will automatically be generated

EXPORT WIZARD Similarly, you can export data files (e.g., Excel) 1. Go to File -> Export Data 2. Choose Library and Member 3. Select the type of data you wish to export 4. Choose destination folder and filename

5. Reading SAS dataset SET statement If the data already exits as a temporary/permanent SAS data set, we do not need to use INFILE/INPUT statements SET statement reads a temporary/permanent SAS data set into a new one, so that you can modify it (add new variables, subset, etc.) data new-data-set; set old-data-set; * If permanent data set; data new-data-set; set library.old-data-set; Permanent SAS data set and library next week.

Descriptive Statistics Data Description Summarize your data PROC MEANS Visualize data Histograms, bos plots, scatter plots

Data Description PROC CONTENTS The output displays info about the data set, such as number of variables and their types, number of observations, dates of creation, etc. PROC PRINT Used to list the data proc contents data=data-name options; proc print data = data-name;

PROC MEANS Primarily used for reporting sums and means of numeric variables proc means data = data_name;... <statements>; Default statistics: N (number of non-missing obs), Mean, StdDev, Min and Max Other Options: QRANGE MODE RANGE SUM MAXDEC = n number of decimal places to be displayed MISSING treats missing values as valid summary groups NMISS number of missing values MEDIAN, Q1 (25thpercentile), Q3 (75thpercentile)

PROC MEANS: VAR statement Without VAR statement, it will show the summary statistics for all numeric variables You can specify which variables to include in the report using VAR statement proc means data = data_name; var var1 var2 va3;

PROC UNIVARIATE Also produces simple summary statistics (mean, standard deviation) Generates histograms, boxplots, normal probability plots (QQ plots) Conducts tests for normality proc univariate data = data_name; <statements>;

PROC UNIVARIATE - Histogram Histogram and normality check proc univariate data=prob2; histogram time / normal(percents=20 40 60 80); inset n normal(ksdpval) / pos = ne; Percents option specifies the quantiles of the normal distribution normal(ksdpval) generates the Kolmogorov-Smirnov test of normality; Pos indicates the location (NE) of where to put this test result

PROC UNIVARIATE - Plots The PLOTS option in the PROC UNIVARIATE statement requests several basic summary plots. * PROC UNIVARIATE - boxplot and qqplot; proc univariate data=prob2 plot; var time;