Data Management - 50%

Similar documents
Applied Regression Modeling: A Business Approach

GLMSELECT for Model Selection

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Applied Regression Modeling: A Business Approach

Introduction to Statistical Analyses in SAS

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

Quantitative - One Population

CSC 328/428 Summer Session I 2002 Data Analysis for the Experimenter FINAL EXAM

SASEG 9B Regression Assumptions

Stat 5100 Handout #19 SAS: Influential Observations and Outliers

SAS (Statistical Analysis Software/System)

Enterprise Miner Tutorial Notes 2 1

SAS Visual Analytics 8.2: Getting Started with Reports

Data Science. Data Analyst. Data Scientist. Data Architect

Stat 5100 Handout #14.a SAS: Logistic Regression

Lab 07: Multiple Linear Regression: Variable Selection

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

Product Catalog. AcaStat. Software

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics

CREATING THE ANALYSIS

Writing Reports with Report Designer and SSRS 2014 Level 1

chapter two: building your first report... 15

Two-Stage Least Squares

Lab #9: ANOVA and TUKEY tests

8. MINITAB COMMANDS WEEK-BY-WEEK

SAS Visual Analytics 8.2: Working with Report Content

Centering and Interactions: The Training Data

Course Outline. Writing Reports with Report Builder and SSRS Level 1 Course 55123: 2 days Instructor Led. About this course

An introduction to SPSS

Data Analytics Training Program

Introductory Guide to SAS:

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Hyperion Interactive Reporting Reports & Dashboards Essentials

Brief Guide on Using SPSS 10.0

Subset Selection in Multiple Regression

Analysis of variance - ANOVA

SAS (Statistical Analysis Software/System)

SAS Training BASE SAS CONCEPTS BASE SAS:

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Applied Regression Modeling: A Business Approach

Intermediate SAS: Statistics

Using SPSS with The Fundamentals of Political Science Research

SAS Visual Analytics 7.2, 7.3, and 7.4: Getting Started with Analytical Models

[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS}

Antrix Academy of Data Science TM

Fly wing length data Sokal and Rohlf Box 10.1 Ch13.xls. on chalk board

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

ACHIEVEMENTS FROM TRAINING

Learn What s New. Statistical Software

Advanced Analytics with Enterprise Guide Catherine Truxillo, Ph.D., Stephen McDaniel, and David McNamara, SAS Institute Inc.

MINITAB Release Comparison Chart Release 14, Release 13, and Student Versions

Data Mining and Analytics. Introduction

MS in Applied Statistics: Study Guide for the Data Science concentration Comprehensive Examination. 1. MAT 456 Applied Regression Analysis

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

BIOMETRICS INFORMATION

1.8.1 Research various database careers SE: 335

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Visualizing the World

STA 570 Spring Lecture 5 Tuesday, Feb 1

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

Week 6, Week 7 and Week 8 Analyses of Variance

What s New in SAS Studio?

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Here is Kellogg s custom menu for their core statistics class, which can be loaded by typing the do statement shown in the command window at the very

Contents of SAS Programming Techniques

Package OLScurve. August 29, 2016

Sales Price of Laptops Based on Their Specifications. Hyunwoo Cho Jay Jung Gun Hee Lee Chan Hong Park Seoyul Um Mario Wijaya Team #10

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE

AVANTUS TRAINING PTE LTD

Chapter 5: The beast of bias

Querying Data with Transact SQL

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Data Analyst Nanodegree Syllabus

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

The Power and Sample Size Application

Lecture 7: Linear Regression (continued)

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Quality Checking an fmri Group Result (art_groupcheck)

STATISTICS (STAT) Statistics (STAT) 1

Cognos BI Cognos 8 BI Author v2.

Factorial ANOVA. Skipping... Page 1 of 18

Multivariate Capability Analysis

Chemical Reaction dataset ( )

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Index COPYRIGHTED MATERIAL. Symbols and Numerics

Minitab detailed

Statistical Pattern Recognition

EZY Intellect Pte. Ltd., #1 Changi North Street 1, Singapore

OLS Assumptions and Goodness of Fit

Microsoft Access 2013/2016 Course Outlines (version differences will be noted in the outline).

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Technical Support Minitab Version Student Free technical support for eligible products

Example 5.25: (page 228) Screenshots from JMP. These examples assume post-hoc analysis using a Protected LSD or Protected Welch strategy.

Transcription:

Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define a data connection Specify Data Management Studio options Access the QKB Create a name value macro pair Access the business rules manager Access the appropriate monitoring report Attach and detach primary tabs Create, design and be able to explore data explorations and interpret results Define and create data collections from exploration results Create and explore a data profile Create a data profile from different sources (text file, filtered table, SQL query) Interpret results (frequency distribution & pattern) Use collections from profile results Design data standardization schemes Build a scheme from profile results Build a scheme manually Update existing schemes Create Data Jobs Rename output fields Add nodes and preview nodes Run a data job View a log and settings Work with data job settings and data job displays 1

Best practices (how do you ensure that you are following a particular best practice): examples: insert notes, establish naming conventions Work with branching Join tables Apply the Field layout node to control field order Work with the Data Validation node: o Add it to the job flow o Specify properties/review properties o Edit settings for the Data Validation node Work with data inputs Work with data outputs Profile data from within data jobs Interact with the Repository from within Data Jobs Determine how data is processed Data job variables Set Sorting properties for the Data Sorting node o Set appropriate advanced properties options for the Data Sorting Node Apply a Standardization definition and scheme Use a definition Use a scheme Be able to determine the differences between definition and scheme Explain what happens when you use both a definition and scheme Review and interpret standardization results Be able to explain the different steps involved in the process of standardization Apply Parsing definitions Distinguish between different data types and their tokens Review and interpret parsing results Be able to explain the different steps involved in the process of parsing Use parsing definition Compare and contrast the differences between identification analysis and right fielding nodes Review results Explain the technique used for identification (process of the definition) Apply the Gender Analysis node to determine gender Use gender definition Interpret results Explain different techniques for accomplishing gender analysis 2

Create an Entity Resolution Job Use a node in the data job that is the clustering node and explain why you would want to use it Survivorship (surviving record identification) o Record rules o Field rules o Options for survivorship Discuss and apply the Cluster Diff node Apply Cross-field matching (new option) Use the Match Codes Node to select match definitions for selected fields o Outline the various uses for match codes (join) o Use the definition o Interpret the results o Match versus match parsed o Explain the process for creating a match code o Select sensitivity for a selected match definition o Apply matching best practices Define and create business rules Use Business Rules Manager Create a new business rule o Name/label rule o Specify type of rule o Define checks o Specify fields Distinguish between different types of business rules o Row o Set o Group Apply business rules o Profile o Execute business rule node Use of Expression Builder Apply best practices Describe the organization, structure and basic navigation of the QKB Identify and describe locale levels (global, language, country) Navigate the QKB (tab structure, copy definitions, etc.) Identify data types and tokens 3

Be able to articulate when to use the various components of the QKB Components include: o Regular expressions o Schemes o Phonetics library o Vocabularies o Grammar o Chop Tables Define the processing steps and components used in the different definition types Identify/describe the different definition types o Parsing o Standardization o Match o Identification o Casing o Extraction o Locale guess o Gender o Patterns 4

ANOVA and Regression - 30% Verify the assumptions of ANOVA Explain the central limit theorem and when it must be applied Examine the distribution of continuous variables (histogram, box-whisker, Q-Q plots) Describe the effect of skewness on the normal distribution Define H0, H1, Type I/II error, statistical power, p-value Describe the effect of sample size on p-value and power Interpret the results of hypothesis testing Interpret histograms and normal probability charts Draw conclusions about your data from histogram, box-whisker, and Q-Q plots Identify the kinds of problems may be present in the data: (biased sample, outliers, extreme values) For a given experiment, verify that the observations are independent For a given experiment, verify the errors are normally distributed Use the UNIVARIATE procedure to examine residuals For a given experiment, verify all groups have equal response variance Use the HOVTEST option of MEANS statement in PROC GLM to asses response variance Analyze differences between population means using the GLM and TTEST procedures Use the GLM Procedure to perform ANOVA o CLASS statement o MODEL statement o MEANS statement o OUTPUT statement Evaluate the null hypothesis using the output of the GLM procedure Interpret the statistical output of the GLM procedure (variance derived from MSE, F value, p-value R 2, Levene's test) Interpret the graphical output of the GLM procedure Use the TTEST Procedure to compare means Perform ANOVA post hoc test to evaluate treatment affect use the LSMEANS statement in the GLM or PLM procedure to perform pairwise comparisons use PDIFF option of LSMEANS statement use ADJUST option of the LSMEANS statement (TUKEY and DUNNETT) Interpret diffograms to evaluate pairwise comparisons 5

Interpret control plots to evaluate pairwise comparisons Compare/Contrast use of pairwise T-Tests, Tukey and Dunnett comparison methods PLM Detect and analyze interactions between factors Use the GLM procedure to produce reports that will help determine the significance of the interaction between factors. o MODEL statement o LSMEANS with SLICE=option (Also using PROC PLM) o ODS SELECT Interpret the output of the GLM procedure to identify interaction between factors: o p-value o F Value o R Squared o TYPE I SS o TYPE III SS Fit a multiple linear regression model using the REG and GLM procedures Use the REG procedure to fit a multiple linear regression model Use the GLM procedure to fit a multiple linear regression model Analyze the output of the REG, PLM, and GLM procedures for multiple linear regression models Interpret REG or GLM procedure output for a multiple linear regression model: convert models to algebraic expressions Convert models to algebraic expressions Identify missing degrees of freedom Identify variance due to model/error, and total variance Calculate a missing F value Identify variable with largest impact to model For output from two models, identify which model is better Identify how much of the variation in the dependent variable is explained by the model Conclusions that can be drawn from REG, GLM, or PLM output: (about H0, model quality, graphics) Use the REG or GLMSELECT procedure to perform model selection Use the SELECTION option of the model statement in the GLMSELECT procedure Compare the different model selection methods (STEPWISE, FORWARD, BACKWARD) Enable ODS graphics to display graphs from the REG or GLMSELECT procedure Identify best models by examining the graphical output (fit criterion from the REG or GLMSELECT procedure) 6

Assign names to models in the REG procedure (multiple model statements) Assess the validity of a given regression model through the use of diagnostic and residual analysis Explain the assumptions for linear regression From a set of residuals plots, asses which assumption about the error terms has been violated Use REG procedure MODEL statement options to identify influential observations (Student Residuals, Cook's D, DFFITS, DFBETAS) Explain options for handling influential observations Identify colinearity problems by examining REG procedure output Use MODEL statement options to diagnose collinearity problems (VIF, COLLIN, COLLINOINT) Perform logistic regression with the LOGISTIC procedure Identify experiments that require analysis via logistic regression Identify logistic regression assumptions logistic regression concepts (log odds, logit transformation, sigmoidal relationship between p and X) Use the LOGISTIC procedure to fit a binary logistic regression model (MODEL and CLASS statements) Optimize model performance through input selection Use the LOGISTIC procedure to fit a multiple logistic regression model LOGISCTIC procedure SELECTION=SCORE option Perform Model Selection (STEPWISE, FORWARD, BACKWARD) within the LOGISTIC procedure Interpret the output of the LOGISTIC procedure Interpret the output from the LOGISTIC procedure for binary logistic regression models: o Model Convergence section o Testing Global Null Hypothesis table o Type 3 Analysis of Effects table o Analysis of Maximum Likelihood Estimates table o Association of Predicted Probabilities and Observed Responses 7

Visual Data Exploration - 20% Examine, modify, and create data items Create and use parameterized data items Examine data item properties and measure details Change data item properties Create custom sorts Create distinct counts Create aggregated measures Create calculated items Create hierarchies Create custom categories Select and work with data sources Work with multiple data sources Change data sources Refresh data sources Create, modify, and interpret automatic chart visualizations in Visual Analytics Explorer Identify default visualizations Identify the properties available in an automatic chart Create, modify, and interpret graph and table visualizations in Visual Analytics Explorer Work with list table visualizations Work with crosstab visualizations Work with bar chart visualizations Work with line chart visualizations Work with scatter plot visualizations Work with bubble plot visualizations Work with histogram visualizations Work with box plot visualizations Work with heat map visualizations Work with geo map visualizations Work with treemap visualizations Work with correlation matrix visualizations 8

Enhance visualizations with analytics within Visual Analytics Explorer Add fit lines to visualizations Create forecasts Interpret word clouds Interact with visualizations and explorations within Visual Analytics Explorer Control appearance of visualizations within explorations Add comments to visualizations and explorations Use filters on data source and visualizations Share explorations Share visualizations Note: All 32 main objectives will be tested on every exam. The 210 expanded objectives are provided for additional explanation and define the entire domain that could be tested. 9