Data Visualisation with SASIINSIGHT Software. Gerhard Held SAS Institute. Summary. Introduction

Size: px
Start display at page:

Download "Data Visualisation with SASIINSIGHT Software. Gerhard Held SAS Institute. Summary. Introduction"

Transcription

1 Data Visualisation with SASIINSIGHT Software Gerhard Held SAS Institute Summary Recently interactive data analysis packages have become popular. These packages attempt to visualise the structure in multivariate data and offer a more intuitive approach to data analysis. The software is usually offered as stand alone tools for data visualisation (or graphical data analysis). However, its is desirable to extend the data visualisation approach beyond the borders of simple but efficient techniques for describing data such as identifying outliers or brushing of points in a plot. In this paper we will demonstrate how data visualisation techniques implemented in SASIINSIGHT software can be applied to analyze linear relationships of data including general linear models. We will also discuss how intermediate results can be saved for further processing. Introduction Since the early 1970s we have seen a revival of the exploratory data analysis tradition in statistics (Tukey, 1970). The first attempts to implement this approach in computer programs were in the early 1980 (Velleman, Hoaglin, 1981). One goal of exploratory data analysis implementations on a computer was to enable individual users to see structure in multivariate data. This gave rise to a new class of interactive style data analysis packages such as MACSPIN (Donohue, Donohue, Gasko 1986) or Data Desk, pioneered by Paul Velleman, on "personal" platforms such as the Apple Macintosh. A major attraction of these packages was that they the application of dynamic graphical methods to data, defined as the "direct. manipulation of elements of a graph on a computer screen" (Cleveland, McGill 1988). As the aim of dynamic graphical methods is to visualise the structure of multivariate data, "data visualisation" is often used as a synonym. One drawback of these early implementations was that they focused on the data visualisation aspect without integrating this functionality into a wider spectrum of data analysis techniques. As early as 1980 John Tukey requested that "we need both the exploratory and confirmatory (analysis)" (Tukey, 1980). Since then we see more general data analysis packages, such as S-PLUS or The SAS System, including data visualisation tools as a speciality. 394

2 ( I I In this paper we will: introduce SASIINSIGHT software as an example of data visualisation packages available today; discuss the relative merits of data visualisation as opposed to traditional data analysis methods; and point out some new functionality within SASIINSIGHT software which enable tighter integration of data visualisation and traditional methods. Graphical Exploration ~ l[ ~ ~: r h t-,. 1'- \"; i' ~ ", 'i,," f' f: t " t, l ~ ~ ~, ~' - j.~, ~'; i r- ~. j', ~, " t, j; " ~- ; ;," t ~',1 Ji: ~. i\ tr, ~. J!~~ \ ~ \. SASIINSIGHT software, an integrated component of the SAS System, is a dynamic tool for data exploration and analysis. With it you can explore data through interactive histograms, box plots, scatter plots, and 3D rotating plots. Youcan examine correlations and principal components to find the structure of your data. Finally you can construct predictive models based on relationships in the data. All interactive graphs and analyses are linked across multiple windows, Any change in one window is immediately reflected in all windows related to the same data. SASIINSIGHT software is implemented on a large variety of hardware platforms: many popular UNIX-Workstations, Digital Equipment Minicomputer and'decstations running VMS, ffim Mainframes (running MVS, CMS, and VSE), and with Release 6.08 of the SAS System, also on workstations running Windows 3.1 and OS/ A powerful workstation is ideal to perform graphic data visualisation. UNIX workstations, or 486- based PCs running OS/2 or Windows are recommended. As SASIINSIGHT software allows many views of the same data set it is also recommended to use large monitors (15" or larger). The discussions in this paper are based on a data set containing statistics about 407 large commercial companies in 1991 (in terms of sales). Variables include the Company Name, Nationality, Industry Type, Number of Employees, and Sales, Profits, Assets and Equity in millions of U.S. dollars. As SASIINSIGHT software is an integrated component it can be invoked from within the SAS System by simply typing in its name (INSIGHT) or by using pull-down menus. SASIINSIGHT then prompts the user for the data set. We select the business data set (COMP9I). The user will then see a data window, the data set presented as a table (rows are companies and columns are variables - see Figure 1). In a data window you can sort, edit, and extract subsets of your data. You can also assign measurement levels and default roles that determine how your variables are used in graphs and analyses. As we would like to identify data points as companies, we assign variable COMPANY to be a LABEL using the DATA menu. Click on COMPANY, select DATA, then PROPERTIES and finally LABEL (sequence: DATA: PROPERTIES : LABEL). Previous users of SASIINSIGHT software will notice at this point that the pull-down menus have been regrouped in a more logical structure. All analysis and graphical f- l i f:. ~ 't~ ~~~ 395 :.,. -/ ~:.:: :.. ". ~--' ","

3 functions have been combined in the ANALYSIS menu; a new DATA menu covers all data manipulation functions. Figure 1: Data table ofcomp91 data set We would like to explore measures of economic success for the selected companies (profits, SALES) as well as find factors which determine economic success. Release 6.08 of SAS/INSIGHT software offers box plots as an additional way to explore distributions graphically. Selecting BOX PLOT, PROFITS as Y (graph variable), and INDUSTRY as group variable creates the side-by-side box plot shown in Figure 2. Note that group processing (available with all analyses and graphics) has also been added with Release In Figure 2 we have already clicked on a few data points (companies) with extreme values for PROFIT. Notice that mm is one of the least profitable companies (2,827 Billion dollars loss) in 1991 whereas in 1990 it was still one of the most profitable companies (6,020 Billion. dollars profit)! 396

4 Figure 2: Box plot of profits for industry groups.~. The labeled extreme values somewhat distort other relationships so we click or drag with the mouse over extreme values, thus creating a rectangular brush, and select EDIT : OBSERVATIONS (RECORDS) : HIDE IN GRAPHS. This deletes extreme values from the graph (not from any calculations!) and realigns the graph (Figure 3). In addition we click on INDUSTRY and activate the new MARKERS window (EDIT: WINDOWS: MARKERS). We could assign an individual marker to each INDUSTRY or click on the "multiple markers" button at the bottom of the MARKERS window. This automatically assigns a different marker to each value of INDUSTRY. In the same way, colours can also be assigned individually or automatically (a new feature in Release 6.08 of SAS/INSIGHT software). The markers are now an "observation state", i.e. companies retain their marker. for any subsequent graphs unless they are changed. Figure 3 now shows more clearly that some industries are consistently profitable (e.g. the Pharmaceutical industry), others show a large internal variation (e.g. Computer industry, and Oil Refining) and still others feature quite a number of outliers (e.g. Automobiles, food, and Electronics industries). 397

5 Figure 3: Box plot of profits for industry groups (outliers removed) Figure 4: Scatter plot matrix of EMPLOYS, SALES, and ASSETS \ \ 398

6 ',,"";-_ ~ -, _.~ 'J _, ~, 'c',_ '." J c " _. --' _ ~_-::. ~_ ~_ s~~_ ":_'-' -_ To explore the other measure of economic success, SALES, we take a different approach. We suspect that the number of employees (EMPLOYS) and ASSETS may correlate with SALES. Therefore we click on all three variables in the data table and then select ANALYSE: SCATTER PLOT (X Y) which creates Figure 4. Obviously Figure 4 is distorted again by some very big companies (e.g. Toyota Motor high in SALES, General Electric high in ASSETS and EMPLOYS), and also the variation seems to increase with increasing values of SALES, ASSETS and EMPLOYS. As this is an overall pattern, we may decide for a data transformation rather than hiding extreme values again. To do that we simply click on each of the variables in the graph and select EDIT : VARIABLES : LOG(X). This calculates the logarithm for each variable and adjusts the scatter plot matrix accordingly. The new transformed variables are named L_SALES, L_ASSETS, and L_EMPLOY. Figure 5: Scatter plot ofl_employ by L_SALES For Figure 5 we have selected the "MAGNIFYING GLAS" from the TOOLS window (EDIT: WINDOWS: TOOLS) and dragged over the L_SALES and L_EMPLOY portion of scatter plot matrix to focus in on this part. It shows a clear linear structure of L_SALES to L_EMPLOY and also reveals that oil companies (upward pointing triangles in Figure 5) consistently generate larger L _SALES, as could have been expected ba~ed on their L_EMPLOY value. If needed we could also explore the data in three dimensions using a rotating plot. One function of the rotating plot is to show multivariate outliers. As it would be not adequate to describe insights of rotations by text alone we will not try to illustrate this now. 399

7 Previous results suggest that we might have a separate look on the influence of INDUSTRY type on this relationship. A simple graphical way to do this would be to generate a series of scatterplots ofl_salesby L_EMPLOY grouped by INDUSTRY. Figure 6 shows two of the industries as an example, Oil Refining and Pharmaceuticals. It is obvious that the slope ofa regression curve ofl_sales on L_EMPLOY would be very similar for. both industries but the intercept for Pharmaceuticals would be negative, meaning that the Pharmaceutical industry would require many more employees to become profitable. Figure 6: Scatter plot ofl _EMPLOY BY L _SALES for oil and pharmaceutical industries Model Formulation We have now enough evidence to formulate a model on sales as a measure of economic success. SAS/INSIGHT supports the traditional parametric regression analysis, but also the general linear model as implemented in the GLM procedure of SAS/STAT software. SAS/INSIGHT now also offers an implementation of the GENERALISED linear model supporting response distributions from the exponential family (normal, inverse Gaussian, gamma, Poisson, binomial); and corresponding canonical link functions (identity, logit, probit, complementary log-log, and power link function). Both the general and the generalized linear model have been introduced in Release 6.08 of SAS/INSIGHT software. In addition, SAS/INSIGHT covers residual plots (residual by predicted, residual 400

8 normal QQ plot and partial leverage plots} as well as parametric and nonparametric fit curves (splines, kernel estimation).,. As generalized linear models are not adequate for our data we will confine our test to a linear model. The model we would like to test isl _SALES as the response variable and L_EMPLOY and INDUSTRY as factors. We also include the interaction of both' factors into the model. Figure 7 shows part of the results. The R-Square of indicates that 51,4% ofl_sales can be explained by the variables in the model. All variables in the model including the interaction are significant (prob>f associated with the F-Test). Figure 7:. General Linear Model for L_SALES Often it is required to save results for further processing. This can be easily done if the aim is to integrate graphics or reports in a text document as for instance for this paper. A typical problem area, however, is to save statististical tables in a computerised form to apply to new data or to reformat the output. For this purpose SASIINSIGHT software supports the Output Delivery System of the SAS System (ODS). This is another new feature in Release 6.08 of the SAS System. Procedures using the ODS produce their results in the form of output objects, data structures in machine precision that persist in memory. With output objects you can create data sets, produce listings, reorganize output and create and save custom report formats. The ODS is easily activated. For example if you would mark with the mouse the Summary of Fit, Analysis of Variance and Type III Tests tables of Figure 7, then select FILE : SAVE : TABLES, the system would respond with a Note window indicating that

9 tables were saved as output objects. Using the OUTPUT procedure of the SAS System the saved tables can then be accessed and manipulated (SAS Institute, 1992, see Figure 8). Figure 8: SASIINSIGHT tables as output objects Conclusion We had three goals with this article. Concerning the first goal it could be shown that SASIINSIGHT software adequately covers dynamic graphical data analysis. All interactive graphs and analyses are linked across multiple windows, and changes in one window are immediately reflected in all windows related to the same data. SASIINSIGHT offers typical tools for data visualisation, such as identification and labeling of points, use of colours and markers, brushing, scatterplot matrices, and 3D rotating plots. The latest implementation of SASIINSIGHT software includes additional functionality beyond the standards of data visualisation such as: interactive box plots, group processing for graphical and statistical analyses, integration of general and generalized linear models, and saving of any results (data, graphics, or output) in a form ready for immediated further processmg. Dynamic graphical methods greatly facilitate exploratory data analysis, using state-of-theart computer technology. This technology helps the analyst to concentrate on finding structures in multivariate data rather then deal with the software mechanics of how to code an analysis. Graphical data analysis is a great time saver. All steps discussed in this paper took roughly 15 minutes to accomplish. It is up to the reader to determine how long this would have been taken using traditional methods and tools. A word of caution: data visualisation gives new insights to data but needs traditional hypotheses testing methods as a complement. Therefore, users of these systems are advised to look for integrated software covering both approaches to data analysis. 402

10 References Cleveland, W.S. & McGill, M.E. (Ed.), Dynamic Graphics for Statistics. Belmont, Ca.: Wadsworth Inc. Donoho, A.W., Donoho, D.L. & vasko, M.(1986). MACSPIN Graphical Data Analysis Software. Austin, Texas: D2 Software SAS Institute Inc.(1993), SASIINSIGHT User's Guide, Version 6, Second Edition, Cary, NC: SAS Institute Inc. Tukey, J.W. (1970). Exploratory Data Analysis. Vol. I. Reading M.A.: Addison-Wesley Tukey, J.W.(1980). We need both exploratory and confirmatory, The American Statistician, 34, Velleman, P.F. & Hoaglin.D.C. (1981). Applications,. Basics, and Computing of Exploratory Data Analysis. Boston: Duxbury Press 403

Trellis Displays. Definition. Example. Trellising: Which plot is best? Historical Development. Technical Definition

Trellis Displays. Definition. Example. Trellising: Which plot is best? Historical Development. Technical Definition Trellis Displays The curse of dimensionality as described by Huber [6] is not restricted to mathematical statistical problems, but can be found in graphicbased data analysis as well. Most plots like histograms

More information

STAT 3304/5304 Introduction to Statistical Computing. Introduction to SAS

STAT 3304/5304 Introduction to Statistical Computing. Introduction to SAS STAT 3304/5304 Introduction to Statistical Computing Introduction to SAS What is SAS? SAS (originally an acronym for Statistical Analysis System, now it is not an acronym for anything) is a program designed

More information

CREATING THE ANALYSIS

CREATING THE ANALYSIS Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219

More information

1 Introduction. Abstract

1 Introduction. Abstract 262 Displaying Correlations using Position, Motion, Point Size or Point Colour Serge Limoges Colin Ware William Knight School of Computer Science University of New Brunswick P.O. Box 4400 Fredericton,

More information

CREATING THE DISTRIBUTION ANALYSIS

CREATING THE DISTRIBUTION ANALYSIS Chapter 12 Examining Distributions Chapter Table of Contents CREATING THE DISTRIBUTION ANALYSIS...176 BoxPlot...178 Histogram...180 Moments and Quantiles Tables...... 183 ADDING DENSITY ESTIMATES...184

More information

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG

More information

SAS/INSIGHT lii SOFTWARE: DATA VISUALISATION WITH THE,SAS SYSTEM HELD BY: GERHARD HELD AND THOMAS EMMERICH I SAS INSTITUTE. 1.

SAS/INSIGHT lii SOFTWARE: DATA VISUALISATION WITH THE,SAS SYSTEM HELD BY: GERHARD HELD AND THOMAS EMMERICH I SAS INSTITUTE. 1. SAS/INSIGHT lii SOFTWARE: DATA VISUALISATION WITH THE,SAS SYSTEM HELD BY: GERHARD HELD AND THOMAS EMMERICH I SAS INSTITUTE AUTHOR: GERHARD HELD 1. Introduction ~: It is a commonplace that the hardware

More information

Batch Processing in SAS/INSIGHT Software

Batch Processing in SAS/INSIGHT Software Batch Processing in SAS/INSIGHT Software Heman Robinson, SAS Institute Inc., Cary, NC ABSTRACT Graphical user interfaces have replaced batch statements for many data analysis tasks. However, batch processing

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:

More information

How to use FSBforecast Excel add in for regression analysis

How to use FSBforecast Excel add in for regression analysis How to use FSBforecast Excel add in for regression analysis FSBforecast is an Excel add in for data analysis and regression that was developed here at the Fuqua School of Business over the last 3 years

More information

Rick Wicklin, SAS Institute Inc. Peter Rowe, CART Statistics & Modeling Team, Wachovia Corporation

Rick Wicklin, SAS Institute Inc. Peter Rowe, CART Statistics & Modeling Team, Wachovia Corporation Paper 290-2007 An Introduction to SAS Stat Studio: A Programmable Successor to SAS/INSIGHT Rick Wicklin, SAS Institute Inc. Peter Rowe, CART Statistics & Modeling Team, Wachovia Corporation OVERVIEW SAS

More information

Exploratory Data Analysis EDA

Exploratory Data Analysis EDA Exploratory Data Analysis EDA Luc Anselin http://spatial.uchicago.edu 1 from EDA to ESDA dynamic graphics primer on multivariate EDA interpretation and limitations 2 From EDA to ESDA 3 Exploratory Data

More information

Chapter 1 Introduction. Chapter Contents

Chapter 1 Introduction. Chapter Contents Chapter 1 Introduction Chapter Contents OVERVIEW OF SAS/STAT SOFTWARE................... 17 ABOUT THIS BOOK.............................. 17 Chapter Organization............................. 17 Typographical

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

Minitab 18 Feature List

Minitab 18 Feature List Minitab 18 Feature List * New or Improved Assistant Measurement systems analysis * Capability analysis Graphical analysis Hypothesis tests Regression DOE Control charts * Graphics Scatterplots, matrix

More information

SAS Visual Analytics 8.2: Getting Started with Reports

SAS Visual Analytics 8.2: Getting Started with Reports SAS Visual Analytics 8.2: Getting Started with Reports Introduction Reporting The SAS Visual Analytics tools give you everything you need to produce and distribute clear and compelling reports. SAS Visual

More information

SAS Structural Equation Modeling 1.3 for JMP

SAS Structural Equation Modeling 1.3 for JMP SAS Structural Equation Modeling 1.3 for JMP SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. SAS Structural Equation Modeling 1.3 for JMP. Cary,

More information

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix Paula Ahonen-Rainio Maa-123.3530 Visual Analysis in GIS 11.11.2015 Topics today YOUR REPORTS OF A-2 Thematic maps with charts

More information

Exploratory model analysis

Exploratory model analysis Exploratory model analysis with R and GGobi Hadley Wickham 6--8 Introduction Why do we build models? There are two basic reasons: explanation or prediction [Ripley, 4]. Using large ensembles of models

More information

Technical Support Minitab Version Student Free technical support for eligible products

Technical Support Minitab Version Student Free technical support for eligible products Technical Support Free technical support for eligible products All registered users (including students) All registered users (including students) Registered instructors Not eligible Worksheet Size Number

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

MINITAB Release Comparison Chart Release 14, Release 13, and Student Versions

MINITAB Release Comparison Chart Release 14, Release 13, and Student Versions Technical Support Free technical support Worksheet Size All registered users, including students Registered instructors Number of worksheets Limited only by system resources 5 5 Number of cells per worksheet

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Tips and Guidance for Analyzing Data. Executive Summary

Tips and Guidance for Analyzing Data. Executive Summary Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Making the Transition from R-code to Arc

Making the Transition from R-code to Arc Making the Transition from R-code to Arc Sanford Weisberg Supported by the National Science Foundation Division of Undergraduate Education Grants 93-54678 and 96-52887. May 17, 2000 Arc is the revision

More information

Learn What s New. Statistical Software

Learn What s New. Statistical Software Statistical Software Learn What s New Upgrade now to access new and improved statistical features and other enhancements that make it even easier to analyze your data. The Assistant Data Customization

More information

Chapter 13 Multivariate Techniques. Chapter Table of Contents

Chapter 13 Multivariate Techniques. Chapter Table of Contents Chapter 13 Multivariate Techniques Chapter Table of Contents Introduction...279 Principal Components Analysis...280 Canonical Correlation...289 References...298 278 Chapter 13. Multivariate Techniques

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

TableLens: A Clear Window for Viewing Multivariate Data Ramana Rao July 11, 2006

TableLens: A Clear Window for Viewing Multivariate Data Ramana Rao July 11, 2006 TableLens: A Clear Window for Viewing Multivariate Data Ramana Rao July 11, 2006 Can a few simple operators on a familiar and minimal representation provide much of the power of exploratory data analysis?

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Intermediate SAS: Statistics

Intermediate SAS: Statistics Intermediate SAS: Statistics OIT TSS 293-4444 oithelp@mail.wvu.edu oit.wvu.edu/training/classmat/sas/ Table of Contents Procedures... 2 Two-sample t-test:... 2 Paired differences t-test:... 2 Chi Square

More information

Chapter 41 SAS/INSIGHT Statements. Chapter Table of Contents

Chapter 41 SAS/INSIGHT Statements. Chapter Table of Contents Chapter 41 SAS/INSIGHT Statements Chapter Table of Contents DETAILS...706 PROCINSIGHTStatement...707 WINDOWStatement...708 OPENStatement...708 BYStatement...709 CLASSStatement...709 BARStatement...709

More information

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract 19-1214 Chemometrics Technical Note Description of Pirouette Algorithms Abstract This discussion introduces the three analysis realms available in Pirouette and briefly describes each of the algorithms

More information

Statistics Statistical Computing Software

Statistics Statistical Computing Software Statistics 135 - Statistical Computing Software Mark E. Irwin Department of Statistics Harvard University Autumn Term Monday, September 19, 2005 - January 2006 Copyright c 2005 by Mark E. Irwin Personnel

More information

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015 GLM II Basic Modeling Strategy 2015 CAS Ratemaking and Product Management Seminar by Paul Bailey March 10, 2015 Building predictive models is a multi-step process Set project goals and review background

More information

Tips on JMP ing into Mixture Experimentation

Tips on JMP ing into Mixture Experimentation Tips on JMP ing into Mixture Experimentation Daniell. Obermiller, The Dow Chemical Company, Midland, MI Abstract Mixture experimentation has unique challenges due to the fact that the proportion of the

More information

The SAS interface is shown in the following screen shot:

The SAS interface is shown in the following screen shot: The SAS interface is shown in the following screen shot: There are several items of importance shown in the screen shot First there are the usual main menu items, such as File, Edit, etc I seldom use anything

More information

Intro to Stata. University of Virginia Library data.library.virginia.edu. September 16, 2014

Intro to Stata. University of Virginia Library data.library.virginia.edu. September 16, 2014 to 1/12 Intro to University of Virginia Library data.library.virginia.edu September 16, 2014 Getting to Know to 2/12 Strengths Available A full-featured statistical programming language For Windows, Mac

More information

Data Analysis: Displaying Data - Deception with Graphs

Data Analysis: Displaying Data - Deception with Graphs This module looks at ways in which data can be deceptively displayed with graphs. Such deception can be defined as "the deliberate or inadvertent manipulation or distortion of the form or content of a

More information

Data Visualization Techniques

Data Visualization Techniques Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

More information

Introduction to Exploratory Data Analysis

Introduction to Exploratory Data Analysis Introduction to Exploratory Data Analysis Ref: NIST/SEMATECH e-handbook of Statistical Methods http://www.itl.nist.gov/div898/handbook/index.htm The original work in Exploratory Data Analysis (EDA) was

More information

Data Management - 50%

Data Management - 50% Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.

More information

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007 What s New in Spotfire DXP 1.1 Spotfire Product Management January 2007 Spotfire DXP Version 1.1 This document highlights the new capabilities planned for release in version 1.1 of Spotfire DXP. In this

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

Graph Structure Over Time

Graph Structure Over Time Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines

More information

Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering

Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering A. Spanos* (Petroleum Geo-Services) & M. Bekara (PGS - Petroleum Geo- Services) SUMMARY The current approach for the quality

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data EXERCISE Using Excel for Graphical Analysis of Data Introduction In several upcoming experiments, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression analysis. Analysis of Variance: one way classification,

More information

PHARMACOKINETIC STATISTICAL ANALYSIS SYSTEM - - A SAS/AF AND SAS/FSP APPLICATION

PHARMACOKINETIC STATISTICAL ANALYSIS SYSTEM - - A SAS/AF AND SAS/FSP APPLICATION PHARMACOKINETIC STATISTICAL ANALYSIS SYSTEM - - A SAS/AF AND SAS/FSP APPLICATION Sharon M. Passe, Hoffmann-La Roche Inc. Andrea L Contino, Hoffmann-La Roche Inc. ABSTRACT The statistician responsible for

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex Keiko I. Powers, Ph.D., J. D. Power and Associates, Westlake Village, CA ABSTRACT Discrete time series

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS code SAS (originally Statistical Analysis Software) is a commercial statistical software package based on a powerful programming

More information

Release notes for StatCrunch mid-march 2015 update

Release notes for StatCrunch mid-march 2015 update Release notes for StatCrunch mid-march 2015 update A major StatCrunch update was made on March 18, 2015. This document describes the content of the update including major additions to StatCrunch that were

More information

Introduction to Mplus

Introduction to Mplus Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus

More information

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Graphical Analysis of Data using Microsoft Excel [2016 Version] Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data CS 9: Machine Learning Final Report Identifying Driving Behavior from Data Robert F. Karol Project Suggester: Danny Goodman from MetroMile December 3th 3 Problem Description For my project, I am looking

More information

1 Introducing SAS and SAS/ASSIST Software

1 Introducing SAS and SAS/ASSIST Software 1 CHAPTER 1 Introducing SAS and SAS/ASSIST Software What Is SAS? 1 Data Access 2 Data Management 2 Data Analysis 2 Data Presentation 2 SAS/ASSIST Software 2 The SAS/ASSIST WorkPlace Environment 3 Buttons

More information

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions YEAR 12 Trial Exam Paper 2016 FURTHER MATHEMATICS Written examination 1 s This book presents: worked solutions, giving you a series of points to show you how to work through the questions mark allocations

More information

How to use FSBForecast Excel add-in for regression analysis (July 2012 version)

How to use FSBForecast Excel add-in for regression analysis (July 2012 version) How to use FSBForecast Excel add-in for regression analysis (July 2012 version) FSBForecast is an Excel add-in for data analysis and regression that was developed at the Fuqua School of Business over the

More information

Data Visualization Techniques

Data Visualization Techniques Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

More information

JMP Book Descriptions

JMP Book Descriptions JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked

More information

THE SWALLOW-TAIL PLOT: A SIMPLE GRAPH FOR VISUALIZING BIVARIATE DATA.

THE SWALLOW-TAIL PLOT: A SIMPLE GRAPH FOR VISUALIZING BIVARIATE DATA. STATISTICA, anno LXXIV, n. 2, 2014 THE SWALLOW-TAIL PLOT: A SIMPLE GRAPH FOR VISUALIZING BIVARIATE DATA. Maria Adele Milioli Dipartimento di Economia, Università di Parma, Parma, Italia Sergio Zani Dipartimento

More information

Quality Checking an fmri Group Result (art_groupcheck)

Quality Checking an fmri Group Result (art_groupcheck) Quality Checking an fmri Group Result (art_groupcheck) Paul Mazaika, Feb. 24, 2009 A statistical parameter map of fmri group analyses relies on the assumptions of the General Linear Model (GLM). The assumptions

More information

A Modified Approach for Detection of Outliers

A Modified Approach for Detection of Outliers A Modified Approach for Detection of Outliers Iftikhar Hussain Adil Department of Economics School of Social Sciences and Humanities National University of Sciences and Technology Islamabad Iftikhar.adil@s3h.nust.edu.pk

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016

Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016 Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016 Overview Overview (short: we covered most of this in the tutorial) Why infographics and visualisation What s the

More information

DI TRANSFORM. The regressive analyses. identify relationships

DI TRANSFORM. The regressive analyses. identify relationships July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical,

More information

Linked Data Views. Introduction. Starting with Scatterplots. By Graham Wills

Linked Data Views. Introduction. Starting with Scatterplots. By Graham Wills Linked Data Views By Graham Wills gwills@research.bell-labs.com Introduction I think of a data view very generally as anything that gives the user a way of looking at data so as to gain insight and understanding.

More information

Forecasting Asia Pacific Mobile Market Trends Using Regression Analysis

Forecasting Asia Pacific Mobile Market Trends Using Regression Analysis Forecasting Asia Pacific Mobile Market Trends Using Regression Analysis Leijia Wu and Kumbesan Sandrasegaran University of Technology, Sydney lwu@eng.uts.edu.au, kumbes@eng.uts.edu.au Abstract This paper

More information

Chapter 1. Using the Cluster Analysis. Background Information

Chapter 1. Using the Cluster Analysis. Background Information Chapter 1 Using the Cluster Analysis Background Information Cluster analysis is the name of a multivariate technique used to identify similar characteristics in a group of observations. In cluster analysis,

More information

Mira Shapiro, Analytic Designers LLC, Bethesda, MD

Mira Shapiro, Analytic Designers LLC, Bethesda, MD Paper JMP04 Using JMP Partition to Grow Decision Trees in Base SAS Mira Shapiro, Analytic Designers LLC, Bethesda, MD ABSTRACT Decision Tree is a popular technique used in data mining and is often used

More information

VW 1LQH :HHNV 7KH VWXGHQW LV H[SHFWHG WR

VW 1LQH :HHNV 7KH VWXGHQW LV H[SHFWHG WR PreAP Pre Calculus solve problems from physical situations using trigonometry, including the use of Law of Sines, Law of Cosines, and area formulas and incorporate radian measure where needed.[3e] What

More information

Getting Started with JMP at ISU

Getting Started with JMP at ISU Getting Started with JMP at ISU 1 Introduction JMP (pronounced like jump ) is the new campus-wide standard statistical package for introductory statistics courses at Iowa State University. JMP is produced

More information

Enterprise Miner Tutorial Notes 2 1

Enterprise Miner Tutorial Notes 2 1 Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender

More information

Time Series Analysis by State Space Methods

Time Series Analysis by State Space Methods Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

Section 18-1: Graphical Representation of Linear Equations and Functions

Section 18-1: Graphical Representation of Linear Equations and Functions Section 18-1: Graphical Representation of Linear Equations and Functions Prepare a table of solutions and locate the solutions on a coordinate system: f(x) = 2x 5 Learning Outcome 2 Write x + 3 = 5 as

More information

Parametric. Practices. Patrick Cunningham. CAE Associates Inc. and ANSYS Inc. Proprietary 2012 CAE Associates Inc. and ANSYS Inc. All rights reserved.

Parametric. Practices. Patrick Cunningham. CAE Associates Inc. and ANSYS Inc. Proprietary 2012 CAE Associates Inc. and ANSYS Inc. All rights reserved. Parametric Modeling Best Practices Patrick Cunningham July, 2012 CAE Associates Inc. and ANSYS Inc. Proprietary 2012 CAE Associates Inc. and ANSYS Inc. All rights reserved. E-Learning Webinar Series This

More information

JMP 10 Student Edition Quick Guide

JMP 10 Student Edition Quick Guide JMP 10 Student Edition Quick Guide Instructions presume an open data table, default preference settings and appropriately typed, user-specified variables of interest. RMC = Click Right Mouse Button Graphing

More information

Introduction to Digital Image Processing

Introduction to Digital Image Processing Fall 2005 Image Enhancement in the Spatial Domain: Histograms, Arithmetic/Logic Operators, Basics of Spatial Filtering, Smoothing Spatial Filters Tuesday, February 7 2006, Overview (1): Before We Begin

More information

JMP 12.1 Quick Reference Windows and Macintosh Keyboard Shortcuts

JMP 12.1 Quick Reference Windows and Macintosh Keyboard Shortcuts Data Table Actions JMP 12.1 Quick Reference and Keyboard s Select the left or right cell. If a blinking cursor is inserted in a cell, move one character left or right through the cell contents. Select

More information

USING TEMATH S VISUALIZATION TOOLS IN CALCULUS 1

USING TEMATH S VISUALIZATION TOOLS IN CALCULUS 1 USING TEMATH S VISUALIZATION TOOLS IN CALCULUS 1 Robert E. Kowalczyk and Adam O. Hausknecht University of Massachusetts Dartmouth North Dartmouth, MA 02747 TEMATH (Tools for Exploring Mathematics) is a

More information

Lab Activity #2- Statistics and Graphing

Lab Activity #2- Statistics and Graphing Lab Activity #2- Statistics and Graphing Graphical Representation of Data and the Use of Google Sheets : Scientists answer posed questions by performing experiments which provide information about a given

More information

AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE

AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE A.P. Ruhil and Tara Chand National Dairy Research Institute, Karnal-132001 JMP commonly pronounced as Jump is a statistical software

More information

Chapter 25 Editing Windows. Chapter Table of Contents

Chapter 25 Editing Windows. Chapter Table of Contents Chapter 25 Editing Windows Chapter Table of Contents ZOOMING WINDOWS...368 RENEWING WINDOWS...375 ADDING AND DELETING...378 MOVING AND SIZING...385 ALIGNING GRAPHS...391 365 Part 2. Introduction 366 Chapter

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230

More information

Using JMP Visualizations to Build a Statistical Model George J. Hurley, The Hershey Company, Hershey, PA

Using JMP Visualizations to Build a Statistical Model George J. Hurley, The Hershey Company, Hershey, PA Paper JP05-2011 Using JMP Visualizations to Build a Statistical Model George J. Hurley, The Hershey Company, Hershey, PA ABSTRACT JMP has long been used by engineers to build various types of models, including

More information