Data Visualisation with SASIINSIGHT Software. Gerhard Held SAS Institute. Summary. Introduction
|
|
- Rudolf Hampton
- 6 years ago
- Views:
Transcription
1 Data Visualisation with SASIINSIGHT Software Gerhard Held SAS Institute Summary Recently interactive data analysis packages have become popular. These packages attempt to visualise the structure in multivariate data and offer a more intuitive approach to data analysis. The software is usually offered as stand alone tools for data visualisation (or graphical data analysis). However, its is desirable to extend the data visualisation approach beyond the borders of simple but efficient techniques for describing data such as identifying outliers or brushing of points in a plot. In this paper we will demonstrate how data visualisation techniques implemented in SASIINSIGHT software can be applied to analyze linear relationships of data including general linear models. We will also discuss how intermediate results can be saved for further processing. Introduction Since the early 1970s we have seen a revival of the exploratory data analysis tradition in statistics (Tukey, 1970). The first attempts to implement this approach in computer programs were in the early 1980 (Velleman, Hoaglin, 1981). One goal of exploratory data analysis implementations on a computer was to enable individual users to see structure in multivariate data. This gave rise to a new class of interactive style data analysis packages such as MACSPIN (Donohue, Donohue, Gasko 1986) or Data Desk, pioneered by Paul Velleman, on "personal" platforms such as the Apple Macintosh. A major attraction of these packages was that they the application of dynamic graphical methods to data, defined as the "direct. manipulation of elements of a graph on a computer screen" (Cleveland, McGill 1988). As the aim of dynamic graphical methods is to visualise the structure of multivariate data, "data visualisation" is often used as a synonym. One drawback of these early implementations was that they focused on the data visualisation aspect without integrating this functionality into a wider spectrum of data analysis techniques. As early as 1980 John Tukey requested that "we need both the exploratory and confirmatory (analysis)" (Tukey, 1980). Since then we see more general data analysis packages, such as S-PLUS or The SAS System, including data visualisation tools as a speciality. 394
2 ( I I In this paper we will: introduce SASIINSIGHT software as an example of data visualisation packages available today; discuss the relative merits of data visualisation as opposed to traditional data analysis methods; and point out some new functionality within SASIINSIGHT software which enable tighter integration of data visualisation and traditional methods. Graphical Exploration ~ l[ ~ ~: r h t-,. 1'- \"; i' ~ ", 'i,," f' f: t " t, l ~ ~ ~, ~' - j.~, ~'; i r- ~. j', ~, " t, j; " ~- ; ;," t ~',1 Ji: ~. i\ tr, ~. J!~~ \ ~ \. SASIINSIGHT software, an integrated component of the SAS System, is a dynamic tool for data exploration and analysis. With it you can explore data through interactive histograms, box plots, scatter plots, and 3D rotating plots. Youcan examine correlations and principal components to find the structure of your data. Finally you can construct predictive models based on relationships in the data. All interactive graphs and analyses are linked across multiple windows, Any change in one window is immediately reflected in all windows related to the same data. SASIINSIGHT software is implemented on a large variety of hardware platforms: many popular UNIX-Workstations, Digital Equipment Minicomputer and'decstations running VMS, ffim Mainframes (running MVS, CMS, and VSE), and with Release 6.08 of the SAS System, also on workstations running Windows 3.1 and OS/ A powerful workstation is ideal to perform graphic data visualisation. UNIX workstations, or 486- based PCs running OS/2 or Windows are recommended. As SASIINSIGHT software allows many views of the same data set it is also recommended to use large monitors (15" or larger). The discussions in this paper are based on a data set containing statistics about 407 large commercial companies in 1991 (in terms of sales). Variables include the Company Name, Nationality, Industry Type, Number of Employees, and Sales, Profits, Assets and Equity in millions of U.S. dollars. As SASIINSIGHT software is an integrated component it can be invoked from within the SAS System by simply typing in its name (INSIGHT) or by using pull-down menus. SASIINSIGHT then prompts the user for the data set. We select the business data set (COMP9I). The user will then see a data window, the data set presented as a table (rows are companies and columns are variables - see Figure 1). In a data window you can sort, edit, and extract subsets of your data. You can also assign measurement levels and default roles that determine how your variables are used in graphs and analyses. As we would like to identify data points as companies, we assign variable COMPANY to be a LABEL using the DATA menu. Click on COMPANY, select DATA, then PROPERTIES and finally LABEL (sequence: DATA: PROPERTIES : LABEL). Previous users of SASIINSIGHT software will notice at this point that the pull-down menus have been regrouped in a more logical structure. All analysis and graphical f- l i f:. ~ 't~ ~~~ 395 :.,. -/ ~:.:: :.. ". ~--' ","
3 functions have been combined in the ANALYSIS menu; a new DATA menu covers all data manipulation functions. Figure 1: Data table ofcomp91 data set We would like to explore measures of economic success for the selected companies (profits, SALES) as well as find factors which determine economic success. Release 6.08 of SAS/INSIGHT software offers box plots as an additional way to explore distributions graphically. Selecting BOX PLOT, PROFITS as Y (graph variable), and INDUSTRY as group variable creates the side-by-side box plot shown in Figure 2. Note that group processing (available with all analyses and graphics) has also been added with Release In Figure 2 we have already clicked on a few data points (companies) with extreme values for PROFIT. Notice that mm is one of the least profitable companies (2,827 Billion dollars loss) in 1991 whereas in 1990 it was still one of the most profitable companies (6,020 Billion. dollars profit)! 396
4 Figure 2: Box plot of profits for industry groups.~. The labeled extreme values somewhat distort other relationships so we click or drag with the mouse over extreme values, thus creating a rectangular brush, and select EDIT : OBSERVATIONS (RECORDS) : HIDE IN GRAPHS. This deletes extreme values from the graph (not from any calculations!) and realigns the graph (Figure 3). In addition we click on INDUSTRY and activate the new MARKERS window (EDIT: WINDOWS: MARKERS). We could assign an individual marker to each INDUSTRY or click on the "multiple markers" button at the bottom of the MARKERS window. This automatically assigns a different marker to each value of INDUSTRY. In the same way, colours can also be assigned individually or automatically (a new feature in Release 6.08 of SAS/INSIGHT software). The markers are now an "observation state", i.e. companies retain their marker. for any subsequent graphs unless they are changed. Figure 3 now shows more clearly that some industries are consistently profitable (e.g. the Pharmaceutical industry), others show a large internal variation (e.g. Computer industry, and Oil Refining) and still others feature quite a number of outliers (e.g. Automobiles, food, and Electronics industries). 397
5 Figure 3: Box plot of profits for industry groups (outliers removed) Figure 4: Scatter plot matrix of EMPLOYS, SALES, and ASSETS \ \ 398
6 ',,"";-_ ~ -, _.~ 'J _, ~, 'c',_ '." J c " _. --' _ ~_-::. ~_ ~_ s~~_ ":_'-' -_ To explore the other measure of economic success, SALES, we take a different approach. We suspect that the number of employees (EMPLOYS) and ASSETS may correlate with SALES. Therefore we click on all three variables in the data table and then select ANALYSE: SCATTER PLOT (X Y) which creates Figure 4. Obviously Figure 4 is distorted again by some very big companies (e.g. Toyota Motor high in SALES, General Electric high in ASSETS and EMPLOYS), and also the variation seems to increase with increasing values of SALES, ASSETS and EMPLOYS. As this is an overall pattern, we may decide for a data transformation rather than hiding extreme values again. To do that we simply click on each of the variables in the graph and select EDIT : VARIABLES : LOG(X). This calculates the logarithm for each variable and adjusts the scatter plot matrix accordingly. The new transformed variables are named L_SALES, L_ASSETS, and L_EMPLOY. Figure 5: Scatter plot ofl_employ by L_SALES For Figure 5 we have selected the "MAGNIFYING GLAS" from the TOOLS window (EDIT: WINDOWS: TOOLS) and dragged over the L_SALES and L_EMPLOY portion of scatter plot matrix to focus in on this part. It shows a clear linear structure of L_SALES to L_EMPLOY and also reveals that oil companies (upward pointing triangles in Figure 5) consistently generate larger L _SALES, as could have been expected ba~ed on their L_EMPLOY value. If needed we could also explore the data in three dimensions using a rotating plot. One function of the rotating plot is to show multivariate outliers. As it would be not adequate to describe insights of rotations by text alone we will not try to illustrate this now. 399
7 Previous results suggest that we might have a separate look on the influence of INDUSTRY type on this relationship. A simple graphical way to do this would be to generate a series of scatterplots ofl_salesby L_EMPLOY grouped by INDUSTRY. Figure 6 shows two of the industries as an example, Oil Refining and Pharmaceuticals. It is obvious that the slope ofa regression curve ofl_sales on L_EMPLOY would be very similar for. both industries but the intercept for Pharmaceuticals would be negative, meaning that the Pharmaceutical industry would require many more employees to become profitable. Figure 6: Scatter plot ofl _EMPLOY BY L _SALES for oil and pharmaceutical industries Model Formulation We have now enough evidence to formulate a model on sales as a measure of economic success. SAS/INSIGHT supports the traditional parametric regression analysis, but also the general linear model as implemented in the GLM procedure of SAS/STAT software. SAS/INSIGHT now also offers an implementation of the GENERALISED linear model supporting response distributions from the exponential family (normal, inverse Gaussian, gamma, Poisson, binomial); and corresponding canonical link functions (identity, logit, probit, complementary log-log, and power link function). Both the general and the generalized linear model have been introduced in Release 6.08 of SAS/INSIGHT software. In addition, SAS/INSIGHT covers residual plots (residual by predicted, residual 400
8 normal QQ plot and partial leverage plots} as well as parametric and nonparametric fit curves (splines, kernel estimation).,. As generalized linear models are not adequate for our data we will confine our test to a linear model. The model we would like to test isl _SALES as the response variable and L_EMPLOY and INDUSTRY as factors. We also include the interaction of both' factors into the model. Figure 7 shows part of the results. The R-Square of indicates that 51,4% ofl_sales can be explained by the variables in the model. All variables in the model including the interaction are significant (prob>f associated with the F-Test). Figure 7:. General Linear Model for L_SALES Often it is required to save results for further processing. This can be easily done if the aim is to integrate graphics or reports in a text document as for instance for this paper. A typical problem area, however, is to save statististical tables in a computerised form to apply to new data or to reformat the output. For this purpose SASIINSIGHT software supports the Output Delivery System of the SAS System (ODS). This is another new feature in Release 6.08 of the SAS System. Procedures using the ODS produce their results in the form of output objects, data structures in machine precision that persist in memory. With output objects you can create data sets, produce listings, reorganize output and create and save custom report formats. The ODS is easily activated. For example if you would mark with the mouse the Summary of Fit, Analysis of Variance and Type III Tests tables of Figure 7, then select FILE : SAVE : TABLES, the system would respond with a Note window indicating that
9 tables were saved as output objects. Using the OUTPUT procedure of the SAS System the saved tables can then be accessed and manipulated (SAS Institute, 1992, see Figure 8). Figure 8: SASIINSIGHT tables as output objects Conclusion We had three goals with this article. Concerning the first goal it could be shown that SASIINSIGHT software adequately covers dynamic graphical data analysis. All interactive graphs and analyses are linked across multiple windows, and changes in one window are immediately reflected in all windows related to the same data. SASIINSIGHT offers typical tools for data visualisation, such as identification and labeling of points, use of colours and markers, brushing, scatterplot matrices, and 3D rotating plots. The latest implementation of SASIINSIGHT software includes additional functionality beyond the standards of data visualisation such as: interactive box plots, group processing for graphical and statistical analyses, integration of general and generalized linear models, and saving of any results (data, graphics, or output) in a form ready for immediated further processmg. Dynamic graphical methods greatly facilitate exploratory data analysis, using state-of-theart computer technology. This technology helps the analyst to concentrate on finding structures in multivariate data rather then deal with the software mechanics of how to code an analysis. Graphical data analysis is a great time saver. All steps discussed in this paper took roughly 15 minutes to accomplish. It is up to the reader to determine how long this would have been taken using traditional methods and tools. A word of caution: data visualisation gives new insights to data but needs traditional hypotheses testing methods as a complement. Therefore, users of these systems are advised to look for integrated software covering both approaches to data analysis. 402
10 References Cleveland, W.S. & McGill, M.E. (Ed.), Dynamic Graphics for Statistics. Belmont, Ca.: Wadsworth Inc. Donoho, A.W., Donoho, D.L. & vasko, M.(1986). MACSPIN Graphical Data Analysis Software. Austin, Texas: D2 Software SAS Institute Inc.(1993), SASIINSIGHT User's Guide, Version 6, Second Edition, Cary, NC: SAS Institute Inc. Tukey, J.W. (1970). Exploratory Data Analysis. Vol. I. Reading M.A.: Addison-Wesley Tukey, J.W.(1980). We need both exploratory and confirmatory, The American Statistician, 34, Velleman, P.F. & Hoaglin.D.C. (1981). Applications,. Basics, and Computing of Exploratory Data Analysis. Boston: Duxbury Press 403
Trellis Displays. Definition. Example. Trellising: Which plot is best? Historical Development. Technical Definition
Trellis Displays The curse of dimensionality as described by Huber [6] is not restricted to mathematical statistical problems, but can be found in graphicbased data analysis as well. Most plots like histograms
More informationSTAT 3304/5304 Introduction to Statistical Computing. Introduction to SAS
STAT 3304/5304 Introduction to Statistical Computing Introduction to SAS What is SAS? SAS (originally an acronym for Statistical Analysis System, now it is not an acronym for anything) is a program designed
More informationCREATING THE ANALYSIS
Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219
More information1 Introduction. Abstract
262 Displaying Correlations using Position, Motion, Point Size or Point Colour Serge Limoges Colin Ware William Knight School of Computer Science University of New Brunswick P.O. Box 4400 Fredericton,
More informationCREATING THE DISTRIBUTION ANALYSIS
Chapter 12 Examining Distributions Chapter Table of Contents CREATING THE DISTRIBUTION ANALYSIS...176 BoxPlot...178 Histogram...180 Moments and Quantiles Tables...... 183 ADDING DENSITY ESTIMATES...184
More informationTHE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann
Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG
More informationSAS/INSIGHT lii SOFTWARE: DATA VISUALISATION WITH THE,SAS SYSTEM HELD BY: GERHARD HELD AND THOMAS EMMERICH I SAS INSTITUTE. 1.
SAS/INSIGHT lii SOFTWARE: DATA VISUALISATION WITH THE,SAS SYSTEM HELD BY: GERHARD HELD AND THOMAS EMMERICH I SAS INSTITUTE AUTHOR: GERHARD HELD 1. Introduction ~: It is a commonplace that the hardware
More informationBatch Processing in SAS/INSIGHT Software
Batch Processing in SAS/INSIGHT Software Heman Robinson, SAS Institute Inc., Cary, NC ABSTRACT Graphical user interfaces have replaced batch statements for many data analysis tasks. However, batch processing
More informationSAS (Statistical Analysis Software/System)
SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:
More informationHow to use FSBforecast Excel add in for regression analysis
How to use FSBforecast Excel add in for regression analysis FSBforecast is an Excel add in for data analysis and regression that was developed here at the Fuqua School of Business over the last 3 years
More informationRick Wicklin, SAS Institute Inc. Peter Rowe, CART Statistics & Modeling Team, Wachovia Corporation
Paper 290-2007 An Introduction to SAS Stat Studio: A Programmable Successor to SAS/INSIGHT Rick Wicklin, SAS Institute Inc. Peter Rowe, CART Statistics & Modeling Team, Wachovia Corporation OVERVIEW SAS
More informationExploratory Data Analysis EDA
Exploratory Data Analysis EDA Luc Anselin http://spatial.uchicago.edu 1 from EDA to ESDA dynamic graphics primer on multivariate EDA interpretation and limitations 2 From EDA to ESDA 3 Exploratory Data
More informationChapter 1 Introduction. Chapter Contents
Chapter 1 Introduction Chapter Contents OVERVIEW OF SAS/STAT SOFTWARE................... 17 ABOUT THIS BOOK.............................. 17 Chapter Organization............................. 17 Typographical
More informationUsing Excel for Graphical Analysis of Data
Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are
More informationMinitab 18 Feature List
Minitab 18 Feature List * New or Improved Assistant Measurement systems analysis * Capability analysis Graphical analysis Hypothesis tests Regression DOE Control charts * Graphics Scatterplots, matrix
More informationSAS Visual Analytics 8.2: Getting Started with Reports
SAS Visual Analytics 8.2: Getting Started with Reports Introduction Reporting The SAS Visual Analytics tools give you everything you need to produce and distribute clear and compelling reports. SAS Visual
More informationSAS Structural Equation Modeling 1.3 for JMP
SAS Structural Equation Modeling 1.3 for JMP SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. SAS Structural Equation Modeling 1.3 for JMP. Cary,
More informationStatistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS
Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix Paula Ahonen-Rainio Maa-123.3530 Visual Analysis in GIS 11.11.2015 Topics today YOUR REPORTS OF A-2 Thematic maps with charts
More informationExploratory model analysis
Exploratory model analysis with R and GGobi Hadley Wickham 6--8 Introduction Why do we build models? There are two basic reasons: explanation or prediction [Ripley, 4]. Using large ensembles of models
More informationTechnical Support Minitab Version Student Free technical support for eligible products
Technical Support Free technical support for eligible products All registered users (including students) All registered users (including students) Registered instructors Not eligible Worksheet Size Number
More informationMultiple Regression White paper
+44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms
More informationMINITAB Release Comparison Chart Release 14, Release 13, and Student Versions
Technical Support Free technical support Worksheet Size All registered users, including students Registered instructors Number of worksheets Limited only by system resources 5 5 Number of cells per worksheet
More informationGeneralized Additive Model
Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1
More informationTips and Guidance for Analyzing Data. Executive Summary
Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationBootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping
Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,
More information( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.
Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING
More informationSYS 6021 Linear Statistical Models
SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are
More informationMaking the Transition from R-code to Arc
Making the Transition from R-code to Arc Sanford Weisberg Supported by the National Science Foundation Division of Undergraduate Education Grants 93-54678 and 96-52887. May 17, 2000 Arc is the revision
More informationLearn What s New. Statistical Software
Statistical Software Learn What s New Upgrade now to access new and improved statistical features and other enhancements that make it even easier to analyze your data. The Assistant Data Customization
More informationChapter 13 Multivariate Techniques. Chapter Table of Contents
Chapter 13 Multivariate Techniques Chapter Table of Contents Introduction...279 Principal Components Analysis...280 Canonical Correlation...289 References...298 278 Chapter 13. Multivariate Techniques
More informationLOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.
LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationTableLens: A Clear Window for Viewing Multivariate Data Ramana Rao July 11, 2006
TableLens: A Clear Window for Viewing Multivariate Data Ramana Rao July 11, 2006 Can a few simple operators on a familiar and minimal representation provide much of the power of exploratory data analysis?
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationIntermediate SAS: Statistics
Intermediate SAS: Statistics OIT TSS 293-4444 oithelp@mail.wvu.edu oit.wvu.edu/training/classmat/sas/ Table of Contents Procedures... 2 Two-sample t-test:... 2 Paired differences t-test:... 2 Chi Square
More informationChapter 41 SAS/INSIGHT Statements. Chapter Table of Contents
Chapter 41 SAS/INSIGHT Statements Chapter Table of Contents DETAILS...706 PROCINSIGHTStatement...707 WINDOWStatement...708 OPENStatement...708 BYStatement...709 CLASSStatement...709 BARStatement...709
More informationChemometrics. Description of Pirouette Algorithms. Technical Note. Abstract
19-1214 Chemometrics Technical Note Description of Pirouette Algorithms Abstract This discussion introduces the three analysis realms available in Pirouette and briefly describes each of the algorithms
More informationStatistics Statistical Computing Software
Statistics 135 - Statistical Computing Software Mark E. Irwin Department of Statistics Harvard University Autumn Term Monday, September 19, 2005 - January 2006 Copyright c 2005 by Mark E. Irwin Personnel
More informationGLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015
GLM II Basic Modeling Strategy 2015 CAS Ratemaking and Product Management Seminar by Paul Bailey March 10, 2015 Building predictive models is a multi-step process Set project goals and review background
More informationTips on JMP ing into Mixture Experimentation
Tips on JMP ing into Mixture Experimentation Daniell. Obermiller, The Dow Chemical Company, Midland, MI Abstract Mixture experimentation has unique challenges due to the fact that the proportion of the
More informationThe SAS interface is shown in the following screen shot:
The SAS interface is shown in the following screen shot: There are several items of importance shown in the screen shot First there are the usual main menu items, such as File, Edit, etc I seldom use anything
More informationIntro to Stata. University of Virginia Library data.library.virginia.edu. September 16, 2014
to 1/12 Intro to University of Virginia Library data.library.virginia.edu September 16, 2014 Getting to Know to 2/12 Strengths Available A full-featured statistical programming language For Windows, Mac
More informationData Analysis: Displaying Data - Deception with Graphs
This module looks at ways in which data can be deceptively displayed with graphs. Such deception can be defined as "the deliberate or inadvertent manipulation or distortion of the form or content of a
More informationData Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
More informationIntroduction to Exploratory Data Analysis
Introduction to Exploratory Data Analysis Ref: NIST/SEMATECH e-handbook of Statistical Methods http://www.itl.nist.gov/div898/handbook/index.htm The original work in Exploratory Data Analysis (EDA) was
More informationData Management - 50%
Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define
More informationGeneralized Additive Models
Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.
More informationWhat s New in Spotfire DXP 1.1. Spotfire Product Management January 2007
What s New in Spotfire DXP 1.1 Spotfire Product Management January 2007 Spotfire DXP Version 1.1 This document highlights the new capabilities planned for release in version 1.1 of Spotfire DXP. In this
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More informationGraph Structure Over Time
Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines
More informationUsing Statistical Techniques to Improve the QC Process of Swell Noise Filtering
Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering A. Spanos* (Petroleum Geo-Services) & M. Bekara (PGS - Petroleum Geo- Services) SUMMARY The current approach for the quality
More informationUsing Excel for Graphical Analysis of Data
EXERCISE Using Excel for Graphical Analysis of Data Introduction In several upcoming experiments, a primary goal will be to determine the mathematical relationship between two variable physical parameters.
More informationSTAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression
STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression analysis. Analysis of Variance: one way classification,
More informationPHARMACOKINETIC STATISTICAL ANALYSIS SYSTEM - - A SAS/AF AND SAS/FSP APPLICATION
PHARMACOKINETIC STATISTICAL ANALYSIS SYSTEM - - A SAS/AF AND SAS/FSP APPLICATION Sharon M. Passe, Hoffmann-La Roche Inc. Andrea L Contino, Hoffmann-La Roche Inc. ABSTRACT The statistician responsible for
More informationPoints Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked
Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations
More informationEffects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex
Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex Keiko I. Powers, Ph.D., J. D. Power and Associates, Westlake Village, CA ABSTRACT Discrete time series
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS code SAS (originally Statistical Analysis Software) is a commercial statistical software package based on a powerful programming
More informationRelease notes for StatCrunch mid-march 2015 update
Release notes for StatCrunch mid-march 2015 update A major StatCrunch update was made on March 18, 2015. This document describes the content of the update including major additions to StatCrunch that were
More informationIntroduction to Mplus
Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus
More informationGraphical Analysis of Data using Microsoft Excel [2016 Version]
Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.
More informationCS 229: Machine Learning Final Report Identifying Driving Behavior from Data
CS 9: Machine Learning Final Report Identifying Driving Behavior from Data Robert F. Karol Project Suggester: Danny Goodman from MetroMile December 3th 3 Problem Description For my project, I am looking
More information1 Introducing SAS and SAS/ASSIST Software
1 CHAPTER 1 Introducing SAS and SAS/ASSIST Software What Is SAS? 1 Data Access 2 Data Management 2 Data Analysis 2 Data Presentation 2 SAS/ASSIST Software 2 The SAS/ASSIST WorkPlace Environment 3 Buttons
More informationYEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions
YEAR 12 Trial Exam Paper 2016 FURTHER MATHEMATICS Written examination 1 s This book presents: worked solutions, giving you a series of points to show you how to work through the questions mark allocations
More informationHow to use FSBForecast Excel add-in for regression analysis (July 2012 version)
How to use FSBForecast Excel add-in for regression analysis (July 2012 version) FSBForecast is an Excel add-in for data analysis and regression that was developed at the Fuqua School of Business over the
More informationData Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
More informationJMP Book Descriptions
JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked
More informationTHE SWALLOW-TAIL PLOT: A SIMPLE GRAPH FOR VISUALIZING BIVARIATE DATA.
STATISTICA, anno LXXIV, n. 2, 2014 THE SWALLOW-TAIL PLOT: A SIMPLE GRAPH FOR VISUALIZING BIVARIATE DATA. Maria Adele Milioli Dipartimento di Economia, Università di Parma, Parma, Italia Sergio Zani Dipartimento
More informationQuality Checking an fmri Group Result (art_groupcheck)
Quality Checking an fmri Group Result (art_groupcheck) Paul Mazaika, Feb. 24, 2009 A statistical parameter map of fmri group analyses relies on the assumptions of the General Linear Model (GLM). The assumptions
More informationA Modified Approach for Detection of Outliers
A Modified Approach for Detection of Outliers Iftikhar Hussain Adil Department of Economics School of Social Sciences and Humanities National University of Sciences and Technology Islamabad Iftikhar.adil@s3h.nust.edu.pk
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationInfographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016
Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016 Overview Overview (short: we covered most of this in the tutorial) Why infographics and visualisation What s the
More informationDI TRANSFORM. The regressive analyses. identify relationships
July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical,
More informationLinked Data Views. Introduction. Starting with Scatterplots. By Graham Wills
Linked Data Views By Graham Wills gwills@research.bell-labs.com Introduction I think of a data view very generally as anything that gives the user a way of looking at data so as to gain insight and understanding.
More informationForecasting Asia Pacific Mobile Market Trends Using Regression Analysis
Forecasting Asia Pacific Mobile Market Trends Using Regression Analysis Leijia Wu and Kumbesan Sandrasegaran University of Technology, Sydney lwu@eng.uts.edu.au, kumbes@eng.uts.edu.au Abstract This paper
More informationChapter 1. Using the Cluster Analysis. Background Information
Chapter 1 Using the Cluster Analysis Background Information Cluster analysis is the name of a multivariate technique used to identify similar characteristics in a group of observations. In cluster analysis,
More informationMira Shapiro, Analytic Designers LLC, Bethesda, MD
Paper JMP04 Using JMP Partition to Grow Decision Trees in Base SAS Mira Shapiro, Analytic Designers LLC, Bethesda, MD ABSTRACT Decision Tree is a popular technique used in data mining and is often used
More informationVW 1LQH :HHNV 7KH VWXGHQW LV H[SHFWHG WR
PreAP Pre Calculus solve problems from physical situations using trigonometry, including the use of Law of Sines, Law of Cosines, and area formulas and incorporate radian measure where needed.[3e] What
More informationGetting Started with JMP at ISU
Getting Started with JMP at ISU 1 Introduction JMP (pronounced like jump ) is the new campus-wide standard statistical package for introductory statistics courses at Iowa State University. JMP is produced
More informationEnterprise Miner Tutorial Notes 2 1
Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender
More informationTime Series Analysis by State Space Methods
Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY
More informationCHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA
Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent
More information8. MINITAB COMMANDS WEEK-BY-WEEK
8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are
More informationSTAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.
STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,
More informationData analysis using Microsoft Excel
Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data
More informationSection 18-1: Graphical Representation of Linear Equations and Functions
Section 18-1: Graphical Representation of Linear Equations and Functions Prepare a table of solutions and locate the solutions on a coordinate system: f(x) = 2x 5 Learning Outcome 2 Write x + 3 = 5 as
More informationParametric. Practices. Patrick Cunningham. CAE Associates Inc. and ANSYS Inc. Proprietary 2012 CAE Associates Inc. and ANSYS Inc. All rights reserved.
Parametric Modeling Best Practices Patrick Cunningham July, 2012 CAE Associates Inc. and ANSYS Inc. Proprietary 2012 CAE Associates Inc. and ANSYS Inc. All rights reserved. E-Learning Webinar Series This
More informationJMP 10 Student Edition Quick Guide
JMP 10 Student Edition Quick Guide Instructions presume an open data table, default preference settings and appropriately typed, user-specified variables of interest. RMC = Click Right Mouse Button Graphing
More informationIntroduction to Digital Image Processing
Fall 2005 Image Enhancement in the Spatial Domain: Histograms, Arithmetic/Logic Operators, Basics of Spatial Filtering, Smoothing Spatial Filters Tuesday, February 7 2006, Overview (1): Before We Begin
More informationJMP 12.1 Quick Reference Windows and Macintosh Keyboard Shortcuts
Data Table Actions JMP 12.1 Quick Reference and Keyboard s Select the left or right cell. If a blinking cursor is inserted in a cell, move one character left or right through the cell contents. Select
More informationUSING TEMATH S VISUALIZATION TOOLS IN CALCULUS 1
USING TEMATH S VISUALIZATION TOOLS IN CALCULUS 1 Robert E. Kowalczyk and Adam O. Hausknecht University of Massachusetts Dartmouth North Dartmouth, MA 02747 TEMATH (Tools for Exploring Mathematics) is a
More informationLab Activity #2- Statistics and Graphing
Lab Activity #2- Statistics and Graphing Graphical Representation of Data and the Use of Google Sheets : Scientists answer posed questions by performing experiments which provide information about a given
More informationAN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE
AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE A.P. Ruhil and Tara Chand National Dairy Research Institute, Karnal-132001 JMP commonly pronounced as Jump is a statistical software
More informationChapter 25 Editing Windows. Chapter Table of Contents
Chapter 25 Editing Windows Chapter Table of Contents ZOOMING WINDOWS...368 RENEWING WINDOWS...375 ADDING AND DELETING...378 MOVING AND SIZING...385 ALIGNING GRAPHS...391 365 Part 2. Introduction 366 Chapter
More informationSTATISTICS (STAT) Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).
More informationCHAPTER 1 INTRODUCTION
Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More informationIntroducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone
Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230
More informationUsing JMP Visualizations to Build a Statistical Model George J. Hurley, The Hershey Company, Hershey, PA
Paper JP05-2011 Using JMP Visualizations to Build a Statistical Model George J. Hurley, The Hershey Company, Hershey, PA ABSTRACT JMP has long been used by engineers to build various types of models, including
More information