LISA: Explore JMP Capabilities in Design of Experiments. Liaosa Xu June 21, 2012

Similar documents
CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

DESIGN OF EXPERIMENTS and ROBUST DESIGN

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

Computer Experiments: Space Filling Design and Gaussian Process Modeling

Chapter 23 Introduction to the OPTEX Procedure

Error Analysis, Statistics and Graphing

OPTIMISATION OF PIN FIN HEAT SINK USING TAGUCHI METHOD

QstatLab: software for statistical process control and robust engineering

Section 7 Mixture Design Tutorials

Split-Plot General Multilevel-Categoric Factorial Tutorial

Using the DATAMINE Program

Analysis of Two-Level Designs

Introduction to Design of Experiments for UV/EB Scientists and Engineers

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

Design of Experiments for Coatings

Problem 8-1 (as stated in RSM Simplified

JMP Book Descriptions

Using Excel for Graphical Analysis of Data

1. Assumptions. 1. Introduction. 2. Terminology

Using Excel for Graphical Analysis of Data

Historical Data RSM Tutorial Part 1 The Basics

Multicollinearity and Validation CIVL 7012/8012

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Design of Experiments

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Table of Contents (As covered from textbook)

Math 227 EXCEL / MEGASTAT Guide

Applying Supervised Learning

Enduring Understandings: Some basic math skills are required to be reviewed in preparation for the course.

Demonstration of the DoE Process with Software Tools

Basics of Multivariate Modelling and Data Analysis

Applied Regression Modeling: A Business Approach

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Generalized Additive Model

Practical Design of Experiments: Considerations for Iterative Developmental Testing

CHAPTER 2 DESIGN DEFINITION

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Linear Methods for Regression and Shrinkage Methods

EVALUATION OF OPTIMAL MACHINING PARAMETERS OF NICROFER C263 ALLOY USING RESPONSE SURFACE METHODOLOGY WHILE TURNING ON CNC LATHE MACHINE

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Predict Outcomes and Reveal Relationships in Categorical Data

Verification and Validation of X-Sim: A Trace-Based Simulator

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Optimal Partially Replicated Cube, Star and Center Runs in Face-centered Central Composite Designs

STANDARDS OF LEARNING CONTENT REVIEW NOTES ALGEBRA II. 3 rd Nine Weeks,

Removing Subjectivity from the Assessment of Critical Process Parameters and Their Impact

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Box-Cox Transformation for Simple Linear Regression

Nonparametric Approaches to Regression

2. On classification and related tasks

7. Collinearity and Model Selection

Tips on JMP ing into Mixture Experimentation

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

2016 Stat-Ease, Inc. & CAMO Software

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA

Computer Experiments. Designs

Subset Selection in Multiple Regression

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data

2016 Stat-Ease, Inc. Taking Advantage of Automated Model Selection Tools for Response Surface Modeling

Fathom Dynamic Data TM Version 2 Specifications

SASEG 9B Regression Assumptions

Dealing with Categorical Data Types in a Designed Experiment

ALGEBRA II A CURRICULUM OUTLINE

Math Lab- Geometry Pacing Guide Quarter 3. Unit 1: Rational and Irrational Numbers, Exponents and Roots

Selecting the Right Central Composite Design

2014 Stat-Ease, Inc. All Rights Reserved.

JEDIS: The JMP Experimental Design Iterative Solver 1

Supporting Information. High-Throughput, Algorithmic Determination of Nanoparticle Structure From Electron Microscopy Images

correlated to the Michigan High School Mathematics Content Expectations

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12

The Curse of Dimensionality

Structural Health Monitoring Using Guided Ultrasonic Waves to Detect Damage in Composite Panels

Quality Improvement in the Multi-response Problem by Using Clustering Characteristic

Everything taken from (Hair, Hult et al. 2017) but some formulas taken elswere or created by Erik Mønness.

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

Local Minima in Regression with Optimal Scaling Transformations

Maths Year 11 Mock Revision list

Development of a tool for the easy determination of control factor interaction in the Design of Experiments and the Taguchi Methods

Excel Forecasting Tools Review

The Automation of the Feature Selection Process. Ronen Meiri & Jacob Zahavi

Birkdale High School - Higher Scheme of Work

Montana City School GRADE 5

AQA GCSE Maths - Higher Self-Assessment Checklist

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Introduction to ANSYS CFX

Grade 7 Mathematics STAAR/TEKS 2014

Design and Analysis of Multi-Factored Experiments

Activity 7. Modeling Exponential Decay with a Look at Asymptotes. Objectives. Introduction. Modeling the Experiment: Casting Out Sixes

7 Fractions. Number Sense and Numeration Measurement Geometry and Spatial Sense Patterning and Algebra Data Management and Probability

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Integers & Absolute Value Properties of Addition Add Integers Subtract Integers. Add & Subtract Like Fractions Add & Subtract Unlike Fractions

Exploratory model analysis

Course: Algebra MP: Reason abstractively and quantitatively MP: Model with mathematics MP: Look for and make use of structure

9-1 GCSE Maths. GCSE Mathematics has a Foundation tier (Grades 1 5) and a Higher tier (Grades 4 9).

PITSCO Math Individualized Prescriptive Lessons (IPLs)

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

Stat 5303 (Oehlert): Response Surfaces 1

AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE

Transcription:

LISA: Explore JMP Capabilities in Design of Experiments Liaosa Xu June 21, 2012

Course Outline Why We Need Custom Design The General Approach JMP Examples Potential Collinearity Issues Prior Design Evaluations Augmented Design Design from Candidate Set

Why Custom Design Sometimes standard designs may not work, Computer generated (Custom/Optimal) designs are alternatives. An irregular experimental region Involving categorical and continuous variables A nonstandard model Unusual sample size requirements

Why Custom Design An irregular experimental region (Montgomery 2009) If the region of interest for the experiment is not a cube or a sphere. standard designs may not be possible. An experimenter is investigating the properties of a particular adhesive. x 1 is the amount of adhesive, x 2 is the cure temperature. The prior knowledge is: a) If too little adhesive and too low cure temperature, the parts will not bond. b) If both factors are at high levels, the parts will be either damaged by heat stress or an inadequate bond will result.

Why Custom Design Categorical Variables Custom design can obtain a model in the presence of categorical variables with multiple levels. Examples of categorical factors are machine, operator, solvent and catalyst.

Why Custom Design A nonstandard model Sometimes the experimenter may have some special knowledge or insight about the process being studied that may suggest a nonstandard model (specific interaction terms and specific quadratic terms. For example, the model proposed from prior knowledge is Note: this is not full response surface model

Why Custom Design Unusual sample size requirements Occasionally, we may need reduce the runs required by standard designs. For example, we intend to fit a second-order model with four variables. The model has 15 terms to estimate. Central composite design (CCD) requires 26-30 runs. Since the runs are expensive or time-consuming, we only could afford less than 20 runs. We can use computer-generated design to reduce the number of runs.

The General Approach of Custom Design The usual approach for Custom Design is: 1) Specify a model 2) Determine the region of interest Linear Constraints 3) Select number of runs to make 4) Specify the optimality criterion 5) Create the design, consider adding some center-point runs.

Model Specification in Custom Design The Model specification: All designs are model dependent. By default, JMP put all main effects as the model terms. For prediction purpose, consider using Response Surface Model, with I-Optimal criterion. To identify the important factors, consider adding two way interactions between each pair of factors. Use your educated guess to specify the model terms.

Optimality Criterion in Custom Design Custom design is also called optimal design since it is the best with respect to some criterion. Popular choice is D-Optimal design, which gives the most precise estimate of the effects jointly. D-Optimal designs are most useful to determine the important factors in the model. (Most appropriate for screening experiment) D-Optimal designs are not preferred when primary goal is prediction.

Optimality Criterion in Custom Design Another choice in JMP is I-optimal design, which seeks to minimize the average prediction variance over the design space. When the prediction ability of the model is the major concern, the I-Optimal Design is preferred. JMP selects the I-Optimal Design by default for response surface designs.

JMP Example of Custom Design A three factor (two numerical plus one categorical) design was used to determine the operating conditions for modeling the amount of extraction. x 1 : centrifuge inlet temperature [40, 80] x 2 : extraction temperature [40, 60] x 3 : solvent A, B and C We use indicator variable z 1 and z 2 to denote x 3 s discrete levels z 1 1, if A is assigned 0, otherwise The response surface model can be written as z 2 Centrifuge inlet minus extraction >0 i.e., x 1 -x 2 0 1, if Bisassigned 0, otherwise 2 2 y 0 1x1 2x2 12x1x2 11x1 22x2 z z zx zx zx zx 1 1 2 2 11 1 1 21 2 1 12 1 2 22 2 2

JMP Example of Custom Design Design space for x 1 and x 2 60 x 1 -x 2 0 x 2 50 Feasible Region 40 40 50 60 70 80 x 1

JMP Example of Custom Design JMP->DOE->Custom Design Factors Constraints Model Specification

JMP Example of Custom Design Set Random Seed to be 1000, (Illustration only, not in Practice!!) Simulate Responses (used for collinearity detection later) Optimality Criterion Set Number of Starts to be 1000 Let s change to D optimality Number of Runs Click here to generate design

JMP Example of Custom Design Design Output After you create the design, get some rough evaluations of your design before you run it! Evaluation Information (More details later) Randomize your Design to make table!!

JMP Example of Custom Design Simulated response You can have your data here!

Collinearity Problems Because the custom designs considered were not orthogonal, multicollinearity is possible. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. It was considered a potential problem for three reasons: a ) Large variances and covariances when estimating the regression coefficients b) The instability and wrong sign of regression coefficients. A little perturbing in response variables would lead to the large change of effects estimation or even opposite signs. c) Often confusing and misleading results

Collinearity Problems Detecting multicollinearity Calculate the variance inflation factors (VIF) for each predictor x j :

Collinearity Problems Detecting multicollinearity using JMP Red Triangle (next to Model in the upper-left panel of the data table) Run Script Click the Run Model button Scroll to the Parameter Estimates section Right click on the table Columns VIF

Collinearity Problems Detecting multicollinearity using JMP No VIF is larger than 10, no severe multicollinearity in this design

Design Evaluations: The prediction variance for any factor setting or overall design space is the product of the error variance and a quantity that depends on the design and the factor setting. This ratio, called the relative variance of prediction, can be calculated before acquiring the data. It is ideal for the prediction variance to be small throughout the allowable regions of the factors.

Design Evaluations-Prediction Design Evaluations: Variance Profile Prediction Variance Profile The prediction variance 0.5 is relative to the error variance. For example, if the estimated (prior) variance of experimental error (MSE) is 10, then the prediction variance of y at center value of x 1 (=0) is 10*0.5=5. Control-click on the factor to set a factor level of your choice. You can drag the vertical trace lines to change the factor settings to different points.

Design Evaluations: Prediction Variance Profile Maximum Desirability command on the Prediction Variance Profile title bar identifies the maximum prediction variance (1.321) for the model. Comparing the prediction variance profilers for two designs side-by-side is one way to compare two designs.

Design Evaluations: Fraction of Design Space The Fraction of Design Space (FDS) plot is a way to see how much of the model prediction variance lies above (or below) a given value. The X axis is the proportion or percentage of prediction variance values, ranging from 0 to 100%, and the Y axis is the range of prediction variance values. Note: 90 th quantile prediction variance value is well-suited in a variety of scenarios. Using the crosshair tool shows that 90% of the possible factor settings have a relative predictive variance less than 0.91.

Design Evaluations: The Signal to Noise Ratio (often abbreviated SNR or S/N) is a measure used in science and engineering to quantify how much a signal has been corrupted by noise. It is defined as the ratio of signal power to the noise power corrupting the signal. A ratio higher than 1:1 indicates more signal than noise. In less technical terms, signal-to-noise ratio compares the level of a desired signal (such as music) to the level of background noise. The higher the ratio, the less obtrusive the background noise is. In statistical analysis, signal in SNR is defined as the regression coefficient of model terms, noise is defined as experimental error (Model error) in terms of standard deviation.

Design Evaluations: The Power column shows the power of the design as specified to detect effects of a certain size (SNR) at given significance level. Here, assume the model error std. dev. ( )=2.5, the true coefficient value of 2 for X2 is 2.5, then the SNR=2.5/2.5=1.0, the probability (Power) to identify such effect is 0.402 at significance level 0.05. In JMP, you can change the SNR setting (e.g. Signal to Noise=1 i.e. 2 / =1) and significance level (.05).

Design Evaluations: Design Power of the Design e.x. If we consider the coefficient of Extraction Temp is sharp with respect to the random noise, say 2 is twice ( 2 =5.0) as large as the Noise, i.e. SNR ( =2), we have more power to detect it (0.909 vs 0.402)

Design Evaluations: Significance Level increases, the Power increase. Signal to Noise Ratio increases, the Power increase. Note: If your design turns out to have very low power with even large Signal to Noise ratio settings, then one needs to question whether it is worth running the experiment!

Design Evaluations-Design Diagnostics Design Diagnostics These efficiency measures are single numbers attempting to quantify one design characteristic. While the maximum efficiency is 100 for any criterion, an efficiency of 100% is impossible for many design problems. D: Minimize the joint confidence region for regression coefficients. It is best to use these design measures to compare two competitive designs with the same model and number of runs rather than as some absolute measure of design quality. G: Minimize the maximum scaled prediction variance. A: Minimize the sum variances of all regression coefficients.

Why Augmented Design? Experimentation is an iterative process, we can not assume that one successful screening experiment has optimized our process. Four common reasons for an unsatisfied experiment: The specified model is inadequate. The results predicted from the experiment are not reproducible. Many trials failed. Important conditions, often an optimum, lie outside the experimental region.

Motivation for Augmented Design ----Model Inadequacy The inadequacies of the model may be revealed during the analysis of the data. The investigated relationship may be more complicated than expected. Inclusion of higher-order polynomial terms Transformations of factors or response Detection of lurking variables. Ranges of some factors are wrong*.

Motivation for Augmented Design ----Prediction Failure When a model is believed to be adequate, fails correctly to predict the results of new experiments could be due to the systematic differences between the new and old observations. Lurking variable. Biases introduced into model selection Too parsimonious model

Motivation for Augmented Design ----Failing Trials If many individual trials fail, there may not be sufficient data to estimate the parameters of the model. It s important to find out if there is some technical mishap or whether there is something more fundamentally amiss.

Motivation for Augmented Design ----Optimum Issues To define an optimum of the response or of some performance characteristic, this seems to lie appreciably outside the present experimental region, experimental confirmation of this prediction will be necessary.

Motivation for Augmented Design We now consider the augmentation of a design by the addition of a specified number of new trials. The augmentation includes the need for a higher-order model, a different design region, the introduction of a new factor or deduction of non-important factors. The new design will depend on the trials for which the response is known, although not usually on the values of the responses.

Examples of Augmented Design Example A chemical engineer investigates the effects of six factors on the percent reaction yield of a chemical process. A 2 6-2 fractional factorial design is implemented with 2 additional center runs. The data has shown the significant curvature for some factors, but the collinearity prevents us to identify the determine the correct factor(s). The engineers also would like to know if it is beneficial to increase the amount of catalyst to have a higher yield from current concentration of 0.2M. With 3 slected factors and catalyst concentrations, the augmented design with 8 additional runs is used to fit the response surface model for the four factors.

Augmented Design in JMP Open Augmented design 18 Runs.JMP in JMP Right click red triangle next to Screening Run Script

Augmented Design in JMP The screening analysis indicates that there is curvature effect for X3, but it is aliased completely with all other quadratic factors X1 2 and X4 2.

Augmented Design in JMP Right click red triangle next to Full Factorial Model with X1, X3 and X4 Run Script The lack of fit test indicates that the full factorial model is not adequate.

Augmented Design in JMP Augmented design with different model specification DOE Augment Design Click OK.

Augmented Design in JMP Augmentation Choices Augment You can change the upper and lower limit for new runs, which would give you different prediction region. Note, catalyst now is with upper bound 0.8. Usually, the additional runs would be performed in different day, we may consider the day as blocking effect in the model and check this box.

Augmented Design in JMP Specify RSM Again, design is model dependent, specify the correct model is crucial for a satisfactory experiment, in the augmented design, usually you would change some model terms such as interactions, quadratic effects, etc.

Augmented Design in JMP Choose the optimality criterion to be I- optimal. Number of Starts to be 1000. Random seed to be 1000. Specify the total number of runs including original runs (18+8=26) Click Make Design. Click Make Table.

Augmented Design in JMP New runs grouped in block 2.

Augmented Design in JMP Additional Runs Original Design points For X1, X3 and X4, the original runs are generated on corner and center points, the added runs are generated from axial and facial points.

Augmented Design in JMP Open Augmented design 26 Runs.JMP in JMP Right click red triangle next to Model Run Script The lack of fit test is not significant for RS Model. From the effect tests, X1 2 and X3 2 quadratic effects are significant and catalyst effect is significant at alpha=0.1.

Augmented Design in JMP Right click red triangle next to Reduced Model Run Script The lack of fit test is not significant for RS Model. All model effects are significant at alpha=0.05.

Augmented Design in JMP Right click red triangle next to Prediction Profile Maximize Desirability The predicted yield is maximized at X1=1, X3=1, X4=-1 and Catalyst=0.8 with predicted value 27.25. Confirm it as an additional run.

Design from a Candidate Set Motivations for Designs Based on a Candidate Set What to consider when using Candidate Set Design Candidate Set Design in JMP

What is a Candidate Set? Candidate Set of Design Points - are the total group of possible data points from which the actual design points can be chosen. For example: to construct quantitative structure activity relationship (QSAR) models, which help summarize a supposed relationship between chemical structures and biological activity of chemicals, the chemist may search the chemical compound database to get some candidate compounds. A QSAR has the form as:

Why Design Based on a Candidate Set We may not have full control of experimental factors and are limited to choice of some factor combinations The design space is complex with irregular factor settings and complicated non-linear factor constraints. As the term suggests, Candidate Set Design help us pick the best design points from the candidate set with respect to some criteria.

Design from a Candidate Set Example Mitchell (1974a): An animal scientist wants to compare wildlife densities in four different habitats over a year. However, due to the cost of experimentation, only 12 observations can be made. The following model is postulated for the density in habitat during month : This model includes the habitat as a classification variable, the effect of time with an overall linear drift term, and cyclic behavior in the form of a Fourier series. There is no intercept term in the model. Note, there are 12 Months and 4 habitats, we can create a candidate set with 48 points.

Design from a Candidate Set Open Mitchell.csv in JMP Data set contains the 48 candidate points and includes the four cosine variables (c1, c2, c3, and c4) and three sine variables (s1, s2, ands3).

Design from a Candidate Set JMP will not do randomization with Candidate Set Design, do it manually! Due to the limitation of design space, we may end up with design with severe collinearity Look at Variation Influence Factors (VIF) to assess the lack of orthogonality

Design from a Candidate Set Be sure to assess the quality of the design (e.g., FDS plots, statistical power relative variance of coefficients) Check your final design space to diagnose possible problems Remember that each design point in the candidate set can only be selected once

Candidate Set Design in JMP Change the data type of some factors in JMP file Right click on Habitat column Column Information Data Type Character OK You need specify the correct data type in this step since we can t change this in custom design.

Candidate Set Design in JMP DOE Custom Design Add Factor Covariate Have to select one factor at a time in JMP 9. JMP will treat Habitat as original categorical data type in candidate set file

Candidate Set Design in JMP Model Specification

Candidate Set Design in JMP Custom Design Optimality Criterion Make D-Optimal Design Set seed to be 193030034. Set number of start to be 100. Specify 12 in Number of Runs Click Make Design Click Make Table

Candidate Set Design in JMP Candidate set design and evaluations Non-randomized Note for this design, we actually can not do randomization due to the property of Month factor. Let s assume it can be randomized for illustration purpose.

Candidate Set Design in JMP To do randomization and VIFs calculation manually in JMP you need response data Right click on Y column Formula Scroll in the Functions box, choose Random Random Uniform, then click the OK button Right click on Y column Sort Red Triangle (next to Model in the upper-left panel of the data table) Run Script Check No intercept Click the Run Model button Scroll to the Parameter Estimates section Right click on the table Columns VIF

Candidate Set Design in JMP Run the experiments according to this randomized order. VIFs <10 No severe multicollinearity problem

More capabilities of JMP 1. Split-Plot/Split-Split-Plot Design 2. Saturated and Supersaturated Design 3. Mixture Design 4. Choice DesignSpace 5. Non-linear Design 6. Filling Design / Design for Computer Experiments

Workshop Scenario 1: Suppose an experimenter is investigating the properties of a particular adhesive. is the amount of adhesive, is the cure temperature. The prior knowledge is: If too little adhesive and too low cure temperature, the parts will not bond.. If both factors are at high levels, the parts will be either damaged by heat stress or an inadequate bond will result. The model of interest is a non-standard model, i.e., Also due to the budget constraint, only 30 runs can be offered.

Workshop Scenario 2: Meyer, et al. (1996), demonstrates how to use the augment designer in JMP to resolve ambiguities left by a screening design. In this study, a chemical engineer investigates the effects of five factors on the percent reaction of a chemical process. To begin, open Reactor 8 Runs.jmp, and augment this design with additional 8 runs to incorporate All Two-Factor Interactions. Set seed to be 12834729. After you create the design, you can open Reactor Augment Data.jmp to do the variable selection using stepwise regression. Note: Choose P-value Threshold from the Stopping Rule menu, Mixed from the Direction menu, and make sure Prob to Enter is 0.050 and Prob to Leave is 0.100. These are not the default values.

Workshop Scenario 3: An automotive engineer wants to fit a quadratic (Response Surface) model to fuel consumption data in order to find the values of the control variables that minimize fuel consumption (refer to Vance 1986). The three control variables AFR (air fuel ratio), EGR (exhaust gas recirculation), and SA(spark advance) and their possible settings are shown in the following table: Variable Values AFR 15 16 17 18 EGR 0.020 0.177 0.377 0.566 0.921 1.117 SA 10 16 22 28 34 40 46 52 Rather than run all 192 (4 6 8) combinations of these factors (saved as candidate set workshop 3.csv), the engineer would like to see whether the total number of runs can be reduced to 50 in an optimal fashion.