Process Capability Calculations with Extremely Non-Normal Data

Process Capability Calculations with Extremely Non-Normal Data copyright 2015 (all rights reserved), by: John N. Zorich Jr., MS, CQE Statistical Consultant & Trainer home office: Houston TX 408-203-8811 (cell) www.johnzorich.com JOHNZORICH@YAHOO.COM Reliability Plotting One of the best ways to determine process capability with extremely non-normal data is to use Reliability Plotting. "Reliability" is the % of the lot that meets specification. Reliability plotting involves analyzing the data graphically using the methods and formulas of linear regression. Sample Sizes The output of reliability plotting is a lower 1-sided confidence limit on the observed %-in-specification, at the confidence level used in the calculation. Any sample size is valid, because the smaller the sample size, the farther away from the observed %-inspec is the %-in-spec that you can claim. 90% 100% Confidence limit based upon small sample Confidence limit base upon large sample % reliability observed in sample Houston Regional Quality Conference, Nov. 13, 2015 Copyright 2015 by John Zorich 1

An incorrect distribution assumption can make huge differences in claimed % in-specification. What can be done if raw data cannot be transformed to Normality? QC Spec is 1.00 0 1.00 2.00 Charts copied from http://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm 10 devices put on-test: a geometric progression: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 30 devices put on-test: only 9 life-failures @ hours: 367, 422, 476, 508, 552, 589, 642, 683, 738 10 devices put on-test: tensile failures: 0.35, 0.35, 0.35, 0.35, 0.68, 1.22, 1.56, 1.91, 2.17, 2.18 60 sterile-barrier pouches put on-test: burst-pressure results show two distributions Reliability Plotting is one of the best ways to calculate process capability with data such as on previous slide. Reliability Plotting requires a transformation that gives a straight line (by Linear Regression) that can be extrapolated to the Specification value. % 1 Reliability Specification DATA Extrapolating a curved line is never discussed in reliability textbooks. 95% Confidence limit Transformed Specification Transformed Confidence limit on extrapolated point can be calculated by a standard linear regression formula. SEE NEXT SLIDE, FOR REFERENCE AFTER SEMINAR Houston Regional Quality Conference, Nov. 13, 2015 Copyright 2015 by John Zorich 2

Formula for calculating the 1-sided confidence limit on the mean Y-value at a single X-value on a linear regression line (i.e., the transformed Y-value for hollow triangle on previous slide) (note: The generalized linear regression equation is... Yei = a + b Xi ) = YSL +/ t x See x [(1 / N )+(( XSL Xavg )^2) / (Sum(( Xi Xavg)^2 ))]^0.5 where... YSL = transformed Y-axis value corresponding to the Specification Limit (i.e., transformed Y-value for solid triangle on the chart on previous slide) +/ = use + if out-of-specification is below the spec limit; otherwise use t = one-side, t-table value at alpha = 1 Confidence, and df = N 2 using Excel: t = TINV ( 2 * (1 Confidence), ( N 2 ) ) See = Std Error of Estimate = [ ( Sum ( ( Yei Yi ) ^2 ) ) / ( N 2 ) ] ^0.5 Yi = transformed plotted Y-axis value corresponding to a plotted Xi N = number of X,Y points plotted on the chart (not the same as sample size) XSL = transformed Specification Limit Xavg = average of the transformed X values of the plotted X,Y points (this is not the same as the average of the transformed raw data) (do not include the specification limit in this average) Xi = each of the transformed X values of the plotted X,Y points "Sum" here means to add up each (Xi Xavg)^2, from i = 1 thru i = N Decimal %Cumulative vs. Median Rank Decimal %cumulative = ( Rank ) / ( SampleSize ) where rank" of the lowest value in the data set = 1, next lowest value = 2, next lowest = 3, and so on, and Sample Size = total # of devices or parts being tested. Median Rank can be calculate by either... = ( Rank 0.3 ) / ( SampleSize + 0.4 ) = BETAINV ( 0.5, Rank, SampleSize Rank + 1 ) Those 2 median-rank formulas give the same result to MANY digits. On Y-axis, plot either... decimal %cumulative, or median rank. Reliability Plotting: EXACT vs. INTERVAL Rank 1 2 3 4 5 6 7 8 9 10 RAW DATA CUMULATIVE MEDIAN ( SORTED ) PERCENT DECIMAL RANK 78.2 10% 0.10 0.067 87.3 20% 0.20 0.163 89.9 30% 0.30 0.260 94.8 40% 0.40 0.356 99.7 50% 0.50 0.452 103.5 60% 0.60 0.548 108.9 70% 0.70 0.644 112.1 80% 0.80 0.740 118.6 90% 0.90 0.837 124.8 ( cannot plot 100%!! ) 0.933 Exact data are the individual measurement values obtained during a study. It is usually better to pool identical values into groups ("intervals"), the way it is done for a histogram. VERY IMPORTANT: Plot Median Rank on Y-axis for exact data. Plot decimal %cumulative for interval data. 26 Houston Regional Quality Conference, Nov. 13, 2015 Copyright 2015 by John Zorich 3

Examples of useful X-axis Transformations ( using Excel Formulas ) In the formulas below, A, B, C, D, and E are constants (negative or positive, whole numbers or fractional) chosen to help linearize the reliability plot. = ASINH ( SQRT ( X ) ) = SQRT ( X + A ) = ( ( X ^ B ) 1 ) / B (Box-Cox Transformation) = ASINH ( X / C ) (from Johnson SU Transformation) = LN ( X + D ) (from Johnson SL Transformation) = 1 / ( X + E ) Examples of useful Y-axis Transformations ( using Excel Formulas ) F = Median Rank (exact) or Decimal%Cum (interval) C = the user-chosen shape parameter constant ** = other parameters set to a value of 1.000 = NORMSINV ( F ) ( = Normal ) = NORMSINV [ 1 (1 F ) ^ ( 1 / C ) ] ** = EXP ( NORMSINV [ 1 (1 F ) ^ ( 1 / C ) ] ) ** = LN ( LN ( 1 / ( 1 F ) ) ) ( = SEV ) = LN ( 1 / ( 1 F ) ) ^ ( 1 / C ) ** ( = Weibull ) = LN ( F / ( 1 F ) ) ( = Logistic ) = LN ( 1 / ( LN ( 1 / F ) ) ) ( = LEV ) = TAN ( PI() * ( F 0.5 ) ) ( = Cauchy ) 20 10 devices put on-test; results have a geometric progression: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 30 units on-test; 9 life-failures occurred at hours: 367, 422, 476, 508, 552, 589, 642, 683, 738 Largest Extreme Value distribution yields 99.8% reliable at 95% confidence vs. spec of 0.100 16 Data is right censored ; that is, the 21 data points on the right side of the time line (i.e., greater than 738 hours) are not known. But sample size = 30 is used for calculation of y-axis median-rank values. Power- LogNormal distribution yields 99.8% reliable at 95% confidence vs. spec of 300 Houston Regional Quality Conference, Nov. 13, 2015 Copyright 2015 by John Zorich 4

10 devices put on-test; tensile failures were 0.35, 0.35, 0.35, 0.35, 0.68, 1.22, 1.56, 1.91, 2.17, 2.18 Leftmost dot represents the four 0.35 values (%cumulative = 40%) that are left-censored. Weibull distribution yields 62% reliable at 95% confidence vs. spec of 0.100 The 2.18 value is right-censored, i.e. not plotted, which makes it easier to find the transformation to linearity for this interval data. 16 60 sterile-barrier pouches put on-test; Failure (burst under pressure) occurred either at one of the 3 seams created by the pouch manufacturer, or at the seam created by purchaser s testing personnel. Sample shows 2 separate distributions: QC-spec here user s seal Mfg s seal 16 Burst strength (actual data) of very large, sterile-barrier pouches 60 pouches tested; the minimum spec was 0.40 psi. The (sorted) raw data was... 0.48 0.55 0.60 0.62 0.68 0.72 0.48 0.56 0.61 0.63 0.68 0.73 0.49 0.56 0.61 0.64 0.68 0.75 0.51 0.56 0.61 0.64 0.68 0.76 0.52 0.56 0.61 0.64 0.69 0.83 0.53 0.57 0.61 0.65 0.69 0.86 0.53 0.58 0.61 0.66 0.69 0.86 0.53 0.59 0.61 0.66 0.71 0.99 0.53 0.59 0.62 0.66 0.72 1.02 0.55 0.59 0.62 0.67 0.72 1.22 Burst strength Many replicate values, & therefore should (usually) plot as interval data rather than exact data. 0.48 0.55 0.60 0.62 0.68 0.72 0.48 0.56 0.61 0.63 0.68 0.73 0.49 0.56 0.61 0.64 0.68 0.75 0.51 0.56 0.61 0.64 0.68 0.76 0.52 0.56 0.61 0.64 0.69 0.83 0.53 0.57 0.61 0.65 0.69 0.86 0.53 0.58 0.61 0.66 0.69 0.86 0.53 0.59 0.61 0.66 0.71 0.99 0.53 0.59 0.62 0.66 0.72 1.02 0.55 0.59 0.62 0.67 0.72 1.22 Houston Regional Quality Conference, Nov. 13, 2015 Copyright 2015 by John Zorich 5

How to convert the ( n = 60 ) data, with its many exact but replicate values, into interval data: 0.48 0.55 0.60 0.48 0.56 0.61 0.49 0.56 0.61 0.51 0.56 0.61 0.52 0.56 0.61 0.53 0.57 0.61 0.53 0.58 0.61 0.53 0.59 0.61 0.53 0.59 0.62 0.55 0.59 0.62 Do NOT plot zero values; e.g., do not plot 54 -- 0 48 -- 2 = 2/60 = 0.033 49 -- 1 = 3/60 = 0.050 51 -- 1 = 4/60 = 0.067 52 -- 1 = 5/60 = 0.083 53 -- 4 = 9/60 = 0.150 55 -- 1 56 -- 4 57 -- 1 58 -- 1 59 -- 3 60 -- 1 61 -- 7 etc. Burst strength for n=60 pouches On a basic cumulative plot ( i.e., no transformations of either axis ) the data set comprises 2 different populations: Right-censoring is valid only if testing vs. a lower spec limit! Must "censor" these data, so that they do not interfere with identifying best transformation. Burst strength for n=60 pouches If convert from "exact" to "interval" and censor values above 0.8 psi, then results are... 99.9 %Reliability at 95% confidence using... Log(X) vs Z(%cum) ( i.e., LogNormal ) When analyzing reliability vs. upper specification limit, typically cannot use interval method because get nonsense results (> 100% reliability!), therefore must use exact method: Transformed Cumulative Decimal Fraction.. Transformed Cumulative Decimal Fraction.. 180% 160% 140% 120% 100% 80% 60% 40% 20% 0% 4 3 2 1 0-1 -2-3 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 Transformed Treatment Values 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 INTERVAL yields 140% reliability, which is nonsense. EXACT yields 99.9% reliability. Transformed Treatment Values Houston Regional Quality Conference, Nov. 13, 2015 Copyright 2015 by John Zorich 6

Reliability vs. 2-sided specification limit : calculate each one separately, using the same confidence level and transformation in each case, then pool the information like so: % out-ofspec vs. 100% lower spec % out-ofspec vs. upper spec For example: if reliability vs. lower spec is 97% and vs. upper is 96%, then overall reliability... = 100% (100% 97%) ( 100% 96%) = 100 3 4 = 93% reliability at 95% confidence Reliability Plotting vs. 2-sided specification limits : 98% reliable at Upper Spec Limit Therefore we are 95% confident that the population is 93% in-spec (100% 3% 4% = 93%) 97% reliable (= 3% out-of-spec) at upper 95% confidence limit 96% reliable (= 4% outof-spec) at lower 95% confidence limit 99% reliable at Lower Spec Limit Linearity of a Reliability Plot must be determined visually, NOT by trusting the correlation coefficient. 1000 0.955 800 0.955 600 0.955 400 200 Correlation Coefficients 0 0 5 10 15 20 Examples of Y-axis Reverse Transformations ( using Excel Formulas ) To determine the % Reliability, the y-axis confidence limit must be reverse transformed. For example... If Smallest Extreme Value (= SEV) is used, & the upper 95% confidence limit on an extrapolated lower specification is 2.820, then... 2.820 = LN ( LN ( 1 / ( 1 F ) ) ) F = 1 1 / EXP(EXP( 2.820)) = 0.05876 = 5.786% %Reliability = 100% 5.786% = 94.214% at 95% confidence 20 Houston Regional Quality Conference, Nov. 13, 2015 Copyright 2015 by John Zorich 7

In summary... Reliability Plotting requires the user to make decisions about... what transformations yield best straight line what data needs to be converted to intervals what data needs to be censored Spreadsheets can be used to perform reliability plotting; or high-end statistics programs. Biggest challenge is to explain the method to auditors (in the speaker s 15 years of experience using reliability plotting, such auditors are very receptive to this method, or very intimidated by it). There is no one textbook that discusses all of the Reliability Plotting topics that were demonstrated in this seminar, but the following textbooks discuss various individual aspects of it: Dovich: Quality Engineering Statistics Dovich: Reliability Statistics Juran: Juran s Quality Handbook Natrella: Experimental Statistics NIST Engineering Statistics Internet Handbook, found at http://www.itl.nist.gov/div898/handbook/index.htm Tobias & Trindade, Applied Reliability This has an entire chapter on reliability plotting. To receive some relevant computer files, including a spreadsheet that demonstrates Reliability Plotting, send an email request to... JOHNZORICH@YAHOO.COM or visit: www.johnzorich.com Houston Regional Quality Conference, Nov. 13, 2015 Copyright 2015 by John Zorich 8