Complexity Challenges to the Discovery of Relationships in Eddy Current Non-destructive Test Data

Size: px

Start display at page:

Download "Complexity Challenges to the Discovery of Relationships in Eddy Current Non-destructive Test Data"

Henry Mason
5 years ago
Views:

1 Complexity Challenges to the Discovery of Relationships in Eddy Current Non-destructive Test Data CPT John R. Brence United States Military Academy Donald E. Brown, PhD University of Virginia

2 Outline Background Information Eddy Current Non-destructive Tests (NDT) Approach Algorithms Results Interpretation of Results Conclusions Future Research Questions 2

3 Background 1 of 2 Many commercial & military aircraft reached or exceeded original design life USAF aircraft years old KC-135 (40 yrs. old) extended for 25 years Civilian airlines Boeing 727 family Introduced in 60 s Corrosion is a serious threat especially to older aircraft Significant increase in maintenance costs Increasing concern about structural integrity 3

4 Background Aloha Flight of 2 4

5 Relationship Hypothesis 1 of 2 Relationship between calibration specimen results & classifying corrosion on KC-135 parts Current artificial corrosion processes show similar characteristics as those found in naturally corroded lap joints 5

6 Relationship Hypothesis 2 of 2 Natural Corrosion (KC-135 Specimen) Artificial Corrosion (Calibration Specimen) 6

Eddy Current Non-destructive Tests Problems with current visual representation Requires considerable expertise to create and interpret Need

7 Eddy Current Non-destructive Tests Problems with current visual representation Requires considerable expertise to create and interpret Need for visual clarity leads to data generalization, averaging, or overlooked points ~ accuracy? Missed corrosion may cause a catastrophic accident 7

8 Data acquisition Approach Outline Data transformation & consistency Model development & feature selection Model training & testing Model Evaluation Model Selection 8

9 Data acquisition Approach Graphic Data transformation Consistent Data Feature selection No Yes Model development Data mining Iterate Variable transform Model testing & evaluation Model selection 9

10 Approach Data Acquisition Institute of Aerospace Research Calibration specimens Retired KC-135 specimens Data n m nm Induced Voltage measurements from multi-frequency scans Calibration specimen E1 1 Scan Direction 10

11 Approach Data Transformation & Consistency 1 of 5 Eddy Current Specimen E1 data 4 different scan frequencies are the 4 predictor variables (5.5 khz, 8 khz, 17 khz, & 30 khz) Merged 4 scan frequency files into one file 11

12 Approach Data Transformation & Consistency 2 of khz 8 khz 17 khz 30 khz 12

13 Approach Data Transformation & Consistency 3 of 5 Results from PicView Program Starred areas show which picture is used to model specific loss area. Image60 Image25 Image20 Finagle 13

14 Approach Data Transformation & Consistency 4 of 5 Specimen E1: EC Bitmap data set Specimen E1: EC data set format 10% 7.5% 5% 0% 17.5% 15% 12.5% 10% 12.5% 40% 35% 30% 20% 45% 40% 7.5% 15% 45% 50% 27.5% 22.5% 50% 35% 5% 17.5% 20% 22.5% 25% 25% 27.5% 30% 0% 14

15 Approach Data Transformation & Consistency 5 of 5 Eddy Current Sensitivity to Milled Thickness Loss Specimen E1, Eddy Current Scan Data Mapping Validation khz 8 khz 17 khz 30 khz average voltage response (V) khz (k5) 8 khz (k8) 17 khz (k17) 30 khz (k30) percent material removed Graph from original study percent material loss (%) Graph resulting from data transformation 15

16 Approach Model Development & Feature Selection Eddy Current Four predictors and one response variable Looked at histograms of variables to categorize the observation s distribution Used scaling and transformations of predictors Feature Selection (E.G., regression) Stepwise, Forward, Backward selection Maximum R 2 adj, Mallows CP 16

17 Approach Model Training & Testing Calibration specimen data used for training ~ Eddy Current specimen E1 Training and Test data configuration (in general) 75% training (120,456 observations) 25% test (40,152 observations) 17

18 Approach Model Training Evaluation Akaike Information Criterion Schwartz Criterion Coefficient of multiple determination (R 2 ) Adjusted R 2 Mallows Cp Mean Absolute Error Mean Squared Error 18

19 Approach Model Selection Selection of best modeling methodology based on root mean squared error calculation on test set MSE TESTSET = 1 N N j= 1 ( ) 2 y y j 19

20 Multiple Regression Algorithms 1 of 4 Y i = β X β i X β 0 + 1,1 + 2 i, p 1X i, p 1 β + ε i Considered polynomial, interaction, and transformed terms 20

21 Algorithms 2 of 4 Regression Trees Least Squares example YES Y Mean Value = 15 Std dev = Is k NO Y Mean Value = 12.5 Std dev = Y Mean Value = 30 Std dev = Is k Is k YES NO YES NO Y Mean Value = 15 Std dev = Y Mean Value = 45 Std dev = Y Mean Value = 35 Std dev = Y Mean Value = 22.5 Std dev = YES Is k NO Y Mean Value = 7.5 Std dev = Y Mean Value = 5 Std dev =

22 Polynomial Networks Normalizers 1 st Layer Algorithms 3 of 4 Input A Single 2 d Layer Doublet Input B 3 d Layer Triplet Output Input C Unitizers Doublet Input D 22

23 Algorithms 4 of 4 Ordinal Logistic Regression P(Y j) 1 P(Y 3) P(Y 1) 0 P(Y 2) Predictor Value(s) 23

24 Results Multiple Regression 1 of 7 The more complex the model, the better it did with both training and test datasets Best model incorporated transformed 4 th order polynomial and interaction terms Problem ~ Heteroscedasticity (non-constant variance) 24

25 Results Regression Trees 2 of 7 Program limitations for data size 60,000 observation training dataset 60,000 observation test dataset Least squares tree tested 2611 trees Least absolute deviation tested 172 trees 25

26 Results Regression Trees 3 of 7 Least absolute deviation regression tree was the best Fewer nodes 819 vs Smaller Complexity value: 1.0 vs 37.6 Smaller Root MSE for test set 26

27 Results Regression Trees 4 of 7 Least Squares Regression Tree Least Absolute Deviation Regression Tree 27

28 Results Polynomial Networks 5 of 7 28

29 Results Ordinal Logistic Regression 6 of 7 The more complex the model, the better it did with both training and test datasets Best model incorporated transformed 4 th order polynomial and interaction terms 29

30 Results Model Selection 7 of 7 Overall Model Comparision by Test Set Model Multiple Regression Model 8 Logistic Regression Model 8 Polynomial Network LAD Regression Tree RT MSE VAR

31 Interpretation of Results More Complex = Better Model 1 of 3 x 1 Stitching Effect Dataset requires complex parametric models Or Non-parametric models Y=0 Y=1 Y=2 Y=4 x 2 31

32 Interpretation of Results More Complex = Better Model 2 of 3 STITCHING EFFECT 32

33 Interpretation of Results Why LAD RT does well 3 of 3 LAD regression tree does well because: Robust in presence of heteroscedasticity Partitioning provides for improved accuracy Uses stitching to capture nonlinearities 33

34 Conclusions 1 of 2 Maintenance Operations Showed that an algorithm can be developed to assist operators in maintenance decisions Showed how to transform and clean the eddy current data for analysis Provided a basis for choosing among competing algorithms for actual implementation 34

35 Methodological Conclusions 2 of 2 Provided a formal approach for comparing different data mining techniques on real corrosion data Showed that real data sets can produce highly complex relationships (contrast with Ockham s razor) and that models can be found to handle these complexities Demonstrated the power of tree-based methods to treat nonlinearities in the data through stitching, which was formerly thought to be a disadvantage 35

36 Classification algorithms Future Research If time-lapse available ~ time series analysis Spatial models (correlation between corrosion areas) Other non-parametric techniques Application of a known naturally corroded specimen as test dataset 36

37 NASA s Vomit Comet Questions?

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always