Machine Learning: Overview & Applications to Test

Machine Learning: Overview & Applications to Test 1st Lt Takayuki Iguchi 1st Lt Megan E. Lewis AFOTEC/Det 5/DTS Release Date: 6 MAR 17 Approved for Public Release: Distribution Unlimited. AFOTEC Public Affairs Public Release Number 2017-01 1

Why use Machine Learning in test? It takes more time to analyze large high dimensional data than it does to collect it Video Audio BUS data Machine learning is designed to work with large high dimensional data 2

Visualizing Large High Dimensional Data 3

Visualizing Large High Dimensional Data 4

Visualizing Large High Dimensional Data 5

Visualizing Large High Dimensional Data 6

Visualizing Large High Dimensional Data 7

Visualizing Large High Dimensional Data 8

What is Machine Learning? A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. The field of study that gives computers the ability to learn without being explicitly programmed. 9

Types of Machine Learning Reinforcement Learning Learn to select action that maximize the accumulated reward over time. Unsupervised Learning Infer a function from unlabeled training data Supervised Learning Infer a function from labeled training data 10

Types of ML Unsupervised Learning These things are similar These things will add up to something that will look like a 2. (van der Maaten [2008]) (Hinton [2013]) 11

Types of Machine Learning Supervised Learning This is the correct salary of a professor given the time since highest degree earned. These are a camels. Those are a people. Include graph of a simple linear regression (Weisberg [1985]) (ImageNet [2014]) 12

Unsupervised Learning Tasks Anomaly detection / outlier detection Dimensionality Reduction Manifold Learning Clustering 13

Anomaly Detection As instrumentation has improved, often the limiting factor isn t that there isn t enough data but that there is too much data With flight test there is often a small time window between sorties. Thus quick cursory data analysis is desired but not currently possible Anomaly Detection methods can help identify otherwise hidden issues (not detected by aircrew) before they manifest into larger issues 14

Anomaly Detection Perform problem ID with logged & uncontrollable factors Variety of different algorithms and methodologies Choice of algorithm & methodology is dependent on application and nature of data 15

Dimensionality Reduction The goal: Take a high dimensional dataset and find a good representation in a lower dimension (e.g., 2-D). Signal decomposition methods: Principal Component Analysis (PCA) Kernel PCA Factor Analysis Non-negative matrix factorization Manifold learning: Isomap Locally linear embedding (LLE) Spectral embedding Multi-dimensional scaling (MDS) t-stochastic Neighbor Embedding (t-sne) 16

t-stochastic Neighbor Embedding PCA is variance based. If the structure in the high dimensional space lies on a non-linear manifold, PCA will not work well. (Vanderplas, scikit-learn [2016]) 17

t-stochastic Neighbor Embedding (van der Maaten [2016]) 18

t-stochastic Neighbor Embedding (van der Maaten [2016]) 19

t-stochastic Neighbor Embedding (van der Maaten [2016]) 20

t-stochastic Neighbor Embedding (van der Maaten [2016]) 21

t-stochastic neighbor embedding 0 1 2 3 4 5 6 7 8 9 t-sne Sammon mapping Isomap Locally Linear Embedding (van der Maaten [2008]) 22

Clustering The goal: Partition a dataset to maximize similarity within each partition. Connectivity-based / hierarchical clustering Single Linkage Clustering (SLINK) Centroid based clustering k-means++ k-medians Density based clustering Density-based spatial clustering of applications with noise (DBSCAN) Distribution based clustering Gaussian Mixture Models 23

k-means Randomly draw cluster centroids Until clustering remains unchanged Assign points to nearest centroid Calculate new centroids Output Clustering 24

k-means Randomly draw cluster centroids Until clustering remains unchanged Assign points to nearest centroid Calculate new centroids Output Clustering 25

k-means Randomly draw cluster centroids Until clustering remains unchanged Assign points to nearest centroid Calculate new centroids Output Clustering 26

k-means Randomly draw cluster centroids Until clustering remains unchanged Assign points to nearest centroid Calculate new centroids Output Clustering 27

k-means Randomly draw cluster centroids Until clustering remains unchanged Assign points to nearest centroid Calculate new centroids Output Clustering 28

k-means Randomly draw cluster centroids Until clustering remains unchanged Assign points to nearest centroid Calculate new centroids Output Clustering 29

k-means Randomly draw cluster centroids Until clustering remains unchanged Assign points to nearest centroid Calculate new centroids Output Clustering 30

k-means Randomly draw cluster centroids Until clustering remains unchanged Assign points to nearest centroid Calculate new centroids Output Clustering 31

k-means Randomly draw cluster centroids Until clustering remains unchanged Assign points to nearest centroid Calculate new centroids Output Clustering 32

Supervised Learning Tasks Classification Output is discrete Speech Recognition Image Classification Regression Output is continuous 33

Neural Networks A Neuron a 0 = 1 Mathematical model for a neuron a i w ij w 0j Σ g a j (Russell, Norvig [2010]) 34

Perceptrons All inputs connected to outputs x 1 w 1 Error function: x 2 w 2 y E = 1 2 2 t y x 3 w 3 Update weights with each training case: Output unit is a threshold unit Δw i = ε y threshold(w x ) x i Output unit is a logistic unit Δw i = ε E w i = εσ w x (y σ w x )(1 σ w x )x i (Russell, Norvig [2010]) 35

Multi-layer Perceptron Better performing than a single-layer feed forward neural network Trained with backpropagation x 1 w 1 w 2 h 1 w 7 x 2 w 3 w 4 y x 3 w 5 h 2 w 8 w 6 36

Image Text Recognition Over-the-shoulder videos are common data sources Cheap to implement Processing is time intensive ANNs can help (Karpathy [2015]) (Shi, et. al. [2016]) 37

Convolutional Neural Network Typically used for image classification RGB Images can be thought of as a 3d matrix Fully connected hidden layers too many weights The forward pass: pass a filter over the image (Hinton [2013]) 38

Convolutional Neural Network (Karpathy [2016]) 39

Convolutional Neural Network (Karpathy [2016]) 40

Convolutional Neural Network (Karpathy [2016]) 41

Convolutional Neural Network (Karpathy [2016]) 42

Convolutional Neural Network (Karpathy [2016]) 43

Convolutional Neural Network (Karpathy [2016]) 44

Convolutional Neural Network (Karpathy [2016]) 45

Convolutional Neural Network (Karpathy [2016]) 46

Convolutional Neural Network (Karpathy [2016]) 47

Hyperspectral Classification Per-pixel classification from hyperspectral data Data from https://engineering.purdue.edu/biehl/multispec/hyperspectral.html 48

CNNs on MNIST Misclassifications of LeNet5 (LeCun [1998]) 49

Recurrent Neural Networks Directed cycles in their connection graph MLPs and CNNs require fixed sized input Used to model sequential data Hard to Train Output Layer Hidden Layer Input Layer t 1 t 2 t 3 t 4 t 5 t 6 50

Recurrent Neural Networks (Karpathy [2015]) 51

Image Text Recognition Different ways of thinking about the problem Long Short-Term Memory Layer most recently used (Karpathy [2015]) (Shi, et. al. [2016]) 52

Audio-speech Recognition Traditional Speech Models Phenomes Words Speech waveform Acoustic Model Pronunciation Model Language Model Sentence (kaˈfā) (cafe) argmax X P W X = argmax W,L P X L P L W P(W) (Beaufays [2016]) 53

Other Acoustic Models Other DNN based approaches to acoustic modeling (Beaufays [2016]) Method Year DBN 2012 Long Short Term Memory (LSTM) 2013 Convolutional LSTM DNN 2014 Connectionist Temporal Classification (CTC) 2015 54

Summary of Applications Audio to text Transcribe in-flight audio/ conversations Transcribe survey conversations Easily slew to audio of interest Image captioning Write text in image to a text file In-flight data Object recognition in images Help label truth data when testing sensors Video is just adding a time dimension to images Techniques from images may be applied to video 55

Next Steps Low hanging fruit Use already existing open source text recognition in images/video OpenCV Use free audio transcription software TensorFlow (Google) SwiftScribe (Baidu) 56

Next Steps Open areas for development: Transcribing acronyms Using Machine Learning on bus data to tell a maintainer of a certain risk. ATC radar more accurately narrowing down location of a/c in real time (Hrastovec et. al. [2014]) Identifying early indications of airframe stress and strain (Hickinbotham et. al. [2000]) 57

Acknowledgements Workshop organizers AFOTEC Det 5 leadership Mr. Jeff Wilson Capt Joshua Vaughan 58

References Hinton, Geoffrey. Artificial Neural Networks. Coursera. (2013) ImageNet (2014). http://www.image-net.org/challenges/lsvrc/2014/ui/det.html Karpathy, Andrej et. al. CS321n online course notes: http://cs231n.stanford.edu/ Karpathy, Andrej RNN github page (2015): http://karpathy.github.io/2015/05/21/rnneffectiveness/ LeCun, Yann Gradient-Based Learning Applied to document Recognition MATLAB documentation (2017): https://www.mathworks.com/discovery/supportvector-machine.html van der Maaten t-sne github page (2016): https://lvdmaaten.github.io/tsne/ van der Maaten, Hinton. Visualizing Data using t-sne JMLR 2008 Russell, Norvig. Artificial Intelligence: A Modern Approach. 3 rd Ed. 2010. New Jersey: Pearson. scikit-learn documentation (2016). http://scikit-learn.org/stable/documentation.html S. Weisberg (1985). Applied Linear Regression, Second Edition. New York: John Wiley and Sons. Wolberg, W.H., & Mangasarian, O.L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the National Academy of Sciences, 87, 9193--9196. 59

References van der Maaten, Hinton. Visualizing Data using t-sne JMLR (2008). Shi, Baoguang, Xiang Bai, and Cong Yao. "An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence (2016). Ghamisi, Pedram, et al. "Advanced Supervised Spectral Classifiers for Hyperspectral Images: A Review." IEEE Geoscience and Remote Sensing Magazine (GRSM) (2017). Dahl, George E., Dong, Deng, Acero. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing (2012). Gupta, Manish, et. al. Outlier Detection for Temporal Data: A Survey. IEE Transactions on Knowledge and Data Engineering (2014). Beaufays, Francoise. Speech Recognition Google I/O (2016). Yoon, Seunghyun, et al. "Efficient Transfer Learning Schemes for Personalized Language Modeling using Recurrent Neural Network." arxiv preprint arxiv:1701.03578 (2017). 60

Questions? 61