Introduction to Massive Data Interpretation

Size: px

Start display at page:

Download "Introduction to Massive Data Interpretation"

Gertrude Patrick
5 years ago
Views:

1 Introduction to Massive Data Interpretation JERKER HAMMARBERG JAKOB FREDSLUND THE ALEXANDRA INSTITUTE 2013

2 2/12 Introduction Cases C1. Bird Vocalization Recognition C2. Body Movement Classification C3. Repeated Body Movement Recognition C4. Optical Surface Contamination Classification C5. Optical Friction Measurement C6. Anxiety Recognition C7. Soil Parameter Estimation Signal Processing Linear Filters Noise Cancellation Fast Fourier Transform (FFT) Feature Extraction Machine Learning Principal Component Analysis (PCA) Neural Networks Support Vector Machines (SVM) K-Means Clustering Hidden Markov Models (HMM) Introduction This document briefly presents the most commonly used techniques for signal processing and machine learning - in general, techniques to be used for extracting information from and interpreting large amounts of data from sensors or databases. The acquirement of knowledge and experience with these techniques are part of the Alexandra Institute s commitment to provide services within this field in the context of pervasive computing. The document starts with a number of motivating cases with forward references to the techniques that might be relevant for each case. Thereafter, the techniques are briefly described with regard to what they are used for and how they work in an intuitive and rather non-technical fashion. Cases This section presents seven real-world cases for massive data interpretation. They are all taken from projects involving the Alexandra Institute and Danish companies in C1. Bird Vocalization Recognition Description: A smartphone application that can recognize birds from their songs and calls. The application should be usable outdoors, where the user can record the bird vocalization and immediately obtain information about the recognized bird. While recording and feature extraction are performed on the mobile device, the actual classification takes place on a server, much like distributed speech recognition. The classifier can also be used for other applications, such as automatic monitoring of the presence of birds at a fixed site over long periods of time. Techniques: Fast Fourier transform, Noise cancellation, Feature extraction, Principal component analysis, Support Vector Machines, Hidden Markov models.

3 3/12 C2. Body Movement Classification Description: A garment that gives feedback based on body movements, to be used by children for training or rehabilitation. Such a garment can aid children in rehabilitation of for example paresis by providing motivation to use the impaired limbs in daily life, and not only during training sessions. For this reason, the garment needs to recognize appropriate and inappropriate body movements in real-time from motion sensors sewn into the garment and provide immediate feedback to the child. The same idea also has potential for adults and for non-medical use such as sports training. Techniques: Linear filters, Feature extraction, Decision trees, Hidden Markov models, Kalman filtering. C3. Repeated Body Movement Recognition Description: A garment that recognizes repeated body movements, to be used by workers with repetitive tasks in order to make them aware of repeated movements and thus prevent injuries. The hardware is similar to that of case C2, but the software is different with regard to that the movements to be recognized are not necessarily specified in advance. Instead, the system will automatically recognize repeated movements from the motion sensor data while the user is working. Techniques: Linear filters, Feature extraction, K-means clustering, Kalman filtering. C4. Optical Surface Contamination Classification Description: A sensor system that detects the presence of snow, water, ice or other contaminants on a road solely by optical means. The system is contained in a small box mounted on a vehicle that drives on the road to be examined. The sensors measure the reflection from a number of laser beams at different wavelengths on the surface, and based on these measurements, the system classifies the surface contaminant. This system can be used both for ordinary roads, where it can alert the driver about ice and water on the road while driving, or for runways and taxiways in airports, where it can provide decision support for field services personnel. Techniques: Linear filters, Noise cancellation, Feature extraction, Principal component analysis, Decision trees, Linear discriminants, Neural networks. C5. Optical Friction Measurement Description: A sensor system that measures the friction coefficient on a road solely by optical means. The hardware is the same as that of case C4, but in this case, the output is a numerical value rather than a class. This system is primarily intended to be used for airport runways, where friction measurement is an important part of operation in winter conditions. Techniques: Linear filters, Noise cancellation, Feature extraction, Neural networks, System identification, Kalman filtering, Extended Kalman filtering. C6. Anxiety Recognition Description: A biosensor-based system for anxiety patients, allowing them to monitor their current psychological state and predict panic attacks before they occur. Biosensors measuring heart activity, muscle tension and some other physiological parameters are connected to a mobile device, which is capable of predicting panic attacks. When the patient uses the system in daily life, he or she can be alerted about an upcoming panic attack and take necessary precautions.

4 4/12 Techniques: Linear filters, Feature extraction, Principal component analysis, Decision trees, Linear discriminants. C7. Soil Parameter Estimation Description: A sensor system that measures soil parameters such as cloddiness, strength and water content and adjusts the tillage machinery accordingly in real-time. The sensor system is mounted in the front of the tractor and could comprise laser range finders, cameras and other relevant hardware. From these sensors, relevant soil parameters are calculated in real-time. These parameters are in turn used to control the tillage machinery attached to the tractor, so that the soil is tillaged optimally. Techniques: Linear filters, Neural networks, System identification, Kalman filtering, Extended Kalman filtering. Signal Processing A signal, in this context, generally refers to a variable that varies with time, and signal processing refers to extracting relevant information out of the signal. Typically in our cases, the signal to extract information from comes from sensors such as microphones, accelerometers, GPS, cameras, etc., and the objective of signal processing is to deduce relevant physical quantities, such as frequency spectra and positions, out of these sensor signals. This section describes some commonly used signal processing techniques. Linear Filters Usage: For removing noise when the frequency spectrum of the noise is significantly different than the desired noise-free signal. Description: Linear filtering is the classical method for removing noise in specific frequency bands. The most intuitive case is probably audio filtering, where frequencies above and below the frequencies of the expected sound are removed. However, filtering out specific frequency bands makes sense for almost all sensor data, since the measured physical quantities typically do not vary above or below certain rates. For example, the temperature in a room should not vary by more than a few of degrees per minute, so all measured temperature variations faster than that are probably noise that should be removed. The most commonly used types of linear filters are:! Low-pass filter. Removes signal components above a certain frequency.! High-pass filter. Removes signal components below a certain frequency.! Band-pass filter. Removes signal components outside a certain frequency band.! Band-stop filter. Removes signal components within a certain frequency band. Linear filters are very simple and can be implemented digitally with only a few lines of code, and they require very little computation power. The implementation requires a set of fixed parameters called filter coefficients or filter constants, which are calculated at design time from the desired frequency limits according to some simple mathematical formulas. Matlab includes convenient tools for filter coefficient calculations. Unfortunately, implementations of linear filters are always non-ideal, in the sense that they always dampen the desired frequencies and let through undesired frequencies to some degree. For example, a low-pass filter that should remove signal components above 10 Hz will still significantly dampen frequencies at 8-9 Hz and allow through frequencies at Hz. The filter can be made sharper by increasing the filter order, which essentially means that the algorithm has to loop more iterations for each sample. The drawback with increasing the filter order is that the signal is delayed, which can be a problem in real-time applications.

5 5/12 Finally, it should be noticed that many pattern recognition algorithms ignore noise more or less by their nature, given that it is statistically independent from the desired information of the signal. Therefore, filtering the input to pattern recognition algorithms may actually worsen their performance, since filtering inevitably removes some of the desired information. Tools: Matlab (Signal processing toolbox), Octave. Noise Cancellation Usage: For removing noise when the nature of the noise is known to a degree that it can be modeled mathematically. Description: Noise cancellation is a more sophisticated technique for removing noise than filtering. It can be used when more is known about the noise source, so that it is possible to build a mathematical model of the noise. With this model, the noise at a given point in time can be predicted from previously sampled information and then simply subtracted from the noisy signal, yielding a noise-free signal. For example, if the noise comes from 50 Hz interference from the electricity network, it might be possible to measure the amplitude and phase of the interference, and then model the noise as a sine wave that can be subtracted from the signal in real-time. Tools: Matlab (Signal processing toolbox). Fast Fourier Transform (FFT) Usage: For calculating the frequency spectrum of a sampled signal. The frequency spectrum can further be used for filtering, feature extraction, data compression and data analysis. Description: FFT is an algorithm that calculates the frequency spectrum of a sampled signal, that is, the magnitude of each frequency that is present in the signal. I.e., the signal is transformed from the time domain to the frequency domain. Again, audio is the most instructive case, since the spectrum of an audio signal clearly shows the tones that it contains. For example, the FFT-calculated spectrum of a clear 440 Hz sine wave will have a high value at 440 Hz and a low value (or zero) at all other frequencies. The FFT algorithm is not straightforward, but it can be found in many textbooks and in many places on the Internet. It takes an array of sampled values (over time) as input, and outputs an array where the value of each element is the magnitude of a specific frequency. It is typically computed repeatedly over a fixed time interval as new data arrives. For example, for audio, the spectrum could be computed repeatedly over the last 20 ms, corresponding to 960 samples if sampled at 48 khz (this is a common sample rate for many PC sound cards). Hence, over time, a sequence of spectra is produced. These spectra can be used for:! Direct data analysis - for example tonal analysis of music.! Data compression - most audio data formats, such as mp3, are based on the principle of discarding frequency components with relatively low magnitude.! Filtering - unwanted frequencies can be zeroed and then the spectrum can be transformed back to time domain.! Feature extraction for pattern recognition, see below. The well-known Nyquist theorem states that the FFT can only produce a usable spectrum from 0 Hz up to half the sample frequency (the sample frequency is the inverse of the time between each sample). For example, if a signal is sampled at 48 khz, only frequencies up to 24 khz can be reliably read out of the spectrum from the FFT. This is a theoretical limitation that no other algorithm can do better. Another limitation with FFT is that it induces errors called spectral leakage into the spectrum, which originates from the discontinuities at the start and end of the input

6 6/12 array. One way to remedy this problem is to apply a window function on the input array before calculating the FFT. Tools: Matlab, Octave. Feature Extraction Usage: For extracting desired information from a signal for use with machine learning algorithms. Description: Feature extraction generally refers to preprocessing raw input data to make it suitable for machine learning. In particular, redundant data should be removed, and the data should be converted into quantities that relate as closely as possible to the relevant characteristics, or features, one is really looking for. For example, to classify bird vocalizations, the raw audio data should be converted to frequency domain using FFT, since the different birds are intuitively more easily distinguished by the spectra than by the raw audio data. As another example, to detect anxiety, the pulse in beats per minute should be extracted from the raw ECG sensor data. The choice and design of the feature extraction algorithms are typically a combination of determination of relevant features based on domain knowledge and mathematical analysis of the data to further remove redundant data. For the latter, a commonly used technique is principal component analysis (PCA), described below. For the former, the determination of relevant features, one must partly rely on intuition, experience and domain knowledge as well as trial-anderror. Here are a few commonly used features and feature extraction techniques for different application domains:! Audio: Zero crossing rate, Signal energy, FFT, Spectral flux, Mel-frequency cepstral coefficients (MFCC).! Images and video: Edge detection, blob detection, motion detection.! Text: Word frequencies. In addition, it is common to use the means and variances of the features over time frames within which their values are not expected to change much, rather than using all the data at higher sample rate. For example, for bird vocalization recognition, the audio features can be expected to be stationary within approximately 100 ms, so it suffices to use the mean and variance of each feature over each 100 ms time frame. The output of the feature extraction, and the input to the machine learning algorithm, is a feature vector for each time frame. The feature vector is a vector of values that capture the essential information for the time frame. For bird vocalization recognition, the feature vector could be the means and variances of each of zero crossing rate, signal energy, spectral flux and 13 MFC coefficients, a total of 32 values. Tools: Matlab, Octave. Machine Learning In many applications, one wants to categorize some input data into classes or estimate some quantity based on it, but the relationship between the input and the class or quantity may be too complex to determine. On the other hand, it may still be possible to provide training data that contains known instances of the desired classification or estimation. For example, it may be possible to collect a number of sound files with bird vocalizations and have an expert label them with the correct bird species, or it may be possible to measure the friction of a road mechanically while collecting optical sensor data from the same spot at the same time.

7 7/12 In these situations, a machine learning algorithm can be used to solve the classification or estimation problem. Machine learning refers to the use of systems that can learn from training data to perform a specific task. Machine learning algorithms are all fixed, generic algorithms that are governed by some parameters, and training refers to tuning these parameters by use of the training data, so that the algorithm subsequently performs the classification or estimation as well as possible. It is important to understand that although machine learning is often associated with artificial intelligence, these algorithms are not intelligent in the sense that they can take any kind of input and associated desired output and find the best relationship with no further human guidance. The human system designer must still put in considerable intelligence, knowledge and experience by ensuring that the training data is sufficient (data collection), that the input is appropriately preprocessed (feature extraction), that the right algorithm is chosen for the application (model selection), and that the performance of the algorithm is properly evaluated once it has been implemented and trained (validation). In this section, some of the most commonly used machine learning algorithms are presented, primarily with a focus on which kind of applications they are best suited for. Wherever possible, the algorithms have been categorized according to the following:! Supervised/Unsupervised: A supervised algorithm uses training data that is labeled with the class or quantity to predict. An unsupervised algorithm finds classes or numeric relationships from unlabeled data.! Discriminative/Generative: A discriminative algorithm only provides the most probable class or quantity for a data vector. A generative algorithm provides probabilities (or some other kind of score) for all possible classes or values, allowing for the application to assess the confidence of the prediction.! Static/Dynamic: A static algorithm works with single data vectors. A dynamic algorithm also considers the sequential structure of data that arrive in a sequence, typically over time.! Linear/Non-linear: A linear algorithm only finds linear relationships between the data vectors and the class or quantity to predict. A non-linear algorithm finds more general, for example curved, relationships.! Classification/Regression: A classification algorithm is used to find or predict classes from data vectors. A regression algorithm is used to find numeric relationships. However, it should be noticed that many algorithms can be extended to provide more functionality than indicated by the category. Principal Component Analysis (PCA) Usage: PCA has several common usages, including:! For dimensionality reduction, that is, removing features that provide little or no extra information so that machine learning becomes more efficient.! For visualizing the distribution of a data set with many dimensions, for example to aid in selecting features or to assess the separation of the classes in the feature space.! For unsupervised regression to find linear functions to latent variables that explain the observed data. Category: Unsupervised discriminative static linear regression. Description: Principal Component Analysis (PCA) transforms the individual components of data vectors into another set of components, called the principal components, which are linearly un-

8 8/12 correlated. This way, if some components were highly correlated in the original data, these components will essentially be transformed into one component that captures the information of all these original components together and some other components with very low information content. In the anxiety recognition example, assume that the data vectors contain measurements of heart rate, muscle tension and perspiration. We expect these three components to be correlated, such that if one component were plotted against another (e.g. heart rate against muscle tension), the points would lie close to a diagonal line in the graph. After PCA, the first data component for each vector describes a combined heart rate - muscle tension - perspiration feature, and the other two components will only exhibit the deviations from this relationship. When plotting the transformed data, the points would lie close to the X axis of the graph. In fact, what PCA does is to rotate the feature space so that such diagonal correlation lines (or in general hyperplanes) are aligned with the base axes of the transformed space. Concretely, given some original data of n data points with m components each, PCA typically yields the following mathematical objects:! An n"m matrix with the transformed data. In a machine learning context, the transformed data is commonly used to visualize the data set in a two-dimensional graph by using only the two first principal components and ignoring the rest. This corresponds to projecting the higher-dimensional data onto a two-dimensional plane with the highest possible variance in the projected data set. If each data point is colored according to a class (which is known if it is training data for a classifier), one can get an indication of how the classes are distributed in the feature space.! An m"m transformation matrix. The PCA transformation of a single original data vector can be computed by multiplying it with the transformation matrix. This can be used for dimensionality reduction: After PCA transformation, the last components of the vector are removed, since they presumably add little extra information. In the example with anxiety recognition, the three-component original vector could be reduced to a single number - the first principal component. The transformation can also be seen as the result of unsupervised regression to reveal latent variables - in the anxiety recognition example, the first principal component can be interpreted as stress, and the transformation matrix provides a linear function to compute stress directly from the data vectors.! m eigenvalues. The eigenvalues correspond to the variance of the principal components and can be used to assess how many components can be removed in the dimensionality reduction. For example, the eigenvalues in the anxiety recognition example could be 1.2, 0.3 and 0.2, indicating that the first component has considerably higher variance (and hence information content) than the other two. In practice, the original data vectors should be centered and scaled before performing PCA. This is typically done by subtracting the mean and then dividing by the standard deviation for each component. Tools: Matlab (statistics toolbox), Octave, Weka, R, Alglib. Neural Networks Usage: For all kinds of classification and regression. Category: All.

9/12 Description: An Artificial Neural Networks (or just neural network) is a simple mathematical model simulating the vast network of interconnected neurons in the human brain.

units ( neurons ) connected to each other and processing

9 9/12 Description: An Artificial Neural Networks (or just neural network) is a simple mathematical model simulating the vast network of interconnected neurons in the human brain. It consists of several layers of processing units ( neurons ) connected to each other and processing input data into some output. E.g., with a system of three layers, the first layer would have input neurons, which send data via connections to the second hidden layer of neurons, which processes the data and sends the result through more connections to the third layer of (output) neurons. More complex systems would have more layers of neurons. Each neuron processes the input it receives, typically by applying some function to the (weighted) signal from each incoming connection, summing the results and then applying a socalled activation function to the sum. The activation function is often a sigmoid ( S-shaped ) function, which for most input produces an essentially binary activation level output: I.e., either the neuron fires or it does not. The weights on the connections are usually identified through a learning phase. Various learning algorithms can be used, e.g. genetic programming. Thus, a neural network is defined by these three parameters: 1 The layer and interconnection structure 2 The connection weights and/or the learning algorithm for updating them 3 The activation function Making these decisions requires some understanding of the underlying theory of neural networks and often also a certain amount of experimenting. However, once a good structure and a good set of weights have been found, a neural network may prove to be quite robust. Tools: Matlab (neural networks toolbox), Weka, NEAT. Support Vector Machines (SVM) Usage: Primarily for discriminative classification, but can also be extended to generative classification and regression. Category: Supervised discriminative static non-linear classification. Description: The support vector machine (SVM) has in short time become one of the most widely used classifiers. It is easy to configure, it is fast to train and it performs relatively well with few training data. SVMs are based on the maximal margin principle, which means that it finds a separator between the classes in the feature space such that the distance between the separator and the closest training data are maximized. The basic SVM simply finds a single linear hyperplane that

10 10/12 separates two classes with maximal margin. However, for practical use this basic SVM is extended as follows:! A non-linear kernel function replaces the dot product that is used in the basic SVM. By some mathematical magic, this allows for finding a non-linear, curved separator, which is necessary in all non-trivial cases. There are several kernel functions to choose between. A commonly used kernel is the radial basis function, which is governed by a parameter often called gamma.! For cases where training data for different classes are mixed up such that no perfect separator can be found, some slack variables are introduced to allow for some data to lie on the wrong side of the separator. This slack is governed by a parameter often called C.! In order to handle more than two classes, a decision scheme based on multiple binary SVMs (two-class SVMs) is used. A commonly used approach is one-versus-the-rest, where for each class, a binary SVM is trained with data from the class versus data from all the other classes. When classifying an unseen test instance, the winning class will be the one whose binary SVM most confidently indicates membership in the class. This is possible because a binary SVM actually returns a number whose sign (positive or negative) indicates the class, while the absolute value indicates the confidence. For example, for three classes A, B and C, three binary SVMs are trained with A versus B and C, B versus A and C, and C versus A and B. If for a test instance the first SVM returns 0.05, the second 0.8 and the third -0.25, then the instance is classified as B. With the above extensions, the gamma and C parameters must be given to the SVM training algorithm. In practice, it is common to simply train the SVM multiple times with different values of gamma and C to see which values yield the best accuracy. Many software packages perform this search for gamma and C automatically, which means that the classifier can be trained with no configuration whatsoever. However, it is important to understand that these values affect how well the classifier will perform on unseen data when in operation. In particular, a too high gamma value gives an overfit SVM that will not perform well on unseen data that is only a little different from the training data. A too high C value makes the classifier sensitive to noise and errors in the training data. Tools: Weka, Libsvm. K-Means Clustering Usage: For finding classes in a data set based on clustering in the feature space. Category: Unsupervised discriminative static non-linear classification. Description: With k-means clustering, one attempts to divide n data points into k clusters such that each data point belongs to the cluster with the nearest mean. Data points can be general points in an n-dimensional space. The problem is computationally difficult ( NP-hard ), but there are efficient heuristics for solving it. The most common method is Lloyd s algorithm, devised by Stuart Lloyd in 1957: 1 Choose k initial cluster mean points, c 1 to c k 2 Create k clusters such that a data point x belongs to cluster i if c i is the closest cluster mean point to x 3 Calculate the new mean of each cluster 4 Repeat steps 2-4 until clusters do not change

11 11/12 There are several ways of performing step 1. One way, the Forgy method, is choosing simply a random set of k points from the data and use those as the initial cluster mean points. Another, Random Partition, is to group randomly the data points into k clusters and proceed to step 3 above. K-means clustering is fast to compute, but it might be a drawback that the number of clusters, k, is given as input to the algorithm. Thus, using a wrong number of clusters may lead to poor results and one should do some diagnostic calculation first to estimate k. Tools: Matlab, Weka, Spectral Python. Hidden Markov Models (HMM) Usage: For recognition of sequential patterns. Category: Unsupervised and supervised generative dynamic classification. Description: The classification algorithms described so far classify individual data vectors. However, in many applications, the data vectors arrive in sequences, typically over time, and it may be necessary to take the sequential development itself into account for accurate recognition. This is the case with speech recognition, where complete sequences of sound data must be considered to recognize different words. Similarly, the sequential development of a bird vocalization contains relevant information for classification of the bird species. A Hidden Markov Model (HMM) is a probabilistic model of a sequential pattern that can be used to evaluate how well an unseen data sequence matches the pattern. For each pattern to recognize, a HMM is produced from training data. An unseen sequence is classified by evaluating it against all HMMs and choosing the one with the best match. For bird classification, there would be at least one HMM for each bird species - possibly several, if the same bird species have several different vocalizations. HMMs assume that the sequence can be modeled in terms of states that implicitly account for the progress of the sequence. For example, for speech recognition, the states typically correspond to the phonemes of the word, so that the model for the spoken word Alexandra has ten states: A-L-E-K-S-A-N-D-R-A. Given the set of states for a sequence, the HMM concretely specifies:! The transition probabilities between the states as an n"n matrix, where n is the number of states. They express the tendency of the pattern to stay at a given state for a while or to move to another one. For example, if the average Alexandra pattern stays at the first A for some samples before moving on to the L, the transition probability from the first state (A) to the same state could be 0.95, while it is 0.05 to the second state (L), and 0 to all other states.! The emission probabilities that express the probability of data elements from the sequence at a given state. The mathematical form depends on the data elements, but for ordinary vectors with real numbers, the emission probabilities are typically expressed as mean and variance for each component. For example, if the speech recognizer works with vectors of 15 sound features per sample, then the emission probabilities at state A would be expressed with 15 mean values corresponding to the average a phoneme and some variances to express how far from the average that each sound feature can be to still sound like an a. Unfortunately, several software packages (including Matlab) only allow the data elements of the sequence to be single integers in finite intervals, e.g. from 1 to 20. Then the emission probabilities are represented by an n"m matrix, where n

12 12/12 is the number of states and m is the length of the finite interval. In reality, however, the sequences to analyze typically consist of vectors with real numbers and not single finite integers. There are two ways to solve this problem:! Perform vector quantization into classes. If the classes are known, an ordinary classification algorithm, e.g. SVM, can be used. For example, the speech data vectors can be classified into phonemes, so the sequence of feature vectors is turned into a sequence of phoneme classes before it is analyzed with HMM. If the classes are not known, a clustering algorithm such as k-means can be used to enforce a quantization.! Choose another software package. Training a HMM amounts to finding the transition and emission probabilities that match a given sequence or a set of sequences. This can be done efficiently with e.g. the Baum-Welch algorithm. However, it only finds local optima and one must provide an initial guess of the probabilities that the algorithm can start iterating from. The best solution will only be found if an initial guess that is close enough to this solution is provided. If possible, one should try to construct an HMM manually based on what is known about the pattern and use this as the initial guess. Otherwise, uniformly distributed values can be used. Given a HMM and an unseen test sequence, the following can be evaluated efficiently:! The probability of the test sequence given the model. This is typically computed by the forward algorithm and expresses how well the test sequence matches the modeled pattern.! The most probable sequence of states that generate the test sequence. This is typically computed by the Viterbi algorithm. In general, HMMs are rather complicated to use, and simply training a HMM without proper configuration and data preparation will probably not yield useful results. Inexperienced users are recommended to search the literature for the best way to use HMMs for the particular application. One should also consider alternative methods for handling sequential data, such as:! Sliding windows, which essentially means to collect several data vectors from the sequence, e.g. 20 successive elements, into one large data vector and use this with a standard static classification algorithm.! Feature extraction over sequences, where a tailor-made algorithm analyzes the raw data over a sequence or part of a sequence and extracts features that are then fed into a static classification algorithm. Tools: Matlab (statistics toolbox), Hidden Markov Model toolbox for Matlab, jahmm. This publication is funded by the performance contract Massive Data processing, visualization and interpretation.

Applying Supervised Learning

Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains