Introduction to Massive Data Interpretation

Size: px
Start display at page:

Download "Introduction to Massive Data Interpretation"

Transcription

1 Introduction to Massive Data Interpretation JERKER HAMMARBERG JAKOB FREDSLUND THE ALEXANDRA INSTITUTE 2013

2 2/12 Introduction Cases C1. Bird Vocalization Recognition C2. Body Movement Classification C3. Repeated Body Movement Recognition C4. Optical Surface Contamination Classification C5. Optical Friction Measurement C6. Anxiety Recognition C7. Soil Parameter Estimation Signal Processing Linear Filters Noise Cancellation Fast Fourier Transform (FFT) Feature Extraction Machine Learning Principal Component Analysis (PCA) Neural Networks Support Vector Machines (SVM) K-Means Clustering Hidden Markov Models (HMM) Introduction This document briefly presents the most commonly used techniques for signal processing and machine learning - in general, techniques to be used for extracting information from and interpreting large amounts of data from sensors or databases. The acquirement of knowledge and experience with these techniques are part of the Alexandra Institute s commitment to provide services within this field in the context of pervasive computing. The document starts with a number of motivating cases with forward references to the techniques that might be relevant for each case. Thereafter, the techniques are briefly described with regard to what they are used for and how they work in an intuitive and rather non-technical fashion. Cases This section presents seven real-world cases for massive data interpretation. They are all taken from projects involving the Alexandra Institute and Danish companies in C1. Bird Vocalization Recognition Description: A smartphone application that can recognize birds from their songs and calls. The application should be usable outdoors, where the user can record the bird vocalization and immediately obtain information about the recognized bird. While recording and feature extraction are performed on the mobile device, the actual classification takes place on a server, much like distributed speech recognition. The classifier can also be used for other applications, such as automatic monitoring of the presence of birds at a fixed site over long periods of time. Techniques: Fast Fourier transform, Noise cancellation, Feature extraction, Principal component analysis, Support Vector Machines, Hidden Markov models.

3 3/12 C2. Body Movement Classification Description: A garment that gives feedback based on body movements, to be used by children for training or rehabilitation. Such a garment can aid children in rehabilitation of for example paresis by providing motivation to use the impaired limbs in daily life, and not only during training sessions. For this reason, the garment needs to recognize appropriate and inappropriate body movements in real-time from motion sensors sewn into the garment and provide immediate feedback to the child. The same idea also has potential for adults and for non-medical use such as sports training. Techniques: Linear filters, Feature extraction, Decision trees, Hidden Markov models, Kalman filtering. C3. Repeated Body Movement Recognition Description: A garment that recognizes repeated body movements, to be used by workers with repetitive tasks in order to make them aware of repeated movements and thus prevent injuries. The hardware is similar to that of case C2, but the software is different with regard to that the movements to be recognized are not necessarily specified in advance. Instead, the system will automatically recognize repeated movements from the motion sensor data while the user is working. Techniques: Linear filters, Feature extraction, K-means clustering, Kalman filtering. C4. Optical Surface Contamination Classification Description: A sensor system that detects the presence of snow, water, ice or other contaminants on a road solely by optical means. The system is contained in a small box mounted on a vehicle that drives on the road to be examined. The sensors measure the reflection from a number of laser beams at different wavelengths on the surface, and based on these measurements, the system classifies the surface contaminant. This system can be used both for ordinary roads, where it can alert the driver about ice and water on the road while driving, or for runways and taxiways in airports, where it can provide decision support for field services personnel. Techniques: Linear filters, Noise cancellation, Feature extraction, Principal component analysis, Decision trees, Linear discriminants, Neural networks. C5. Optical Friction Measurement Description: A sensor system that measures the friction coefficient on a road solely by optical means. The hardware is the same as that of case C4, but in this case, the output is a numerical value rather than a class. This system is primarily intended to be used for airport runways, where friction measurement is an important part of operation in winter conditions. Techniques: Linear filters, Noise cancellation, Feature extraction, Neural networks, System identification, Kalman filtering, Extended Kalman filtering. C6. Anxiety Recognition Description: A biosensor-based system for anxiety patients, allowing them to monitor their current psychological state and predict panic attacks before they occur. Biosensors measuring heart activity, muscle tension and some other physiological parameters are connected to a mobile device, which is capable of predicting panic attacks. When the patient uses the system in daily life, he or she can be alerted about an upcoming panic attack and take necessary precautions.

4 4/12 Techniques: Linear filters, Feature extraction, Principal component analysis, Decision trees, Linear discriminants. C7. Soil Parameter Estimation Description: A sensor system that measures soil parameters such as cloddiness, strength and water content and adjusts the tillage machinery accordingly in real-time. The sensor system is mounted in the front of the tractor and could comprise laser range finders, cameras and other relevant hardware. From these sensors, relevant soil parameters are calculated in real-time. These parameters are in turn used to control the tillage machinery attached to the tractor, so that the soil is tillaged optimally. Techniques: Linear filters, Neural networks, System identification, Kalman filtering, Extended Kalman filtering. Signal Processing A signal, in this context, generally refers to a variable that varies with time, and signal processing refers to extracting relevant information out of the signal. Typically in our cases, the signal to extract information from comes from sensors such as microphones, accelerometers, GPS, cameras, etc., and the objective of signal processing is to deduce relevant physical quantities, such as frequency spectra and positions, out of these sensor signals. This section describes some commonly used signal processing techniques. Linear Filters Usage: For removing noise when the frequency spectrum of the noise is significantly different than the desired noise-free signal. Description: Linear filtering is the classical method for removing noise in specific frequency bands. The most intuitive case is probably audio filtering, where frequencies above and below the frequencies of the expected sound are removed. However, filtering out specific frequency bands makes sense for almost all sensor data, since the measured physical quantities typically do not vary above or below certain rates. For example, the temperature in a room should not vary by more than a few of degrees per minute, so all measured temperature variations faster than that are probably noise that should be removed. The most commonly used types of linear filters are:! Low-pass filter. Removes signal components above a certain frequency.! High-pass filter. Removes signal components below a certain frequency.! Band-pass filter. Removes signal components outside a certain frequency band.! Band-stop filter. Removes signal components within a certain frequency band. Linear filters are very simple and can be implemented digitally with only a few lines of code, and they require very little computation power. The implementation requires a set of fixed parameters called filter coefficients or filter constants, which are calculated at design time from the desired frequency limits according to some simple mathematical formulas. Matlab includes convenient tools for filter coefficient calculations. Unfortunately, implementations of linear filters are always non-ideal, in the sense that they always dampen the desired frequencies and let through undesired frequencies to some degree. For example, a low-pass filter that should remove signal components above 10 Hz will still significantly dampen frequencies at 8-9 Hz and allow through frequencies at Hz. The filter can be made sharper by increasing the filter order, which essentially means that the algorithm has to loop more iterations for each sample. The drawback with increasing the filter order is that the signal is delayed, which can be a problem in real-time applications.

5 5/12 Finally, it should be noticed that many pattern recognition algorithms ignore noise more or less by their nature, given that it is statistically independent from the desired information of the signal. Therefore, filtering the input to pattern recognition algorithms may actually worsen their performance, since filtering inevitably removes some of the desired information. Tools: Matlab (Signal processing toolbox), Octave. Noise Cancellation Usage: For removing noise when the nature of the noise is known to a degree that it can be modeled mathematically. Description: Noise cancellation is a more sophisticated technique for removing noise than filtering. It can be used when more is known about the noise source, so that it is possible to build a mathematical model of the noise. With this model, the noise at a given point in time can be predicted from previously sampled information and then simply subtracted from the noisy signal, yielding a noise-free signal. For example, if the noise comes from 50 Hz interference from the electricity network, it might be possible to measure the amplitude and phase of the interference, and then model the noise as a sine wave that can be subtracted from the signal in real-time. Tools: Matlab (Signal processing toolbox). Fast Fourier Transform (FFT) Usage: For calculating the frequency spectrum of a sampled signal. The frequency spectrum can further be used for filtering, feature extraction, data compression and data analysis. Description: FFT is an algorithm that calculates the frequency spectrum of a sampled signal, that is, the magnitude of each frequency that is present in the signal. I.e., the signal is transformed from the time domain to the frequency domain. Again, audio is the most instructive case, since the spectrum of an audio signal clearly shows the tones that it contains. For example, the FFT-calculated spectrum of a clear 440 Hz sine wave will have a high value at 440 Hz and a low value (or zero) at all other frequencies. The FFT algorithm is not straightforward, but it can be found in many textbooks and in many places on the Internet. It takes an array of sampled values (over time) as input, and outputs an array where the value of each element is the magnitude of a specific frequency. It is typically computed repeatedly over a fixed time interval as new data arrives. For example, for audio, the spectrum could be computed repeatedly over the last 20 ms, corresponding to 960 samples if sampled at 48 khz (this is a common sample rate for many PC sound cards). Hence, over time, a sequence of spectra is produced. These spectra can be used for:! Direct data analysis - for example tonal analysis of music.! Data compression - most audio data formats, such as mp3, are based on the principle of discarding frequency components with relatively low magnitude.! Filtering - unwanted frequencies can be zeroed and then the spectrum can be transformed back to time domain.! Feature extraction for pattern recognition, see below. The well-known Nyquist theorem states that the FFT can only produce a usable spectrum from 0 Hz up to half the sample frequency (the sample frequency is the inverse of the time between each sample). For example, if a signal is sampled at 48 khz, only frequencies up to 24 khz can be reliably read out of the spectrum from the FFT. This is a theoretical limitation that no other algorithm can do better. Another limitation with FFT is that it induces errors called spectral leakage into the spectrum, which originates from the discontinuities at the start and end of the input

6 6/12 array. One way to remedy this problem is to apply a window function on the input array before calculating the FFT. Tools: Matlab, Octave. Feature Extraction Usage: For extracting desired information from a signal for use with machine learning algorithms. Description: Feature extraction generally refers to preprocessing raw input data to make it suitable for machine learning. In particular, redundant data should be removed, and the data should be converted into quantities that relate as closely as possible to the relevant characteristics, or features, one is really looking for. For example, to classify bird vocalizations, the raw audio data should be converted to frequency domain using FFT, since the different birds are intuitively more easily distinguished by the spectra than by the raw audio data. As another example, to detect anxiety, the pulse in beats per minute should be extracted from the raw ECG sensor data. The choice and design of the feature extraction algorithms are typically a combination of determination of relevant features based on domain knowledge and mathematical analysis of the data to further remove redundant data. For the latter, a commonly used technique is principal component analysis (PCA), described below. For the former, the determination of relevant features, one must partly rely on intuition, experience and domain knowledge as well as trial-anderror. Here are a few commonly used features and feature extraction techniques for different application domains:! Audio: Zero crossing rate, Signal energy, FFT, Spectral flux, Mel-frequency cepstral coefficients (MFCC).! Images and video: Edge detection, blob detection, motion detection.! Text: Word frequencies. In addition, it is common to use the means and variances of the features over time frames within which their values are not expected to change much, rather than using all the data at higher sample rate. For example, for bird vocalization recognition, the audio features can be expected to be stationary within approximately 100 ms, so it suffices to use the mean and variance of each feature over each 100 ms time frame. The output of the feature extraction, and the input to the machine learning algorithm, is a feature vector for each time frame. The feature vector is a vector of values that capture the essential information for the time frame. For bird vocalization recognition, the feature vector could be the means and variances of each of zero crossing rate, signal energy, spectral flux and 13 MFC coefficients, a total of 32 values. Tools: Matlab, Octave. Machine Learning In many applications, one wants to categorize some input data into classes or estimate some quantity based on it, but the relationship between the input and the class or quantity may be too complex to determine. On the other hand, it may still be possible to provide training data that contains known instances of the desired classification or estimation. For example, it may be possible to collect a number of sound files with bird vocalizations and have an expert label them with the correct bird species, or it may be possible to measure the friction of a road mechanically while collecting optical sensor data from the same spot at the same time.

7 7/12 In these situations, a machine learning algorithm can be used to solve the classification or estimation problem. Machine learning refers to the use of systems that can learn from training data to perform a specific task. Machine learning algorithms are all fixed, generic algorithms that are governed by some parameters, and training refers to tuning these parameters by use of the training data, so that the algorithm subsequently performs the classification or estimation as well as possible. It is important to understand that although machine learning is often associated with artificial intelligence, these algorithms are not intelligent in the sense that they can take any kind of input and associated desired output and find the best relationship with no further human guidance. The human system designer must still put in considerable intelligence, knowledge and experience by ensuring that the training data is sufficient (data collection), that the input is appropriately preprocessed (feature extraction), that the right algorithm is chosen for the application (model selection), and that the performance of the algorithm is properly evaluated once it has been implemented and trained (validation). In this section, some of the most commonly used machine learning algorithms are presented, primarily with a focus on which kind of applications they are best suited for. Wherever possible, the algorithms have been categorized according to the following:! Supervised/Unsupervised: A supervised algorithm uses training data that is labeled with the class or quantity to predict. An unsupervised algorithm finds classes or numeric relationships from unlabeled data.! Discriminative/Generative: A discriminative algorithm only provides the most probable class or quantity for a data vector. A generative algorithm provides probabilities (or some other kind of score) for all possible classes or values, allowing for the application to assess the confidence of the prediction.! Static/Dynamic: A static algorithm works with single data vectors. A dynamic algorithm also considers the sequential structure of data that arrive in a sequence, typically over time.! Linear/Non-linear: A linear algorithm only finds linear relationships between the data vectors and the class or quantity to predict. A non-linear algorithm finds more general, for example curved, relationships.! Classification/Regression: A classification algorithm is used to find or predict classes from data vectors. A regression algorithm is used to find numeric relationships. However, it should be noticed that many algorithms can be extended to provide more functionality than indicated by the category. Principal Component Analysis (PCA) Usage: PCA has several common usages, including:! For dimensionality reduction, that is, removing features that provide little or no extra information so that machine learning becomes more efficient.! For visualizing the distribution of a data set with many dimensions, for example to aid in selecting features or to assess the separation of the classes in the feature space.! For unsupervised regression to find linear functions to latent variables that explain the observed data. Category: Unsupervised discriminative static linear regression. Description: Principal Component Analysis (PCA) transforms the individual components of data vectors into another set of components, called the principal components, which are linearly un-

8 8/12 correlated. This way, if some components were highly correlated in the original data, these components will essentially be transformed into one component that captures the information of all these original components together and some other components with very low information content. In the anxiety recognition example, assume that the data vectors contain measurements of heart rate, muscle tension and perspiration. We expect these three components to be correlated, such that if one component were plotted against another (e.g. heart rate against muscle tension), the points would lie close to a diagonal line in the graph. After PCA, the first data component for each vector describes a combined heart rate - muscle tension - perspiration feature, and the other two components will only exhibit the deviations from this relationship. When plotting the transformed data, the points would lie close to the X axis of the graph. In fact, what PCA does is to rotate the feature space so that such diagonal correlation lines (or in general hyperplanes) are aligned with the base axes of the transformed space. Concretely, given some original data of n data points with m components each, PCA typically yields the following mathematical objects:! An n"m matrix with the transformed data. In a machine learning context, the transformed data is commonly used to visualize the data set in a two-dimensional graph by using only the two first principal components and ignoring the rest. This corresponds to projecting the higher-dimensional data onto a two-dimensional plane with the highest possible variance in the projected data set. If each data point is colored according to a class (which is known if it is training data for a classifier), one can get an indication of how the classes are distributed in the feature space.! An m"m transformation matrix. The PCA transformation of a single original data vector can be computed by multiplying it with the transformation matrix. This can be used for dimensionality reduction: After PCA transformation, the last components of the vector are removed, since they presumably add little extra information. In the example with anxiety recognition, the three-component original vector could be reduced to a single number - the first principal component. The transformation can also be seen as the result of unsupervised regression to reveal latent variables - in the anxiety recognition example, the first principal component can be interpreted as stress, and the transformation matrix provides a linear function to compute stress directly from the data vectors.! m eigenvalues. The eigenvalues correspond to the variance of the principal components and can be used to assess how many components can be removed in the dimensionality reduction. For example, the eigenvalues in the anxiety recognition example could be 1.2, 0.3 and 0.2, indicating that the first component has considerably higher variance (and hence information content) than the other two. In practice, the original data vectors should be centered and scaled before performing PCA. This is typically done by subtracting the mean and then dividing by the standard deviation for each component. Tools: Matlab (statistics toolbox), Octave, Weka, R, Alglib. Neural Networks Usage: For all kinds of classification and regression. Category: All.

9 9/12 Description: An Artificial Neural Networks (or just neural network) is a simple mathematical model simulating the vast network of interconnected neurons in the human brain. It consists of several layers of processing units ( neurons ) connected to each other and processing input data into some output. E.g., with a system of three layers, the first layer would have input neurons, which send data via connections to the second hidden layer of neurons, which processes the data and sends the result through more connections to the third layer of (output) neurons. More complex systems would have more layers of neurons. Each neuron processes the input it receives, typically by applying some function to the (weighted) signal from each incoming connection, summing the results and then applying a socalled activation function to the sum. The activation function is often a sigmoid ( S-shaped ) function, which for most input produces an essentially binary activation level output: I.e., either the neuron fires or it does not. The weights on the connections are usually identified through a learning phase. Various learning algorithms can be used, e.g. genetic programming. Thus, a neural network is defined by these three parameters: 1 The layer and interconnection structure 2 The connection weights and/or the learning algorithm for updating them 3 The activation function Making these decisions requires some understanding of the underlying theory of neural networks and often also a certain amount of experimenting. However, once a good structure and a good set of weights have been found, a neural network may prove to be quite robust. Tools: Matlab (neural networks toolbox), Weka, NEAT. Support Vector Machines (SVM) Usage: Primarily for discriminative classification, but can also be extended to generative classification and regression. Category: Supervised discriminative static non-linear classification. Description: The support vector machine (SVM) has in short time become one of the most widely used classifiers. It is easy to configure, it is fast to train and it performs relatively well with few training data. SVMs are based on the maximal margin principle, which means that it finds a separator between the classes in the feature space such that the distance between the separator and the closest training data are maximized. The basic SVM simply finds a single linear hyperplane that

10 10/12 separates two classes with maximal margin. However, for practical use this basic SVM is extended as follows:! A non-linear kernel function replaces the dot product that is used in the basic SVM. By some mathematical magic, this allows for finding a non-linear, curved separator, which is necessary in all non-trivial cases. There are several kernel functions to choose between. A commonly used kernel is the radial basis function, which is governed by a parameter often called gamma.! For cases where training data for different classes are mixed up such that no perfect separator can be found, some slack variables are introduced to allow for some data to lie on the wrong side of the separator. This slack is governed by a parameter often called C.! In order to handle more than two classes, a decision scheme based on multiple binary SVMs (two-class SVMs) is used. A commonly used approach is one-versus-the-rest, where for each class, a binary SVM is trained with data from the class versus data from all the other classes. When classifying an unseen test instance, the winning class will be the one whose binary SVM most confidently indicates membership in the class. This is possible because a binary SVM actually returns a number whose sign (positive or negative) indicates the class, while the absolute value indicates the confidence. For example, for three classes A, B and C, three binary SVMs are trained with A versus B and C, B versus A and C, and C versus A and B. If for a test instance the first SVM returns 0.05, the second 0.8 and the third -0.25, then the instance is classified as B. With the above extensions, the gamma and C parameters must be given to the SVM training algorithm. In practice, it is common to simply train the SVM multiple times with different values of gamma and C to see which values yield the best accuracy. Many software packages perform this search for gamma and C automatically, which means that the classifier can be trained with no configuration whatsoever. However, it is important to understand that these values affect how well the classifier will perform on unseen data when in operation. In particular, a too high gamma value gives an overfit SVM that will not perform well on unseen data that is only a little different from the training data. A too high C value makes the classifier sensitive to noise and errors in the training data. Tools: Weka, Libsvm. K-Means Clustering Usage: For finding classes in a data set based on clustering in the feature space. Category: Unsupervised discriminative static non-linear classification. Description: With k-means clustering, one attempts to divide n data points into k clusters such that each data point belongs to the cluster with the nearest mean. Data points can be general points in an n-dimensional space. The problem is computationally difficult ( NP-hard ), but there are efficient heuristics for solving it. The most common method is Lloyd s algorithm, devised by Stuart Lloyd in 1957: 1 Choose k initial cluster mean points, c 1 to c k 2 Create k clusters such that a data point x belongs to cluster i if c i is the closest cluster mean point to x 3 Calculate the new mean of each cluster 4 Repeat steps 2-4 until clusters do not change

11 11/12 There are several ways of performing step 1. One way, the Forgy method, is choosing simply a random set of k points from the data and use those as the initial cluster mean points. Another, Random Partition, is to group randomly the data points into k clusters and proceed to step 3 above. K-means clustering is fast to compute, but it might be a drawback that the number of clusters, k, is given as input to the algorithm. Thus, using a wrong number of clusters may lead to poor results and one should do some diagnostic calculation first to estimate k. Tools: Matlab, Weka, Spectral Python. Hidden Markov Models (HMM) Usage: For recognition of sequential patterns. Category: Unsupervised and supervised generative dynamic classification. Description: The classification algorithms described so far classify individual data vectors. However, in many applications, the data vectors arrive in sequences, typically over time, and it may be necessary to take the sequential development itself into account for accurate recognition. This is the case with speech recognition, where complete sequences of sound data must be considered to recognize different words. Similarly, the sequential development of a bird vocalization contains relevant information for classification of the bird species. A Hidden Markov Model (HMM) is a probabilistic model of a sequential pattern that can be used to evaluate how well an unseen data sequence matches the pattern. For each pattern to recognize, a HMM is produced from training data. An unseen sequence is classified by evaluating it against all HMMs and choosing the one with the best match. For bird classification, there would be at least one HMM for each bird species - possibly several, if the same bird species have several different vocalizations. HMMs assume that the sequence can be modeled in terms of states that implicitly account for the progress of the sequence. For example, for speech recognition, the states typically correspond to the phonemes of the word, so that the model for the spoken word Alexandra has ten states: A-L-E-K-S-A-N-D-R-A. Given the set of states for a sequence, the HMM concretely specifies:! The transition probabilities between the states as an n"n matrix, where n is the number of states. They express the tendency of the pattern to stay at a given state for a while or to move to another one. For example, if the average Alexandra pattern stays at the first A for some samples before moving on to the L, the transition probability from the first state (A) to the same state could be 0.95, while it is 0.05 to the second state (L), and 0 to all other states.! The emission probabilities that express the probability of data elements from the sequence at a given state. The mathematical form depends on the data elements, but for ordinary vectors with real numbers, the emission probabilities are typically expressed as mean and variance for each component. For example, if the speech recognizer works with vectors of 15 sound features per sample, then the emission probabilities at state A would be expressed with 15 mean values corresponding to the average a phoneme and some variances to express how far from the average that each sound feature can be to still sound like an a. Unfortunately, several software packages (including Matlab) only allow the data elements of the sequence to be single integers in finite intervals, e.g. from 1 to 20. Then the emission probabilities are represented by an n"m matrix, where n

12 12/12 is the number of states and m is the length of the finite interval. In reality, however, the sequences to analyze typically consist of vectors with real numbers and not single finite integers. There are two ways to solve this problem:! Perform vector quantization into classes. If the classes are known, an ordinary classification algorithm, e.g. SVM, can be used. For example, the speech data vectors can be classified into phonemes, so the sequence of feature vectors is turned into a sequence of phoneme classes before it is analyzed with HMM. If the classes are not known, a clustering algorithm such as k-means can be used to enforce a quantization.! Choose another software package. Training a HMM amounts to finding the transition and emission probabilities that match a given sequence or a set of sequences. This can be done efficiently with e.g. the Baum-Welch algorithm. However, it only finds local optima and one must provide an initial guess of the probabilities that the algorithm can start iterating from. The best solution will only be found if an initial guess that is close enough to this solution is provided. If possible, one should try to construct an HMM manually based on what is known about the pattern and use this as the initial guess. Otherwise, uniformly distributed values can be used. Given a HMM and an unseen test sequence, the following can be evaluated efficiently:! The probability of the test sequence given the model. This is typically computed by the forward algorithm and expresses how well the test sequence matches the modeled pattern.! The most probable sequence of states that generate the test sequence. This is typically computed by the Viterbi algorithm. In general, HMMs are rather complicated to use, and simply training a HMM without proper configuration and data preparation will probably not yield useful results. Inexperienced users are recommended to search the literature for the best way to use HMMs for the particular application. One should also consider alternative methods for handling sequential data, such as:! Sliding windows, which essentially means to collect several data vectors from the sequence, e.g. 20 successive elements, into one large data vector and use this with a standard static classification algorithm.! Feature extraction over sequences, where a tailor-made algorithm analyzes the raw data over a sequence or part of a sequence and extracts features that are then fed into a static classification algorithm. Tools: Matlab (statistics toolbox), Hidden Markov Model toolbox for Matlab, jahmm. This publication is funded by the performance contract Massive Data processing, visualization and interpretation.

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples. Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Neetha Das Prof. Andy Khong

Neetha Das Prof. Andy Khong Neetha Das Prof. Andy Khong Contents Introduction and aim Current system at IMI Proposed new classification model Support Vector Machines Initial audio data collection and processing Features and their

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

Chapter 3. Speech segmentation. 3.1 Preprocessing

Chapter 3. Speech segmentation. 3.1 Preprocessing , as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents

More information

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 158. Web Mining and Recommender Systems. Midterm recap CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158

More information

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

3 Feature Selection & Feature Extraction

3 Feature Selection & Feature Extraction 3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy

More information

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation by Joe Madden In conjunction with ECE 39 Introduction to Artificial Neural Networks and Fuzzy Systems

More information

CHAPTER 3. Preprocessing and Feature Extraction. Techniques

CHAPTER 3. Preprocessing and Feature Extraction. Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques 3.1 Need for Preprocessing and Feature Extraction schemes for Pattern Recognition and

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

ECE 285 Class Project Report

ECE 285 Class Project Report ECE 285 Class Project Report Based on Source localization in an ocean waveguide using supervised machine learning Yiwen Gong ( yig122@eng.ucsd.edu), Yu Chai( yuc385@eng.ucsd.edu ), Yifeng Bu( ybu@eng.ucsd.edu

More information

2. Basic Task of Pattern Classification

2. Basic Task of Pattern Classification 2. Basic Task of Pattern Classification Definition of the Task Informal Definition: Telling things apart 3 Definition: http://www.webopedia.com/term/p/pattern_recognition.html pattern recognition Last

More information

Data mining with Support Vector Machine

Data mining with Support Vector Machine Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014 MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data CS 9: Machine Learning Final Report Identifying Driving Behavior from Data Robert F. Karol Project Suggester: Danny Goodman from MetroMile December 3th 3 Problem Description For my project, I am looking

More information

WHAT TYPE OF NEURAL NETWORK IS IDEAL FOR PREDICTIONS OF SOLAR FLARES?

WHAT TYPE OF NEURAL NETWORK IS IDEAL FOR PREDICTIONS OF SOLAR FLARES? WHAT TYPE OF NEURAL NETWORK IS IDEAL FOR PREDICTIONS OF SOLAR FLARES? Initially considered for this model was a feed forward neural network. Essentially, this means connections between units do not form

More information

An Introduction to Machine Learning

An Introduction to Machine Learning TRIPODS Summer Boorcamp: Topology and Machine Learning August 6, 2018 General Set-up Introduction Set-up and Goal Suppose we have X 1,X 2,...,X n data samples. Can we predict properites about any given

More information

Separation of speech mixture using time-frequency masking implemented on a DSP

Separation of speech mixture using time-frequency masking implemented on a DSP Separation of speech mixture using time-frequency masking implemented on a DSP Javier Gaztelumendi and Yoganathan Sivakumar March 13, 2017 1 Introduction This report describes the implementation of a blind

More information

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving

More information

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,

More information

Spatial Enhancement Definition

Spatial Enhancement Definition Spatial Enhancement Nickolas Faust The Electro- Optics, Environment, and Materials Laboratory Georgia Tech Research Institute Georgia Institute of Technology Definition Spectral enhancement relies on changing

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

DI TRANSFORM. The regressive analyses. identify relationships

DI TRANSFORM. The regressive analyses. identify relationships July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical,

More information

Lab 9. Julia Janicki. Introduction

Lab 9. Julia Janicki. Introduction Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Detection, Classification, & Identification of Objects in Cluttered Images

Detection, Classification, & Identification of Objects in Cluttered Images Detection, Classification, & Identification of Objects in Cluttered Images Steve Elgar Washington State University Electrical Engineering 2752 Pullman, Washington 99164-2752 elgar@eecs.wsu.edu Voice: (509)

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

AM 221: Advanced Optimization Spring 2016

AM 221: Advanced Optimization Spring 2016 AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 2 Wednesday, January 27th 1 Overview In our previous lecture we discussed several applications of optimization, introduced basic terminology,

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

A Self-Organizing Binary System*

A Self-Organizing Binary System* 212 1959 PROCEEDINGS OF THE EASTERN JOINT COMPUTER CONFERENCE A Self-Organizing Binary System* RICHARD L. MATTSONt INTRODUCTION ANY STIMULUS to a system such as described in this paper can be coded into

More information

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri 1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?

More information

The Automatic Musicologist

The Automatic Musicologist The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical

More information

Application of Support Vector Machine Algorithm in Spam Filtering

Application of Support Vector Machine Algorithm in  Spam Filtering Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification

More information

Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008.

Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008. Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008. Introduction There is much that is unknown regarding

More information

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

1 Introduction. 3 Data Preprocessing. 2 Literature Review

1 Introduction. 3 Data Preprocessing. 2 Literature Review Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

Classification by Support Vector Machines

Classification by Support Vector Machines Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

Programming Exercise 7: K-means Clustering and Principal Component Analysis

Programming Exercise 7: K-means Clustering and Principal Component Analysis Programming Exercise 7: K-means Clustering and Principal Component Analysis Machine Learning May 13, 2012 Introduction In this exercise, you will implement the K-means clustering algorithm and apply it

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

Final Report: Kaggle Soil Property Prediction Challenge

Final Report: Kaggle Soil Property Prediction Challenge Final Report: Kaggle Soil Property Prediction Challenge Saurabh Verma (verma076@umn.edu, (612)598-1893) 1 Project Goal Low cost and rapid analysis of soil samples using infrared spectroscopy provide new

More information

Spectral Classification

Spectral Classification Spectral Classification Spectral Classification Supervised versus Unsupervised Classification n Unsupervised Classes are determined by the computer. Also referred to as clustering n Supervised Classes

More information

Deep Learning. Volker Tresp Summer 2014

Deep Learning. Volker Tresp Summer 2014 Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Data Mining: Models and Methods

Data Mining: Models and Methods Data Mining: Models and Methods Author, Kirill Goltsman A White Paper July 2017 --------------------------------------------------- www.datascience.foundation Copyright 2016-2017 What is Data Mining? Data

More information

2. LITERATURE REVIEW

2. LITERATURE REVIEW 2. LITERATURE REVIEW CBIR has come long way before 1990 and very little papers have been published at that time, however the number of papers published since 1997 is increasing. There are many CBIR algorithms

More information

Detecting Harmful Hand Behaviors with Machine Learning from Wearable Motion Sensor Data

Detecting Harmful Hand Behaviors with Machine Learning from Wearable Motion Sensor Data Detecting Harmful Hand Behaviors with Machine Learning from Wearable Motion Sensor Data Lingfeng Zhang and Philip K. Chan Florida Institute of Technology, Melbourne, FL 32901 lingfeng2013@my.fit.edu, pkc@cs.fit.edu

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA More Learning Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA 1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector

More information

Sparse and large-scale learning with heterogeneous data

Sparse and large-scale learning with heterogeneous data Sparse and large-scale learning with heterogeneous data February 15, 2007 Gert Lanckriet (gert@ece.ucsd.edu) IEEE-SDCIS In this talk Statistical machine learning Techniques: roots in classical statistics

More information

Online Pose Classification and Walking Speed Estimation using Handheld Devices

Online Pose Classification and Walking Speed Estimation using Handheld Devices Online Pose Classification and Walking Speed Estimation using Handheld Devices Jun-geun Park MIT CSAIL Joint work with: Ami Patel (MIT EECS), Jonathan Ledlie (Nokia Research), Dorothy Curtis (MIT CSAIL),

More information

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

PARAMETER OPTIMIZATION FOR AUTOMATED SIGNAL ANALYSIS FOR CONDITION MONITORING OF AIRCRAFT SYSTEMS. Mike Gerdes 1, Dieter Scholz 1

PARAMETER OPTIMIZATION FOR AUTOMATED SIGNAL ANALYSIS FOR CONDITION MONITORING OF AIRCRAFT SYSTEMS. Mike Gerdes 1, Dieter Scholz 1 AST 2011 Workshop on Aviation System Technology PARAMETER OPTIMIZATION FOR AUTOMATED SIGNAL ANALYSIS FOR CONDITION MONITORING OF AIRCRAFT SYSTEMS Mike Gerdes 1, Dieter Scholz 1 1 Aero - Aircraft Design

More information

Programming-By-Example Gesture Recognition Kevin Gabayan, Steven Lansel December 15, 2006

Programming-By-Example Gesture Recognition Kevin Gabayan, Steven Lansel December 15, 2006 Programming-By-Example Gesture Recognition Kevin Gabayan, Steven Lansel December 15, 6 Abstract Machine learning and hardware improvements to a programming-by-example rapid prototyping system are proposed.

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Data Mining. Neural Networks

Data Mining. Neural Networks Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most

More information

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute

More information

Audio-Visual Speech Activity Detection

Audio-Visual Speech Activity Detection Institut für Technische Informatik und Kommunikationsnetze Semester Thesis at the Department of Information Technology and Electrical Engineering Audio-Visual Speech Activity Detection Salome Mannale Advisors:

More information

Repeating Segment Detection in Songs using Audio Fingerprint Matching

Repeating Segment Detection in Songs using Audio Fingerprint Matching Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday. CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted

More information

SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING. Data Mining Essentials SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

Distance Weighted Discrimination Method for Parkinson s for Automatic Classification of Rehabilitative Speech Treatment for Parkinson s Patients

Distance Weighted Discrimination Method for Parkinson s for Automatic Classification of Rehabilitative Speech Treatment for Parkinson s Patients Operations Research II Project Distance Weighted Discrimination Method for Parkinson s for Automatic Classification of Rehabilitative Speech Treatment for Parkinson s Patients Nicol Lo 1. Introduction

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems

More information

Vision Based Metal Spectral Analysis using

Vision Based Metal Spectral Analysis using 1/27 Vision Based Metal Spectral Analysis using Eranga Ukwatta Department of Electrical and Computer Engineering The University of Western Ontario May 25, 2009 2/27 Outline 1 Overview of Element Spectroscopy

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music COS 116 The Computational Universe Laboratory 4: Digital Sound and Music In this lab you will learn about digital representations of sound and music, especially focusing on the role played by frequency

More information

Support Vector Machines

Support Vector Machines Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization

More information

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning Overview T7 - SVM and s Christian Vögeli cvoegeli@inf.ethz.ch Supervised/ s Support Vector Machines Kernels Based on slides by P. Orbanz & J. Keuchel Task: Apply some machine learning method to data from

More information