Transductive Phoneme Classification Using Local Scaling And Confidence

Size: px
Start display at page:

Download "Transductive Phoneme Classification Using Local Scaling And Confidence"

Transcription

1 202 IEEE 27-th Convention of Electrical and Electronics Engineers in Israel Transductive Phoneme Classification Using Local Scaling And Confidence Matan Orbach Dept. of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel matanorb@tx.technion.ac.il Koby Crammer Dept. of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel koby@ee.technion.ac.il Abstract We apply a graph-based Transduction Algorithm with COnfidence named TACO to the task of phoneme classification. In recent work, TACO outperformed two state-of-theart transductive learning algorithms on several natural language processing tasks. However, although TACO is a general-purpose algorithm, it has not yet been used for tasks in other domains, nor applied to graphs with millions of vertices. We show its effectiveness, as well as its scalability, by performing transductive phoneme classification on data from the TIMIT speech corpus. In addition, we experiment with two methods for graph construction, including local scaling, previously used for unsupervised clustering. Our results show that local scaling combined with TACO outperforms other combinations of graph construction methods and graph-based transductive algorithms. I. INTRODUCTION A key challenge in automatic speech recognition systems is transcribing acoustic signals with phonemes. For this task, much research has been dedicated to the design of supervised classifiers, taking as input a set of acoustic signals annotated with corresponding phonemes [,2]. However, such data may not always be available in the amounts required for achieving satisfying classification performance. Furthermore, speech signals alone can be easily obtained in a variety of languages and accents. For that reason, phoneme classification has recently attracted attention from researchers in the field of semi-supervised learning SSL and specifically, graph-based methods. SSL algorithms take as input a set of labeled data and an additional, typically large, unlabeled dataset. In graphbased SSL, the learner assumes the existence of an undirected weighted graph consisting of both labeled and unlabeled examples. Each input example is associated with a vertex. An edge weight is a measure of similarity between corresponding connected vertices. In the transductive setting, the goal of the learner is to label the unlabeled examples in the graph. We focus on graph-based transduction for phoneme classification. Each example is a vector of features describing a frame of the acoustic signal in time. For measuring similarity, Euclidean distances can be used, then transformed into graph weights using a bandwidth parametrized Gaussian kernel [3]. A similar approach [4] is to first decorrelate input examples and then use a Gaussian kernel, without a bandwidth parameter, for forming graph weights. Such methods apply the same weights generation method throughout the entire input graph. However, the density of input examples is likely to vary in input space. Therefore, we propose to form edge weights using a locally scaled Gaussian kernel, previously proposed for graph construction in unsupervised clustering [5]. Recently we introduced TACO, a graph-based transductive algorithm, that was shown to perform well on several text classification tasks [6]. However, it has not yet been applied to tasks in other domains. In addition to propagating labels, unlike previous algorithms, TACO maintains additional confidence information, used for estimating the quality of each propagated label. This information is used to better propagate label information throughout the graph, discouraging the effect of poorly estimated labels. In addition, TACO has been shown to adapt well to unbalanced data. This is an important property since phonemes are inherently unbalanced. We use TACO as our transductive learning algorithm. Previous work on transductive phoneme classification was typically performed in one of two possible settings. Alexandrescu & Kirchhoff [3,7] first use a supervised classifier to output soft phonetic labeling of feature vectors. Then, the graph is constructed in a labels space, and a transductive graph-based algorithm is used as a second phase, smoothing the output labeling of the supervised classifier. In contrast, Subramanya & Bilmes [4] construct the graph directly from feature vectors representing the acoustic signals. Liu & Kirchhoff experiment in both settings [8], comparing phoneme classification performance using several graph-based learning algorithms and graph construction methods. We follow the latter setting, without performing a first phase using a supervised classifier. II. GRAPH-BASED TRANSDUCTION The input to a graph-based transductive learner is constituted of two sets. The first set D l = {x i, y i } n l i= contains examples x i, associated with a label from a given labels set y i L = {,..., m}. The second set D u = {x i } n l+n u i=n l + contains additional unlabeled examples. We assume each example is embedded within vector space, x i R d. The goal of the learner is to assign a label ŷ i to each of the unlabeled

2 examples in D u. We denote the total number of input examples by n = n l + n u. In phoneme classification, examples are feature vectors representing time frames of acoustic speech signals. The labeled input set D l contains feature vectors representing a set of labeled utterances A l = { as, } p s. Each utterance as is a sequence of feature vectors, a s =..., x i,..., labeled with the corresponding phonemes sequence p s =..., y i,...,. For simplicity, we assume the length of the acoustic signal a s is equal to the length of the corresponding phonemes sequence p s. In practice, a phoneme sequence may contain consecutive occurrences of the same phoneme to represent a single spoken phoneme spanning over more than one consecutive time frames. Similarly, the unlabeled input set D u contains feature vectors corresponding to a set of unlabeled utterances A u = { } as. The first step in graph-based transduction is the construction of an undirected weighted graph G = V, E, W from the input. Each input feature vector x i is associated with a vertex v i V. The set of edges is E = V V. Edge weights are described by a symmetric matrix with non-negative elements, denoted W R n n. An edge weight w i,j W represents the strength of our belief that predictions for vertices v i and v j should be similar. A large value for w i,j means these two predictions should be close. However, the opposite is not true. Specifically, a small edge weight or even zero does not mean predictions for v i and v j should be different. Rather, it states our lack of knowledge on the correct relationship between these two predictions. In practice, most edge weights are zero, and W is sparse. We discuss several ways for setting edge weights in Sec. III. Prior knowledge about examples in the labeled input set formulated by associating a prior labels vector y i {0, } m with each vertex. For every vertex v i associated with feature vector from the labeled input set D l, our input contains the correct phoneme p, so we set y i,p = and all other entries to zero. For vertices v j associated with feature vectors from the unlabeled input set D u, we set y j = 0, the vector with all elements equal to zero. For simplicity, we assume the first n l vertices in V are labeled vertices, associated with feature vectors from the labeled input set, and the last n u vertices in V are unlabeled vertices, associated with feature vectors from the unlabeled input set. We denote by δ l i = [i nl ] the indicator of a vertex to be labeled, that is δ l i = iff the vertex v i is a labeled vertex. III. GRAPH WEIGHTS The choice of weights for the graph edges is of key importance to the overall performance of graph-based transductive algorithms. Typically, for phoneme classification, a distance measure d x i, x j is used to calculate distances between pairs of feature vectors. Then, the distances are transformed to weights, representing similarity, using a Gaussian kernel w i,j = exp [d x i, x j ] 2 a 2 where a is a kernel bandwidth hyper-parameter. The quality of the generated weights is controlled by choice of both distance measure and bandwidth hyper-parameter. Several methods have been previously proposed for setting the value of the bandwidth hyper-parameter a. In one approach [9], a gradient descent based method is used to select a per dimension bandwidth parameter, such that the output labeling has low entropy, and thus forms a confident labeling. Another approach [0] is minimizing the leave-one-out prediction error on labeled data points, also using a gradient based algorithm. However, both gradient based methods add considerable computational cost. A more computationally efficient approach [3,7], utilizes a single bandwidth parameter. First, the average betweenclass distance d b and average within-class distance d w are computed: d b = N b N w y i y j d x i, x j ; dw = y i=y j d x i, x j where N b and N w are the respective counts of elements in each sum. Next, the bandwidth parameter is chosen such that two samples distanced at db + d w /2 have a similarity of 0.5: exp [ db + d 2 ] w /2 a 2 = 2 a = d b + d w 2 ln 2. 2 The intuition behind this method is that two samples placed at the most ambiguous distance should also have an ambiguous similarity value. We refer to this method as global scaling. Using a single bandwidth parameter, or even a set of bandwidth parameters, one per dimension, implies that the same notion of closeness is used throughout the entire graph. However, input data is likely to be denser in some areas than others, and also possibly denser for one or more specific labels. Therefore, we propose using local scaling [5]. For each vertex v i we maintain its own local bandwidth parameter a i, and set its value according to the local neighbourhood of v i. Using the local scaling parameters we set the graph weights as w i,j = exp [d x i, x j ] 2 a i a j. 3 Various methods can be used for selecting the local scaling parameters. We follow Zelnik-Manor & Perona [5] and simply set a i = d x i, x k i, 4 where x k i is the kth nearest neighbour of x i.

3 IV. TRANSDUCTION WITH CONFIDENCE Recently we introduced TACO [6], a graph-based transductive algorithm. We apply TACO for our task of phoneme classification. For completeness, we briefly describe TACO. TACO maintains both first order and second order information for every vertex in the input graph. The first order information are per vertex label scores µ i = [µ i,,..., µ i,m ] R m. The larger the rth element µ is, the stronger is the belief that the input x i associated with vertex v i belongs to class r. Prediction is given according to the common multiclass inference rule ŷ i = arg max r µ. Typically, graph-based transductive algorithms maintain only first order information [4,9,]. However, TACO maintains additional second order confidence information, a per vertex diagonal non-negative matrix Σ i R m m, where the rth diagonal element of Σ i is denoted by σ. Each parameter σ is associated with uncertainty in the corresponding score parameter µ. The lower the value of σ is, the higher is the confidence in the score value µ. TACO casts learning by minimizing the following unconstrained convex objective in parameters { µ i, Σ i } n i= : C = 4 n i,j= α n l i= w i,j [ µi µ j Σ i + Σ ] j µi µ j [µ i y i Σ i + ] γ I µ i y i n TrΣ i β i= 5 6 n log det Σ i, 7 i= where α, β and γ are hyper-parameters. The objective consists of three terms. The manifold term 5 promotes smoothness of the output labeling, requiring scores for close vertices large w i,j to be similar, unless uncertainty is high in either predicted scores. The second term 6 requires the scores for labeled vertices to be close to their corresponding prior labels vector, again unless the uncertainty in score parameters is high. The last term 7 regularizes the uncertainty parameters to be far from infinity and not close to zero. An efficient iterative algorithm for minimizing the above objective was derived by Orbach & Crammer [6]. Let µ t σ t and denote the score and uncertainty parameters maintained by the iterative algorithm at iteration t for vertex v i and label r. Iterations are based on two update equations. First, updating a score value for a specific vertex and label µ t using neighbouring score and uncertainty parameters from previous iteration, given by µ t = nj i w t,j µt j,r j= nj i j= w t,j + c t y + c t 8 Parameters: α > 0, β > 0,γ > 0 Input: Graph G = V, E, W and v i V prior labeling y i Initialize: t =, µ 0 i = 0 and Σ 0 i = I for all v i V Repeat For v i V : Compute µ t i from µ t j and Σ t Compute Σ t i from µ t j using 9 j using 8 t t + Until convergence Output: Score vectors µ t i and confidence matrices Σ t i. where Fig.. w t,j = w i,j The TACO algorithm for graph-based transduction. σ t + ; c t σ t = δ li j,r σ t = β 2α + 2α σ t + γ This update sets the score for label r and vertex v i to be a weighted average of neighbouring scores for label r from the previous iteration. The weights w t,j in 8 are based on static graph weights w i,j and dynamic uncertainty parameters. The second update step concerns updating the uncertainty value for a particular vertex and label σ t, using scores of neighbouring vertices from the previous iteration: β 2 + 2α g t 9 where g t = n j= 2 2 w i,j µ t j,r µt + δl i µ t y. Here, uncertainly for label r and vertex v i is monotonic in a quadratic measure of divergence between { previous } score µ t and previous neighbouring scores µ t j,r. The complete pseudocode for TACO is given in Fig.. V. EXPERIMENTS We evaluate the performance of TACO on the task of phoneme classification, along with two other state-of-theart graph-based transductive algorithms: Modified Adsorption MAD [] and Measure Propagation MP [4,2]. Data: The TIMIT corpus contains speech signals manually annotated with frame based phonetic transcriptions [3]. We use pre-processed data [4] partitioned to a training set of 3, 696 utterances, a development set of 0 utterances and a test set of 92 utterances. We use a standard mapping of the 6 phonemes in TIMIT to a subset of 39 classes [5]. The data contains feature vectors consisting of 3 Melfrequency coefficients along with first and second derivatives 39 values. Structural information is incorporated by adding to each feature vector its immediate three predecessors and successors, such that the final dimension of input examples is 39 7 = 273..

4 Graph construction: From the input partition we construct two graphs. First, a development graph, including examples from the training and development sets, for a total of 4, 096 and roughly.2 million vertices. Second, a test graph, with examples from the training and test sets, containing 3, 888 utterances and around. million vertices. For measuring distances in input space we use Euclidean distance d x i, x j 2 = x i x j 2 2. We prune each graph by keeping for each vertex its kth nearest neighbours k-nn, yielding a directed graph. Then, direction of edges is removed, resulting in an undirected graph in which vertex degree may be larger than k. We fix k = 0, as previously used by Subramanya & Bilmes [4]. We transform distances to similarities using two graph construction methods. For global scaling we calculate the global bandwidth parameter using and 2. The averages in are calculated by applying random sampling [3]. For local scaling, we select local bandwidth parameters according to 4 and form edge weights using 3. The same value of k used for nearest neighbours graph construction is also used for local bandwidth parameters selection, so there is no additional computational cost. To conclude, we have four input graphs: a containing along with training development or test data; b weights formed using global or local scaling. Setting: We select utterances for the labeled utterances set A l by randomly sampling utterances from the training set. The labeled input set D l contains feature vectors composing the sampled utterances. This is a more realistic scenario than simply randomly sampling feature vectors, without relating to their source. Utterances are sampled until a fraction f of the feature vectors in the training set is labeled, under the constraint that each phoneme class is selected at least once. We use f {0.0, 0.05, 0., 0.2, 0.3, 0.5}. On the sampled labeled information we perform class prior normalization, for both TACO and MAD [6]. The development graph is used for hyper-parameters tuning. We tune by performing a grid search over a predefined range of values for each algorithm. The range for each of the hyperparameters in the three algorithms is as follows. For TACO, α {e-4, e-2,, e2, e4}, β {e-4, e-2,, e2} and γ {, 00}. For MP, ν {e-8, e-6, e-4, 0.0, 0.} and µ {e-8, e-4, 0.0, 0.,, 0, 00,} and fixing α =. This is a superset of the range used before [4]. For MAD, µ = and µ 2, µ 3 {e-8, e-4, 0.0,, 0, 00, 000} following Talukdar & Crammer []. Performance is evaluated on vertices that belong to the development set, and the optimal hyper-parameters combination is selected. Final evaluation is performed on the test graph. We repeat the described labeled sampling procedure, and set the values for the hyper-parameters to be the optimal values selected on the development graph. Performance is evaluated on vertices belonging to the test set. Results: We use two metrics to evaluate performance [4], phone accuracy, computed using the Levenshtein distance, and frame accuracy, the percentage of frames classified correctly. For all results the reported evaluation metric is the same as Phone accuracy Phone accuracy a Test graph b Development graph Fig. 2. A comparison of phone accuracy for different amounts of supervision. Results on a 56,692 test set vertices from the test graph b 20,448 development set vertices from the development graph. the metric used for hyper-parameters tuning. A comparison of phone accuracy on the test graph, for all evaluated combinations of algorithms and graph construction methods, is given in Fig. 2a. Local scaling for graph construction and TACO as the transductive algorithm outperform all other combinations for all values of f. Results on the development graph in Fig. 2b are similar with slightly higher absolute values. In Fig. 3, we use frame accuracy as the evaluation metric, and results are similar. Comparing graph construction methods, both TACO and MP perform better on graphs constructed with local scaling. MAD performs better using local scaling when relatively small amounts of labeled training data are available. The performance gain attained by using local over global scaling

5 Frame accuracy Fig. 3. A comparison of frame accuracy for different amounts of supervision on test set vertices from the test graph. Local scaling performance gain TACO MP MAD Fig. 4. Change in phone accuracy comparing local and global scaling. Results are on test set vertices from the test graph. Positive values indicate an increase in phoneme accuracy gained by using local scaling. is further illustrated in Fig. 4. For all algorithms, the most significant performance boost is where only % of the training data is labeled. The largest gain is for MP, improving by roughly 4.5% of phone accuracy, for % of labeled data. As more data is labeled, the performance gap favouring local scaling is decreased. For TACO, the performance gain is decreased monotonically, from over 4% for % of labeled data until just above 0.5% for % of labeled data. A similar trend appears for MAD, gaining improvement with local scaling only until a fraction of 20% of the training set is labeled. From this point on, local scaling has a negative effect of performance, and global scaling is better. This implies local scaling is more beneficial when small amounts of labeled utterances are available. VI. CONCLUSION We have demonstrated the effectiveness and scalability of TACO, a recently introduced graph-based transductive algorithm, to the task of phoneme classification. TACO outperforms two other state-of-the-art algorithms, MAD and MP. In addition, we introduced local scaling as a graph construction method for transductive phoneme classification. Local scaling improves the input graph, improving the phoneme classification accuracy of TACO. In future work we plan to modify current transduction algorithms to better use the sequential nature of acoustic utterances. We believe the use of such structured information may contribute an additional performance boost to current results. We also plan to perform induction, allowing the labeling of previously unseen unlabeled utterances. REFERENCES [] K. Crammer and D. D. Lee, Online discriminative learning of phoneme recognition via collections of generalized linear models, ICASP, 202. [2] A. Gunawardana, M. Mahajan, A. Acero, and J. C. Platt, Hidden conditional random fields for phone classification, INTERSPEECH, [3] A. Alexandrescu and K. Kirchhoff, Graph-based learning for phonetic classification, ASRU, [4] A. Subramanya and J. Bilmes, Semi-supervised learning with measure propagation, JMLR, 20. [5] L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, in NIPS, [6] M. Orbach and K. Crammer, Graph-based transduction with confidence, in ECML, 202. [7] K. Kirchhoff and A. Alexandrescu, Phonetic classification using controlled random walks, Interspeech, vol. 2, no., pp. 2 5, 20. [8] Y. Liu and K. Kirchhoff, A comparison of graph construction and learning algorithms for graph-based phonetic classification, UWEE Technical Report, 202. [9] X. Zhu, Z. Ghahramani, and J. Lafferty, Semi-supervised learning using gaussian fields and harmonic functions, in ICML, [0] X. Zhang and W. S. Lee, Hyperparameter learning for graph based semi-supervised learning algorithms, in NIPS, [] P. P. Talukdar and K. Crammer, New regularized algorithms for transductive learning, in ECML, [2] A. Subramanya and J. Bilmes, Soft-supervised learning for text classification, in EMNLP, [3] L. F. Lamel, R. H. Kassel, and S. Seneff, Speech database development: design and analysis of the acoustic-phonetic corpus, Proceedings of the DARPA Speech Recognition Workshop, 986. [4] C. C. Cheng, F. Sha, and L. K. Saul, A fast online algorithm for large margin training of continuous density hidden markov models, INTERSPEECH, [5] L. Kai-fu and H. Hsiao-Wuen, Speaker-independent phone recognition using hidden markov models, IEEE Transactions on Acoustics, Speech and Signal Processing, 989.

Learning Better Data Representation using Inference-Driven Metric Learning

Learning Better Data Representation using Inference-Driven Metric Learning Learning Better Data Representation using Inference-Driven Metric Learning Paramveer S. Dhillon CIS Deptt., Univ. of Penn. Philadelphia, PA, U.S.A dhillon@cis.upenn.edu Partha Pratim Talukdar Search Labs,

More information

Inference Driven Metric Learning (IDML) for Graph Construction

Inference Driven Metric Learning (IDML) for Graph Construction University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 1-1-2010 Inference Driven Metric Learning (IDML) for Graph Construction Paramveer S. Dhillon

More information

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran

More information

Graph-based Semi- Supervised Learning as Optimization

Graph-based Semi- Supervised Learning as Optimization Graph-based Semi- Supervised Learning as Optimization Partha Pratim Talukdar CMU Machine Learning with Large Datasets (10-605) April 3, 2012 Graph-based Semi-Supervised Learning 0.2 0.1 0.2 0.3 0.3 0.2

More information

Efficient Iterative Semi-supervised Classification on Manifold

Efficient Iterative Semi-supervised Classification on Manifold . Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Alternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang

Alternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang Graph Transduction via Alternating Minimization Jun Wang, Tony Jebara, and Shih-fu Chang 1 Outline of the presentation Brief introduction and related work Problems with Graph Labeling Imbalanced labels

More information

Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data

Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Shih-Fu Chang Department of Electrical Engineering Department of Computer Science Columbia University Joint work with Jun Wang (IBM),

More information

SUBMODULAR FEATURE SELECTION FOR HIGH-DIMENSIONAL ACOUSTIC SCORE SPACES. Yuzong Liu, Kai Wei, Katrin Kirchhoff, Yisong Song, Jeff Bilmes

SUBMODULAR FEATURE SELECTION FOR HIGH-DIMENSIONAL ACOUSTIC SCORE SPACES. Yuzong Liu, Kai Wei, Katrin Kirchhoff, Yisong Song, Jeff Bilmes SUBMODULAR FEATURE SELECTION FOR HIGH-DIMENSIONAL ACOUSTIC SCORE SPACES Yuzong Liu, Kai Wei, Katrin Kirchhoff, Yisong Song, Jeff Bilmes Department of Electrical Engineering, University of Washington Seattle,

More information

(Graph-based) Semi-Supervised Learning. Partha Pratim Talukdar Indian Institute of Science

(Graph-based) Semi-Supervised Learning. Partha Pratim Talukdar Indian Institute of Science (Graph-based) Semi-Supervised Learning Partha Pratim Talukdar Indian Institute of Science ppt@serc.iisc.in April 7, 2015 Supervised Learning Labeled Data Learning Algorithm Model 2 Supervised Learning

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Using PageRank in Feature Selection

Using PageRank in Feature Selection Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important

More information

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Retrospective ICML99 Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Outline The paper in

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 Hidden Markov Models Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 1 Outline 1. 2. 3. 4. Brief review of HMMs Hidden Markov Support Vector Machines Large Margin Hidden Markov Models

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

Semi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization

Semi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization Semi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization Sunil Thulasidasan 1,2, Jeffrey Bilmes 2 1 Los Alamos National Laboratory 2 Department

More information

Kernel-based Transductive Learning with Nearest Neighbors

Kernel-based Transductive Learning with Nearest Neighbors Kernel-based Transductive Learning with Nearest Neighbors Liangcai Shu, Jinhui Wu, Lei Yu, and Weiyi Meng Dept. of Computer Science, SUNY at Binghamton Binghamton, New York 13902, U. S. A. {lshu,jwu6,lyu,meng}@cs.binghamton.edu

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION. Hung-An Chang and James R. Glass

HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION. Hung-An Chang and James R. Glass HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION Hung-An Chang and James R. Glass MIT Computer Science and Artificial Intelligence Laboratory Cambridge, Massachusetts, 02139,

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Using Document Summarization Techniques for Speech Data Subset Selection

Using Document Summarization Techniques for Speech Data Subset Selection Using Document Summarization Techniques for Speech Data Subset Selection Kai Wei, Yuzong Liu, Katrin Kirchhoff, Jeff Bilmes Department of Electrical Engineering University of Washington Seattle, WA 98195,

More information

Using PageRank in Feature Selection

Using PageRank in Feature Selection Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

Semi-Supervised Learning: Lecture Notes

Semi-Supervised Learning: Lecture Notes Semi-Supervised Learning: Lecture Notes William W. Cohen March 30, 2018 1 What is Semi-Supervised Learning? In supervised learning, a learner is given a dataset of m labeled examples {(x 1, y 1 ),...,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Semi-supervised Data Representation via Affinity Graph Learning

Semi-supervised Data Representation via Affinity Graph Learning 1 Semi-supervised Data Representation via Affinity Graph Learning Weiya Ren 1 1 College of Information System and Management, National University of Defense Technology, Changsha, Hunan, P.R China, 410073

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Celebrity Identification and Recognition in Videos. An application of semi-supervised learning and multiclass classification

Celebrity Identification and Recognition in Videos. An application of semi-supervised learning and multiclass classification Celebrity Identification and Recognition in Videos An application of semi-supervised learning and multiclass classification Contents Celebrity Identification and Recognition in Videos... 1 Aim... 3 Motivation...

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training

More information

Improving Image Segmentation Quality Via Graph Theory

Improving Image Segmentation Quality Via Graph Theory International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,

More information

Comparisons of Sequence Labeling Algorithms and Extensions

Comparisons of Sequence Labeling Algorithms and Extensions Nam Nguyen Yunsong Guo Department of Computer Science, Cornell University, Ithaca, NY 14853, USA NHNGUYEN@CS.CORNELL.EDU GUOYS@CS.CORNELL.EDU Abstract In this paper, we survey the current state-ofart models

More information

Unsupervised Outlier Detection and Semi-Supervised Learning

Unsupervised Outlier Detection and Semi-Supervised Learning Unsupervised Outlier Detection and Semi-Supervised Learning Adam Vinueza Department of Computer Science University of Colorado Boulder, Colorado 832 vinueza@colorado.edu Gregory Z. Grudic Department of

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Semi-supervised learning SSL (on graphs)

Semi-supervised learning SSL (on graphs) Semi-supervised learning SSL (on graphs) 1 Announcement No office hour for William after class today! 2 Semi-supervised learning Given: A pool of labeled examples L A (usually larger) pool of unlabeled

More information

Confidence in Structured-Prediction using Confidence-Weighted Models

Confidence in Structured-Prediction using Confidence-Weighted Models Confidence in Structured-Prediction using Confidence-Weighted Models Avihai Mejer Department of Computer Science Technion-Israel Institute of Technology Haifa 32, Israel amejer@tx.technion.ac.il Koby Crammer

More information

SemiBoost: Boosting for Semi-supervised Learning

SemiBoost: Boosting for Semi-supervised Learning To appear in the IEEE Transactions on Pattern Analysis and Machine Intelligence. SemiBoost: Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE,

More information

S. Yamuna M.Tech Student Dept of CSE, Jawaharlal Nehru Technological University, Anantapur

S. Yamuna M.Tech Student Dept of CSE, Jawaharlal Nehru Technological University, Anantapur Ensemble Prototype Vector Machines based on Semi Supervised Classifiers S. Yamuna M.Tech Student Dept of CSE, Jawaharlal Nehru Technological University, Anantapur Abstract: -- In large number of real world

More information

Instance-level Semi-supervised Multiple Instance Learning

Instance-level Semi-supervised Multiple Instance Learning Instance-level Semi-supervised Multiple Instance Learning Yangqing Jia and Changshui Zhang State Key Laboratory on Intelligent Technology and Systems Tsinghua National Laboratory for Information Science

More information

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

TRANSDUCTIVE LINK SPAM DETECTION

TRANSDUCTIVE LINK SPAM DETECTION TRANSDUCTIVE LINK SPAM DETECTION Denny Zhou Microsoft Research http://research.microsoft.com/~denzho Joint work with Chris Burges and Tao Tao Presenter: Krysta Svore Link spam detection problem Classification

More information

1 Training/Validation/Testing

1 Training/Validation/Testing CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.

More information

SEMI-SUPERVISED LEARNING (SSL) for classification

SEMI-SUPERVISED LEARNING (SSL) for classification IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 12, DECEMBER 2015 2411 Bilinear Embedding Label Propagation: Towards Scalable Prediction of Image Labels Yuchen Liang, Zhao Zhang, Member, IEEE, Weiming Jiang,

More information

Opinion Mining by Transformation-Based Domain Adaptation

Opinion Mining by Transformation-Based Domain Adaptation Opinion Mining by Transformation-Based Domain Adaptation Róbert Ormándi, István Hegedűs, and Richárd Farkas University of Szeged, Hungary {ormandi,ihegedus,rfarkas}@inf.u-szeged.hu Abstract. Here we propose

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution

On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution ICML2011 Jun. 28-Jul. 2, 2011 On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution Masashi Sugiyama, Makoto Yamada, Manabu Kimura, and Hirotaka Hachiya Department of

More information

Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification

Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification 2 1 Xugang Lu 1, Peng Shen 1, Yu Tsao 2, Hisashi

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

Application of Support Vector Machine Algorithm in Spam Filtering

Application of Support Vector Machine Algorithm in  Spam Filtering Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification

More information

GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION

GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION Nasehe Jamshidpour a, Saeid Homayouni b, Abdolreza Safari a a Dept. of Geomatics Engineering, College of Engineering,

More information

Graph Laplacian Kernels for Object Classification from a Single Example

Graph Laplacian Kernels for Object Classification from a Single Example Graph Laplacian Kernels for Object Classification from a Single Example Hong Chang & Dit-Yan Yeung Department of Computer Science, Hong Kong University of Science and Technology {hongch,dyyeung}@cs.ust.hk

More information

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:

More information

Conditional Random Fields : Theory and Application

Conditional Random Fields : Theory and Application Conditional Random Fields : Theory and Application Matt Seigel (mss46@cam.ac.uk) 3 June 2010 Cambridge University Engineering Department Outline The Sequence Classification Problem Linear Chain CRFs CRF

More information

A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

More information

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Bagging and Boosting Algorithms for Support Vector Machine Classifiers Bagging and Boosting Algorithms for Support Vector Machine Classifiers Noritaka SHIGEI and Hiromi MIYAJIMA Dept. of Electrical and Electronics Engineering, Kagoshima University 1-21-40, Korimoto, Kagoshima

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se

More information

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING Idan Ram, Michael Elad and Israel Cohen Department of Electrical Engineering Department of Computer Science Technion - Israel Institute of Technology

More information

Generating the Reduced Set by Systematic Sampling

Generating the Reduced Set by Systematic Sampling Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan

More information

CS294-1 Assignment 2 Report

CS294-1 Assignment 2 Report CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

UNSUPERVISED SUBMODULAR SUBSET SELECTION FOR SPEECH DATA :EXTENDED VERSION. Kai Wei Yuzong Liu Katrin Kirchhoff Jeff Bilmes

UNSUPERVISED SUBMODULAR SUBSET SELECTION FOR SPEECH DATA :EXTENDED VERSION. Kai Wei Yuzong Liu Katrin Kirchhoff Jeff Bilmes UNSUPERVISED SUBMODULAR SUBSET SELECTION FOR SPEECH DATA :EXTENDED VERSION Kai Wei Yuzong Liu Katrin Kirchhoff Jeff Bilmes Department of Electrical Engineering, University of Washington Seattle, WA 98195,

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

Day 3 Lecture 1. Unsupervised Learning

Day 3 Lecture 1. Unsupervised Learning Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

Mining di Dati Web. Lezione 3 - Clustering and Classification

Mining di Dati Web. Lezione 3 - Clustering and Classification Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

A Topography-Preserving Latent Variable Model with Learning Metrics

A Topography-Preserving Latent Variable Model with Learning Metrics A Topography-Preserving Latent Variable Model with Learning Metrics Samuel Kaski and Janne Sinkkonen Helsinki University of Technology Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT, Finland

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Object Segmentation and Tracking in 3D Video With Sparse Depth Information Using a Fully Connected CRF Model

Object Segmentation and Tracking in 3D Video With Sparse Depth Information Using a Fully Connected CRF Model Object Segmentation and Tracking in 3D Video With Sparse Depth Information Using a Fully Connected CRF Model Ido Ofir Computer Science Department Stanford University December 17, 2011 Abstract This project

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

TRADITIONAL machine learning techniques use only

TRADITIONAL machine learning techniques use only IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL., NO., NOVEMBER 779 Semisupervised Classification With Cluster Regularization Rodrigo G. F. Soares, Huanhuan Chen, Member, IEEE, andxinyao,fellow,

More information

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to

More information

SVMs and Data Dependent Distance Metric

SVMs and Data Dependent Distance Metric SVMs and Data Dependent Distance Metric N. Zaidi, D. Squire Clayton School of Information Technology, Monash University, Clayton, VIC 38, Australia Email: {nayyar.zaidi,david.squire}@monash.edu Abstract

More information

1 Introduction. 3 Data Preprocessing. 2 Literature Review

1 Introduction. 3 Data Preprocessing. 2 Literature Review Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues

More information

Structured Learning. Jun Zhu

Structured Learning. Jun Zhu Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum

More information

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs Felix Wang fywang2 John Wieting wieting2 Introduction We implement a texture classification algorithm using 2-D Noncausal Hidden

More information

A Unified Framework to Integrate Supervision and Metric Learning into Clustering

A Unified Framework to Integrate Supervision and Metric Learning into Clustering A Unified Framework to Integrate Supervision and Metric Learning into Clustering Xin Li and Dan Roth Department of Computer Science University of Illinois, Urbana, IL 61801 (xli1,danr)@uiuc.edu December

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation Supplementary Material: The Emergence of Organizing Structure in Conceptual Representation Brenden M. Lake, 1,2 Neil D. Lawrence, 3 Joshua B. Tenenbaum, 4,5 1 Center for Data Science, New York University

More information

Allstate Insurance Claims Severity: A Machine Learning Approach

Allstate Insurance Claims Severity: A Machine Learning Approach Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

Metric Learning for Large-Scale Image Classification:

Metric Learning for Large-Scale Image Classification: Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka

More information

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Lecture 21 : A Hybrid: Deep Learning and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation

More information

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either

More information

Semi-supervised Learning

Semi-supervised Learning Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

Large Scale Manifold Transduction

Large Scale Manifold Transduction Large Scale Manifold Transduction Michael Karlen, Jason Weston, Ayse Erkan & Ronan Collobert NEC Labs America, Princeton, USA Ećole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland New York University,

More information

Semantic Segmentation. Zhongang Qi

Semantic Segmentation. Zhongang Qi Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in

More information

Markov Random Fields and Segmentation with Graph Cuts

Markov Random Fields and Segmentation with Graph Cuts Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 27 (Thursday) HW 4 is out

More information