Transductive Phoneme Classification Using Local Scaling And Confidence
|
|
- Alisha Pierce
- 6 years ago
- Views:
Transcription
1 202 IEEE 27-th Convention of Electrical and Electronics Engineers in Israel Transductive Phoneme Classification Using Local Scaling And Confidence Matan Orbach Dept. of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel matanorb@tx.technion.ac.il Koby Crammer Dept. of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel koby@ee.technion.ac.il Abstract We apply a graph-based Transduction Algorithm with COnfidence named TACO to the task of phoneme classification. In recent work, TACO outperformed two state-of-theart transductive learning algorithms on several natural language processing tasks. However, although TACO is a general-purpose algorithm, it has not yet been used for tasks in other domains, nor applied to graphs with millions of vertices. We show its effectiveness, as well as its scalability, by performing transductive phoneme classification on data from the TIMIT speech corpus. In addition, we experiment with two methods for graph construction, including local scaling, previously used for unsupervised clustering. Our results show that local scaling combined with TACO outperforms other combinations of graph construction methods and graph-based transductive algorithms. I. INTRODUCTION A key challenge in automatic speech recognition systems is transcribing acoustic signals with phonemes. For this task, much research has been dedicated to the design of supervised classifiers, taking as input a set of acoustic signals annotated with corresponding phonemes [,2]. However, such data may not always be available in the amounts required for achieving satisfying classification performance. Furthermore, speech signals alone can be easily obtained in a variety of languages and accents. For that reason, phoneme classification has recently attracted attention from researchers in the field of semi-supervised learning SSL and specifically, graph-based methods. SSL algorithms take as input a set of labeled data and an additional, typically large, unlabeled dataset. In graphbased SSL, the learner assumes the existence of an undirected weighted graph consisting of both labeled and unlabeled examples. Each input example is associated with a vertex. An edge weight is a measure of similarity between corresponding connected vertices. In the transductive setting, the goal of the learner is to label the unlabeled examples in the graph. We focus on graph-based transduction for phoneme classification. Each example is a vector of features describing a frame of the acoustic signal in time. For measuring similarity, Euclidean distances can be used, then transformed into graph weights using a bandwidth parametrized Gaussian kernel [3]. A similar approach [4] is to first decorrelate input examples and then use a Gaussian kernel, without a bandwidth parameter, for forming graph weights. Such methods apply the same weights generation method throughout the entire input graph. However, the density of input examples is likely to vary in input space. Therefore, we propose to form edge weights using a locally scaled Gaussian kernel, previously proposed for graph construction in unsupervised clustering [5]. Recently we introduced TACO, a graph-based transductive algorithm, that was shown to perform well on several text classification tasks [6]. However, it has not yet been applied to tasks in other domains. In addition to propagating labels, unlike previous algorithms, TACO maintains additional confidence information, used for estimating the quality of each propagated label. This information is used to better propagate label information throughout the graph, discouraging the effect of poorly estimated labels. In addition, TACO has been shown to adapt well to unbalanced data. This is an important property since phonemes are inherently unbalanced. We use TACO as our transductive learning algorithm. Previous work on transductive phoneme classification was typically performed in one of two possible settings. Alexandrescu & Kirchhoff [3,7] first use a supervised classifier to output soft phonetic labeling of feature vectors. Then, the graph is constructed in a labels space, and a transductive graph-based algorithm is used as a second phase, smoothing the output labeling of the supervised classifier. In contrast, Subramanya & Bilmes [4] construct the graph directly from feature vectors representing the acoustic signals. Liu & Kirchhoff experiment in both settings [8], comparing phoneme classification performance using several graph-based learning algorithms and graph construction methods. We follow the latter setting, without performing a first phase using a supervised classifier. II. GRAPH-BASED TRANSDUCTION The input to a graph-based transductive learner is constituted of two sets. The first set D l = {x i, y i } n l i= contains examples x i, associated with a label from a given labels set y i L = {,..., m}. The second set D u = {x i } n l+n u i=n l + contains additional unlabeled examples. We assume each example is embedded within vector space, x i R d. The goal of the learner is to assign a label ŷ i to each of the unlabeled
2 examples in D u. We denote the total number of input examples by n = n l + n u. In phoneme classification, examples are feature vectors representing time frames of acoustic speech signals. The labeled input set D l contains feature vectors representing a set of labeled utterances A l = { as, } p s. Each utterance as is a sequence of feature vectors, a s =..., x i,..., labeled with the corresponding phonemes sequence p s =..., y i,...,. For simplicity, we assume the length of the acoustic signal a s is equal to the length of the corresponding phonemes sequence p s. In practice, a phoneme sequence may contain consecutive occurrences of the same phoneme to represent a single spoken phoneme spanning over more than one consecutive time frames. Similarly, the unlabeled input set D u contains feature vectors corresponding to a set of unlabeled utterances A u = { } as. The first step in graph-based transduction is the construction of an undirected weighted graph G = V, E, W from the input. Each input feature vector x i is associated with a vertex v i V. The set of edges is E = V V. Edge weights are described by a symmetric matrix with non-negative elements, denoted W R n n. An edge weight w i,j W represents the strength of our belief that predictions for vertices v i and v j should be similar. A large value for w i,j means these two predictions should be close. However, the opposite is not true. Specifically, a small edge weight or even zero does not mean predictions for v i and v j should be different. Rather, it states our lack of knowledge on the correct relationship between these two predictions. In practice, most edge weights are zero, and W is sparse. We discuss several ways for setting edge weights in Sec. III. Prior knowledge about examples in the labeled input set formulated by associating a prior labels vector y i {0, } m with each vertex. For every vertex v i associated with feature vector from the labeled input set D l, our input contains the correct phoneme p, so we set y i,p = and all other entries to zero. For vertices v j associated with feature vectors from the unlabeled input set D u, we set y j = 0, the vector with all elements equal to zero. For simplicity, we assume the first n l vertices in V are labeled vertices, associated with feature vectors from the labeled input set, and the last n u vertices in V are unlabeled vertices, associated with feature vectors from the unlabeled input set. We denote by δ l i = [i nl ] the indicator of a vertex to be labeled, that is δ l i = iff the vertex v i is a labeled vertex. III. GRAPH WEIGHTS The choice of weights for the graph edges is of key importance to the overall performance of graph-based transductive algorithms. Typically, for phoneme classification, a distance measure d x i, x j is used to calculate distances between pairs of feature vectors. Then, the distances are transformed to weights, representing similarity, using a Gaussian kernel w i,j = exp [d x i, x j ] 2 a 2 where a is a kernel bandwidth hyper-parameter. The quality of the generated weights is controlled by choice of both distance measure and bandwidth hyper-parameter. Several methods have been previously proposed for setting the value of the bandwidth hyper-parameter a. In one approach [9], a gradient descent based method is used to select a per dimension bandwidth parameter, such that the output labeling has low entropy, and thus forms a confident labeling. Another approach [0] is minimizing the leave-one-out prediction error on labeled data points, also using a gradient based algorithm. However, both gradient based methods add considerable computational cost. A more computationally efficient approach [3,7], utilizes a single bandwidth parameter. First, the average betweenclass distance d b and average within-class distance d w are computed: d b = N b N w y i y j d x i, x j ; dw = y i=y j d x i, x j where N b and N w are the respective counts of elements in each sum. Next, the bandwidth parameter is chosen such that two samples distanced at db + d w /2 have a similarity of 0.5: exp [ db + d 2 ] w /2 a 2 = 2 a = d b + d w 2 ln 2. 2 The intuition behind this method is that two samples placed at the most ambiguous distance should also have an ambiguous similarity value. We refer to this method as global scaling. Using a single bandwidth parameter, or even a set of bandwidth parameters, one per dimension, implies that the same notion of closeness is used throughout the entire graph. However, input data is likely to be denser in some areas than others, and also possibly denser for one or more specific labels. Therefore, we propose using local scaling [5]. For each vertex v i we maintain its own local bandwidth parameter a i, and set its value according to the local neighbourhood of v i. Using the local scaling parameters we set the graph weights as w i,j = exp [d x i, x j ] 2 a i a j. 3 Various methods can be used for selecting the local scaling parameters. We follow Zelnik-Manor & Perona [5] and simply set a i = d x i, x k i, 4 where x k i is the kth nearest neighbour of x i.
3 IV. TRANSDUCTION WITH CONFIDENCE Recently we introduced TACO [6], a graph-based transductive algorithm. We apply TACO for our task of phoneme classification. For completeness, we briefly describe TACO. TACO maintains both first order and second order information for every vertex in the input graph. The first order information are per vertex label scores µ i = [µ i,,..., µ i,m ] R m. The larger the rth element µ is, the stronger is the belief that the input x i associated with vertex v i belongs to class r. Prediction is given according to the common multiclass inference rule ŷ i = arg max r µ. Typically, graph-based transductive algorithms maintain only first order information [4,9,]. However, TACO maintains additional second order confidence information, a per vertex diagonal non-negative matrix Σ i R m m, where the rth diagonal element of Σ i is denoted by σ. Each parameter σ is associated with uncertainty in the corresponding score parameter µ. The lower the value of σ is, the higher is the confidence in the score value µ. TACO casts learning by minimizing the following unconstrained convex objective in parameters { µ i, Σ i } n i= : C = 4 n i,j= α n l i= w i,j [ µi µ j Σ i + Σ ] j µi µ j [µ i y i Σ i + ] γ I µ i y i n TrΣ i β i= 5 6 n log det Σ i, 7 i= where α, β and γ are hyper-parameters. The objective consists of three terms. The manifold term 5 promotes smoothness of the output labeling, requiring scores for close vertices large w i,j to be similar, unless uncertainty is high in either predicted scores. The second term 6 requires the scores for labeled vertices to be close to their corresponding prior labels vector, again unless the uncertainty in score parameters is high. The last term 7 regularizes the uncertainty parameters to be far from infinity and not close to zero. An efficient iterative algorithm for minimizing the above objective was derived by Orbach & Crammer [6]. Let µ t σ t and denote the score and uncertainty parameters maintained by the iterative algorithm at iteration t for vertex v i and label r. Iterations are based on two update equations. First, updating a score value for a specific vertex and label µ t using neighbouring score and uncertainty parameters from previous iteration, given by µ t = nj i w t,j µt j,r j= nj i j= w t,j + c t y + c t 8 Parameters: α > 0, β > 0,γ > 0 Input: Graph G = V, E, W and v i V prior labeling y i Initialize: t =, µ 0 i = 0 and Σ 0 i = I for all v i V Repeat For v i V : Compute µ t i from µ t j and Σ t Compute Σ t i from µ t j using 9 j using 8 t t + Until convergence Output: Score vectors µ t i and confidence matrices Σ t i. where Fig.. w t,j = w i,j The TACO algorithm for graph-based transduction. σ t + ; c t σ t = δ li j,r σ t = β 2α + 2α σ t + γ This update sets the score for label r and vertex v i to be a weighted average of neighbouring scores for label r from the previous iteration. The weights w t,j in 8 are based on static graph weights w i,j and dynamic uncertainty parameters. The second update step concerns updating the uncertainty value for a particular vertex and label σ t, using scores of neighbouring vertices from the previous iteration: β 2 + 2α g t 9 where g t = n j= 2 2 w i,j µ t j,r µt + δl i µ t y. Here, uncertainly for label r and vertex v i is monotonic in a quadratic measure of divergence between { previous } score µ t and previous neighbouring scores µ t j,r. The complete pseudocode for TACO is given in Fig.. V. EXPERIMENTS We evaluate the performance of TACO on the task of phoneme classification, along with two other state-of-theart graph-based transductive algorithms: Modified Adsorption MAD [] and Measure Propagation MP [4,2]. Data: The TIMIT corpus contains speech signals manually annotated with frame based phonetic transcriptions [3]. We use pre-processed data [4] partitioned to a training set of 3, 696 utterances, a development set of 0 utterances and a test set of 92 utterances. We use a standard mapping of the 6 phonemes in TIMIT to a subset of 39 classes [5]. The data contains feature vectors consisting of 3 Melfrequency coefficients along with first and second derivatives 39 values. Structural information is incorporated by adding to each feature vector its immediate three predecessors and successors, such that the final dimension of input examples is 39 7 = 273..
4 Graph construction: From the input partition we construct two graphs. First, a development graph, including examples from the training and development sets, for a total of 4, 096 and roughly.2 million vertices. Second, a test graph, with examples from the training and test sets, containing 3, 888 utterances and around. million vertices. For measuring distances in input space we use Euclidean distance d x i, x j 2 = x i x j 2 2. We prune each graph by keeping for each vertex its kth nearest neighbours k-nn, yielding a directed graph. Then, direction of edges is removed, resulting in an undirected graph in which vertex degree may be larger than k. We fix k = 0, as previously used by Subramanya & Bilmes [4]. We transform distances to similarities using two graph construction methods. For global scaling we calculate the global bandwidth parameter using and 2. The averages in are calculated by applying random sampling [3]. For local scaling, we select local bandwidth parameters according to 4 and form edge weights using 3. The same value of k used for nearest neighbours graph construction is also used for local bandwidth parameters selection, so there is no additional computational cost. To conclude, we have four input graphs: a containing along with training development or test data; b weights formed using global or local scaling. Setting: We select utterances for the labeled utterances set A l by randomly sampling utterances from the training set. The labeled input set D l contains feature vectors composing the sampled utterances. This is a more realistic scenario than simply randomly sampling feature vectors, without relating to their source. Utterances are sampled until a fraction f of the feature vectors in the training set is labeled, under the constraint that each phoneme class is selected at least once. We use f {0.0, 0.05, 0., 0.2, 0.3, 0.5}. On the sampled labeled information we perform class prior normalization, for both TACO and MAD [6]. The development graph is used for hyper-parameters tuning. We tune by performing a grid search over a predefined range of values for each algorithm. The range for each of the hyperparameters in the three algorithms is as follows. For TACO, α {e-4, e-2,, e2, e4}, β {e-4, e-2,, e2} and γ {, 00}. For MP, ν {e-8, e-6, e-4, 0.0, 0.} and µ {e-8, e-4, 0.0, 0.,, 0, 00,} and fixing α =. This is a superset of the range used before [4]. For MAD, µ = and µ 2, µ 3 {e-8, e-4, 0.0,, 0, 00, 000} following Talukdar & Crammer []. Performance is evaluated on vertices that belong to the development set, and the optimal hyper-parameters combination is selected. Final evaluation is performed on the test graph. We repeat the described labeled sampling procedure, and set the values for the hyper-parameters to be the optimal values selected on the development graph. Performance is evaluated on vertices belonging to the test set. Results: We use two metrics to evaluate performance [4], phone accuracy, computed using the Levenshtein distance, and frame accuracy, the percentage of frames classified correctly. For all results the reported evaluation metric is the same as Phone accuracy Phone accuracy a Test graph b Development graph Fig. 2. A comparison of phone accuracy for different amounts of supervision. Results on a 56,692 test set vertices from the test graph b 20,448 development set vertices from the development graph. the metric used for hyper-parameters tuning. A comparison of phone accuracy on the test graph, for all evaluated combinations of algorithms and graph construction methods, is given in Fig. 2a. Local scaling for graph construction and TACO as the transductive algorithm outperform all other combinations for all values of f. Results on the development graph in Fig. 2b are similar with slightly higher absolute values. In Fig. 3, we use frame accuracy as the evaluation metric, and results are similar. Comparing graph construction methods, both TACO and MP perform better on graphs constructed with local scaling. MAD performs better using local scaling when relatively small amounts of labeled training data are available. The performance gain attained by using local over global scaling
5 Frame accuracy Fig. 3. A comparison of frame accuracy for different amounts of supervision on test set vertices from the test graph. Local scaling performance gain TACO MP MAD Fig. 4. Change in phone accuracy comparing local and global scaling. Results are on test set vertices from the test graph. Positive values indicate an increase in phoneme accuracy gained by using local scaling. is further illustrated in Fig. 4. For all algorithms, the most significant performance boost is where only % of the training data is labeled. The largest gain is for MP, improving by roughly 4.5% of phone accuracy, for % of labeled data. As more data is labeled, the performance gap favouring local scaling is decreased. For TACO, the performance gain is decreased monotonically, from over 4% for % of labeled data until just above 0.5% for % of labeled data. A similar trend appears for MAD, gaining improvement with local scaling only until a fraction of 20% of the training set is labeled. From this point on, local scaling has a negative effect of performance, and global scaling is better. This implies local scaling is more beneficial when small amounts of labeled utterances are available. VI. CONCLUSION We have demonstrated the effectiveness and scalability of TACO, a recently introduced graph-based transductive algorithm, to the task of phoneme classification. TACO outperforms two other state-of-the-art algorithms, MAD and MP. In addition, we introduced local scaling as a graph construction method for transductive phoneme classification. Local scaling improves the input graph, improving the phoneme classification accuracy of TACO. In future work we plan to modify current transduction algorithms to better use the sequential nature of acoustic utterances. We believe the use of such structured information may contribute an additional performance boost to current results. We also plan to perform induction, allowing the labeling of previously unseen unlabeled utterances. REFERENCES [] K. Crammer and D. D. Lee, Online discriminative learning of phoneme recognition via collections of generalized linear models, ICASP, 202. [2] A. Gunawardana, M. Mahajan, A. Acero, and J. C. Platt, Hidden conditional random fields for phone classification, INTERSPEECH, [3] A. Alexandrescu and K. Kirchhoff, Graph-based learning for phonetic classification, ASRU, [4] A. Subramanya and J. Bilmes, Semi-supervised learning with measure propagation, JMLR, 20. [5] L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, in NIPS, [6] M. Orbach and K. Crammer, Graph-based transduction with confidence, in ECML, 202. [7] K. Kirchhoff and A. Alexandrescu, Phonetic classification using controlled random walks, Interspeech, vol. 2, no., pp. 2 5, 20. [8] Y. Liu and K. Kirchhoff, A comparison of graph construction and learning algorithms for graph-based phonetic classification, UWEE Technical Report, 202. [9] X. Zhu, Z. Ghahramani, and J. Lafferty, Semi-supervised learning using gaussian fields and harmonic functions, in ICML, [0] X. Zhang and W. S. Lee, Hyperparameter learning for graph based semi-supervised learning algorithms, in NIPS, [] P. P. Talukdar and K. Crammer, New regularized algorithms for transductive learning, in ECML, [2] A. Subramanya and J. Bilmes, Soft-supervised learning for text classification, in EMNLP, [3] L. F. Lamel, R. H. Kassel, and S. Seneff, Speech database development: design and analysis of the acoustic-phonetic corpus, Proceedings of the DARPA Speech Recognition Workshop, 986. [4] C. C. Cheng, F. Sha, and L. K. Saul, A fast online algorithm for large margin training of continuous density hidden markov models, INTERSPEECH, [5] L. Kai-fu and H. Hsiao-Wuen, Speaker-independent phone recognition using hidden markov models, IEEE Transactions on Acoustics, Speech and Signal Processing, 989.
Learning Better Data Representation using Inference-Driven Metric Learning
Learning Better Data Representation using Inference-Driven Metric Learning Paramveer S. Dhillon CIS Deptt., Univ. of Penn. Philadelphia, PA, U.S.A dhillon@cis.upenn.edu Partha Pratim Talukdar Search Labs,
More informationInference Driven Metric Learning (IDML) for Graph Construction
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 1-1-2010 Inference Driven Metric Learning (IDML) for Graph Construction Paramveer S. Dhillon
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationGraph-based Semi- Supervised Learning as Optimization
Graph-based Semi- Supervised Learning as Optimization Partha Pratim Talukdar CMU Machine Learning with Large Datasets (10-605) April 3, 2012 Graph-based Semi-Supervised Learning 0.2 0.1 0.2 0.3 0.3 0.2
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationAlternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang
Graph Transduction via Alternating Minimization Jun Wang, Tony Jebara, and Shih-fu Chang 1 Outline of the presentation Brief introduction and related work Problems with Graph Labeling Imbalanced labels
More informationGraph-based Techniques for Searching Large-Scale Noisy Multimedia Data
Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Shih-Fu Chang Department of Electrical Engineering Department of Computer Science Columbia University Joint work with Jun Wang (IBM),
More informationSUBMODULAR FEATURE SELECTION FOR HIGH-DIMENSIONAL ACOUSTIC SCORE SPACES. Yuzong Liu, Kai Wei, Katrin Kirchhoff, Yisong Song, Jeff Bilmes
SUBMODULAR FEATURE SELECTION FOR HIGH-DIMENSIONAL ACOUSTIC SCORE SPACES Yuzong Liu, Kai Wei, Katrin Kirchhoff, Yisong Song, Jeff Bilmes Department of Electrical Engineering, University of Washington Seattle,
More information(Graph-based) Semi-Supervised Learning. Partha Pratim Talukdar Indian Institute of Science
(Graph-based) Semi-Supervised Learning Partha Pratim Talukdar Indian Institute of Science ppt@serc.iisc.in April 7, 2015 Supervised Learning Labeled Data Learning Algorithm Model 2 Supervised Learning
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important
More informationThorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA
Retrospective ICML99 Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Outline The paper in
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationHidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017
Hidden Markov Models Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 1 Outline 1. 2. 3. 4. Brief review of HMMs Hidden Markov Support Vector Machines Large Margin Hidden Markov Models
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationSemi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization
Semi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization Sunil Thulasidasan 1,2, Jeffrey Bilmes 2 1 Los Alamos National Laboratory 2 Department
More informationKernel-based Transductive Learning with Nearest Neighbors
Kernel-based Transductive Learning with Nearest Neighbors Liangcai Shu, Jinhui Wu, Lei Yu, and Weiyi Meng Dept. of Computer Science, SUNY at Binghamton Binghamton, New York 13902, U. S. A. {lshu,jwu6,lyu,meng}@cs.binghamton.edu
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationHIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION. Hung-An Chang and James R. Glass
HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION Hung-An Chang and James R. Glass MIT Computer Science and Artificial Intelligence Laboratory Cambridge, Massachusetts, 02139,
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationUsing Document Summarization Techniques for Speech Data Subset Selection
Using Document Summarization Techniques for Speech Data Subset Selection Kai Wei, Yuzong Liu, Katrin Kirchhoff, Jeff Bilmes Department of Electrical Engineering University of Washington Seattle, WA 98195,
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationSemi-supervised learning and active learning
Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationSemi-Supervised Learning: Lecture Notes
Semi-Supervised Learning: Lecture Notes William W. Cohen March 30, 2018 1 What is Semi-Supervised Learning? In supervised learning, a learner is given a dataset of m labeled examples {(x 1, y 1 ),...,
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationSemi-supervised Data Representation via Affinity Graph Learning
1 Semi-supervised Data Representation via Affinity Graph Learning Weiya Ren 1 1 College of Information System and Management, National University of Defense Technology, Changsha, Hunan, P.R China, 410073
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationCelebrity Identification and Recognition in Videos. An application of semi-supervised learning and multiclass classification
Celebrity Identification and Recognition in Videos An application of semi-supervised learning and multiclass classification Contents Celebrity Identification and Recognition in Videos... 1 Aim... 3 Motivation...
More informationEstimating Human Pose in Images. Navraj Singh December 11, 2009
Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training
More informationImproving Image Segmentation Quality Via Graph Theory
International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,
More informationComparisons of Sequence Labeling Algorithms and Extensions
Nam Nguyen Yunsong Guo Department of Computer Science, Cornell University, Ithaca, NY 14853, USA NHNGUYEN@CS.CORNELL.EDU GUOYS@CS.CORNELL.EDU Abstract In this paper, we survey the current state-ofart models
More informationUnsupervised Outlier Detection and Semi-Supervised Learning
Unsupervised Outlier Detection and Semi-Supervised Learning Adam Vinueza Department of Computer Science University of Colorado Boulder, Colorado 832 vinueza@colorado.edu Gregory Z. Grudic Department of
More informationLink Prediction for Social Network
Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationSemi-supervised learning SSL (on graphs)
Semi-supervised learning SSL (on graphs) 1 Announcement No office hour for William after class today! 2 Semi-supervised learning Given: A pool of labeled examples L A (usually larger) pool of unlabeled
More informationConfidence in Structured-Prediction using Confidence-Weighted Models
Confidence in Structured-Prediction using Confidence-Weighted Models Avihai Mejer Department of Computer Science Technion-Israel Institute of Technology Haifa 32, Israel amejer@tx.technion.ac.il Koby Crammer
More informationSemiBoost: Boosting for Semi-supervised Learning
To appear in the IEEE Transactions on Pattern Analysis and Machine Intelligence. SemiBoost: Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE,
More informationS. Yamuna M.Tech Student Dept of CSE, Jawaharlal Nehru Technological University, Anantapur
Ensemble Prototype Vector Machines based on Semi Supervised Classifiers S. Yamuna M.Tech Student Dept of CSE, Jawaharlal Nehru Technological University, Anantapur Abstract: -- In large number of real world
More informationInstance-level Semi-supervised Multiple Instance Learning
Instance-level Semi-supervised Multiple Instance Learning Yangqing Jia and Changshui Zhang State Key Laboratory on Intelligent Technology and Systems Tsinghua National Laboratory for Information Science
More informationComputer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models
Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationTRANSDUCTIVE LINK SPAM DETECTION
TRANSDUCTIVE LINK SPAM DETECTION Denny Zhou Microsoft Research http://research.microsoft.com/~denzho Joint work with Chris Burges and Tao Tao Presenter: Krysta Svore Link spam detection problem Classification
More information1 Training/Validation/Testing
CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.
More informationSEMI-SUPERVISED LEARNING (SSL) for classification
IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 12, DECEMBER 2015 2411 Bilinear Embedding Label Propagation: Towards Scalable Prediction of Image Labels Yuchen Liang, Zhao Zhang, Member, IEEE, Weiming Jiang,
More informationOpinion Mining by Transformation-Based Domain Adaptation
Opinion Mining by Transformation-Based Domain Adaptation Róbert Ormándi, István Hegedűs, and Richárd Farkas University of Szeged, Hungary {ormandi,ihegedus,rfarkas}@inf.u-szeged.hu Abstract. Here we propose
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationOn Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution
ICML2011 Jun. 28-Jul. 2, 2011 On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution Masashi Sugiyama, Makoto Yamada, Manabu Kimura, and Hirotaka Hachiya Department of
More informationPair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification 2 1 Xugang Lu 1, Peng Shen 1, Yu Tsao 2, Hisashi
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationApplication of Support Vector Machine Algorithm in Spam Filtering
Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification
More informationGRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION
GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION Nasehe Jamshidpour a, Saeid Homayouni b, Abdolreza Safari a a Dept. of Geomatics Engineering, College of Engineering,
More informationGraph Laplacian Kernels for Object Classification from a Single Example
Graph Laplacian Kernels for Object Classification from a Single Example Hong Chang & Dit-Yan Yeung Department of Computer Science, Hong Kong University of Science and Technology {hongch,dyyeung}@cs.ust.hk
More informationSpeech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:
More informationConditional Random Fields : Theory and Application
Conditional Random Fields : Theory and Application Matt Seigel (mss46@cam.ac.uk) 3 June 2010 Cambridge University Engineering Department Outline The Sequence Classification Problem Linear Chain CRFs CRF
More informationA Survey on Postive and Unlabelled Learning
A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled
More informationBagging and Boosting Algorithms for Support Vector Machine Classifiers
Bagging and Boosting Algorithms for Support Vector Machine Classifiers Noritaka SHIGEI and Hiromi MIYAJIMA Dept. of Electrical and Electronics Engineering, Kagoshima University 1-21-40, Korimoto, Kagoshima
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationAcoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing
Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se
More informationIMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING
IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING Idan Ram, Michael Elad and Israel Cohen Department of Electrical Engineering Department of Computer Science Technion - Israel Institute of Technology
More informationGenerating the Reduced Set by Systematic Sampling
Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan
More informationCS294-1 Assignment 2 Report
CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The
More informationBagging for One-Class Learning
Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one
More informationUNSUPERVISED SUBMODULAR SUBSET SELECTION FOR SPEECH DATA :EXTENDED VERSION. Kai Wei Yuzong Liu Katrin Kirchhoff Jeff Bilmes
UNSUPERVISED SUBMODULAR SUBSET SELECTION FOR SPEECH DATA :EXTENDED VERSION Kai Wei Yuzong Liu Katrin Kirchhoff Jeff Bilmes Department of Electrical Engineering, University of Washington Seattle, WA 98195,
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationDay 3 Lecture 1. Unsupervised Learning
Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationMining di Dati Web. Lezione 3 - Clustering and Classification
Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationA Topography-Preserving Latent Variable Model with Learning Metrics
A Topography-Preserving Latent Variable Model with Learning Metrics Samuel Kaski and Janne Sinkkonen Helsinki University of Technology Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT, Finland
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationObject Segmentation and Tracking in 3D Video With Sparse Depth Information Using a Fully Connected CRF Model
Object Segmentation and Tracking in 3D Video With Sparse Depth Information Using a Fully Connected CRF Model Ido Ofir Computer Science Department Stanford University December 17, 2011 Abstract This project
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationOnline Pattern Recognition in Multivariate Data Streams using Unsupervised Learning
Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning
More informationTRADITIONAL machine learning techniques use only
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL., NO., NOVEMBER 779 Semisupervised Classification With Cluster Regularization Rodrigo G. F. Soares, Huanhuan Chen, Member, IEEE, andxinyao,fellow,
More information.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..
.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to
More informationSVMs and Data Dependent Distance Metric
SVMs and Data Dependent Distance Metric N. Zaidi, D. Squire Clayton School of Information Technology, Monash University, Clayton, VIC 38, Australia Email: {nayyar.zaidi,david.squire}@monash.edu Abstract
More information1 Introduction. 3 Data Preprocessing. 2 Literature Review
Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationCS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs
CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs Felix Wang fywang2 John Wieting wieting2 Introduction We implement a texture classification algorithm using 2-D Noncausal Hidden
More informationA Unified Framework to Integrate Supervision and Metric Learning into Clustering
A Unified Framework to Integrate Supervision and Metric Learning into Clustering Xin Li and Dan Roth Department of Computer Science University of Illinois, Urbana, IL 61801 (xli1,danr)@uiuc.edu December
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationSupplementary Material: The Emergence of. Organizing Structure in Conceptual Representation
Supplementary Material: The Emergence of Organizing Structure in Conceptual Representation Brenden M. Lake, 1,2 Neil D. Lawrence, 3 Joshua B. Tenenbaum, 4,5 1 Center for Data Science, New York University
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationMetric Learning for Large-Scale Image Classification:
Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationSemi-supervised Learning
Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationLarge Scale Manifold Transduction
Large Scale Manifold Transduction Michael Karlen, Jason Weston, Ayse Erkan & Ronan Collobert NEC Labs America, Princeton, USA Ećole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland New York University,
More informationSemantic Segmentation. Zhongang Qi
Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in
More informationMarkov Random Fields and Segmentation with Graph Cuts
Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 27 (Thursday) HW 4 is out
More information