Link prediction in graph construction for supervised and semi-supervised learning
|
|
- Blaise Scott
- 6 years ago
- Views:
Transcription
1 Link prediction in graph construction for supervised and semi-supervised learning Lilian Berton, Jorge Valverde-Rebaza and Alneu de Andrade Lopes Laboratory of Computational Intelligence (LABIC) University of São Paulo (USP) Brazil July 2015
2 Outline 1 Introduction 2 Proposal 3 Experiments 4 Conclusion Jorge Valverde-Rebaza Link prediction in graph construction 2 / 20
3 Outline 1 Introduction 2 Proposal 3 Experiments 4 Conclusion Jorge Valverde-Rebaza Link prediction in graph construction 3 / 20
4 Motivation Networks or graphs are a powerful relational representation that has been employed in different tasks of machine learning. Link prediction is an important scientific issue regarding network analysis that has attracted increasing attention in recent years. Many social, biological and information systems can be naturally described as networks, while some data are flat data. To apply graph-based methods to flat data is necessary to convert the data into a network, furthermore converting flat data to relational data can help to improve classification accuracy. Despite many methods for graph construction have been proposed, it is still an open problem. Jorge Valverde-Rebaza Link prediction in graph construction 4 / 20
5 Objective and hypothesis Propose a new method for graph construction using the link prediction intuition. If a network is very sparse, for example when a minimum spanning tree is applied, it misses structural information for the inference algorithms. If a network is very dense, for example when knn considering k > 10 is applied, the excess edges become noise in the graph. Considering a basic graph structure is possible add predicted edges, generating a new (balanced) graph structure. It can improves the quality of graphs leading to better classification accuracy in supervised and semi-supervised domains (SSL). Jorge Valverde-Rebaza Link prediction in graph construction 5 / 20
6 Graph Construction Many data sets are available in tabular flat format. It is necessary to convert the data into a network to be able to apply a graph-based algorithm. We apply k-nearest neighbor (knn), Mutual knn, Minimum/Maximum spanning tree (Min/MaxST) to generate an initial graph. (a) 3NN (b) M3NN (c) MinST (d) MaxST Figure: Graph construction methods. Jorge Valverde-Rebaza Link prediction in graph construction 6 / 20
7 Link Prediction (LP) Link prediction (LP) addresses the problem of predicting the existence of missing relations or new ones. Common Neighbors (c): s c v i,v j = Γ(v i ) Γ(v j ) Weighted CN (w): s w v i,v j = v k Γ(v i ) Γ(v j ) w(v i, v k ) + w(v k, v j ) Katz (k): s k v i,v j = l=1 βl paths l v i,v j = βa vi,v j + β 2 (A 2 ) vi,v j +... Figure: Link prediction process. Jorge Valverde-Rebaza Link prediction in graph construction 7 / 20
8 Outline 1 Introduction 2 Proposal 3 Experiments 4 Conclusion Jorge Valverde-Rebaza Link prediction in graph construction 8 / 20
9 Proposal To predict new links is assigned a score s vi v j for each pair of disconnected vertices v i and v j. All non-observed links are ranked according to their scores, and the links connecting more similar nodes are supposed to be of higher existence likelihoods. A percentage of the top ranked links can be considered. (a) Dataset (b) MinST (c) MinST+LP (Katz- 30%) Figure: LP construction steps. Jorge Valverde-Rebaza Link prediction in graph construction 9 / 20
10 Outline 1 Introduction 2 Proposal 3 Experiments 4 Conclusion Jorge Valverde-Rebaza Link prediction in graph construction 10 / 20
11 Datasets Table: Data sets descriptions for SSL classification Data set # Instances # Attributes # Classes g241c g241n Digit USPS COIL Table: Data sets descriptions for supervised classification Data set # Instances # Attributes # Classes Wine Ecoli Customers Cancer Blood Gaussians Gaussians Jorge Valverde-Rebaza Link prediction in graph construction 11 / 20
12 SSL experimental setup PCA was applied reducing the dimensions to 50 since in high-dimensional data the distance to the nearest neighbor approaches the distance to the farthest neighbor which degenerates the quality of the graph. 10 and 100 labeled vertices were randomly selected. We apply MinST, MaxST, knn and MkNN with 1 k 20, and the LP graphs (our proposal) considering the same methods combined with a LP measure: MinST+LP, MaxST+LP, knn+lp and MkNN+LP with 1 k 5. The weighted graph W uses the binary weighting approach. The algorithm used for the label inference task was the Local and Global Consistency (LGC). The average accuracy of 30 runs was used as evaluation. Jorge Valverde-Rebaza Link prediction in graph construction 12 / 20
13 Supervised experimental setup For Cancer dataset the instances with missing values were also removed. We apply MinST, MaxST, knn and MkNN with 1 k 20, and the LP graphs (our proposal) considering the same methods combined with a LP measure: MinST+LP, MaxST+LP, knn+lp and MkNN+LP with 1 k 3. The weighted graph W uses the opposite of Euclidean Distance. The relational algorithms used for the classification were: nobayes, nolb-lr-binary, nolb-lr-count, nolb-lr-mode, prn. The accuracy of 10-fold cross validation was used as evaluation. Jorge Valverde-Rebaza Link prediction in graph construction 13 / 20
14 Results CD knn+lp MkNN knn MinST+LP MaxST MinST MaxST+LP MkNN+LP Figure: Nemenyi post-hoc test for semi-supervised classification. CD knn+lp knn MkNN MinST+LP MaxST MkNN+LP MinST MaxST+LP Figure: Nemenyi post-hoc test for supervised classification. Jorge Valverde-Rebaza Link prediction in graph construction 14 / 20
15 Parameter analysis Figure: Distribution of parameters k and top percentage of links used for the graph construction methods in the supervised classification. Jorge Valverde-Rebaza Link prediction in graph construction 15 / 20
16 Average degree Average degree knn MkNN MST knn+lp Mk NN+LP MST+LP k or % of links * 10 Figure: Average degree for knn, MkNN, MST and LP versions: knn+lp, MkNN+LP, MSt+LP applied to Gaussians3 data set. LP versions use k = 3 and the common neighbors measure. Jorge Valverde-Rebaza Link prediction in graph construction 16 / 20
17 Outline 1 Introduction 2 Proposal 3 Experiments 4 Conclusion Jorge Valverde-Rebaza Link prediction in graph construction 17 / 20
18 Conclusions Link prediction (LP) has been used in many fields of science, as online social networks where links can be recommended as promising friendships. Here LP was used for graph construction: from an initial graph structure edges are predict generating a new balanced graph. The proposed graphs were evaluated in supervised and semi-supervised classification providing improvements in accuracy. The graphs are sparse and represent well the neighborhood of a point. In future work, other baseline methods could be tested as well other measures for LP. Our approach also could be applied in other domains of machine learning using graph-based methods. Jorge Valverde-Rebaza Link prediction in graph construction 18 / 20
19 References Berton, L. and Lopes, A. (2014). Graph construction based on labeled instances for semi-supervised learning. In Proceedings of 22nd ICPR, pages Liben-Nowell, D. and Kleinberg, J. (2007). The link-prediction problem for social networks. JASIST, 58(7): Lü, L. and Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6): Macskassy, S. A. and Provost, F. J. (2007). Classification in networked data: A toolkit and a univariate case study. JMLR, 8: Rohban, M. H. and Rabiee, H. R. (2012). Supervised neighborhood graph construction for semi-supervised classification. Pattern Recognition, 45(4): Valverde-Rebaza, J. and Lopes, A. (2012). Link prediction in complex networks based on cluster information. In SBIA 12, pages Valverde-Rebaza, J. and Lopes, A. (2013). Exploiting behaviors of communities of Twitter users for link prediction. SNAM, 3(4): Jorge Valverde-Rebaza Link prediction in graph construction 19 / 20
20 Thank you Jorge Valverde-Rebaza
Link prediction in graph construction for supervised and semi-supervised learning
Universidade de São Paulo Biblioteca Digital da Produção ntelectual - BDP Departamento de Ciências de Computação - CMC/SCC Comunicações em Eventos - CMC/SCC 2015-07 Link prediction in graph construction
More informationNetwork construction and applications for semi-supervised learning
Universidade de São Paulo Biblioteca Digital da Produção Intelectual - BDPI Departamento de Ciências de Computação - ICMC/SCC Comunicações em Eventos - ICMC/SCC 2016-10 Network construction and applications
More informationA Naïve Bayes model based on overlapping groups for link prediction in online social networks
Universidade de São Paulo Biblioteca Digital da Produção Intelectual - BDPI Departamento de Ciências de Computação - ICMC/SCC Comunicações em Eventos - ICMC/SCC 25-4 A Naïve Bayes model based on overlapping
More informationExploiting Social and Mobility Patterns for Friendship Prediction in Location-Based Social Networks
Exploiting Social and Mobility Patterns for Friendship Prediction in Location-Based Social Networks Jorge Valverde-Rebaza ICMC Univ. of São Paulo, Brazil jvalverr@icmc.usp.br Mathieu Roche TETIS & LIRMM
More informationThe impact of network sampling on relational classification
The impact of network sampling on relational classification Lilian Berton 2 Didier A. Vega-Oliveros 1 Jorge Valverde-Rebaza 1 Andre Tavares da Silva 2 Alneu de Andrade Lopes 1 1 Department of Computer
More informationDistribution-free Predictive Approaches
Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationIdentifying and Understanding Differential Transcriptor Binding
Identifying and Understanding Differential Transcriptor Binding 15-899: Computational Genomics David Koes Yong Lu Motivation Under different conditions, a transcription factor binds to different genes
More informationCS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks
CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,
More informationAnomaly Detection. You Chen
Anomaly Detection You Chen 1 Two questions: (1) What is Anomaly Detection? (2) What are Anomalies? Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior
More informationLink Prediction in Microblog Network Using Supervised Learning with Multiple Features
Link Prediction in Microblog Network Using Supervised Learning with Multiple Features Siyao Han*, Yan Xu The Information Science Department, Beijing Language and Culture University, Beijing, China. * Corresponding
More informationMODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour
MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11 Nearest Neighbour Classifier Keywords: K Neighbours, Weighted, Nearest Neighbour 1 Nearest neighbour classifiers This is amongst the simplest
More informationOnline Social Networks and Media
Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationPredicting Gene Function and Localization
Predicting Gene Function and Localization By Ankit Kumar and Raissa Largman CS 229 Fall 2013 I. INTRODUCTION Our data comes from the 2001 KDD Cup Data Mining Competition. The competition had two tasks,
More informationData Preprocessing. Supervised Learning
Supervised Learning Regression Given the value of an input X, the output Y belongs to the set of real values R. The goal is to predict output accurately for a new input. The predictions or outputs y are
More informationResearch Article Link Prediction in Directed Network and Its Application in Microblog
Mathematical Problems in Engineering, Article ID 509282, 8 pages http://dx.doi.org/10.1155/2014/509282 Research Article Link Prediction in Directed Network and Its Application in Microblog Yan Yu 1,2 and
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More information3D Object Recognition using Multiclass SVM-KNN
3D Object Recognition using Multiclass SVM-KNN R. Muralidharan, C. Chandradekar April 29, 2014 Presented by: Tasadduk Chowdhury Problem We address the problem of recognizing 3D objects based on various
More informationSubject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.
Subject Copy paste feature into the diagram. When we define the data analysis process into Tanagra, it is possible to copy components (or entire branches of components) towards another location into the
More informationAn Empirical Study of Lazy Multilabel Classification Algorithms
An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
More informationMachine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016
Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised
More informationMetric learning approaches! for image annotation! and face recognition!
Metric learning approaches! for image annotation! and face recognition! Jakob Verbeek" LEAR Team, INRIA Grenoble, France! Joint work with :"!Matthieu Guillaumin"!!Thomas Mensink"!!!Cordelia Schmid! 1 2
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationSemi-supervised Learning
Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationLink Prediction and Anomoly Detection
Graphs and Networks Lecture 23 Link Prediction and Anomoly Detection Daniel A. Spielman November 19, 2013 23.1 Disclaimer These notes are not necessarily an accurate representation of what happened in
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationIntro to Artificial Intelligence
Intro to Artificial Intelligence Ahmed Sallam { Lecture 5: Machine Learning ://. } ://.. 2 Review Probabilistic inference Enumeration Approximate inference 3 Today What is machine learning? Supervised
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationApplied Statistics for Neuroscientists Part IIa: Machine Learning
Applied Statistics for Neuroscientists Part IIa: Machine Learning Dr. Seyed-Ahmad Ahmadi 04.04.2017 16.11.2017 Outline Machine Learning Difference between statistics and machine learning Modeling the problem
More informationSYDE 372 Introduction to Pattern Recognition. Distance Measures for Pattern Classification: Part I
SYDE 372 Introduction to Pattern Recognition Distance Measures for Pattern Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline Distance Measures
More informationSemi-supervised Data Representation via Affinity Graph Learning
1 Semi-supervised Data Representation via Affinity Graph Learning Weiya Ren 1 1 College of Information System and Management, National University of Defense Technology, Changsha, Hunan, P.R China, 410073
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationCS 340 Lec. 4: K-Nearest Neighbors
CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection
More informationK Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat
K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that
More informationClassifier Inspired Scaling for Training Set Selection
Classifier Inspired Scaling for Training Set Selection Walter Bennette DISTRIBUTION A: Approved for public release: distribution unlimited: 16 May 2016. Case #88ABW-2016-2511 Outline Instance-based classification
More informationLink Prediction for Social Network
Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue
More informationPROBLEM 4
PROBLEM 2 PROBLEM 4 PROBLEM 5 PROBLEM 6 PROBLEM 7 PROBLEM 8 PROBLEM 9 PROBLEM 10 PROBLEM 11 PROBLEM 12 PROBLEM 13 PROBLEM 14 PROBLEM 16 PROBLEM 17 PROBLEM 22 PROBLEM 23 PROBLEM 24 PROBLEM 25
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationAttentional Based Multiple-Object Tracking
Attentional Based Multiple-Object Tracking Mark Calafut Stanford University mcalafut@stanford.edu Abstract This paper investigates the attentional based tracking framework of Bazzani et al. (2011) and
More informationMachine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017
Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis
More informationNatural Language Processing
Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without
More informationMathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul
Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we
More informationClustering will not be satisfactory if:
Clustering will not be satisfactory if: -- in the input space the clusters are not linearly separable; -- the distance measure is not adequate; -- the assumptions limit the shape or the number of the clusters.
More informationVoronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013
Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationA Study of K-Means-Based Algorithms for Constrained Clustering
A Study of K-Means-Based Algorithms for Constrained Clustering Thiago F. Covões, Eduardo R. Hruschka,, Joydeep Ghosh University of Sao Paulo (USP) at Sao Carlos, Brazil University of Texas (UT) at Austin,
More informationChapter 9: Outlier Analysis
Chapter 9: Outlier Analysis Jilles Vreeken 8 Dec 2015 IRDM Chapter 9, overview 1. Basics & Motivation 2. Extreme Value Analysis 3. Probabilistic Methods 4. Cluster-based Methods 5. Distance-based Methods
More informationComparison Study of Different Pattern Classifiers
Comparison Study of Different Pattern s Comparison Study of Different Pattern s Ameet Joshi, Shweta Bapna, Sravanya Chunduri Abstract This paper presents a comparison study of the different parametric
More informationMS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods
MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Supervised Learning: Nonparametric
More informationChallenges motivating deep learning. Sargur N. Srihari
Challenges motivating deep learning Sargur N. srihari@cedar.buffalo.edu 1 Topics In Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation
More informationTime Series Classification in Dissimilarity Spaces
Proceedings 1st International Workshop on Advanced Analytics and Learning on Temporal Data AALTD 2015 Time Series Classification in Dissimilarity Spaces Brijnesh J. Jain and Stephan Spiegel Berlin Institute
More informationUsing Network Analysis to Improve Nearest Neighbor Classification of Non-Network Data
Using Network Analysis to Improve Nearest Neighbor Classification of Non-Network Data Maciej Piernik, Dariusz Brzezinski, Tadeusz Morzy, and Mikolaj Morzy Institute of Computing Science, Poznan University
More informationText Categorization (I)
CS473 CS-473 Text Categorization (I) Luo Si Department of Computer Science Purdue University Text Categorization (I) Outline Introduction to the task of text categorization Manual v.s. automatic text categorization
More informationCS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University
CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM Mingon Kang, PhD Computer Science, Kennesaw State University KNN K-Nearest Neighbors (KNN) Simple, but very powerful classification algorithm Classifies
More informationE-commercial Recommendation Algorithms Based on Link Analysis
E-commercial Recommendation Algorithms Based on Link Analysis Guanlin Li Le Lu Shulin Cao Junjie Zhu Instructor: Fragkiskos Malliaros University of California, San Diego June 11, 2017 E-commercial Recommendation
More informationon learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015
on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,
More informationREMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationCost-Conscious Comparison of Supervised Learning Algorithms over Multiple Data Sets
Cost-Conscious Comparison of Supervised Learning Algorithms over Multiple Data Sets Mehmet Aydın Ulaş, Olcay Taner Yıldız, Ethem Alpaydın Technical Report, FBE/CMPE-01/2008-04 Institute of Graduate Studies
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important
More informationRobust PDF Table Locator
Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records
More informationClassification of Hand-Written Numeric Digits
Classification of Hand-Written Numeric Digits Nyssa Aragon, William Lane, Fan Zhang December 12, 2013 1 Objective The specific hand-written recognition application that this project is emphasizing is reading
More informationLarge-Scale Face Manifold Learning
Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random
More informationData Complexity in Pattern Recognition
Bell Laboratories Data Complexity in Pattern Recognition Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner, Martin Law, Erinija Pranckeviciene, Albert Orriols-Puig,
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationPattern Recognition Chapter 3: Nearest Neighbour Algorithms
Pattern Recognition Chapter 3: Nearest Neighbour Algorithms Asst. Prof. Dr. Chumphol Bunkhumpornpat Department of Computer Science Faculty of Science Chiang Mai University Learning Objectives What a nearest
More informationEmpirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee
A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the
More informationData Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 4 Instance-Based Learning Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Instance Based Classifiers
More informationSupervised Clustering of Label Ranking Data
Supervised Clustering of Label Ranking Data Mihajlo Grbovic, Nemanja Djuric, Slobodan Vucetic {mihajlo.grbovic, nemanja.djuric, slobodan.vucetic}@temple.edu SIAM SDM 202, Anaheim, California, USA Temple
More informationLearning Better Data Representation using Inference-Driven Metric Learning
Learning Better Data Representation using Inference-Driven Metric Learning Paramveer S. Dhillon CIS Deptt., Univ. of Penn. Philadelphia, PA, U.S.A dhillon@cis.upenn.edu Partha Pratim Talukdar Search Labs,
More informationCollaborative filtering based on a random walk model on a graph
Collaborative filtering based on a random walk model on a graph Marco Saerens, Francois Fouss, Alain Pirotte, Luh Yen, Pierre Dupont (UCL) Jean-Michel Renders (Xerox Research Europe) Some recent methods:
More informationAutomatic Classification of Audio Data
Automatic Classification of Audio Data Carlos H. C. Lopes, Jaime D. Valle Jr. & Alessandro L. Koerich IEEE International Conference on Systems, Man and Cybernetics The Hague, The Netherlands October 2004
More informationLearning Low-rank Transformations: Algorithms and Applications. Qiang Qiu Guillermo Sapiro
Learning Low-rank Transformations: Algorithms and Applications Qiang Qiu Guillermo Sapiro Motivation Outline Low-rank transform - algorithms and theories Applications Subspace clustering Classification
More informationSemi-supervised learning SSL (on graphs)
Semi-supervised learning SSL (on graphs) 1 Announcement No office hour for William after class today! 2 Semi-supervised learning Given: A pool of labeled examples L A (usually larger) pool of unlabeled
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationPackage ECoL. January 22, 2018
Type Package Version 0.1.0 Date 2018-01-22 Package ECoL January 22, 2018 Title Compleity Measures for Classification Problems Provides measures to characterize the compleity of classification problems
More informationKeywords Binary Linked Object, Binary silhouette, Fingertip Detection, Hand Gesture Recognition, k-nn algorithm.
Volume 7, Issue 5, May 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Hand Gestures Recognition
More informationCompetitive Learning with Pairwise Constraints
Competitive Learning with Pairwise Constraints Thiago F. Covões, Eduardo R. Hruschka, Joydeep Ghosh University of Texas (UT) at Austin, USA University of São Paulo (USP) at São Carlos, Brazil {tcovoes,erh}@icmc.usp.br;ghosh@ece.utexas.edu
More informationToward Part-based Document Image Decoding
2012 10th IAPR International Workshop on Document Analysis Systems Toward Part-based Document Image Decoding Wang Song, Seiichi Uchida Kyushu University, Fukuoka, Japan wangsong@human.ait.kyushu-u.ac.jp,
More informationA Fast Multivariate Nearest Neighbour Imputation Algorithm
A Fast Multivariate Nearest Neighbour Imputation Algorithm Norman Solomon, Giles Oatley and Ken McGarry Abstract Imputation of missing data is important in many areas, such as reducing non-response bias
More informationRelational Classification for Personalized Tag Recommendation
Relational Classification for Personalized Tag Recommendation Leandro Balby Marinho, Christine Preisach, and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Samelsonplatz 1, University
More informationAction Recognition & Categories via Spatial-Temporal Features
Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationMidterm Examination CS540-2: Introduction to Artificial Intelligence
Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search
More informationPrototype Selection for Handwritten Connected Digits Classification
2009 0th International Conference on Document Analysis and Recognition Prototype Selection for Handwritten Connected Digits Classification Cristiano de Santana Pereira and George D. C. Cavalcanti 2 Federal
More informationMore Efficient Classification of Web Content Using Graph Sampling
More Efficient Classification of Web Content Using Graph Sampling Chris Bennett Department of Computer Science University of Georgia Athens, Georgia, USA 30602 bennett@cs.uga.edu Abstract In mining information
More informationExploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019
Exploring the Structure of Data at Scale Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019 Outline Why exploration of large datasets matters Challenges in working with large data
More informationStructured prediction using the network perceptron
Structured prediction using the network perceptron Ta-tsen Soong Joint work with Stuart Andrews and Prof. Tony Jebara Motivation A lot of network-structured data Social networks Citation networks Biological
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationAdaptive Gesture Recognition System Integrating Multiple Inputs
Adaptive Gesture Recognition System Integrating Multiple Inputs Master Thesis - Colloquium Tobias Staron University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Technical Aspects
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More information