From Word Embeddings To Document Distances. Matt J. Kusner Yu Sun Nicholas I. Kolkin Kilian Q. Weinberger
|
|
- Lewis Douglas
- 5 years ago
- Views:
Transcription
1 From Word Embeddings To Document Distances Matt J. Kusner Yu Sun Nicholas I. Kolkin Kilian Q. Weinberger
2 Goal: a distance between two documents?
3 Applications document classification multi-lingual document matching song identification
4 Word Embedding word2vec [Mikolov et al., 213] different from [Collobert & Weston, 28] [Mnih & Hinton, 29] word2vec is not deep! words trained on 1 billion words 3 million different words embedded R d
5 Word Embedding word2vec [Mikolov et al., 213] x i x j X 2 R d n words distance between words i and j: kx i x j k 2 is roughly their dissimilarity R d
6 Word Embedding word2vec [Mikolov et al., 213] Man King Woman Queen R d
7 How can we leverage these high quality word embeddings to compute document distances?
8 Word Mover s Distance cat run tree dog steamocean car fly race frog pattern oatpickle rock up rocket win bunnybaby
9 Goal?
10 Word Mover s Distance d d media media R d word embedding
11 Word Mover s Distance d d media media R d word embedding
12 Word Mover s Distance word mover s distance = d minimum word distance to transform mass d into d d media media R d word embedding
13 Word Mover s Distance d d media media R d word embedding
14 Word Mover s Distance d d media media R d word embedding
15 Word Mover s Distance d d media media R d word embedding
16 Word Mover s Distance d media j d T i R d word embedding
17 Word Mover s Distance d d 1/2 R d word embedding
18 Word Mover s Distance d d 1/2 R d word embedding
19 Word Mover s Distance d d 1/2 R d word embedding
20 Word Mover s Distance d WMD(d, d ), d 1/2 min T i,j=1 T ij kx i x j k 2 1/2 x j x k x i
21 Word Mover s Distance d WMD(d, d ) ),, d 1/2 min T i,j=1 T ij kx i x j k 2 s.t. T ij = d i 8i j=1 i=1 T ij = d j 8j
22 Remarks WMD(d, d ), min T ij kx i x j k 2 T s.t. s.t. i,j=1 j=1 i=1 T ij = d i F ij = d i T ij = d j i=1 F ij = d j 8i 8j 8j in CV this is the Earth Mover s Distance (EMD) [Rubner et al., 1998] an old optimal transport problem [Monge, 1781]
23 How well does WMD perform on document classification via k- nearest neighbors (k-nn)?
24 Classic Approaches bag-of-words campaign speech Washington TF-IDF LSI speech speech a a LDA [Salton & Buckley, 1988] [Deerwester et al., 199] [Blei et al., 23] topic distributions Civil War music politics sports politics topic guitar soccer Vicksburg Madonna football Washington speech
25 Results: k-nn test error % train inputs: BOW dim: k-nearest neighbor error bbcsport twitter recipe ohsumed classic reuters amazon Okapi BM25 [Robertson & Walker, 1994] TF-IDF [Jones, 1972] BOW [Frakes & Baeza-Yates, 1992] Componential Counting Grid [erina et al., 213] msda [Chen et al., 212] LDA [Blei et al., 23] LSI [Deerwester et al., 199] Word Mover's Distance news All hyper-parameters set with bayesopt.m [Gardner et al. 214]
26 Results: k-nn average error w.r.t. BOW Okapi BM25 TF-IDF BOW CCG msda LDA LSI.72 WMD
27 Computational Complexity min T F i,j=1 i,j=1 s.t. s.t. WMD(d, d ), j=1 i=1 TF ij kx i x j k 2 T ij = d i F ij = d i T ij = d j i=1 F ij = d j L with 2n constraints O(n 3 log n) 8i 8j 8j [ele & Werman, 29]
28 Computational Complexity min T F i,j=1 i,j=1 s.t. s.t. WMD(d, d ), j=1 i=1 TF ij kx i x j k 2 T ij = d i F ij = d i T ij = d j i=1 F ij = d j 8i 8j 8j approximations: [Rubner et al., 1998]; [Levina & Bickel, 21]; [Grauman & Darrell, 24]; [Shirdhonkar & Jacobs, 28]
29 Approximation 1 [Rubner et al., 1998] d d media media R d word embedding
30 Approximation 1 [Rubner et al., 1998] d d media media R d word embedding
31 d media Approximation 1 Word Centroid Distance WCD(d, d ), kxd Xd k 2 Xd Xd O(nd) [Rubner et al., 1998] media d R d word embedding
32 Faster Approximations for a random test input... distance twitter training input index amazon WCD RWMD WMD training input index
33 Approximation 2 min T F i,j=1 i,j=1 s.t. j=1 i=1 FT ij ij kx ii x jj k 22 T ij = d i F ij = d i T ij = d j i=1 F ij = d j 8i 8i 8j 8j
34 Approximation 2 min T s.t. i,j=1 j=1 i=1 T ij kx i x j k 2 T ij = d i T ij = d j 8i 8j D 1
35 Approximation 2 min T s.t. i,j=1 j=1 i=1 T ij kx i x j k 2 T ij = d i T ij = d j 8i 8j D 1 just a nearest-neighbor search!
36 Approximation 2 min T s.t. i,j=1 j=1 i=1 T ij kx i x j k 2 T ij = d i T ij = d j 8i 8j D 2 just a nearest-neighbor search!
37 Approximation 2 min T s.t. i,j=1 T ij kx i x j k 2 T ij = d i j=1 T ij = d j i=1 8i 8j min T s.t. i,j=1 T ij kx i x j k 2 T ij = d i j=1 T ij = d j i=1 8i 8j D 1 D 2 Relaxed Word Mover s Distance RWMD(d, d ), max(d 1,D 2 ) O(n 2 d)
38 Faster Approximations for a random test input... distance twitter training input index amazon WCD RWMD WMD training input index
39 Faster Approximations for a random test input... distance twitter training input index amazon WCD RWMD WMD training input index
40 Faster Approximations MD average knn error w.r.t. BOW WMD RWMD 1 c2 RWMD D 1 D 2 c1 WCD RWMD WMD
41 Other Embeddings
42 Conclusion Word Mover s Distance: media document distances from word embeddings Very accurate as it leverages high quality word2vec embedding average error w.r.t. BOW Okapi BM25 TF-IDF BOW CCG msda LDA LSI.72 WMD Fast through approximations WMD O(n 3 log n) WCD O(nd) RWMD O(n 2 d)
43 Code: Thank you. Questions?
arxiv: v1 [cs.ir] 20 Nov 2017
Linear-Complexity Relaxed Word Mover s Distance with GPU Acceleration Kubilay Atasu, Thomas Parnell, Celestine Dünner, Manolis Sifalakis, Haralampos Pozidis, Vasileios Vasileiadis, Michail Vlachos, Cesar
More informationDistribution Distance Functions
COMP 875 November 10, 2009 Matthew O Meara Question How similar are these? Outline Motivation Protein Score Function Object Retrieval Kernel Machines 1 Motivation Protein Score Function Object Retrieval
More informationEarth Mover s Distance and The Applications
Earth Mover s Distance and The Applications Hu Ding Computer Science and Engineering, Michigan State University The Motivations It is easy to compare two single objects: the pairwise distance. For example:
More informationFusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification. June 8, 2018
Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification Konstantinos Skianis École Polytechnique France Fragkiskos D. Malliaros CentraleSupélec &
More informationWORD MOVER S EMBEDDING: FROM WORD2VEC TO DOCUMENT EMBEDDING
WORD MOVER S EMBEDDING: FROM WORD2VEC TO DOCUMENT EMBEDDING Anonymous authors Paper under double-blind review ABSTRACT Learning effective text representations is a key foundation for numerous machine learning
More informationFast and Robust Earth Mover s Distances
Fast and Robust Earth Mover s Distances Ofir Pele and Michael Werman School of Computer Science and Engineering The Hebrew University of Jerusalem {ofirpele,werman}@cs.huji.ac.il Abstract We present a
More informationWord Embeddings in Search Engines, Quality Evaluation. Eneko Pinzolas
Word Embeddings in Search Engines, Quality Evaluation Eneko Pinzolas Neural Networks are widely used with high rate of success. But can we reproduce those results in IR? Motivation State of the art for
More informationSupervised Hashing for Image Retrieval via Image Representation Learning
Supervised Hashing for Image Retrieval via Image Representation Learning Rongkai Xia, Yan Pan, Cong Liu (Sun Yat-Sen University) Hanjiang Lai, Shuicheng Yan (National University of Singapore) Finding Similar
More informationUsing Centroids of Word Embeddings and Word Mover s Distance for Biomedical Document Retrieval in Question Answering
Using Centroids of Word Embeddings and Word Mover s Distance for Biomedical Document Retrieval in Question Answering Georgios-Ioannis Brokos 1, Prodromos Malakasiotis 1,2 and Ion Androutsopoulos 1,2 1
More informationStructured Optimal Transport
Structured Optimal Transport David Alvarez-Melis, Tommi Jaakkola, Stefanie Jegelka CSAIL, MIT OTML Workshop @ NIPS, Dec 9th 2017 Motivation: Domain Adaptation c(x i,y j ) c(x k,y`) Labeled Source Domain
More informationExploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019
Exploring the Structure of Data at Scale Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019 Outline Why exploration of large datasets matters Challenges in working with large data
More informationSemantic Matching by Non-Linear Word Transportation for Information Retrieval
Semantic Matching by Non-Linear Word Transportation for Information Retrieval Jiafeng Guo, Yixing Fan, Qingyao Ai, W. Bruce Croft CAS Key Lab of Network Data Science and Technology, Institute of Computing
More informationJames Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!
James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation
More informationMetric Learning for Large Scale Image Classification:
Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Thomas Mensink 1,2 Jakob Verbeek 2 Florent Perronnin 1 Gabriela Csurka 1 1 TVPA - Xerox Research Centre
More informationImage classification Computer Vision Spring 2018, Lecture 18
Image classification http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 18 Course announcements Homework 5 has been posted and is due on April 6 th. - Dropbox link because course
More informationVECTOR SPACE CLASSIFICATION
VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei wwei@idi.ntnu.no Lecture
More information(Multinomial) Logistic Regression + Feature Engineering
-6 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University (Multinomial) Logistic Regression + Feature Engineering Matt Gormley Lecture 9 Feb.
More informationAnnouncements. HW3 problem 4c Kevin Jamieson
Announcements HW3 problem 4c 2017 Kevin Jamieson 1 Announcements HW3 problem 4c 2017 Kevin Jamieson 2 Announcements HW3 problem 4c 2017 Kevin Jamieson 3 Sequences and Recurrent Neural Networks Machine
More informationAlexey Grigorev Team ololobhi (Abhishek & ololo)
Alexey Grigorev Team ololobhi (Abhishek & ololo) Data set ~3 mln train pairs, ~1 mln test pairs ~10.8 mln images (~45 gb) Target Evaluation metric: AUC Category_ID Title Pictures Price No seller data locationid
More informationWord importance-based similarity of documents metric (WISDM)
Word importance-based similarity of documents metric (WISDM) [Fast and scalable document similarity metric for analysis of scientific documents] Viktor Botev IRIS.AI Bekkestua, Norway victor@iris.ai Kaloyan
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationNearest-Neighbor Search in NLP Applications using the Non-Metric Space Library (NMSLIB)
Nearest-Neighbor Search in NLP Applications using the Non-Metric Space Library (NMSLIB) Leo (Leonid) Boytsov https://github.com/searchivarius/nonmetricspacelib Nearest-Neighbor Search in NLP Applications
More informationDeep Learning for Program Analysis. Lili Mou January, 2016
Deep Learning for Program Analysis Lili Mou January, 2016 Outline Introduction Background Deep Neural Networks Real-Valued Representation Learning Our Models Building Program Vector Representations for
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationCSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 / CX 4242 October 9, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Volume Variety Big Data Era 2 Velocity Veracity 3 Big Data are High-Dimensional Examples of High-Dimensional Data Image
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing
More informationSupervised classification of law area in the legal domain
AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms
More informationBag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013
CS4670 / 5670: Computer Vision Noah Snavely Bag-of-words models Object Bag of words Bag of Words Models Adapted from slides by Rob Fergus and Svetlana Lazebnik 1 Object Bag of words Origin 1: Texture Recognition
More informationObject Classification Problem
HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category
More informationIntroduction to Information Retrieval. (COSC 488) Spring Nazli Goharian. Course Outline
Introduction to Information Retrieval (COSC 488) Spring 2012 Nazli Goharian nazli@cs.georgetown.edu Course Outline Introduction Retrieval Strategies (Models) Retrieval Utilities Evaluation Indexing Efficiency
More informationMulti-Dimensional Text Classification
Multi-Dimensional Text Classification Thanaruk THEERAMUNKONG IT Program, SIIT, Thammasat University P.O. Box 22 Thammasat Rangsit Post Office, Pathumthani, Thailand, 12121 ping@siit.tu.ac.th Verayuth LERTNATTEE
More informationCPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017
CPSC 340: Machine Learning and Data Mining Recommender Systems Fall 2017 Assignment 4: Admin Due tonight, 1 late day for Monday, 2 late days for Wednesday. Assignment 5: Posted, due Monday of last week
More informationPTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks
PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks Pramod Srinivasan CS591txt - Text Mining Seminar University of Illinois, Urbana-Champaign April 8, 2016 Pramod Srinivasan
More informationExploring Semantic Concept Using Local Invariant Features
Exploring Semantic Concept Using Local Invariant Features Yu-Gang Jiang, Wan-Lei Zhao and Chong-Wah Ngo Department of Computer Science City University of Hong Kong, Kowloon, Hong Kong {yjiang,wzhao2,cwngo}@cs.cityu.edu.h
More informationNatural Language Processing
Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without
More informationMaking Retrieval Faster Through Document Clustering
R E S E A R C H R E P O R T I D I A P Making Retrieval Faster Through Document Clustering David Grangier 1 Alessandro Vinciarelli 2 IDIAP RR 04-02 January 23, 2004 D a l l e M o l l e I n s t i t u t e
More informationDescriptors for CV. Introduc)on:
Descriptors for CV Content 2014 1.Introduction 2.Histograms 3.HOG 4.LBP 5.Haar Wavelets 6.Video based descriptor 7.How to compare descriptors 8.BoW paradigm 1 2 1 2 Color RGB histogram Introduc)on: Image
More informationEfficient Similarity Search in Scientific Databases with Feature Signatures
DATA MANAGEMENT AND DATA EXPLORATION GROUP Prof. Dr. rer. nat. Thomas Seidl DATA MANAGEMENT AND DATA EXPLORATION GROUP Prof. Dr. rer. nat. Thomas Seidl Efficient Similarity Search in Scientific Databases
More informationAutomatic Classification of Audio Data
Automatic Classification of Audio Data Carlos H. C. Lopes, Jaime D. Valle Jr. & Alessandro L. Koerich IEEE International Conference on Systems, Man and Cybernetics The Hague, The Netherlands October 2004
More informationA Case Study on the Impact of Similarity Measure on Information Retrieval based Software Engineering Tasks
Noname manuscript No. (will be inserted by the editor) A Case Study on the Impact of Similarity Measure on Information Retrieval based Software Engineering Tasks Md Masudur Rahman Saikat Chakraborty Gail
More informationMetric Learning for Large-Scale Image Classification:
Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka
More informationCS6670: Computer Vision
CS6670: Computer Vision Noah Snavely Lecture 16: Bag-of-words models Object Bag of words Announcements Project 3: Eigenfaces due Wednesday, November 11 at 11:59pm solo project Final project presentations:
More informationMarginalized Denoising Autoencoder via Graph Regularization for Domain Adaptation
Marginalized Denoising Autoencoder via Graph Regularization for Domain Adaptation Yong Peng, Shen Wang 2, and Bao-Liang Lu,3, Center for Brain-Like Computing and Machine Intelligence, Department of Computer
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John
More informationLecture 8 May 7, Prabhakar Raghavan
Lecture 8 May 7, 2001 Prabhakar Raghavan Clustering documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics Given the set of docs from the results of
More informationLarge-scale visual recognition Efficient matching
Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!!
More informationText Analytics (Text Mining)
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS
More informationTerm Frequency With Average Term Occurrences For Textual Information Retrieval
Noname manuscript No. (will be inserted by the editor) Term Frequency With Average Term Occurrences For Textual Information Retrieval O. Ibrahim D. Landa-Silva Received: date / Accepted: date Abstract
More informationon learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015
on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,
More informationTowards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley
Towards Large-Scale Semantic Representations for Actionable Exploitation Prof. Trevor Darrell UC Berkeley traditional surveillance sensor emerging crowd sensor Desired capabilities: spatio-temporal reconstruction
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationWord2vec and beyond. presented by Eleni Triantafillou. March 1, 2016
Word2vec and beyond presented by Eleni Triantafillou March 1, 2016 The Big Picture There is a long history of word representations Techniques from information retrieval: Latent Semantic Analysis (LSA)
More informationClustering. Huanle Xu. Clustering 1
Clustering Huanle Xu Clustering 1 High Dimensional Data Given a cloud of data points we want to understand their structure 10/31/2016 Clustering 4 The Problem of Clustering Given a set of points, with
More informationToday s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan
Today s topic CS347 Clustering documents Lecture 8 May 7, 2001 Prabhakar Raghavan Why cluster documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics
More informationNeston High School Mathematics Faculty Homework Booklet
Neston High School Mathematics Faculty Homework Booklet Year 11 Sets 4, 5, 6 Scheme: Foundation Homework Sheet 1 Week Commencing 11 th December 2017 1: Reflect the shape in the given mirror line. C11:
More informationKristina Lerman University of Southern California. This lecture is partly based on slides prepared by Anon Plangprasopchok
Kristina Lerman University of Southern California This lecture is partly based on slides prepared by Anon Plangprasopchok Social Web is a platform for people to create, organize and share information Users
More informationA probabilistic description-oriented approach for categorising Web documents
A probabilistic description-oriented approach for categorising Web documents Norbert Gövert Mounia Lalmas Norbert Fuhr University of Dortmund {goevert,mounia,fuhr}@ls6.cs.uni-dortmund.de Abstract The automatic
More informationWord Embedding for Social Book Suggestion
Word Embedding for Social Book Suggestion Nawal Ould-Amer 1, Philippe Mulhem 1, Mathias Géry 2, and Karam Abdulahhad 1 1 Univ. Grenoble Alpes, LIG, F-38000 Grenoble, France CNRS, LIG, F-38000 Grenoble,
More informationBasic techniques. Text processing; term weighting; vector space model; inverted index; Web Search
Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing
More informationSyllabus. 1. Visual classification Intro 2. SVM 3. Datasets and evaluation 4. Shallow / Deep architectures
Syllabus 1. Visual classification Intro 2. SVM 3. Datasets and evaluation 4. Shallow / Deep architectures Image classification How to define a category? Bicycle Paintings with women Portraits Concepts,
More informationMachine Learning Practice and Theory
Machine Learning Practice and Theory Day 9 - Feature Extraction Govind Gopakumar IIT Kanpur 1 Prelude 2 Announcements Programming Tutorial on Ensemble methods, PCA up Lecture slides for usage of Neural
More informationEVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari
EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular
More informationReport on the TREC-4 Experiment: Combining Probabilistic and Vector-Space Schemes
Report on the TREC-4 Experiment: Combining Probabilistic and Vector-Space Schemes Jacques Savoy, Melchior Ndarugendamwo, Dana Vrajitoru Faculté de droit et des sciences économiques Université de Neuchâtel
More informationClassification Key Concepts
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Parishit
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationHuman Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016
Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016 Max Wang mwang07@stanford.edu Ting-Chun Yeh chun618@stanford.edu I. Introduction Recognizing human
More information3D Deep Learning on Geometric Forms. Hao Su
3D Deep Learning on Geometric Forms Hao Su Many 3D representations are available Candidates: multi-view images depth map volumetric polygonal mesh point cloud primitive-based CAD models 3D representation
More informationOtto Group Product Classification Challenge
Otto Group Product Classification Challenge Hoang Duong May 19, 2015 1 Introduction The Otto Group Product Classification Challenge is the biggest Kaggle competition to date with 3590 participating teams.
More informationMartian lava field, NASA, Wikipedia
Martian lava field, NASA, Wikipedia Old Man of the Mountain, Franconia, New Hampshire Pareidolia http://smrt.ccel.ca/203/2/6/pareidolia/ Reddit for more : ) https://www.reddit.com/r/pareidolia/top/ Pareidolia
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationLearning Compact and Effective Distance Metrics with Diversity Regularization. Pengtao Xie. Carnegie Mellon University
Learning Compact and Effective Distance Metrics with Diversity Regularization Pengtao Xie Carnegie Mellon University 1 Distance Metric Learning Similar Dissimilar Distance Metric Wide applications in retrieval,
More informationRanking models in Information Retrieval: A Survey
Ranking models in Information Retrieval: A Survey R.Suganya Devi Research Scholar Department of Computer Science and Engineering College of Engineering, Guindy, Chennai, Tamilnadu, India Dr D Manjula Professor
More informationECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016
ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea What is this course about? Processing Indexing Retrieving textual data (or audio, video, geo-spatial,, data) Fits in four
More informationFast Document Clustering Based on Weighted Comparative Advantage
Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Fast Document Clustering Based on Weighted Comparative Advantage Jie Ji Intelligent
More informationRanking Function Optimizaton Based on OKAPI and K-Means
2016 International Conference on Mechanical, Control, Electric, Mechatronics, Information and Computer (MCEMIC 2016) ISBN: 978-1-60595-352-6 Ranking Function Optimizaton Based on OKAPI and K-Means Jun
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationClassification Key Concepts
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech 1 How will
More informationOn Identifying Disaster-Related Tweets: Matching-based or Learning based?
IEEE Big MM 2017, April 19-21, 2017 On Identifying Disaster-Related Tweets: Matching-based or Learning based? Presented by Dr. Seon Ho Kim Hien To Sumeet Agrawal Integrated Media Systems Center University
More informationText classification II CE-324: Modern Information Retrieval Sharif University of Technology
Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationLearn from Web Search Logs to Organize Search Results
Learn from Web Search Logs to Organize Search Results Xuanhui Wang Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 xwang20@cs.uiuc.edu ChengXiang Zhai Department
More informationAutomatic Ranking of Images on the Web
Automatic Ranking of Images on the Web HangHang Zhang Electrical Engineering Department Stanford University hhzhang@stanford.edu Zixuan Wang Electrical Engineering Department Stanford University zxwang@stanford.edu
More informationDistribution-free Predictive Approaches
Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for
More informationLearning Query and Document Relevance from a Web-scale Click Graph
Learning Query and Document Relevance from a Web-scale Click Graph Shan Jiang, Yuening Hu, Changsung Kang, Tim Daly Jr., Dawei Yin, Yi Chang, Chengxiang Zhai Department of Computer Science University of
More informationIRISA Participation in JRS 2012 Data-Mining Challenge: Lazy-Learning with Vectorization
IRISA Participation in JRS 2012 Data-Mining Challenge: Lazy-Learning with Vectorization Vincent Claveau To cite this version: Vincent Claveau. IRISA Participation in JRS 2012 Data-Mining Challenge: Lazy-Learning
More informationBoolean Model. Hongning Wang
Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer
More information2 Haruechaiyasak, Shyu and Chen identification is proposed. Our topic identification process is based on a classification method which uses a supervis
International Journal of Computational Intelligence and Applications cfl World Scientific Publishing Company IDENTIFYING TOPICS FOR WEB DOCUMENTS THROUGH FUZZY ASSOCIATION LEARNING CHOOCHART HARUECHAIYASAK,
More informationMultimodal topic model for texts and images utilizing their embeddings
Multimodal topic model for texts and images utilizing their embeddings Nikolay Smelik, smelik@rain.ifmo.ru Andrey Filchenkov, afilchenkov@corp.ifmo.ru Computer Technologies Lab IDP-16. Barcelona, Spain,
More informationActive Browsing using Similarity Pyramids
Active Browsing using Similarity Pyramids Jau-Yuen Chen, Charles A. Bouman and John C. Dalton School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47907-1285 {jauyuen,bouman}@ecn.purdue.edu
More informationAlthough implementations and applications vary, the idea of the EMD. and to some extent it mimics the human perception of texture similarities.
Earth Mover's distance èemdè was rst introduced by Rubner et The for color and texture images ë11, 12ë. This distance can be calculated between al. Introduction two collections of points, when there is
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Skiing Seminar Information Retrieval 2010/2011 Introduction to Information Retrieval Prof. Ulrich Müller-Funk, MScIS Andreas Baumgart and Kay Hildebrand Agenda 1 Boolean
More informationLarge Scale Manifold Transduction
Large Scale Manifold Transduction Michael Karlen, Jason Weston, Ayse Erkan & Ronan Collobert NEC Labs America, Princeton, USA Ećole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland New York University,
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Word Vector 2 Word2Vec Variants Skip-gram: predicting surrounding words given the target word (Mikolov+, 2013) CBOW (continuous bag-of-words): predicting
More informationOptimal transport for machine learning
Optimal transport for machine learning Practical sessions Rémi Flamary, Nicolas Courty, Marco Cuturi Data SCience Summer School (DS3) 2018, Paris, France 1 Course organization A day in Optimal Transport
More informationFall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12
Fall 2016 CS646: Information Retrieval Lecture 2 - Introduction to Search Result Ranking Jiepu Jiang University of Massachusetts Amherst 2016/09/12 More course information Programming Prerequisites Proficiency
More informationDigital Solutions For Advertisers
Digital Solutions For Advertisers KMA applies a direct marketing approach to our comprehensive digital solutions to meet advertiser needs whether it be data enhancement, qualified branding, list building,
More informationOverview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8
Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions
More information