Focused Crawling with Scalable Ordinal Regression Solvers
|
|
- Josephine Miles
- 6 years ago
- Views:
Transcription
1 institution-logo with Scalable Ordinal Regression Solvers Rashmin Babaria, J Saketha Nath, Krishnan S, KR Sivaramakrishnan, Chiranjib Bhattacharyya, M N Murty Department of Computer Science and Automation Indian Institute of Science, INDIA
2 institution-logo & Large scale OR Given a topic (seed pages) find out relevant pages from the web Pose as a large scale OR problem Ordinal Regression Fast OR training algorithm scales to millions of datapoints Fast algorithm to solve an SOCP with one SOC constraint Low prediction time
3 Baseline OR Formulation [Chu & Keerthi, 2005]
4 Clustering based scalable OR Formulation Describe data using clusters instead of data points
5 Clustering based scalable OR Formulation Describe data using clusters instead of data points Class conditional distributions mixture models with spherical covariance
6 institution-logo Clustering based scalable OR Formulation Describe data using clusters instead of data points Class conditional distributions mixture models with spherical covariance Using second order moments (µ, σ 2 I), classify clusters
7 institution-logo Clustering based scalable OR Formulation Describe data using clusters instead of data points Class conditional distributions mixture models with spherical covariance Using second order moments (µ, σ 2 I), classify clusters Proposed formulation will have constraints per cluster
8 institution-logo Clustering based scalable OR Formulation Describe data using clusters instead of data points Class conditional distributions mixture models with spherical covariance Using second order moments (µ, σ 2 I), classify clusters Proposed formulation will have constraints per cluster Size of optimization problem O(clusters) rather than O(datapoints)
9 Proposed OR formulation s solution
10 Proposed OR formulation s solution
11 Proposed OR formulation s solution
12 Proposed OR formulation Features: SOCP Problem with one SOC constraint T train = T clust + T SOCP = O(n) Cluster moments estimated using BIRCH [Zhang et.al., 1996] T clust = O(n) SOCP solved using SeDuMi a. T SOCP is independent of n Can be Kernelized using input space cluster moments No. of Support Vectors at max. k low prediction time a institution-logo
13 Clustering + SOCP gives speedup Table: Training times (sec) with SeDuMi and SMO-OR [Chu & Keerthi, 2005] on synthetic dataset. S-Rate S-Size SMO-OR SeDuMi , , , ,500, ,000, Table: Training times (sec), test error rate with SeDuMi and SMO-OR [Chu & Keerthi, 2005] on CS-Census dataset. S-Size SMO-OR SeDuMi sec (err) sec 5, (.128) 20.4 (.109) 11, (.107) (.112) CS 15, (.107) (.108) 22, (.119) institution-logo
14 institution-logo Large number of clusters is still challenging Table: Training times (sec), test error rate with SeDuMi and SMO-OR [Chu & Keerthi, 2005] on CH-California Housing dataset. S-Size SMO-OR SeDuMi sec (err) sec 10, (.619) 112 (.623) 13, (.616) (.634) CH 15, (.617) 17, (.617) 20, (.62)
15 institution-logo CB-OR Solver Key Idea: Exploit special SOCP form SOCP problem with one SOC constraint Erdougan et.al., 2006 specialized solvers scale better Fast algorithm similar in spirit to Platt s SMO for QP Features: More scalable than generic solvers Easy to implement, uses no optimization tools
16 CB-OR Solver Rewrite Dual as follows: min α,α W (α α) K(α α) d (α + α ) s.t. 0 α 1, 0 α 1 K is Gram matrix for cluster centers s i = i nk k=1 j=1 αj k and s i = i+1 k=2 s i s i, i = 1,...,r 2, s r 1 = s r 1 nk j=1 α j k
17 CB-OR Solver Minimization wrt. two multipliers min α s.t. a( α) 2 + 2b( α) + c e α lb α ub Has closed form solution: α = r ac b e 2 a e 2 b a ] b ub a ub lb if ac b 2 > 0, a e 2 > 0 if ac b 2 = 0, a e 2 > 0 lb ub if e a 0 lb if e + a 0 institution-logo
18 institution-logo CB-OR Solver CB-OR Algorithm Step 1 Pick two most KKT violators Step 2 Solve the 1-d minimization problem Step 3 Update unknowns Step 4 Check for KKT violators. If none terminate. Else Step 1
19 CB-OR Evaluation Training time in seconds CB OR SeDuMi Number of Clusters Figure: Dashed line represents training time with SeDuMi and continuous line that with CB-OR on a synthetic dataset. institution-logo
20 CB-OR Evaluation Table: Comparison of training times (in sec) with CB-OR, SMO-OR and SeDuMi on benchmark datasets. The test set error rate is given in brackets. (CH-California Housing, CS-Census datasets). S-Size CB-OR SMO-OR SeDuMi sec (err) sec (err) sec 10,320.5 (.623) (.619) , (.634) (.616) CH 15, (.618) 1142 (.617) 17, (.621) 1410 (.617) 20, (.62) (.62) 5,690.3 (.109) 893 (.128) ,393.7 (.112) (.107) CS 15,191 1 (.108) (.107) , (.119) institution-logo
21 institution-logo Given a topic (seed pages) find out relevant pages from the web. S. Chakrabarti et.al (1999,2002), C. Aggarwal et.al (2001), M. Diligenti et.al (2000) Requires low bandwidth and low disk space. Small updation cycle.
22 Baseline Focused Crawler [Chakrabarti et.al., 1999]
23 Topic Taxonomy
24 Topic Taxonomy
25 Topic Taxonomy
26 Topic Taxonomy
27 Topic Taxonomy
28 Topic Taxonomy
29 Topic Taxonomy
30 Topic Taxonomy
31 institution-logo Exploit link structure Grangier and Bengio observe that hyperlinked documents are semantically closer. One link away pages are more similar to seed pages compare to two link away pages.
32 Link structure in web
33 Link structure in web
34 Link structure in web
35 as OR problem exploit link structure
36 as OR problem exploit link structure
37 as OR problem exploit link structure
38 as OR problem exploit link structure
39 Baseline architecture
40 Proposed architecture
41 Crawling Experiments Conclusions is a large scale OR problem Category Seed NASCAR Soccer Cancer Mutual Funds
42 NASCAR harvest rate Crawling Experiments Conclusions
43 Cancer harvest rate Crawling Experiments Conclusions
44 Mutual Funds harvest rate Crawling Experiments Conclusions
45 Harvest rate comparison Crawling Experiments Conclusions Dataset Baseline OR NASCAR Cancer Mutual Fund Soccer
46 Conclusions Crawling Experiments Conclusions Proposed a scalable clustering based OR formulation Training time O(datapoints) Support Vectors O(clusters) Exploited special structure of the formulation to develop a fast solver, CB-OR Scalable to tens of thousands of clusters We formulated focused crawling as large scale ordinal regression No need for negative class definition Independent of topic taxonomy OR captures link structure of web graph.
47 Crawling Experiments Conclusions Focused crawler code available at
48 Acknowledgments Crawling Experiments Conclusions This project is partially supported by AOL India Pvt Ltd and DST, Government Of India (DST/ECA/CB/660)
49 Crawling Experiments Conclusions Questions?
Developing Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationSecond Order SMO Improves SVM Online and Active Learning
Second Order SMO Improves SVM Online and Active Learning Tobias Glasmachers and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum 4478 Bochum, Germany Abstract Iterative learning algorithms
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationAn Efficient Clustering Scheme using Support Vector Methods
An Efficient Clustering Scheme using Support Vector Methods J Saketha Nath 1 and S K Shevade 2 1 Supercomputer Education and Research Center, Indian Institute of Science, Bangalore-5612, INDIA. Saketh.Nath@gmail.com
More informationIntroduction to Machine Learning
Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationApproximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression Richard E. Edwards 1, Hao Zhang 1, Lynne E. Parker 1, Joshua R. New 2 1 Distributed Intelligence Lab Department of
More informationUse of Multi-category Proximal SVM for Data Set Reduction
Use of Multi-category Proximal SVM for Data Set Reduction S.V.N Vishwanathan and M Narasimha Murty Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India Abstract.
More informationUsing Analytic QP and Sparseness to Speed Training of Support Vector Machines
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Microsoft Research 1 Microsoft Way Redmond, WA 9805 jplatt@microsoft.com Abstract Training a Support Vector Machine
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationImprovements to the SMO Algorithm for SVM Regression
1188 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 5, SEPTEMBER 2000 Improvements to the SMO Algorithm for SVM Regression S. K. Shevade, S. S. Keerthi, C. Bhattacharyya, K. R. K. Murthy Abstract This
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationPerceptron Learning Algorithm (PLA)
Review: Lecture 4 Perceptron Learning Algorithm (PLA) Learning algorithm for linear threshold functions (LTF) (iterative) Energy function: PLA implements a stochastic gradient algorithm Novikoff s theorem
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationKernels and Constrained Optimization
Machine Learning 1 WS2014 Module IN2064 Sheet 8 Page 1 Machine Learning Worksheet 8 Kernels and Constrained Optimization 1 Kernelized k-nearest neighbours To classify the point x the k-nearest neighbours
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationTransfer Learning Algorithms for Image Classification
Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell 1 Motivation Goal: We want to be able to build classifiers for thousands of visual
More informationEfficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper
More informationPerceptron Learning Algorithm
Perceptron Learning Algorithm An iterative learning algorithm that can find linear threshold function to partition linearly separable set of points. Assume zero threshold value. 1) w(0) = arbitrary, j=1,
More informationBuilding an Internet-Scale Publish/Subscribe System
Building an Internet-Scale Publish/Subscribe System Ian Rose Mema Roussopoulos Peter Pietzuch Rohan Murty Matt Welsh Jonathan Ledlie Imperial College London Peter R. Pietzuch prp@doc.ic.ac.uk Harvard University
More informationChakra Chennubhotla and David Koes
MSCBIO/CMPBIO 2065: Support Vector Machines Chakra Chennubhotla and David Koes Nov 15, 2017 Sources mmds.org chapter 12 Bishop s book Ch. 7 Notes from Toronto, Mark Schmidt (UBC) 2 SVM SVMs and Logistic
More informationObject Classification Problem
HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category
More informationA Short SVM (Support Vector Machine) Tutorial
A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange
More informationDS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationSupervised Learning: Nearest Neighbors
CS 2750: Machine Learning Supervised Learning: Nearest Neighbors Prof. Adriana Kovashka University of Pittsburgh February 1, 2016 Today: Supervised Learning Part I Basic formulation of the simplest classifier:
More informationLab 2: Support vector machines
Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationMulti-Threaded Support Vector Machines For Pattern Recognition
Multi-Threaded Support Vector Machines For Pattern Recognition João Gonçalves 1, Noel Lopes 1,2, and Bernardete Ribeiro 1 1 CISUC, Department of Informatics Engineering, University of Coimbra, Portugal
More informationKernel Principal Component Analysis: Applications and Implementation
Kernel Principal Component Analysis: Applications and Daniel Olsson Royal Institute of Technology Stockholm, Sweden Examiner: Prof. Ulf Jönsson Supervisor: Prof. Pando Georgiev Master s Thesis Presentation
More informationDM6 Support Vector Machines
DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More informationConic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding
Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding B. O Donoghue E. Chu N. Parikh S. Boyd Convex Optimization and Beyond, Edinburgh, 11/6/2104 1 Outline Cone programming Homogeneous
More informationSupport Vector Machines and their Applications
Purushottam Kar Department of Computer Science and Engineering, Indian Institute of Technology Kanpur. Summer School on Expert Systems And Their Applications, Indian Institute of Information Technology
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationConclusions. Chapter Summary of our contributions
Chapter 1 Conclusions During this thesis, We studied Web crawling at many different levels. Our main objectives were to develop a model for Web crawling, to study crawling strategies and to build a Web
More informationNetwork Lasso: Clustering and Optimization in Large Graphs
Network Lasso: Clustering and Optimization in Large Graphs David Hallac, Jure Leskovec, Stephen Boyd Stanford University September 28, 2015 Convex optimization Convex optimization is everywhere Introduction
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More informationAutomated Microarray Classification Based on P-SVM Gene Selection
Automated Microarray Classification Based on P-SVM Gene Selection Johannes Mohr 1,2,, Sambu Seo 1, and Klaus Obermayer 1 1 Berlin Institute of Technology Department of Electrical Engineering and Computer
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationConstrained optimization
Constrained optimization A general constrained optimization problem has the form where The Lagrangian function is given by Primal and dual optimization problems Primal: Dual: Weak duality: Strong duality:
More informationParallel Methods for Convex Optimization. A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney
Parallel Methods for Convex Optimization A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney Problems minimize g(x)+f(x; A, b) Sparse regression g(x) =kxk 1 f(x) =kax bk 2 2 mx Sparse SVM g(x) =kxk
More informationTHE WEB SEARCH ENGINE
International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) Vol.1, Issue 2 Dec 2011 54-60 TJPRC Pvt. Ltd., THE WEB SEARCH ENGINE Mr.G. HANUMANTHA RAO hanu.abc@gmail.com
More informationMaximum Margin Clustering Made Practical
Kai Zhang twinsen@cse.ust.hk Ivor W. Tsang ivor@cse.ust.hk James T. Kwok jamesk@cse.ust.hk Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong Abstract
More informationThe flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R
Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationA Novel Approach for Weighted Clustering
A Novel Approach for Weighted Clustering CHANDRA B. Indian Institute of Technology, Delhi Hauz Khas, New Delhi, India 110 016. Email: bchandra104@yahoo.co.in Abstract: - In majority of the real life datasets,
More informationGPUML: Graphical processors for speeding up kernel machines
GPUML: Graphical processors for speeding up kernel machines http://www.umiacs.umd.edu/~balajiv/gpuml.htm Balaji Vasan Srinivasan, Qi Hu, Ramani Duraiswami Department of Computer Science, University of
More informationGTPS Curriculum 4 th Grade Math. Topic: Topic 1 - Generalize Place Value Understanding
Topic: Topic 1 - Generalize Place Value Understanding Generalize place value understanding for multi-digit numbers. 4.NBT.1. Recognize that in a multi-digit whole number, a digit in one place represents
More informationDynamic Embeddings for User Profiling in Twitter
Dynamic Embeddings for User Profiling in Twitter Shangsong Liang 1, Xiangliang Zhang 1, Zhaochun Ren 2, Evangelos Kanoulas 3 1 KAUST, Saudi Arabia 2 JD.com, China 3 University of Amsterdam, The Netherlands
More informationFast Support Vector Machine Training and Classification on Graphics Processors
Fast Support Vector Machine Training and Classification on Graphics Processors Bryan Catanzaro catanzar@eecs.berkeley.edu Narayanan Sundaram narayans@eecs.berkeley.edu Kurt Keutzer keutzer@eecs.berkeley.edu
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationSetting a Good Example: Improving Generalization Performance in Support Vector Machines through Outlier Exclusion
Setting a Good Example: Improving Generalization Performance in Support Vector Machines through Outlier Exclusion P. Dwight Kuo and Wolfgang Banzhaf Department of Computer Science Memorial University of
More informationUsing Analytic QP and Sparseness to Speed Training of Support Vector Machines
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Microsoft Research 1 Microsoft Way Redmond, WA 98052 jplatt@microsoft.com Abstract Training a Support Vector
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 24, 2015 Course Information Website: www.stat.ucdavis.edu/~chohsieh/ecs289g_scalableml.html My office: Mathematical Sciences Building (MSB)
More informationAnonymization Algorithms - Microaggregation and Clustering
Anonymization Algorithms - Microaggregation and Clustering Li Xiong CS573 Data Privacy and Anonymity Anonymization using Microaggregation or Clustering Practical Data-Oriented Microaggregation for Statistical
More informationLearning to Rank Networked Entities
Learning to Rank Networked Entities Alekh Agarwal Soumen Chakrabarti Sunny Aggarwal Presented by Dong Wang 11/29/2006 We've all heard that a million monkeys banging on a million typewriters will eventually
More informationThe Kinect Sensor. Luís Carriço FCUL 2014/15
Advanced Interaction Techniques The Kinect Sensor Luís Carriço FCUL 2014/15 Sources: MS Kinect for Xbox 360 John C. Tang. Using Kinect to explore NUI, Ms Research, From Stanford CS247 Shotton et al. Real-Time
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationLEARNING to rank is a kind of learning based information
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. X, MARCH 2010 1 Ranking Model Adaptation for Domain-Specific Search Bo Geng, Member, IEEE, Linjun Yang, Member, IEEE, Chao Xu, Xian-Sheng
More informationMathematical Programming and Research Methods (Part II)
Mathematical Programming and Research Methods (Part II) 4. Convexity and Optimization Massimiliano Pontil (based on previous lecture by Andreas Argyriou) 1 Today s Plan Convex sets and functions Types
More informationA Framework for adaptive focused web crawling and information retrieval using genetic algorithms
A Framework for adaptive focused web crawling and information retrieval using genetic algorithms Kevin Sebastian Dept of Computer Science, BITS Pilani kevseb1993@gmail.com 1 Abstract The web is undeniably
More informationTowards Performance and Scalability Analysis of Distributed Memory Programs on Large-Scale Clusters
Towards Performance and Scalability Analysis of Distributed Memory Programs on Large-Scale Clusters 1 University of California, Santa Barbara, 2 Hewlett Packard Labs, and 3 Hewlett Packard Enterprise 1
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,
More informationCANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.
CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known
More informationText Categorization (I)
CS473 CS-473 Text Categorization (I) Luo Si Department of Computer Science Purdue University Text Categorization (I) Outline Introduction to the task of text categorization Manual v.s. automatic text categorization
More informationStatus Locality on the Web: Implications for Building Focused Collections
Working Paper Version Published Version available at http://pubsonline.informs.org/doi/abs/.287/isre.2.457. G. Pant, P. Srinivasan. Status Locality on the Web: Implications for Building Focused Collections.
More informationFastText. Jon Koss, Abhishek Jindal
FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words
More informationSupport Vector Machines
Support Vector Machines 64-360 Algorithmic Learning, part 3 Norman Hendrich University of Hamburg, Dept. of Informatics Vogt-Kölln-Str. 30, D-22527 Hamburg hendrich@informatik.uni-hamburg.de 13/06/2012
More informationCOMS 4771 Support Vector Machines. Nakul Verma
COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron
More informationMore Data, Less Work: Runtime as a decreasing function of data set size. Nati Srebro. Toyota Technological Institute Chicago
More Data, Less Work: Runtime as a decreasing function of data set size Nati Srebro Toyota Technological Institute Chicago Outline we are here SVM speculations, other problems Clustering wild speculations,
More informationSoftware Documentation of the Potential Support Vector Machine
Software Documentation of the Potential Support Vector Machine Tilman Knebel and Sepp Hochreiter Department of Electrical Engineering and Computer Science Technische Universität Berlin 10587 Berlin, Germany
More informationMay 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch
May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch 12.1, 9.1 May 8, CODY Machine Learning for finding oil,
More informationRobotics: Science and Systems
Robotics: Science and Systems Model Predictive Control (MPC) Zhibin Li School of Informatics University of Edinburgh Content Concepts of MPC MPC formulation Objective function and constraints Solving the
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationScalable Network Analysis
Inderjit S. Dhillon University of Texas at Austin COMAD, Ahmedabad, India Dec 20, 2013 Outline Unstructured Data - Scale & Diversity Evolving Networks Machine Learning Problems arising in Networks Recommender
More informationParallel & Scalable Machine Learning Introduction to Machine Learning Algorithms
Parallel & Scalable Machine Learning Introduction to Machine Learning Algorithms Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research
More informationSyllabus. 1. Visual classification Intro 2. SVM 3. Datasets and evaluation 4. Shallow / Deep architectures
Syllabus 1. Visual classification Intro 2. SVM 3. Datasets and evaluation 4. Shallow / Deep architectures Image classification How to define a category? Bicycle Paintings with women Portraits Concepts,
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationLINK context is utilized in various Web-based information
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 18, NO. 1, JANUARY 2006 107 Link Contexts in Classifier-Guided Topical Crawlers Gautam Pant and Padmini Srinivasan Abstract Context of a hyperlink
More informationConvex Optimization and Machine Learning
Convex Optimization and Machine Learning Mengliu Zhao Machine Learning Reading Group School of Computing Science Simon Fraser University March 12, 2014 Mengliu Zhao SFU-MLRG March 12, 2014 1 / 25 Introduction
More informationCOMP5318 Knowledge Management & Data Mining Assignment 1
COMP538 Knowledge Management & Data Mining Assignment Enoch Lau SID 20045765 7 May 2007 Abstract 5.5 Scalability............... 5 Clustering is a fundamental task in data mining that aims to place similar
More informationRobust Kernel Methods in Clustering and Dimensionality Reduction Problems
Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust
More informationActiveClean: Interactive Data Cleaning For Statistical Modeling. Safkat Islam Carolyn Zhang CS 590
ActiveClean: Interactive Data Cleaning For Statistical Modeling Safkat Islam Carolyn Zhang CS 590 Outline Biggest Takeaways, Strengths, and Weaknesses Background System Architecture Updating the Model
More informationNon-negative Matrix Factorization for Multimodal Image Retrieval
Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Machine Learning 2015-II Universidad Nacional de Colombia F. González NMF for MM IR ML 2015-II 1 / 54 Outline 1 The
More informationKernel-based online machine learning and support vector reduction
Kernel-based online machine learning and support vector reduction Sumeet Agarwal, V. Vijaya Saradhi and Harish Karnick 1,2 Abstract We apply kernel-based machine learning methods to online learning situations,
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationApplication of rough ensemble classifier to web services categorization and focused crawling
With the expected growth of the number of Web services available on the web, the need for mechanisms that enable the automatic categorization to organize this vast amount of data, becomes important. A
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationSemiparametric Mixed Effecs with Hierarchical DP Mixture
Semiparametric Mixed Effecs with Hierarchical DP Mixture R topics documented: April 21, 2007 hdpm-package........................................ 1 hdpm............................................ 2 hdpmfitsetup........................................
More informationCSC 411: Lecture 14: Principal Components Analysis & Autoencoders
CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Raquel Urtasun & Rich Zemel University of Toronto Nov 4, 2015 Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 1 / 18
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationSimulation Study of Language Specific Web Crawling
DEWS25 4B-o1 Simulation Study of Language Specific Web Crawling Kulwadee SOMBOONVIWAT Takayuki TAMURA, and Masaru KITSUREGAWA Institute of Industrial Science, The University of Tokyo Information Technology
More informationJeff Howbert Introduction to Machine Learning Winter
Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d
More informationDECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES. Fumitake Takahashi, Shigeo Abe
DECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES Fumitake Takahashi, Shigeo Abe Graduate School of Science and Technology, Kobe University, Kobe, Japan (E-mail: abe@eedept.kobe-u.ac.jp) ABSTRACT
More information