PASCAL. A Parallel Algorithmic SCALable Framework for N-body Problems. Laleh Aghababaie Beni, Aparna Chandramowlishwaran. Euro-Par 2017.
|
|
- Phoebe Lane
- 6 years ago
- Views:
Transcription
1 PASCAL A Parallel Algorithmic SCALable Framework for N-body Problems Laleh Aghababaie Beni, Aparna Chandramowlishwaran Euro-Par 2017
2 Outline Introduction PASCAL Framework Space Partitioning Trees Tree Traversal Prune/Approximate Generators Experiments & Results Conclusions & Future Work Optimizations & Parallelization
3 Outline Introduction PASCAL Framework Space Partitioning Trees Tree Traversal Prune/Approximate Generators Experiments & Results Conclusions & Future Work Optimizations & Parallelization
4 N-body calculations q : F (q) = r ( {q}) C r q r q 3 Force computation q : AllNN(q) =argmin r R d(q, r) q : KDE(q) = 1 R r R K(q, r) Nearest neighbors Kernel density estimation q : Range(q) = r R I(dist(q, r)) h) Range count
5 N-body calculations What do these have in common? q : F (q) = r ( {q}) C r q r q 3 Force computation q : AllNN(q) =argmin r R d(q, r) q : KDE(q) = 1 R r R K(q, r) Nearest neighbors Kernel density estimation q : Range(q) = I(dist(q, r)) h) Range count r R
6 N-body calculations What do these have in common? q : F (q) = r ( {q}) C r q r q 3 Force computation q : AllNN(q) =argmin r R d(q, r) q : KDE(q) = 1 R r R K(q, r) Nearest neighbors Kernel density estimation q : Range(q) = I(dist(q, r)) h) Range count r R Consider pairs of points naïvely O(N 2 )
7 Commonality: Optimal approximation algorithms q : F (q) = r ( {q}) C r q r q 3 Force computation Hierarchical tree-based approximation algorithms for force computations, e.g., Barnes-Hut or FMM Evaluate interactions Tree traversals Store aggregate data at nodes, e.g., bounding box, mass
8 N-body problems in other domains Problem All Nearest Neighbors All Range Search All Range Count Naive Bayes Classifier Mixture Model E-step K-means E-step Mixture Model Log-likelihood Kernel Density Estimation Kernel Density Bayes Classifier 2-point (cross-)correlation Nadaraya-Watson Regression Thermodynamic Average Largest-span set Closest Pair Minimum Spanning Tree Coulombic Interaction Average Density Wave function Hausdorff Distance Intrinsic (fractal) Dimension Operators 8, arg min 8, [ arg 8, 8, arg max 8, 8 8, arg min X, log X 8, 8, arg max, 8,, max,..., max arg min, arg min 8, arg min 8,, 8, max, min, Kernel Function xq xr I(hmin < xq xr < hmax ) I(hmin < xq xr < hmax ) p 1 T 1 (1/ 2 k )e 2 (xi µk ) k (xi µk ) P (Ck ) p 1 T 1 (1/ 2 k )e 2 (xi µk ) k (xi µk ) xq p (1/ 2 k )e ( ( xr xq xq h xr ) h xr )P (Ck ) xr < h) I( xq yr µk )T k 1 (xi µk ) 1 2 (xi ( xq ( xq ( xq xq xq h xr ) xr ) xr ) xr xr q r xq xr I( xq xr < h) ( xq xr ) xq I( xq xr xr < h) Each problem has a set of operators and a kernel function
9 Why N-body methods? One of the original seven dwarfs or motifs FMM listed among the top 10 algorithms having the greatest influence in 20 th century EM is one of the top 10 algorithms having the highest impact in data mining Applications Machine learning Computer vision Computational geometry Scientific computing
10 Key Ideas and Findings An algorithmic framework for N-body problems Automatically generates prune & approximation conditions Results in O(N log N) and O(N) algorithms Domain-specific optimizations and parallelization Show x speedup on 6 different algorithms compared to state-of-art libraries/softwares Out-of-the-box new optimal algorithms O(N log N) EM algorithm for GMM s O(N) algorithm for Hausdorff distance
11 Outline Introduction PASCAL Framework Space Partitioning Trees Tree Traversal Prune/Approximate Generators Experiments & Results Conclusions & Future Work Optimizations & Parallelization
12 PASCAL Framework Datasets Space-partitioning Trees Kd-tree Tree Traversal Schemes Multi tree traversals BaseCase Prune/Approximate ComputeApproximate N-body spec.: Operators & Kernel function Prune/Approximate condition generator Domain-Specific Optimizations Parallelization Optimized code
13 Tree Construction Recursively divide space until each box has at most q points.
14 Tree Construction Recursively divide space until each box has at most q points.
15 Tree Construction Recursively divide space until each box has at most q points.
16 Tree Traversal R
17 Tree Traversal Prune/Approx()? R
18 Tree Traversal Prune/Approx()? NO R
19 Tree Traversal R
20 Tree Traversal Prune/Approx()? R
21 Tree Traversal Prune/Approx()? NO R
22 Tree Traversal R
23 Tree Traversal R BaseCase() Direct L RL O(q 2 )
24 Tree Traversal R
25 Tree Traversal R Prune/Approx()?
26 Tree Traversal R Prune/Approx()? YES
27 Tree Traversal R If Prune/Approx() is true, discard the entire subtree for pruning problems
28 Tree Traversal BaseCase() R
29 Tree Traversal R
30 Tree Traversal R Prune/Approx()? YES
31 Tree Traversal R If Prune/Approx() is true, replace the subtree with the centroid for approximation problems
32 Tree Traversal R ApproxCompute()
33 Tree Traversal BaseCase() R
34 Tree Traversal R
35 Prune/Approximate Condition Generator Prune e.g., Hausdorff Distance
36 Hausdorff Distance
37 Hausdorff Distance
38 Hausdorff Distance R
39 Hausdorff Distance R
40 Hausdorff Distance R
41 Hausdorff Distance R
42 Hausdorff Distance R
43 Hausdorff Distance R
44 Prune/Approximate Condition Generator Prune e.g., Hausdorff Distance Approximation e.g., Expectation Maximization (EM) E-step M-step Log-likelihood
45 Approximate Condition for EM
46 Approximate Condition for EM R
47 Approximate Condition for EM R
48 Approximate Condition for EM Kmin R
49 Approximate Condition for EM R
50 Approximate Condition for EM Kmax R
51 Approximate Condition for EM R
52 Approximate Condition for EM center R center
53 Approximate Condition for EM center K center R center
54 Approximate Condition for EM Kmax -Kmin < X K center center K center R center
55 Approximate Condition for EM Kmax -Kmin < user-controlled accuracy X K center center K center R center
56 Approximate Condition for EM Kmax -Kmin < user-controlled accuracy X K center center K center R center log liklihood: E-step: (r i,max r i,min ) < r i,mean (i =1,...,K)
57 Prune Condition for Hausdorff distance:
58 Prune Condition for Hausdorff distance: R
59 Prune Condition for Hausdorff distance: R
60 Prune Condition for Hausdorff distance: border point R
61 Prune Condition for Hausdorff distance: border point R op 1 ( 1, K(x q,x r ) op 2 ( 2, K(x q,x r ))) s.t. 8x q 2 Nq border, 8x r 2 Nr border
62 Outline Introduction PASCAL Framework Space Partitioning Trees Tree Traversal Prune/Approximate Generators Experiments & Results Conclusions & Future Work Optimizations & Parallelization
63 Optimizations Incremental bounding box calculation
64 Optimizations Incremental bounding box calculation
65 Optimizations Incremental bounding box calculation
66 Optimizations Incremental bounding box calculation Only update the dimension that is split at each node
67 Optimizations Incremental bounding box calculation Only update the dimension that is split at each node
68 Optimizations Incremental bounding box calculation Optimal Metric Calculation Reduced distance e.g., squared Euclidean distance Eliminates expensive sqrt instruction with long latencies Partial distance Big payoff for large dimensional datasets
69 Optimizations Incremental bounding box calculation Optimal Metric Calculation Reduced distance e.g., squared Euclidean distance Eliminates expensive sqrt instruction with long latencies Partial distance Big payoff for large dimensional datasets Incremental distance calculation Node-to-node distance computed incrementally from parent s distance in constant time
70 Parallelization
71 Parallelization cilk_spawn
72 Parallelization cilk_spawn
73 Parallelization cilk_spawn Stop spawning when #cilk threads = #physical threads t0 t1 t2 t3
74 Parallelization Task parallelism cilk_spawn Stop spawning when #cilk threads = #physical threads t0 t1 t2 t3 Data parallelism
75 Parallelization Task parallelism cilk_spawn Stop spawning when #cilk threads = #physical threads t0 t1 t2 t3 Data parallelism Pruning/Approximation causes load imbalance
76 Outline Introduction PASCAL Framework Space Partitioning Trees Tree Traversal Prune/Approximate Generators Experiments & Results Conclusions & Future Work Optimizations & Parallelization
77 Experimental Setup Architecture Dual-socket Intel Xeon E v3 processor (Haswell-EP) Each socket has 8 cores Theoretical peak performance of GFlops Compiler Intel C++ complier (icpc v15.0.2) Python v2.7.6 (Scikit-learn) Java v1.8.0 (Weka)
78 Case Studies (Direct) Nearest Neighbors Range-Search I ( x q x r apple h) Kernel Density Estimation Hausdorff Distance
79 Case Studies (Iterative) Expectation Maximization (EM) E-step M-step Log-likelihood Euclidean Minimum Spanning Tree
80 Library Comparison Weka: 6,677,053 downloads, written in Java Scikit-learn: 121,841 downloads, written in Python MATLAB: over 1,000,000 licensed users, uses C in backend MLPACK: exploits C++ language features to provide maximum performance Speedup EM 6.2 Base Base MATLAB WEKA MLPACK Scikit PASCAL Base Base 12.3 Yahoo! HIGGS Census KDD IHEPC Base 160 Speedup knn Base Base Base Base Yahoo! HIGGS Census KDD IHEPC Base
81 Speedup Breakdown 1.6, 3.2, and 53.7 respectively for the same dataset. KNN EM KDE HD RS EMST Alg +Opt +Par Alg +Opt +Par Alg +Opt +Par Alg +Opt +Par Alg +Opt +Par Alg +Opt +Par Yahoo! HIGGS Census KDD IHEPC
82 Scalability
83 Summary and Status First generalized algorithmic framework for N-body problems Out-of-the-box new optimal algorithms O(N log N) EM algorithm O(N) Hausdorff distance algorithm x speedup from optimal tree algorithm, domain- Generalizes to more than two operators specific optimizations and parallelization optimizations and parallelization Short-term: DSL + code generator for base-case, Long-term: Extend to GPUs and distributed memory systems
Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink
Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationAccelerated Machine Learning Algorithms in Python
Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More informationTreelogy: A Benchmark Suite for Tree Traversals
Purdue University Programming Languages Group Treelogy: A Benchmark Suite for Tree Traversals Nikhil Hegde, Jianqiao Liu, Kirshanthan Sundararajah, and Milind Kulkarni School of Electrical and Computer
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationADVANCED CLASSIFICATION TECHNIQUES
Admin ML lab next Monday Project proposals: Sunday at 11:59pm ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 Fall 2014 Project proposal presentations Machine Learning: A Geometric View 1 Apples
More informationBumptrees for Efficient Function, Constraint, and Classification Learning
umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationNotes and Announcements
Notes and Announcements Midterm exam: Oct 20, Wednesday, In Class Late Homeworks Turn in hardcopies to Michelle. DO NOT ask Michelle for extensions. Note down the date and time of submission. If submitting
More informationIntro to Artificial Intelligence
Intro to Artificial Intelligence Ahmed Sallam { Lecture 5: Machine Learning ://. } ://.. 2 Review Probabilistic inference Enumeration Approximate inference 3 Today What is machine learning? Supervised
More informationAn Introduction to PDF Estimation and Clustering
Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 An Introduction to PDF Estimation and Clustering David Corrigan corrigad@tcd.ie Electrical and Electronic Engineering Dept., University
More informationTowards the world s fastest k-means algorithm
Greg Hamerly Associate Professor Computer Science Department Baylor University Joint work with Jonathan Drake May 15, 2014 Objective function and optimization Lloyd s algorithm 1 The k-means clustering
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationMachine Learning and Pervasive Computing
Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationPARALLEL ALGORITHMS FOR NEAREST NEIGHBOR SEARCH PROBLEMS IN HIGH DIMENSIONS.
PARALLEL ALGORITHMS FOR NEAREST NEIGHBOR SEARCH PROBLEMS IN HIGH DIMENSIONS. BO XIAO AND GEORGE BIROS Abstract. The nearest neighbor search problem in general dimensions finds application in computational
More informationNearest Neighbor Search by Branch and Bound
Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to
More informationGPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC
GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT
More information4. Ad-hoc I: Hierarchical clustering
4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationdoc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague
Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationFaster Cover Trees. Mike Izbicki and Christian R. Shelton UC Riverside. Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, / 21
Faster Cover Trees Mike Izbicki and Christian R. Shelton UC Riverside Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 1 / 21 Outline Why care about faster cover trees? Making cover trees
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationChapter 4: Algorithms CS 795
Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationDATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationKartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18
Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression
More informationA Mixed Hierarchical Algorithm for Nearest Neighbor Search
A Mixed Hierarchical Algorithm for Nearest Neighbor Search Carlo del Mundo Virginia Tech 222 Kraft Dr. Knowledge Works II Building Blacksburg, VA cdel@vt.edu ABSTRACT The k nearest neighbor (knn) search
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationVECTOR SPACE CLASSIFICATION
VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei wwei@idi.ntnu.no Lecture
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationPerformance impact of dynamic parallelism on different clustering algorithms
Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu
More informationHierarchical Decompositions of Kernel Matrices
Hierarchical Decompositions of Kernel Matrices Bill March UT Austin On the job market! Dec. 12, 2015 Joint work with Bo Xiao, Chenhan Yu, Sameer Tharakan, and George Biros Kernel Matrix Approximation Inputs:
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationMore Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA
More Learning Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA 1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationTour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers
Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background
More informationIEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde
IEE 520 Data Mining Project Report Shilpa Madhavan Shinde Contents I. Dataset Description... 3 II. Data Classification... 3 III. Class Imbalance... 5 IV. Classification after Sampling... 5 V. Final Model...
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationBranch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits
Branch and Bound Algorithms for Nearest Neighbor Search: Lecture 1 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology 1 / 36 Outline 1 Welcome
More informationOn the limits of (and opportunities for?) GPU acceleration
On the limits of (and opportunities for?) GPU acceleration Aparna Chandramowlishwaran, Jee Choi, Kenneth Czechowski, Murat (Efe) Guney, Logan Moon, Aashay Shringarpure, Richard (Rich) Vuduc HotPar 10,
More informationHierarchical Clustering 4/5/17
Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction
More informationVoronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013
Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer
More informationOutline. Multivariate analysis: Least-squares linear regression Curve fitting
DATA ANALYSIS Outline Multivariate analysis: principal component analysis (PCA) visualization of high-dimensional data clustering Least-squares linear regression Curve fitting e.g. for time-course data
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationClustering part II 1
Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationPredicting GPU Performance from CPU Runs Using Machine Learning
Predicting GPU Performance from CPU Runs Using Machine Learning Ioana Baldini Stephen Fink Erik Altman IBM T. J. Watson Research Center Yorktown Heights, NY USA 1 To exploit GPGPU acceleration need to
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationData Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.
Data Mining Chapter 4. Algorithms: The Basic Methods (Covering algorithm, Association rule, Linear models, Instance-based learning, Clustering) 1 Covering approach At each stage you identify a rule that
More informationDistribution-free Predictive Approaches
Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More informationDATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines
DATA MINING LECTURE 10B Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines NEAREST NEIGHBOR CLASSIFICATION 10 10 Illustrating Classification Task Tid Attrib1
More informationClassifiers and Detection. D.A. Forsyth
Classifiers and Detection D.A. Forsyth Classifiers Take a measurement x, predict a bit (yes/no; 1/-1; 1/0; etc) Detection with a classifier Search all windows at relevant scales Prepare features Classify
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationA comparative study of k-nearest neighbour techniques in crowd simulation
A comparative study of k-nearest neighbour techniques in crowd simulation Jordi Vermeulen Arne Hillebrand Roland Geraerts Department of Information and Computing Sciences Utrecht University, The Netherlands
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationCase-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationParametrizing the easiness of machine learning problems. Sanjoy Dasgupta, UC San Diego
Parametrizing the easiness of machine learning problems Sanjoy Dasgupta, UC San Diego Outline Linear separators Mixture models Nonparametric clustering Nonparametric classification and regression Nearest
More informationBeyond Online Aggregation: Parallel and Incremental Data Mining with MapReduce Joos-Hendrik Böse*, Artur Andrzejak, Mikael Högqvist
Beyond Online Aggregation: Parallel and Incremental Data Mining with MapReduce Joos-Hendrik Böse*, Artur Andrzejak, Mikael Högqvist *ICSI Berkeley Zuse Institut Berlin 4/26/2010 Joos-Hendrik Boese Slide
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu November 7, 2017 Learnt Clustering Methods Vector Data Set Data Sequence Data Text
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationSupervised and Unsupervised Learning (II)
Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised
More informationInformation Filtering for arxiv.org
Information Filtering for arxiv.org Bandits, Exploration vs. Exploitation, & the Cold Start Problem Peter Frazier School of Operations Research & Information Engineering Cornell University with Xiaoting
More informationUnsupervised learning on Color Images
Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra
More informationCSC 411: Lecture 05: Nearest Neighbors
CSC 411: Lecture 05: Nearest Neighbors Raquel Urtasun & Rich Zemel University of Toronto Sep 28, 2015 Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 1 / 13 Today Non-parametric models
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More information