Introduction to Machine Learning Lecture 4. Mehryar Mohri Courant Institute and Google Research
|
|
- Meghan Lane
- 5 years ago
- Views:
Transcription
1 Introduction to Machine Learning Lecture 4 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
2 Nearest-Neighbor Algorithms
3 Nearest Neighbor Algorithms Definition: fix k 1, given a labeled sample S =((x 1,y 1 ),...,(x m,y m )) (X {0, 1}) m, the k-nn returns the hypothesis h S defined by x X,h S (x) =1 P i:y i =1 w i> P i:y i =0 w i, where the weights w 1,...,w m are chosen such that w i = 1 if x k i is among the k nearest neighbors of x. 3
4 Voronoi Diagram 4
5 Questions Performance: does it work? Choice of the weights: are there better choices than uniform? In particular, can take into account distance to each nearest neighbor. Choice of the distance metric: can a useful metric be defined (or even learned) for a particular problem? Computation in high dimension: data structures and algorithms to improve upon naive algorithm. 5
6 Bayes Classifier Definition: the Bayes error is defined by R = inf h h measurable Pr [h(x) = y]. (x,y) D the Bayes classifier is a measurable hypothesis achieving that error. 6
7 Set-up Sample drawn i.i.d. according to some distribution Nearest neighbor of x X: Error of hypothesis returned on point x X: where S =((x 1,y 1 ),...,(x m,y m )) (X {0, 1}) m. y(u) NN(S, x) =argmin x in S d(x, x ). R(h S,x)=1 y(hs (x))=y(x), is the label of point u (random variable). D 7
8 Convergence of NN Algorithm Lemma: for any x in support, NN(S, x),x) x with probability one when S +. Proof: Let x be in the support of the distribution, then for any >0, Pr[B(x, )] > 0. Thus, Pr d NN(S, x),x > d NN(S, x),x = 1 Pr[B(x, )] S 0. Since is decreasing with S, this also implies convergence with probability one. 8
9 NN Algorithm - Limit Guarantee Theorem: let h S be the hypothesis returned by the nearest neighbor algorithm. Then, Proof: lim S E S D m[r(h S)] 2R E S D m[r(h S,x)] = Pr [y(nn(s, x)) = y(x)] S Dm = x Pr [y(x ) = y(x) NN(S, x) =x ] Pr S D m[nn(s, x) =x ] = x (1 Pr [y(x )=y(x) NN(S, x) =x ]) Pr S D m[nn(s, x) =x ] = 1 Pr[y x]pr[y x ] x y Y 1 Y /2 Y 1 R. Pr S D m[nn(s, x) =x ]. 9
10 NN Algorithm - Limit Guarantee In view of the lemma, one when S +. Thus, Let lim S + E S D m[r(h S,x)] = NN(S, x) x From this it can be concluded that lim S + E S D m[r(h S)] =, then with probability 1 y Y Pr[y x] 2. E 1 Pr[y x] 2. x D y Y y =argmaxpr[y x] y 1 Pr[y x] 2 =1 Pr[y x] 2 Pr[y x] 2. y Y y=y 10
11 NN Algorithm - Limit Guarantee Now, since the variance is non-negative, 1 Y 1 Thus, in view of, Pr[y x] Pr[y x] 0. Y 1 y=y y=y y=y Pr[y x] =(1 Pr[y x]) E 1 Pr[y x] 2 E x D x D y Y = E x D = E x D 1 Pr[y x] 2 (1 Pr[y x]) 2 Y 1 1 (1 R (x)) 2 R (x) 2 Y 1 2R (x) Y R (x) 2 Y 1 2R Y R 2 Y 1. (using E[R (x) 2 ] E[R (x)] 2 ) 11
12 Notes Similar results for the k-nn algorithm. m = S or (k ) ( k. m 0) Guarantees only for infinite amount of data: machine learning deals with finite samples. arbitrarily slow convergence rate. 12
13 NN Problem Problem: given sample S =((x 1,y 1 ),...,(x m,y m )), find the nearest neighbor of test point x. general problem extensively studied in computer science. exact vs. approximate algorithms. dimensionality N crucial. better algorithms for small intrinsic dimension (e.g., limited doubling dimension). 13
14 NN Problem - Case N = 2 Algorithm: compute Voronoi diagram in O(m log m). point location data structure to determine NN. complexity: O(m) space, O(log m) time. x 14
15 NN Problem - Case N > 2 Voronoi diagram: size in. O m N/2 Linear algorithm (no pre-processing): compute distance x x i for all i [1,m]. complexity of distance computation: Ω(Nm). no additional space needed. Tree-based data structures: pre-processing. often used in applications: k-d trees ( k-dimensional trees). 15
16 k-d Trees Binary space partioning trees. Prominent tree-based data structure. Works for low or medium dimensionality. NN search: O(log m) for randomly distributed points. O(Nm 1 1 N ) in the worst case (Lee and Wong, 1977). Can be extended to k-nn search. High dimension: typically inefficient. approximate NN methods. 16 (Bentley, 1975)
17 k-d Trees - Illustration (3, 5),Y (4, 2),X (5, 9),X (1, 1) (8, 4) (2, 9.5) (7, 5.5) 17
18 k-d Trees - Construction Algorithm: for each non-leaf node, choose dimension (e.g., longest of hyperrectangle). choose pivot (median). split node according to (pivot, dimension). balanced tree, binary space partitioning. 18
19 k-d Trees - NN Search 19
20 k-d Trees - NN Search Algorithm: find region containing x (starting from root node, move to child node based on node test). save region point x 0 as current best. move up tree and recursively search regions intersecting hypersphere S(x, x x 0 ) : update current best if current point is closer. restart search with each intersecting sub-tree. move up tree when no more intersecting subtree. 20
21 References Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, Vol. 18, No. 9, Lee, D. T. and Wong, C. K. Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta Informatica Vol. 9, Issue 1. Springer, NY, Mehryar Mohri - Foundations of Machine Learning 21 Courant Institute, NYU
Geometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationK-Nearest Neighbour (Continued) Dr. Xiaowei Huang
K-Nearest Neighbour (Continued) Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ A few things: No lectures on Week 7 (i.e., the week starting from Monday 5 th November), and Week 11 (i.e., the week
More informationAdvanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b
Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b simas@cs.aau.dk Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees
More informationNon-Parametric Modeling
Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE April 14
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 April 14 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Scalability I K-d
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationNearest Neighbor Methods
Nearest Neighbor Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Nearest Neighbor Methods Learning Store all training examples Classifying a
More informationwould be included in is small: to be exact. Thus with probability1, the same partition n+1 n+1 would be produced regardless of whether p is in the inp
1 Introduction 1.1 Parallel Randomized Algorihtms Using Sampling A fundamental strategy used in designing ecient algorithms is divide-and-conquer, where that input data is partitioned into several subproblems
More informationFoundations of Computer Science Spring Mathematical Preliminaries
Foundations of Computer Science Spring 2017 Equivalence Relation, Recursive Definition, and Mathematical Induction Mathematical Preliminaries Mohammad Ashiqur Rahman Department of Computer Science College
More informationkd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )
kd-trees Invented in 1970s by Jon Bentley Name originally meant 3d-trees, 4d-trees, etc where k was the # of dimensions Now, people say kd-tree of dimension d Idea: Each level of the tree compares against
More informationTesting Continuous Distributions. Artur Czumaj. DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science
Testing Continuous Distributions Artur Czumaj DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science University of Warwick Joint work with A. Adamaszek & C. Sohler Testing
More informationCMPS 3130/6130 Computational Geometry Spring Voronoi Diagrams. Carola Wenk. Based on: Computational Geometry: Algorithms and Applications
CMPS 3130/6130 Computational Geometry Spring 2015 Voronoi Diagrams Carola Wenk Based on: Computational Geometry: Algorithms and Applications 2/19/15 CMPS 3130/6130 Computational Geometry 1 Voronoi Diagram
More informationLogarithmic Time Prediction
Logarithmic Time Prediction John Langford Microsoft Research DIMACS Workshop on Big Data through the Lens of Sublinear Algorithms The Multiclass Prediction Problem Repeatedly 1 See x 2 Predict ŷ {1,...,
More informationOrthogonal Range Search and its Relatives
Orthogonal Range Search and its Relatives Coordinate-wise dominance and minima Definition: dominates Say that point (x,y) dominates (x', y') if x
More informationAn Optimal Algorithm for the Euclidean Bottleneck Full Steiner Tree Problem
An Optimal Algorithm for the Euclidean Bottleneck Full Steiner Tree Problem Ahmad Biniaz Anil Maheshwari Michiel Smid September 30, 2013 Abstract Let P and S be two disjoint sets of n and m points in the
More informationAn Efficient Method of Partitioning High Volumes of Multidimensional Data for Parallel Clustering Algorithms
RESEARCH ARTICLE An Efficient Method of Partitioning High Volumes of Multidimensional Data for Parallel Clustering Algorithms OPEN ACCESS Saraswati Mishra 1, Avnish Chandra Suman 2 1 Centre for Development
More information1. Meshes. D7013E Lecture 14
D7013E Lecture 14 Quadtrees Mesh Generation 1. Meshes Input: Components in the form of disjoint polygonal objects Integer coordinates, 0, 45, 90, or 135 angles Output: A triangular mesh Conforming: A triangle
More informationMultidimensional Indexes [14]
CMSC 661, Principles of Database Systems Multidimensional Indexes [14] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Motivation Examined indexes when search keys are in 1-D space Many interesting
More informationGeometric Data Structures Multi-dimensional queries Nearest neighbour problem References. Notes on Computational Geometry and Data Structures
Notes on Computational Geometry and Data Structures 2008 Geometric Data Structures and CGAL Geometric Data Structures and CGAL Data Structure Interval Tree Priority Search Tree Segment Tree Range tree
More informationApproximate Nearest Neighbor Search. Deng Cai Zhejiang University
Approximate Nearest Neighbor Search Deng Cai Zhejiang University The Era of Big Data How to Find Things Quickly? Web 1.0 Text Search Sparse feature Inverted Index How to Find Things Quickly? Web 2.0, 3.0
More informationwhere is a constant, 0 < <. In other words, the ratio between the shortest and longest paths from a node to a leaf is at least. An BB-tree allows ecie
Maintaining -balanced Trees by Partial Rebuilding Arne Andersson Department of Computer Science Lund University Box 8 S-22 00 Lund Sweden Abstract The balance criterion dening the class of -balanced trees
More informationMulti-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25
Multi-way Search Trees (Multi-way Search Trees) Data Structures and Programming Spring 2017 1 / 25 Multi-way Search Trees Each internal node of a multi-way search tree T: has at least two children contains
More informationDATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines
DATA MINING LECTURE 10B Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines NEAREST NEIGHBOR CLASSIFICATION 10 10 Illustrating Classification Task Tid Attrib1
More informationCellular Tree Classifiers. Gérard Biau & Luc Devroye
Cellular Tree Classifiers Gérard Biau & Luc Devroye Paris, December 2013 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized
More informationWe use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.
The Bayes Classifier We have been starting to look at the supervised classification problem: we are given data (x i, y i ) for i = 1,..., n, where x i R d, and y i {1,..., K}. In this section, we suppose
More informationRandomized Algorithms 2017A - Lecture 10 Metric Embeddings into Random Trees
Randomized Algorithms 2017A - Lecture 10 Metric Embeddings into Random Trees Lior Kamma 1 Introduction Embeddings and Distortion An embedding of a metric space (X, d X ) into a metric space (Y, d Y ) is
More informationAlgorithms. Red-Black Trees
Algorithms Red-Black Trees Red-Black Trees Balanced binary search trees guarantee an O(log n) running time Red-black-tree Binary search tree with an additional attribute for its nodes: color which can
More informationNearest Neighbor Classification
Nearest Neighbor Classification Charles Elkan elkan@cs.ucsd.edu October 9, 2007 The nearest-neighbor method is perhaps the simplest of all algorithms for predicting the class of a test example. The training
More informationParallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs
Lecture 16 Treaps; Augmented BSTs Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Margaret Reid-Miller 8 March 2012 Today: - More on Treaps - Ordered Sets and Tables
More information(2,4) Trees. 2/22/2006 (2,4) Trees 1
(2,4) Trees 9 2 5 7 10 14 2/22/2006 (2,4) Trees 1 Outline and Reading Multi-way search tree ( 10.4.1) Definition Search (2,4) tree ( 10.4.2) Definition Search Insertion Deletion Comparison of dictionary
More informationSelf-Balancing Search Trees. Chapter 11
Self-Balancing Search Trees Chapter 11 Chapter Objectives To understand the impact that balance has on the performance of binary search trees To learn about the AVL tree for storing and maintaining a binary
More informationTask Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document
More informationNearest Neighbor Predictors
Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,
More informationPoint Enclosure and the Interval Tree
C.S. 252 Prof. Roberto Tamassia Computational Geometry Sem. II, 1992 1993 Lecture 8 Date: March 3, 1993 Scribe: Dzung T. Hoang Point Enclosure and the Interval Tree Point Enclosure We consider the 1-D
More informationTree-Weighted Neighbors and Geometric k Smallest Spanning Trees
Tree-Weighted Neighbors and Geometric k Smallest Spanning Trees David Eppstein Department of Information and Computer Science University of California, Irvine, CA 92717 Tech. Report 92-77 July 7, 1992
More informationEfficient Algorithmic Techniques for Several Multidimensional Geometric Data Management and Analysis Problems
Efficient Algorithmic Techniques for Several Multidimensional Geometric Data Management and Analysis Problems Mugurel Ionuţ Andreica Politehnica University of Bucharest, Romania, mugurel.andreica@cs.pub.ro
More informationCS350: Data Structures B-Trees
B-Trees James Moscola Department of Engineering & Computer Science York College of Pennsylvania James Moscola Introduction All of the data structures that we ve looked at thus far have been memory-based
More informationTrapezoid and Chain Methods
C.S. 252 Prof. Roberto Tamassia Computational Geometry Sem. II, 1992 1993 Lecture 05 Date: Febuary 17, 1993 Scribe: Peter C. McCluskey Trapezoid and Chain Methods 1 Trapezoid Method (continued) Continuing
More informationAlgorithms for GIS:! Quadtrees
Algorithms for GIS: Quadtrees Quadtree A data structure that corresponds to a hierarchical subdivision of the plane Start with a square (containing inside input data) Divide into 4 equal squares (quadrants)
More informationCIS265/ Trees Red-Black Trees. Some of the following material is from:
CIS265/506 2-3-4 Trees Red-Black Trees Some of the following material is from: Data Structures for Java William H. Ford William R. Topp ISBN 0-13-047724-9 Chapter 27 Balanced Search Trees Bret Ford 2005,
More informationNearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017
Nearest neighbors Focus on tree-based methods Clément Jamin, GUDHI project, Inria March 2017 Introduction Exact and approximate nearest neighbor search Essential tool for many applications Huge bibliography
More information2 A Plane Sweep Algorithm for Line Segment Intersection
EECS 396/496: Computational Geometry Fall 2017 Lecturer: Huck Bennett Lecture 2: Line Segment Intersection In this lecture, we will study the problem of computing all intersections of a set of line segmentn
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationDatenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser
Datenbanksysteme II: Multidimensional Index Structures 2 Ulf Leser Content of this Lecture Introduction Partitioned Hashing Grid Files kdb Trees kd Tree kdb Tree R Trees Example: Nearest neighbor image
More informationLocality- Sensitive Hashing Random Projections for NN Search
Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade
More informationOutline. Other Use of Triangle Inequality Algorithms for Nearest Neighbor Search: Lecture 2. Orchard s Algorithm. Chapter VI
Other Use of Triangle Ineuality Algorithms for Nearest Neighbor Search: Lecture 2 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology Outline
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More information(2,4) Trees Goodrich, Tamassia (2,4) Trees 1
(2,4) Trees 9 2 5 7 10 14 2004 Goodrich, Tamassia (2,4) Trees 1 Multi-Way Search Tree A multi-way search tree is an ordered tree such that Each internal node has at least two children and stores d -1 key-element
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More informationComputational Geometry
Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess
More informationMesh Generation. Quadtrees. Geometric Algorithms. Lecture 9: Quadtrees
Lecture 9: Lecture 9: VLSI Design To Lecture 9: Finite Element Method To http://www.antics1.demon.co.uk/finelms.html Lecture 9: To Lecture 9: To component not conforming doesn t respect input not well-shaped
More informationAdvanced Set Representation Methods
Advanced Set Representation Methods AVL trees. 2-3(-4) Trees. Union-Find Set ADT DSA - lecture 4 - T.U.Cluj-Napoca - M. Joldos 1 Advanced Set Representation. AVL Trees Problem with BSTs: worst case operation
More informationA note on quickly finding the nearest neighbour
A note on quickly finding the nearest neighbour David Barber Department of Computer Science University College London May 19, 2014 1 Finding your nearest neighbour quickly Consider that we have a set of
More informationHierarchical Ordering for Approximate Similarity Ranking
Hierarchical Ordering for Approximate Similarity Ranking Joselíto J. Chua and Peter E. Tischer School of Computer Science and Software Engineering Monash University, Victoria 3800, Australia jjchua@mail.csse.monash.edu.au
More informationAlgorithms for Memory Hierarchies Lecture 2
Algorithms for emory Hierarchies Lecture Lecturer: odari Sitchianva Scribes: Robin Rehrmann, ichael Kirsten Last Time External memory (E) model Scan(): O( ) I/Os Stacks / queues: O( 1 ) I/Os / elt ergesort:
More informationDistribution of the Number of Encryptions in Revocation Schemes for Stateless Receivers
Discrete Mathematics and Theoretical Computer Science DMTCS vol. (subm.), by the authors, 1 1 Distribution of the Number of Encryptions in Revocation Schemes for Stateless Receivers Christopher Eagle 1
More informationarxiv: v1 [cs.cg] 8 Jan 2018
Voronoi Diagrams for a Moderate-Sized Point-Set in a Simple Polygon Eunjin Oh Hee-Kap Ahn arxiv:1801.02292v1 [cs.cg] 8 Jan 2018 Abstract Given a set of sites in a simple polygon, a geodesic Voronoi diagram
More information4 Generating functions in two variables
4 Generating functions in two variables (Wilf, sections.5.6 and 3.4 3.7) Definition. Let a(n, m) (n, m 0) be a function of two integer variables. The 2-variable generating function of a(n, m) is F (x,
More informationCS 350 : Data Structures B-Trees
CS 350 : Data Structures B-Trees David Babcock (courtesy of James Moscola) Department of Physical Sciences York College of Pennsylvania James Moscola Introduction All of the data structures that we ve
More informationVQ Encoding is Nearest Neighbor Search
VQ Encoding is Nearest Neighbor Search Given an input vector, find the closest codeword in the codebook and output its index. Closest is measured in squared Euclidean distance. For two vectors (w 1,x 1,y
More informationLecture Notes: Euclidean Traveling Salesman Problem
IOE 691: Approximation Algorithms Date: 2/6/2017, 2/8/2017 ecture Notes: Euclidean Traveling Salesman Problem Instructor: Viswanath Nagarajan Scribe: Miao Yu 1 Introduction In the Euclidean Traveling Salesman
More informationSearch. The Nearest Neighbor Problem
3 Nearest Neighbor Search Lab Objective: The nearest neighbor problem is an optimization problem that arises in applications such as computer vision, pattern recognition, internet marketing, and data compression.
More informationAn improvement in the build algorithm for Kd-trees using mathematical mean
An improvement in the build algorithm for Kd-trees using mathematical mean Priyank Trivedi, Abhinandan Patni, Zeon Trevor Fernando and Tejaswi Agarwal School of Computing Sciences and Engineering, VIT
More informationStatic Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept.
Advanced Computer Architecture (0630561) Lecture 17 Static Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. INs Taxonomy: An IN could be either static or dynamic. Connections in a
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More informationClustering Billions of Images with Large Scale Nearest Neighbor Search
Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton
More informationUsing the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University
Using the Holey Brick Tree for Spatial Data in General Purpose DBMSs Georgios Evangelidis Betty Salzberg College of Computer Science Northeastern University Boston, MA 02115-5096 1 Introduction There is
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationLamé s Theorem. Strings. Recursively Defined Sets and Structures. Recursively Defined Sets and Structures
Lamé s Theorem Gabriel Lamé (1795-1870) Recursively Defined Sets and Structures Lamé s Theorem: Let a and b be positive integers with a b Then the number of divisions used by the Euclidian algorithm to
More information1D Range Searching (cont) 5.1: 1D Range Searching. Database queries
kd!trees SMD156 Lecture 5 Orthogonal Range Searching Database queries Divide!and!conquer Orthogonal Range Searching Range Trees Searching for items using many ranges of criteria at a time SMD156 Computational
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More informationAverage case analysis of dynamic geometric optimization
Average case analysis of dynamic geometric optimization David Eppstein Department of Information and Computer Science University of California, Irvine, CA 92717 May 19, 1995 Abstract We maintain the maximum
More informationHidden Surface Elimination: BSP trees
Hidden Surface Elimination: BSP trees Outline Binary space partition (BSP) trees Polygon-aligned 1 BSP Trees Basic idea: Preprocess geometric primitives in scene to build a spatial data structure such
More informationModule 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.
The Lecture Contains: Index structure Binary search tree (BST) B-tree B+-tree Order file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture13/13_1.htm[6/14/2012
More informationOrthogonal Range Queries
Orthogonal Range Piotr Indyk Range Searching in 2D Given a set of n points, build a data structure that for any query rectangle R, reports all points in R Kd-trees [Bentley] Not the most efficient solution
More informationSearch Trees - 1 Venkatanatha Sarma Y
Search Trees - 1 Lecture delivered by: Venkatanatha Sarma Y Assistant Professor MSRSAS-Bangalore 11 Objectives To introduce, discuss and analyse the different ways to realise balanced Binary Search Trees
More informationProperties of red-black trees
Red-Black Trees Introduction We have seen that a binary search tree is a useful tool. I.e., if its height is h, then we can implement any basic operation on it in O(h) units of time. The problem: given
More informationA Comparison of Nearest Neighbor Search Algorithms for Generic Object Recognition
A Comparison of Nearest Neighbor Search Algorithms for Generic Object Recognition Ferid Bajramovic 1, Frank Mattern 1, Nicholas Butko 2, Joachim Denzler 1 1 Chair for Computer Vision, Friedrich-Schiller-University
More informationModule 8: Range-Searching in Dictionaries for Points
Module 8: Range-Searching in Dictionaries for Points CS 240 Data Structures and Data Management T. Biedl K. Lanctot M. Sepehri S. Wild Based on lecture notes by many previous cs240 instructors David R.
More informationBalanced Search Trees. CS 3110 Fall 2010
Balanced Search Trees CS 3110 Fall 2010 Some Search Structures Sorted Arrays Advantages Search in O(log n) time (binary search) Disadvantages Need to know size in advance Insertion, deletion O(n) need
More information1-Nearest Neighbor Boundary
Linear Models Bankruptcy example R is the ratio of earnings to expenses L is the number of late payments on credit cards over the past year. We would like here to draw a linear separator, and get so a
More informationComputational Geometry
Range searching and kd-trees 1 Database queries 1D range trees Databases Databases store records or objects Personnel database: Each employee has a name, id code, date of birth, function, salary, start
More informationQuadtrees and Meshing
Computational Geometry Lecture INSTITUT FÜR THEORETISCHE INFORMATIK FAKULTÄT FÜR INFORMATIK Tamara Mchedlidze Darren Strash 14.12.2015 Motivation: Meshing PC Board Layouts To simulate the heat produced
More informationData Structures in Java. Session 23 Instructor: Bert Huang
Data Structures in Java Session 23 Instructor: Bert Huang http://www.cs.columbia.edu/~bert/courses/3134 Announcements Homework 6 due Dec. 10, last day of class Final exam Thursday, Dec. 17th, 4-7 PM, Hamilton
More informationSampling-based Planning 2
RBE MOTION PLANNING Sampling-based Planning 2 Jane Li Assistant Professor Mechanical Engineering & Robotics Engineering http://users.wpi.edu/~zli11 Problem with KD-tree RBE MOTION PLANNING Curse of dimension
More informationBalanced search trees. DS 2017/2018
Balanced search trees. DS 2017/2018 Red-black trees Symmetric binary B-tree, Rudolf Bayer, 1972. The balancing is maintained by using a coloring of the nodes. The red-black trees are binary search trees
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationQuestion 7.11 Show how heapsort processes the input:
Question 7.11 Show how heapsort processes the input: 142, 543, 123, 65, 453, 879, 572, 434, 111, 242, 811, 102. Solution. Step 1 Build the heap. 1.1 Place all the data into a complete binary tree in the
More informationComparing Implementations of Optimal Binary Search Trees
Introduction Comparing Implementations of Optimal Binary Search Trees Corianna Jacoby and Alex King Tufts University May 2017 In this paper we sought to put together a practical comparison of the optimality
More informationWhat is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.
What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem
More information8. Sorting II. 8.1 Heapsort. Heapsort. [Max-]Heap 6. Heapsort, Quicksort, Mergesort. Binary tree with the following properties Wurzel
Heapsort, Quicksort, Mergesort 8. Sorting II 8.1 Heapsort [Ottman/Widmayer, Kap. 2.3, Cormen et al, Kap. 6] 9 210 Heapsort [Max-]Heap 6 Binary tree with the following properties Wurzel Inspiration from
More informationBalanced Search Trees
Balanced Search Trees Computer Science E-22 Harvard Extension School David G. Sullivan, Ph.D. Review: Balanced Trees A tree is balanced if, for each node, the node s subtrees have the same height or have
More informationOrthogonal range searching. Orthogonal range search
CG Lecture Orthogonal range searching Orthogonal range search. Problem definition and motiation. Space decomposition: techniques and trade-offs 3. Space decomposition schemes: Grids: uniform, non-hierarchical
More informationSpanners of Complete k-partite Geometric Graphs
Spanners of Complete k-partite Geometric Graphs Prosenjit Bose Paz Carmi Mathieu Couture Anil Maheshwari Pat Morin Michiel Smid May 30, 008 Abstract We address the following problem: Given a complete k-partite
More informationIntroduction to Randomized Algorithms
Introduction to Randomized Algorithms Gopinath Mishra Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108, India. Organization 1 Introduction 2 Some basic ideas from
More information7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting
Simple Sorting 7. Sorting I 7.1 Simple Sorting Selection Sort, Insertion Sort, Bubblesort [Ottman/Widmayer, Kap. 2.1, Cormen et al, Kap. 2.1, 2.2, Exercise 2.2-2, Problem 2-2 19 197 Problem Algorithm:
More informationData Structures and Algorithms
Data Structures and Algorithms CS245-2008S-19 B-Trees David Galles Department of Computer Science University of San Francisco 19-0: Indexing Operations: Add an element Remove an element Find an element,
More informationMachine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016
Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised
More information1 Minimum Cut Problem
CS 6 Lecture 6 Min Cut and Karger s Algorithm Scribes: Peng Hui How, Virginia Williams (05) Date: November 7, 07 Anthony Kim (06), Mary Wootters (07) Adapted from Virginia Williams lecture notes Minimum
More informationComputing intersections in a set of line segments: the Bentley-Ottmann algorithm
Computing intersections in a set of line segments: the Bentley-Ottmann algorithm Michiel Smid October 14, 2003 1 Introduction In these notes, we introduce a powerful technique for solving geometric problems.
More information