arxiv: v2 [cs.lg] 13 Jun 2016
|
|
- Meredith Powell
- 5 years ago
- Views:
Transcription
1 Eicient Estimation o or the Nearest Neighbors Class o Methods - unpublished wor rom 20 - Alesander Lodwich, Faisal Shaait and Thomas Breuel arxiv: v2 [cs.lg] 3 Jun 206 contact ino: alesander[at]lodwich.net Abstract. The Nearest Neighbors (NN) method has received much attention in the past decades, where some theoretical bounds on its perormance were identiied and where practical optimizations were proposed or maing it wor airly well in high dimensional spaces and on large datasets. From countless experiments o the past it became widely accepted that the value o has a signiicant impact on the perormance o this method. However, the eicient optimization o this parameter has not received so much attention in literature. Today, the most common approach is to cross-validate or bootstrap this value or all values in question. This approach orces distances to be recomputed many times, even i eicient methods are used. Hence, estimating the optimal can become expensive even on modern systems. Frequently, this circumstance leads to a sparse manual search o. In this paper we want to point out that a systematic and thorough estimation o the parameter can be perormed eiciently. The discussed approach relies on large matrices, but we want to argue, that in practice a higher space complexity is oten much less o a problem than repetitive distance computations. Keywords: nearest neighbor, optimization, benchmaring Introduction Since the introduction o the Nearest Neighbor (NN) method by Fix and Hodges in 95 [] a lot o dierent variants o it have appeared in order to mae it suitable to dierent scenarios. The most notable improvements were done in terms o adaptive distance metrics [2][3][4], ast access via space partitioning (paced R* trees [5], d-trees[6], X-trees[7], SPY-TEC [8]), nowledge base (prototype) pruning ([9],[0]) or classiication based on sensitive distributed data ([3],[4],[5]). An overview over the state-o-the-art in nearest neighbor techniques is given in []. The continuing richness o investigation wor into nearest neighbor can be explained with the omnipresence o CBR (Case Based Reasoning) type o problems [2] or just rom the practical point o view o its massive parallelizability or simply populartiy. Ater all nearest neighbor is conceptually easy to understand - many students get to learn the nearest neighbor as the irst classiier.
2 All o the beorehand mentioned advances evolve around the question o how to optimize the NN s distance measure or retrieving. The method s apparent laziness might be the reason why a ast preoptimization o has not been paid a lot attention to. The proper choice o is an important actor or achieving maximum perormance o a NN. However, as we will show, conventional optimization via cross-validation or bootstrapping is slow and promises potential or being sped up. Thereore, we will devote this paper to the concept o ast optimization. There is wor addressing this issue in an alternative way by introducing incremental NN classiiers based on dierent types o trees. This class o nearest neighbors attempts to eliminate the inluence o by choosing the right ad hoc. The rationale behind this method is that or classiication tass the exact majority o a speciic class label is not necessarily interesting. It is only interesting when nearest neighbor is used as a density estimator. In other cases it is generally enough to eel sae about which class label rules the nearest set. This class o nearest neighbor starts by polling a minimal amount o nearest neighbors. Then it analyzes the labels and i the retrieved collection o labels is indecisive it will poll more nearest neighbors until it considers the collection decisive enough. Naturally, the method does not scale to small values o, because e.g. a = will never be indecisive. This is a problem because we now rom experiments that small are oten optimal. Incremental nearest neighbors have their strength in very large databases where typical queries do not need to compute all relative distances. During a lietime o a large database some distances might not be computed at all. The lazy distance computation mae incremental nearest neighbor methods ideal candidates or real time tass operating on large volatile data. However, in a cross validation setup which needs to compute all distances incremental NNs cannot play out their strengths. Because o the restrictions and intended use we exempted incremental methods rom investigation in this paper. We organize this paper in three parts. In the irst part we will study the options to mae the estimation o as ast as possible, in the second part we will experimentally compare the result with the conventional approach and in the last part we will draw a conclusion. 2 Stating the Problem The NN is commonly considered a classiier rom the area o supervised learning theory. In this theory there exists a training unction T that delivers a model m based on a set o options o, a matrix o example values V and a vector o labels l o respective size (m = T (o, V, l)). In turn, the model m is used in a classiication unction C that is presented with a matrix o new examples V and its tas is to deliver a vector o new labels l (l = C(m, V )). T and C must be related but there is no restriction on what the model can be. It can be a set o complex items, a matrix, a vector or just a single value, indeed.
3 From the perspective o a human the purpose o a model is to mae predictions about the uture. In order to be able to do this the human brain requires a simpliication o the world. Only with the simpliication o the world to a reduced number o variables and rules it can compute uture states aster than they occur. In case o NN the model m consists o the data and o the smoothing parameter. This s, that the unction T is an identity unction between model and parameters. This conlicts with the common notion o a model because the production o models commonly implies reduction. However, the collected data already are a reduction o the world! From a practical point o view they can be considered as representative model states o the world and the data in the model is used to predict the class variable o a new vector beore it is actually recorded. This its perectly well with the original notion o models. However, the model is only ixed, when is ixed. According to the ramewor, ixing the model (getting the right value or ) is the job o the unction T. Many training techniques in machine learning use optimization strategies developed or real numbers and open parameter spaces. This is not suitable or as it is an integer value and has nown let and right limits: an ideal candidate or ull search. Most ramewors or pattern recognition oer macro optimization or the remaing model parameters that express themsel in the initial training options o. The structural compatibility o the NN with the pattern recognition ramewors macro optimization unctions seduces the users very oten to macro optimize. The consequence o this is that NN must compute distances repetitively as it cannot assume that speciic vectors will simply exchange their role between training and testing in the uture. In case o NN this is exceptionally regrettable. What does macro optimization or the computational complexity? Here we assume - but without loose o generality - that the dataset V is o size n (nrows in matrix V ) and can be exactly divided into equally sized partitions ready to be rearranged into dierent train and test setups. Since everything is being recomputed the computational complexity or this ind o cross-validation o or a brute NN is O (ˆ n ). 2 ˆ is the size o the tested range o and since we are considering ull search we accept that ˆ depends on the size n o the dataset and the number o olds. This s that the ˆ = n. The scan o nearest sets yields a partial complexity o O( n 2 ). Hence, or ull ( ( ) 2 search the total complexity is O n 3 + n ). 2 In order to reduce this high complexity it is necessary to optimize NN within the train unction T as it has all necessary inormation about the relationships among the examples and the labels. This s that T (o = {}, V, l) should become T (o = {i}, V, l) where i is a vector o partition indices o the ind (0, 0, 0,...,,,,...,,,...) T. Now, the train unction T can utilize the act that no new data will arrive during the training and all possible distance requests can be computed in advance. Distances within the same partition need not be computed as they will
4 be never requested. These ields can be set to ininity (alternatively they can be illed up with the largest value ound in the matrix + ). The lower triangle is symmetrical to the upper triangle o the matrix because vectors between two points have the same norm. The distance matrix D has a structure as shown in igure. The size o D is n n 2. By D(column, row) we the component distance and by D(column, row) 2 we the associated label. Alone this redesign causes ) the complexity o the distance computations to get reduced to O. However this beneit is achieved at the expense o ( n2 2 higher memory use. The brute NN has a space complexity o O(n), now the space complexity has risen to O(n 2 ). The next step is to sort the vectors horizontally according to their distance. Although collecting the best solutions would be aster or a single run it s or a range o that you eectively obtain the insertion sort. Since there exist aster sorting algorithms we choose to sort but by using a dierent algorithm. The astest algorithm or doing so is the quic sort. Its average complexity is O (n log(n)). In the worst case scenario the sorting complexity o this method is O ( n 2). In that case n rows will be sorted with O ( n 2). This s that the worst time complexity so ar is O ( ) O n2 2 + n2 log(n). ( n2 2 + n3 ) and average case is Distance Matrix n test train set set Fold ininite values necessary computations Fold 2 Fold 3 Fold 4 n mirrored values Fold 5 { {{ { { Data Segments (a) Distance matrix and its computation requiring segments or the cross validation o (b) Sorted distance matrix. Values are being sorted big values to the right Fig. : The distance matrix consists o tuples o data and label ((d, l)). It can be sorted horizontally without loosing the relationship between vector pair and ground truth.
5 In order to obtain nearest neighbors or each vector indexed by the row a counting matrix M := N n s is initialized with zeros. s is the number o symbols or classes. For each row in D and M and or the columns =.. n in D the counters or the speciic class label is increased. More precisely, or every row r =..n and or every tested the counters ( M (D(, r) 2, r) are updated by. The complexity o this operation is O n ). 2 For the overall method ( 3 this adds up to O 2 n ). 2 In parallel the level o correct classiication must be computed because ater every modiication o M the state or the smaller is lost. Thereore a matrix A := N n or recording the number o correct classications is required. How is this number computed? At every round o the nearest neighbor candidate computations M contains in each o its rows a vector that tells how many labels o speciic ind are in the nearest neighbors set. The classiication label is l r = argmax M r. The complexity or this operation is O(n 2 s). s This simple method is ambiguous by nature, as there can be many labels that are represented by the same amount o vectors in a nearby set. Computer implementations preer to return the symbol with the smalest coding. However, it is possible to have a shadow matrix S := R n s that is the sum o the distances observed or each class label in the set so ar. The rule or computing S is the same as or M with the dierence that instead o adding ones to the matrix you add distances. When symbol requency is ambiguous (argmax returns more than one value) it is possible to use S to ind which samples are closer overall. Because o the speciic interest into ast optimization the simple argmax processing is used. Now, every l r is compared or equality with l r (ground truth) and the binary result is added to A(, i r ). The argmax A will return best. is obtained by averaging. ( Considering all parts o the ) algorithm together the overall complexity is O n 2 + n 2 log(n) + n 2 s O ( n 2 log(n) ) s Time required or validating the optimal time 000s 00s New KD Brute BD 0s number o cross validation olds Fig. 2: Experimentally established relationship between the number o olds and the algorithm speed.
6 3 Experiments The method or ast computation (AutoNN) was tested against three other algorithms rom the ANN library.[6]: brute, d-tree and bd-tree NN with deault settings. The AutoNN and its competitors perormed a complete cross validation run on the ad, diabetes, gene, glass, heart, heartc, horse, ionosphere, iris, mushrooms, soybean, STATLOG australian, STATLOG german, STATLOG heart, STATLOG SAT, STATLOG segment STATLOG shuttle, STATLOG vehicle, thyroid, waveorm and wine datasets with 3, 5, 0 and 20 olds. The goal o the experiment was the measurement o the time required to complete the ull course o testing dierent. The data was separated into stratiied partitions which were used in dierent conigurations in order to obtain a training and a testing set. The AutoKNN computes the classiication results or all while the other algorithms are bound to use a logarithmic search. By logarithmic search the ollowing schema is t: {, 2, 3, 4, 5, 6, 7, 8, 0, 00, 200,..., 000,..., n}. This schema is practically motivated and rational under the assumption that the inluence o additional labels on the result diminishes with higher values o. Practical consideration is primarily test time. Example: while AutoNN required 5,35s or a complete scan based on the ad dataset, on same data exact brute NN needed 353,7s in logarithmic mode and 9595,7s in ull mode. The use o the logarithmic mode maes results with exactly the same values impossible. However, the dierences in resulting and thus in accuracy were absolutely negligible so that the results are directly comparable nonetheless. We added the experiment times or all databases up to a total or each cross validation size. The results are shown in the igure and the table under 2. Time measurements were perormed on a AMD Phenom II 965 with 8GB o RAM with a Linux ernel. The algorithms are implemented in C/C++ and were compiled with gcc with O3 option. Only core algorithm operation was measured and all time or additional I/O was ignored. For best comparability, ANN library sources were statically included. 4 Discussion and Conclusion The Nearest Neighbor approach is considered user riendly and is requently used or data mining, classiication and regression tass. It is embedded into many automatic environments that mae use o NN s lexibility. Although NN has been used, analyzed and advanced or almost six decades a repeating question can not be answered by current literature: What is the astest way to estimate the right value or and what are the expenses or doing so. The approach chosen here is to move the esimation away rom the meta ramewor right into the training unction T. The advantage o this is that additional inormation about the data can be made. This additional inormation allows to precompute the distances among all vectors without waste and to reuse them numerous times. From this design change which is nown to practitioners
7 but not( discussed in literature a reduction in time complexity can be observed ( ) ) 2 ( ) rom O n 3 + n 2 3 to O 2 n 2 + n 2 log(n) + n 2 s in average case. The experiments show, that this has signiicant impact on the speed o the estimation tas. The comparison between d-tree NN and the proposed approach proves moreover that having a better time complexity saves practically more time than an eicient distance measure or this tas. The cost o this improvement is a higher space complexity (now O(n 2 )). In order to esimtate the practical impact o this complexity exchange we studied the contents o the UCI repository [7]. The UCI should be a reasonable crossover o the problems people ace in real lie. Out o 62 datasets we ound that 90% o them have less than 50K examples, 80% o them have less than 0K examples and hal o the UCI s datasets has less than 000 examples (or exact distribution see Fig. 3). These sizes can be easily handled on higher class commodity computers. This leads to the conclusion that turning in space complexity or time complexity is a good choice most o the time. Future implementations should oer an integrated searching. The results also show that the so ound values or can be transered not only to other exact NN but also to approximate NN woring on d-tree and bd-tree models. Sample Sizes o Datasets in the UCI Repository No. o Examples,0 0 8,0 0 7,0 0 6,0 0 5,0 0 4,0 0 3,0 0 2,0 0 0,0% 20,0% 40,0% 60,0% 80,0% 00,0% (62 on Oct., 5th [85 Entries]) Fraction o Datasets in the UCI Repository Fig. 3: Dataset sizes in the UCI repository
8 5 Acnowledgments This wor has been made possible through the unding o the PaREn (Pattern Recognition and Engineering [8]) project by the BMBF (Federal Ministry o Education and Research, Germany). Reerences. Fix, E., Hodges, J.L. Discriminatory analysis, nonparametric discrimination: Consistency properties. Technical Report 4, USAF School o Aviation Medicine, Randolph Field, Texas, Carlotta Domeniconi, Dimitrios Gunopulos, Jing Peng. Adaptive Metric nearest Neighbor Classiication. CVPR, Vol., pp.57 (2000) 3. Jing Peng, Douglas R. Heisteramp, H. K. Dai. Adaptive Kernel Metric Nearest Neighbor Classiication. ICPR, Vol. 3, pp (2002) 4. T. Deselaers, R. Paredes, E. Vidal, H. Ney. Learning Weighted Distances or Relevance Feedbac in Image Retrieval. ICPR 2008, Tampa, Florida, USA (08/2/2008) 5. I. Kamel, C. Faloutsos. On Pacing R-trees. Proceedings o the 2nd Conerence on Inormation and Knowledge Management (CIKM), Washington DC (993) 6. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest. Introduction to Algorithms. ch. 0, MIT Press and McGraw-Hill (2009) 7. S. Berchtold, D. Keim, H.-P. Kriegel. The X-tree: An index structure or highdimensional data. In Proceedings o 22th International Conerence on Very Large Databases (VLDB96), pp 2839, Morgan Kaumann (996) 8. Dong-Ho Lee, Hyoung-Joo Kim. An Eicient Technique or Nearest-Neighbor Query Processing on the SPY-TEC. IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp , IEEE Educational Activities Department,Piscataway, NJ, USA (2003) 9. Jose Salvador-Sanchez, Filiberto Pla, Francesco J. Ferri. Prototype Selection or the Nearest-Neighbor Rule Through Proximity Graphs. PRL, Vol. 8, No. 6, pp (997) 0. U. Lipowezy. Selection o the optimal prototype subset or -NN classiication PRL, Vol. 9, No. 0, pp (998). Keith Price Bibliography - Nearest Neighbor Literature Overview 2. David W. Aha. The Omnipresence o Case-Based Reasoning in Science and Application. Journal on Knowledge-Based Systems, Vol, pp (998) 3. Young, B., Bhatnagar, R.: Secure K-NN Algorithm or Distributed Databases. University o Cincinnati (2006) 4. Shanec, M., Yongdae, K., Kumar, V.: Privacy Preserving Nearest Neighbor Search. Dept. o Computer Science Minneapolis (2006) 5. Kantarcoglu, M., Cliton, C.: Privately Computing a Distributed -nn Classiier. LNCS, Vol. 3202, (2004) 6. Mount, D. M.: Approximate Nearest Neighbor Library., Department o Computer Science and Institute or Advanced Computer Studies, University o Maryland (200) 7. UCI Machine Learning Repository: 8. Pattern Recognition and Engineering (PaREn), DFKI, BMBF Project: ( )
9 9. Janic V. Frasch and Alesander Lodwich and Faisal Shaait and Thomas M. Breuel A Bayes-true data generator or evaluation o supervised and unsupervised learning methods. Pattern Recognition Letters, volume 32:, p , 20, doi: 6/j.patrec APPENDIX A Inluence o on the NN Perormance The ollowing diagrams are results rom NN based on natural and synthetic datasets. Synthetic datasets were obtained using WGKS [9] The standard deviation was estimated based on a 0x cross-validation. The diagrams are non linear. Sections o little change are compressed, hence x-axis are discontinuous.
10 NN Perormance Heart Database (0x-validation) NNPerormance Glass Database (0x-validation) (acc) (acc) NN Perormance Wine Dataset (0x-validation) NN Perormance Waveorm Dataset (0x-validation) (acc)
11 .00 NN Perormance Thyroid Database (0x-validation) NN Perormance Statlog's Vehicle Database (0x-validation) NN Perormance Statlog's Shuttle Database (0x-validation).20 NN Perormance Statlog's Segment Database (0x-validation)
12 .00 NN Perormance Statlog's SAT Database (0x-validation) NN Perormance Statlog's German Finance Database (0x-validation) NN Perormance Statlog's Australian Financial Dataset (0x-validation) NN Perormance Soybean Dataset (0x-validation)
13 NN Perormance NN Perormance Mushrooms Database (0x-validation) Random 6.67% o MNIST60 (0x-validation) NN Perormance Iris Dataset (0x-validation) NN Perormance MNIST0K (0x-validation)
14 .00 NN Perormance Ionosphere Dataset (0x-validation) NN Perormance Horse Dataset (0x-validation) NN Perormance NN Perormance Heart C Dataset (0x-validation) Gene Dataset (0x-validation)
15 NN Perormance Diabetes Dataset (0x-validation) NN Perormance Ad Dataset (0x-validation) NN Perormance WGKS 2D 20% (0x-validation) NN Perormance WGKS 2D 0% (0x-validation)
16 NN Perormance NN Perormance WGKS 4D 30% (0x-validation) WGKS 4D 20% (0x-validation) NN Perormance WGKS 4D 0% (0x-validation) NN Perormance WGKS 2D 30% (0x-validation)
Individualized Error Estimation for Classification and Regression Models
Individualized Error Estimation for Classification and Regression Models Krisztian Buza, Alexandros Nanopoulos, Lars Schmidt-Thieme Abstract Estimating the error of classification and regression models
More informationComputer Data Analysis and Use of Excel
Computer Data Analysis and Use o Excel I. Theory In this lab we will use Microsot EXCEL to do our calculations and error analysis. This program was written primarily or use by the business community, so
More informationAn Analytic Model for Embedded Machine Vision: Architecture and Performance Exploration
419 An Analytic Model or Embedded Machine Vision: Architecture and Perormance Exploration Chan Kit Wai, Prahlad Vadakkepat, Tan Kok Kiong Department o Electrical and Computer Engineering, 4 Engineering
More informationDecision Support Systems for E-Purchasing using Case-Based Reasoning and Rating Based Collaborative Filtering
Decision Support Systems or E-Purchasing using Case-Based Reasoning and Rating Based Collaborative Filtering ABSTRACT The amount o trade conducted electronically has grown dramatically since the wide introduction
More informationScheduling Unsplittable Flows Using Parallel Switches
Scheduling Unsplittable Flows Using Parallel Switches Saad Mneimneh, Kai-Yeung Siu Massachusetts Institute of Technology 77 Massachusetts Avenue Room -07, Cambridge, MA 039 Abstract We address the problem
More informationA Proposed Approach for Solving Rough Bi-Level. Programming Problems by Genetic Algorithm
Int J Contemp Math Sciences, Vol 6, 0, no 0, 45 465 A Proposed Approach or Solving Rough Bi-Level Programming Problems by Genetic Algorithm M S Osman Department o Basic Science, Higher Technological Institute
More informationTask Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document
More informationPiecewise polynomial interpolation
Chapter 2 Piecewise polynomial interpolation In ection.6., and in Lab, we learned that it is not a good idea to interpolate unctions by a highorder polynomials at equally spaced points. However, it transpires
More informationAdaptive Metric Nearest Neighbor Classification
Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California
More informationITU - Telecommunication Standardization Sector. G.fast: Far-end crosstalk in twisted pair cabling; measurements and modelling ABSTRACT
ITU - Telecommunication Standardization Sector STUDY GROUP 15 Temporary Document 11RV-22 Original: English Richmond, VA. - 3-1 Nov. 211 Question: 4/15 SOURCE 1 : TNO TITLE: G.ast: Far-end crosstalk in
More informationTHE FINANCIAL CALCULATOR
Starter Kit CHAPTER 3 Stalla Seminars THE FINANCIAL CALCULATOR In accordance with the AIMR calculator policy in eect at the time o this writing, CFA candidates are permitted to use one o two approved calculators
More informationMeasuring the Vulnerability of Interconnection. Networks in Embedded Systems. University of Massachusetts, Amherst, MA 01003
Measuring the Vulnerability o Interconnection Networks in Embedded Systems V. Lakamraju, Z. Koren, I. Koren, and C. M. Krishna Department o Electrical and Computer Engineering University o Massachusetts,
More informationROBUST FACE DETECTION UNDER CHALLENGES OF ROTATION, POSE AND OCCLUSION
ROBUST FACE DETECTION UNDER CHALLENGES OF ROTATION, POSE AND OCCLUSION Phuong-Trinh Pham-Ngoc, Quang-Linh Huynh Department o Biomedical Engineering, Faculty o Applied Science, Hochiminh University o Technology,
More informationIntelligent knowledge-based system for the automated screwing process control
Intelligent knowledge-based system or the automated screwing process control YULIYA LEBEDYNSKA yuliya.lebedynska@tu-cottbus.de ULRICH BERGER Chair o automation Brandenburg University o Technology Cottbus
More informationAn Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification
An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University
More informationSSV Criterion Based Discretization for Naive Bayes Classifiers
SSV Criterion Based Discretization for Naive Bayes Classifiers Krzysztof Grąbczewski kgrabcze@phys.uni.torun.pl Department of Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń,
More informationPATH PLANNING OF UNMANNED AERIAL VEHICLE USING DUBINS GEOMETRY WITH AN OBSTACLE
Proceeding o International Conerence On Research, Implementation And Education O Mathematics And Sciences 2015, Yogyakarta State University, 17-19 May 2015 PATH PLANNING OF UNMANNED AERIAL VEHICLE USING
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationBinary recursion. Unate functions. If a cover C(f) is unate in xj, x, then f is unate in xj. x
Binary recursion Unate unctions! Theorem I a cover C() is unate in,, then is unate in.! Theorem I is unate in,, then every prime implicant o is unate in. Why are unate unctions so special?! Special Boolean
More informationClassifier Evasion: Models and Open Problems
Classiier Evasion: Models and Open Problems Blaine Nelson 1, Benjamin I. P. Rubinstein 2, Ling Huang 3, Anthony D. Joseph 1,3, and J. D. Tygar 1 1 UC Berkeley 2 Microsot Research 3 Intel Labs Berkeley
More informationMachine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016
Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationComputer Data Analysis and Plotting
Phys 122 February 6, 2006 quark%//~bland/docs/manuals/ph122/pcintro/pcintro.doc Computer Data Analysis and Plotting In this lab we will use Microsot EXCEL to do our calculations. This program has been
More informationShadow Casting with Stencil Buffer for Real-Time Rendering
102 The International Arab Journal o Inormation Technology, Vol. 5, No. 4, October Shadow Casting with Stencil Buer or Real-Time Rendering Lee Weng, Daut Daman, and Mohd Rahim Faculty o Computer Science
More informationIntelligent Optimization Methods for High-Dimensional Data Classification for Support Vector Machines
Intelligent Inormation Management, 200, 2, ***-*** doi:0.4236/iim.200.26043 Published Online June 200 (http://www.scirp.org/journal/iim) Intelligent Optimization Methods or High-Dimensional Data Classiication
More informationDesign and Realization of user Behaviors Recommendation System Based on Association rules under Cloud Environment
Research Journal o Applied Sciences, Engineering and Technology 6(9): 1669-1673, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientiic Organization, 2013 Submitted: January 19, 2013 Accepted: March
More informationClassification Method for Colored Natural Textures Using Gabor Filtering
Classiication Method or Colored Natural Textures Using Gabor Filtering Leena Lepistö 1, Iivari Kunttu 1, Jorma Autio 2, and Ari Visa 1, 1 Tampere University o Technology Institute o Signal Processing P.
More informationUsing a genetic algorithm for editing k-nearest neighbor classifiers
Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,
More informationA Novel Accurate Genetic Algorithm for Multivariable Systems
World Applied Sciences Journal 5 (): 137-14, 008 ISSN 1818-495 IDOSI Publications, 008 A Novel Accurate Genetic Algorithm or Multivariable Systems Abdorreza Alavi Gharahbagh and Vahid Abolghasemi Department
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More information2. Design Planning with the Quartus II Software
November 2013 QII51016-13.1.0 2. Design Planning with the Quartus II Sotware QII51016-13.1.0 This chapter discusses key FPGA design planning considerations, provides recommendations, and describes various
More informationA Classification System and Analysis for Aspect-Oriented Programs
A Classiication System and Analysis or Aspect-Oriented Programs Martin Rinard, Alexandru Sălcianu, and Suhabe Bugrara Massachusetts Institute o Technology Cambridge, MA 02139 ABSTRACT We present a new
More information9.8 Graphing Rational Functions
9. Graphing Rational Functions Lets begin with a deinition. Deinition: Rational Function A rational unction is a unction o the orm P where P and Q are polynomials. Q An eample o a simple rational unction
More informationSUPER RESOLUTION IMAGE BY EDGE-CONSTRAINED CURVE FITTING IN THE THRESHOLD DECOMPOSITION DOMAIN
SUPER RESOLUTION IMAGE BY EDGE-CONSTRAINED CURVE FITTING IN THE THRESHOLD DECOMPOSITION DOMAIN Tsz Chun Ho and Bing Zeng Department o Electronic and Computer Engineering The Hong Kong University o Science
More informationMATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM
UDC 681.3.06 MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM V.K. Pogrebnoy TPU Institute «Cybernetic centre» E-mail: vk@ad.cctpu.edu.ru Matrix algorithm o solving graph cutting problem has been suggested.
More informationAutomated Modelization of Dynamic Systems
Automated Modelization o Dynamic Systems Ivan Perl, Aleksandr Penskoi ITMO University Saint-Petersburg, Russia ivan.perl, aleksandr.penskoi@corp.imo.ru Abstract Nowadays, dierent kinds o modelling settled
More informationMonocular 3D human pose estimation with a semi-supervised graph-based method
Monocular 3D human pose estimation with a semi-supervised graph-based method Mahdieh Abbasi Computer Vision and Systems Laboratory Department o Electrical Engineering and Computer Engineering Université
More informationWEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1
WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 H. Altay Güvenir and Aynur Akkuş Department of Computer Engineering and Information Science Bilkent University, 06533, Ankara, Turkey
More informationThe Role of Biomedical Dataset in Classification
The Role of Biomedical Dataset in Classification Ajay Kumar Tanwani and Muddassar Farooq Next Generation Intelligent Networks Research Center (nexgin RC) National University of Computer & Emerging Sciences
More informationAbalone Age Prediction using Artificial Neural Network
IOSR Journal o Computer Engineering (IOSR-JCE) e-issn: 2278-066,p-ISSN: 2278-8727, Volume 8, Issue 5, Ver. II (Sept - Oct. 206), PP 34-38 www.iosrjournals.org Abalone Age Prediction using Artiicial Neural
More informationPrivacy Preserving Data Mining. Danushka Bollegala COMP 527
Privacy Preserving ata Mining anushka Bollegala COMP 527 Privacy Issues ata mining attempts to ind mine) interesting patterns rom large datasets However, some o those patterns might reveal inormation that
More informationMAPI Computer Vision. Multiple View Geometry
MAPI Computer Vision Multiple View Geometry Geometry o Multiple Views 2- and 3- view geometry p p Kpˆ [ K R t]p Geometry o Multiple Views 2- and 3- view geometry Epipolar Geometry The epipolar geometry
More informationBinary Morphological Model in Refining Local Fitting Active Contour in Segmenting Weak/Missing Edges
0 International Conerence on Advanced Computer Science Applications and Technologies Binary Morphological Model in Reining Local Fitting Active Contour in Segmenting Weak/Missing Edges Norshaliza Kamaruddin,
More informationResearch Progress for the Interest Control Protocol of Content-centric Networking
Sensors & Transducers 2013 by IFSA http://www.sensorsportal.com Research Progress or the Interest Control Protocol o Content-centric Networing GAO QIANG, WANG WEIMING, ZHOU JINGJING Zhejiang Gongshang
More informationGenerating the Reduced Set by Systematic Sampling
Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan
More informationRelative Constraints as Features
Relative Constraints as Features Piotr Lasek 1 and Krzysztof Lasek 2 1 Chair of Computer Science, University of Rzeszow, ul. Prof. Pigonia 1, 35-510 Rzeszow, Poland, lasek@ur.edu.pl 2 Institute of Computer
More informationMobile Robot Static Path Planning Based on Genetic Simulated Annealing Algorithm
Mobile Robot Static Path Planning Based on Genetic Simulated Annealing Algorithm Wang Yan-ping 1, Wubing 2 1. School o Electric and Electronic Engineering, Shandong University o Technology, Zibo 255049,
More informationNearest Neighbor Predictors
Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,
More informationWAIRS: Improving Classification Accuracy by Weighting Attributes in the AIRS Classifier
WAIR: Improving Classiication Accuracy by Weighting Attributes in the AIR Classiier Andrew ecker, Alex A. Freitas Abstract AIR (Artiicial Immune Recognition ystem) has shown itsel to be a competitive classiier.
More informationNearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017
Nearest neighbors Focus on tree-based methods Clément Jamin, GUDHI project, Inria March 2017 Introduction Exact and approximate nearest neighbor search Essential tool for many applications Huge bibliography
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationImproving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets
Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)
More information3-D TERRAIN RECONSTRUCTION WITH AERIAL PHOTOGRAPHY
3-D TERRAIN RECONSTRUCTION WITH AERIAL PHOTOGRAPHY Bin-Yih Juang ( 莊斌鎰 ) 1, and Chiou-Shann Fuh ( 傅楸善 ) 3 1 Ph. D candidate o Dept. o Mechanical Engineering National Taiwan University, Taipei, Taiwan Instructor
More informationCombined Weak Classifiers
Combined Weak Classifiers Chuanyi Ji and Sheng Ma Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute, Troy, NY 12180 chuanyi@ecse.rpi.edu, shengm@ecse.rpi.edu Abstract
More informationPenalizied Logistic Regression for Classification
Penalizied Logistic Regression for Classification Gennady G. Pekhimenko Department of Computer Science University of Toronto Toronto, ON M5S3L1 pgen@cs.toronto.edu Abstract Investigation for using different
More informationConcept Tree Based Clustering Visualization with Shaded Similarity Matrices
Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices
More information7. High-Speed Differential Interfaces in the Cyclone III Device Family
December 2011 CIII51008-4.0 7. High-Speed Dierential Interaces in the Cyclone III Device Family CIII51008-4.0 This chapter describes the high-speed dierential I/O eatures and resources in the Cyclone III
More informationOptics Quiz #2 April 30, 2014
.71 - Optics Quiz # April 3, 14 Problem 1. Billet s Split Lens Setup I the ield L(x) is placed against a lens with ocal length and pupil unction P(x), the ield (X) on the X-axis placed a distance behind
More informationAN 608: HST Jitter and BER Estimator Tool for Stratix IV GX and GT Devices
AN 608: HST Jitter and BER Estimator Tool or Stratix IV GX and GT Devices July 2010 AN-608-1.0 The high-speed communication link design toolkit (HST) jitter and bit error rate (BER) estimator tool is a
More informationEfficient Pairwise Classification
Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany Abstract. Pairwise classification is a class binarization
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More informationCHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM
96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays
More informationAUTOMATING THE DESIGN OF SOUND SYNTHESIS TECHNIQUES USING EVOLUTIONARY METHODS. Ricardo A. Garcia *
AUTOMATING THE DESIGN OF SOUND SYNTHESIS TECHNIQUES USING EVOLUTIONARY METHODS Ricardo A. Garcia * MIT Media Lab Machine Listening Group 20 Ames St., E5-49, Cambridge, MA 0239 rago@media.mit.edu ABSTRACT
More informationNearest Neighbor with KD Trees
Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest
More informationA Lazy Approach for Machine Learning Algorithms
A Lazy Approach for Machine Learning Algorithms Inés M. Galván, José M. Valls, Nicolas Lecomte and Pedro Isasi Abstract Most machine learning algorithms are eager methods in the sense that a model is generated
More informationMeta-Clustering. Parasaran Raman PhD Candidate School of Computing
Meta-Clustering Parasaran Raman PhD Candidate School of Computing What is Clustering? Goal: Group similar items together Unsupervised No labeling effort Popular choice for large-scale exploratory data
More informationMINING ASSOCIATION RULE FOR HORIZONTALLY PARTITIONED DATABASES USING CK SECURE SUM TECHNIQUE
MINING ASSOCIATION RULE FOR HORIZONTALLY PARTITIONED DATABASES USING CK SECURE SUM TECHNIQUE Jayanti Danasana 1, Raghvendra Kumar 1 and Debadutta Dey 1 1 School of Computer Engineering, KIIT University,
More informationArtificial Neural Network-Based Prediction of Human Posture
Artificial Neural Network-Based Prediction of Human Posture Abstract The use of an artificial neural network (ANN) in many practical complicated problems encourages its implementation in the digital human
More informationNeighbourhood Operations
Neighbourhood Operations Neighbourhood operations simply operate on a larger neighbourhood o piels than point operations Origin Neighbourhoods are mostly a rectangle around a central piel Any size rectangle
More informationLarger K-maps. So far we have only discussed 2 and 3-variable K-maps. We can now create a 4-variable map in the
EET 3 Chapter 3 7/3/2 PAGE - 23 Larger K-maps The -variable K-map So ar we have only discussed 2 and 3-variable K-maps. We can now create a -variable map in the same way that we created the 3-variable
More informationA new approach for ranking trapezoidal vague numbers by using SAW method
Science Road Publishing Corporation Trends in Advanced Science and Engineering ISSN: 225-6557 TASE 2() 57-64, 20 Journal homepage: http://www.sciroad.com/ntase.html A new approach or ranking trapezoidal
More informationChapter 8 The C 4.5*stat algorithm
109 The C 4.5*stat algorithm This chapter explains a new algorithm namely C 4.5*stat for numeric data sets. It is a variant of the C 4.5 algorithm and it uses variance instead of information gain for the
More informationSkill Sets Chapter 5 Functions
Skill Sets Chapter 5 Functions No. Skills Examples o questions involving the skills. Sketch the graph o the (Lecture Notes Example (b)) unction according to the g : x x x, domain. x, x - Students tend
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationLorentzian Distance Classifier for Multiple Features
Yerzhan Kerimbekov 1 and Hasan Şakir Bilge 2 1 Department of Computer Engineering, Ahmet Yesevi University, Ankara, Turkey 2 Department of Electrical-Electronics Engineering, Gazi University, Ankara, Turkey
More informationAlgorithms for Nearest Neighbors
Algorithms for Nearest Neighbors Classic Ideas, New Ideas Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura University of Toronto, July 2007 1 / 39 Outline
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More informationTO meet the increasing bandwidth demands and reduce
Assembling TCP/IP Packets in Optical Burst Switched Networks Xiaojun Cao Jikai Li Yang Chen Chunming Qiao Department o Computer Science and Engineering State University o New York at Bualo Bualo, NY -
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationCompressed Sensing Image Reconstruction Based on Discrete Shearlet Transform
Sensors & Transducers 04 by IFSA Publishing, S. L. http://www.sensorsportal.com Compressed Sensing Image Reconstruction Based on Discrete Shearlet Transorm Shanshan Peng School o Inormation Science and
More informationA MULTI-LEVEL IMAGE DESCRIPTION MODEL SCHEME BASED ON DIGITAL TOPOLOGY
In: Stilla U et al (Eds) PIA7. International Archives o Photogrammetry, Remote Sensing and Spatial Inormation Sciences, 36 (3/W49B) A MULTI-LEVEL IMAGE DESCRIPTION MODEL SCHEME BASED ON DIGITAL TOPOLOG
More informationAn Approach for Performance Evaluation of Batch-sequential and Parallel Architectural Styles
An Approach or Perormance Evaluation o Batch-sequential and Parallel Architectural Styles Golnaz Aghaee Ghazvini MSc student, Young Research Club, Njaabad Branch, Esahan, Iran Aghaee.golnaz@sco.iaun.ac.ir
More informationClassifier Evasion: Models and Open Problems
. In Privacy and Security Issues in Data Mining and Machine Learning, eds. C. Dimitrakakis, et al. Springer, July 2011, pp. 92-98. Classiier Evasion: Models and Open Problems Blaine Nelson 1, Benjamin
More informationEM algorithm with GMM and Naive Bayesian to Implement Missing Values
, pp.1-5 http://dx.doi.org/10.14257/astl.2014.46.01 EM algorithm with GMM and aive Bayesian to Implement Missing Values Xi-Yu Zhou 1, Joon S. Lim 2 1 I.T. College Gachon University Seongnam, South Korea,
More informationNearest Neighbor with KD Trees
Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest
More informationECE 6560 Multirate Signal Processing Chapter 8
Multirate Signal Processing Chapter 8 Dr. Bradley J. Bazuin Western Michigan University College o Engineering and Applied Sciences Department o Electrical and Computer Engineering 903 W. Michigan Ave.
More informationLeave-One-Out Support Vector Machines
Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm
More informationOnline Document Clustering Using the GPU
Online Document Clustering Using the GPU Benjamin E. Teitler, Jagan Sankaranarayanan, Hanan Samet Center for Automation Research Institute for Advanced Computer Studies Department of Computer Science University
More informationMethod estimating reflection coefficients of adaptive lattice filter and its application to system identification
Acoust. Sci. & Tech. 28, 2 (27) PAPER #27 The Acoustical Society o Japan Method estimating relection coeicients o adaptive lattice ilter and its application to system identiication Kensaku Fujii 1;, Masaaki
More informationLearning Implied Global Constraints
Learning Implied Global Constraints Christian Bessiere LIRMM-CNRS U. Montpellier, France bessiere@lirmm.r Remi Coletta LRD Montpellier, France coletta@l-rd.r Thierry Petit LINA-CNRS École des Mines de
More informationCS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012
CS8/68 Computer Vision Spring 0 Dr. George Bebis Programming Assignment Due Date: /7/0 In this assignment, you will implement an algorithm or normalizing ace image using SVD. Face normalization is a required
More informationRecognizing hand-drawn images using shape context
Recognizing hand-drawn images using shape context Gyozo Gidofalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract The objective
More informationA STUDY ON COEXISTENCE OF WLAN AND WPAN USING A PAN COORDINATOR WITH AN ARRAY ANTENNA
A STUDY ON COEXISTENCE OF WLAN AND WPAN USING A PAN COORDINATOR WITH AN ARRAY ANTENNA Yuta NAKAO(Graduate School o Engineering, Division o Physics, Electrical and Computer Engineering, Yokohama National
More informationA neural-networks associative classification method for association rule mining
Data Mining VII: Data, Text and Web Mining and their Business Applications 93 A neural-networks associative classification method for association rule mining P. Sermswatsri & C. Srisa-an Faculty of Information
More informationCS 340 Lec. 4: K-Nearest Neighbors
CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationLocality- Sensitive Hashing Random Projections for NN Search
Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade
More informationvector space retrieval many slides courtesy James Amherst
vector space retrieval many slides courtesy James Allan@umass Amherst 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the
More informationAutomatic Video Segmentation for Czech TV Broadcast Transcription
Automatic Video Segmentation or Czech TV Broadcast Transcription Jose Chaloupka Laboratory o Computer Speech Processing, Institute o Inormation Technology and Electronics Technical University o Liberec
More informationUse of Multi-category Proximal SVM for Data Set Reduction
Use of Multi-category Proximal SVM for Data Set Reduction S.V.N Vishwanathan and M Narasimha Murty Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India Abstract.
More information