Lecture Notes for Chapter 4 Part III. Introduction to Data Mining
|
|
- Sherman Watkins
- 5 years ago
- Views:
Transcription
1 Data Mining Cassification: Basic Concepts, Decision Trees, and Mode Evauation Lecture Notes for Chapter 4 Part III Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
2 Practica Issues of Cassification Underfitting and Overfitting Missing Vaues Costs of Cassification Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2
3 Underfitting and Overfitting (Exampe) 500 circuar and 500 trianguar data points. Circuar points: 0.5 sqrt(x 12 +x 22 ) 1 Trianguar points: sqrt(x 12 +x 22 ) > 0.5 or sqrt(x 12 +x 22 ) < 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3
4 Underfitting and Overfitting Overfitting Underfitting: when mode is too simpe, both training and test errors are arge Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4
5 Overfitting due to Noise Decision boundary is distorted by noise point Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5
6 Notes on Overfitting Overfitting resuts in decision trees that are more compex than necessary Training error no onger provides a good estimate of how we the tree wi perform on previousy unseen records Need new ways for estimating errors Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6
7 Estimating Generaization Errors Re-substitution errors: error on training (Σ e(t) ) Generaization errors: error on testing (Σ e (t)) Methods for estimating generaization errors: Optimistic approach: e (t) = e(t) Pessimistic approach: For each eaf node: e (t) = (e(t)+0.5) Tota error counts: e (T) = e(t) + N 0.5 (N: number of eaf nodes) For a tree with 30 eaf nodes and 10 errors on training (out of 1000 instances): Training error = 10/1000 = 1% Generaization error = ( )/1000 = 2.5% Reduced error pruning (REP): uses vaidation data set to estimate generaization error Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7
8 Occam s Razor Given two modes of simiar generaization errors, one shoud prefer the simper mode over the more compex mode For compex modes, there is a greater chance that it was fitted accidentay by errors in data Therefore, one shoud incude mode compexity when evauating a mode Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8
9 Minimum Description Length (MDL) X y X 1 1 X 2 0 X 3 0 X 4 1 X n 1 A Yes 0 A? No B? B 1 B 2 C? 1 C 1 C B X y X 1? X 2? X 3? X 4? X n? Cost(Mode,Data) = Cost(Data Mode) + Cost(Mode) Cost is the number of bits needed for encoding. We shoud search for the east costy mode. Cost(Data Mode) encodes the errors on training data. Cost(Mode) estimates mode compexity, or future error Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9
10 How to Address Overfitting in Decision Trees Pre-Pruning (Eary Stopping Rue) Stop the agorithm before it becomes a fuy-grown tree Typica stopping conditions for a node: Stop if a instances beong to the same cass Stop if a the attribute vaues are the same More restrictive conditions: Stop if number of instances is ess than some user-specified threshod Stop if cass distribution of instances are independent of the avaiabe features (e.g., using χ 2 test) Stop if expanding the current node does not improve impurity measures (e.g., Gini or information gain). Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
11 How to Address Overfitting Post-pruning Grow decision tree to its entirety Trim the nodes of the decision tree in a bottom-up fashion If generaization error improves after trimming, repace sub-tree by a eaf node. Heuristic: Cass abe of eaf node is determined from majority cass of instances in the sub-tree generaization error count = error count + 0.5*N, where N is the number of eaf nodes, This is a heuristic used in some agorithms, but there are other ways using statistics Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
12 Post-Pruning based on eaves Training Error (Before spitting) = 10/30 Cass = Yes 20 Cass = No 10 Error = 10/30 A? Pessimistic error (Before spitting) = (10 + 1X 0.5)/30 = 10.5/30 Training Error (After spitting) = 9/30 Pessimistic error (After spitting) = ( )/30 = 11/30 Post-pruning decision: PRUNE! A1 A2 A3 A4 Cass = Yes 8 Cass = Yes 3 Cass = Yes 4 Cass = Yes 5 Cass = No 4 Cass = No 4 Cass = No 1 Cass = No 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
13 Exampes of Post-pruning Optimistic error? Case 1: Don t prune for both cases Pessimistic error? C0: 11 C1: 3 C0: 2 C1: 4 Don t prune case 1, prune case 2 Case 2: C0: 14 C1: 3 C0: 2 C1: 2 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
14 Data Fragmentation Number of instances gets smaer as you traverse down the tree Number of instances at the eaf nodes coud be too sma to make any statisticay significant decision Soution: imit number of instances per eaf node >= a user given vaue n. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
15 Decision Trees: Feature Construction x + y < 1 Cass = + Cass = Test condition may invove mutipe attributes, but hard to automate! Finding better node test features is a difficut research issue Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
16 Mode Evauation Metrics for Performance Evauation How to evauate the performance of a mode? Methods for Performance Evauation How to obtain reiabe estimates? Methods for Mode Comparison How to compare the reative performance among competing modes? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
17 Mode Evauation Metrics for Performance Evauation How to evauate the performance of a mode? Methods for Performance Evauation How to obtain reiabe estimates? Methods for Mode Comparison How to compare the reative performance among competing modes? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
18 Metrics for Performance Evauation Focus on the predictive capabiity of a mode Rather than how fast it takes to cassify or buid modes, scaabiity, etc. Confusion Matrix: count or percentage PREDICTED CLASS Cass=Yes Cass=No Cass=Yes a b ACTUAL CLASS Cass=No c d a: TP (true positive) b: FN (fase negative) c: FP (fase positive) d: TN (true negative) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
19 Metrics for Performance Evauation PREDICTED CLASS Cass=Yes Cass=No ACTUAL Cass=Yes a (TP) CLASS Cass=No c (FP) b (FN) d (TN) Most widey-used metric: Accuracy = a a + b + + d c + d = TP TP + TN + TN + FP + FN Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
20 Limitation of Accuracy Consider a 2-cass probem Number of Cass 0 exampes = 9990 Number of Cass 1 exampes = 10 If mode predicts everything to be cass 0, accuracy is 9990/10000 = 99.9 % Accuracy is miseading because mode does not detect any cass 1 exampe Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
21 Cost Matrix PREDICTED CLASS C(i j) Cass=Yes Cass=No Cass=Yes C(Yes Yes) C(No Yes) ACTUAL CLASS Cass=No C(Yes No) C(No No) C(i j): Cost of miscassifying cass j exampe as cass I - medica diagnosis, customer segmentation Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
22 Computing Cost of Cassification Confusion matrix Cost Matrix ACTUAL CLASS PREDICTED CLASS C(i j) Mode M 1 PREDICTED CLASS Mode M 2 PREDICTED CLASS ACTUAL CLASS ACTUAL CLASS Accuracy = 80% Cost = 3910 Accuracy = 90% Cost = 4255 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
23 Information Retrieva Measures PREDICTED CLASS a Precision : p = a + c a Reca: r = a + b ACTUAL CLASS Cass=Yes Cass=No Cass=Yes a b Cass=No c d F - measure (F) = 2rp r + p = 2a 2a + b + c Let C be cost (can be count in our exampe) Precision is biased towards C(Yes Yes) & C(Yes No) Reca is biased towards C(Yes Yes) & C(No Yes) F-measure is biased towards a except C(No No) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
24 Mode Evauation Metrics for Performance Evauation How to evauate the performance of a mode? Methods for Performance Evauation How to obtain reiabe estimates? Methods for Mode Comparison How to compare the reative performance among competing modes? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
25 Methods of Estimation Hodout Reserve 2/3 for training and 1/3 for testing Cross vaidation Partition data into k disjoint subsets k-fod: train on k-1 partitions, test on the remaining one Leave-one-out: k=n Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
26 Test of Significance (Sections 4.5,4.6 of TSK Book) Given two modes: Mode M1: accuracy = 85%, tested on 30 instances Mode M2: accuracy = 75%, tested on 5000 instances Can we say M1 is better than M2? How much confidence can we pace on accuracy of M1 and M2? Can the difference in performance measure be expained as a resut of random fuctuations in the test set? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
27 Confidence Interva for Accuracy Prediction can be regarded as a Bernoui tria A Bernoui tria has 2 possibe outcomes Possibe outcomes for prediction: correct or wrong Coection of Bernoui trias has a Binomia distribution: x Bin(N, p) x: number of correct predictions e.g: Toss a fair coin 50 times, how many heads woud turn up? Expected number of heads = N p = = 25 Given x (# of correct predictions) or equivaenty, acc=x/n, and N =# of test instances, Can we predict p (true accuracy of mode)? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
28 Confidence Interva for Accuracy P For arge N, et 1 α be confidence acc has a norma distribution with mean p and variance p(1-p)/n ( Z < < Z α / 2 1 α / 2 = 1 α acc p p(1 p) / N Confidence Interva for p: ) Area = 1 - α Z α/2 Z 1- α /2 p = 2 N acc + Z 2 α / 2 ± Z 2 α / 2 2( N + 4 N + Z 2 α / 2 ) acc 4 N acc 2 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
29 Confidence Interva for Accuracy Consider a mode that produces an accuracy of 80% when evauated on 100 test instances: N=100, acc = 0.8 Let 1-α = 0.95 (95% confidence) From probabiity tabe, Z α/2 = α Z N p(ower) p(upper) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
30 ROC (Receiver Operating Characteristic) Page 298 of TSK book. Many appications care about ranking (give a queue from the most ikey to the east ikey) Exampes Which ranking order is better? ROC: Deveoped in 1950s for signa detection theory to anayze noisy signas Characterize the trade-off between positive hits and fase aarms ROC curve pots TP (on the y-axis) against FP (on the x-axis) Performance of each cassifier represented as a point on the ROC curve changing the threshod of agorithm, sampe distribution or cost matrix changes the ocation of the point Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
31 How to Construct an ROC curve Instance P(+ A) True Cass Predicted by cassifier This is the ground truth Use cassifier that produces posterior probabiity for each test instance P(+ A) for instance A Sort the instances according to P(+ A) in decreasing order Appy threshod at each unique vaue of P(+ A) Count the number of TP, FP, TN, FN at each threshod TP rate, TPR = TP/(TP+FN) FP rate, FPR = FP/(FP + TN) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
32 How to construct an ROC curve Cass Threshod >= TP FP TN FN TPR FPR ROC Curve: Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
33 Using ROC for Mode Comparison No mode consistenty outperform the other M 1 is better for sma FPR M 2 is better for arge FPR Area Under the ROC curve: AUC Idea: Area = 1 Random guess: Area = 0.5 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
34 ROC Curve (TP,FP): (0,0): decare everything to be negative cass (1,1): decare everything to be positive cass (1,0): idea Diagona ine: Random guessing Beow diagona ine: prediction is opposite of the true cass Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Classification. Instructor: Wei Ding
Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More information10 Classification: Evaluation
CSE4334/5334 Data Mining 10 Classification: Evaluation Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2018 (Slides courtesy of Pang-Ning Tan, Michael Steinbach
More informationDATA MINING LECTURE 9. Classification Decision Trees Evaluation
DATA MINING LECTURE 9 Classification Decision Trees Evaluation 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium
More informationData Mining Classification: Bayesian Decision Theory
Data Mining Classification: Bayesian Decision Theory Lecture Notes for Chapter 2 R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, 2nd ed. New York: Wiley, 2001. Lecture Notes for Chapter
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
More informationClassification Part 4
Classification Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Model Evaluation Metrics for Performance Evaluation How to evaluate
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside)
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 2017) Classification:
More informationData Mining Classification: Alternative Techniques. Imbalanced Class Problem
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems
More informationDATA MINING OVERFITTING AND EVALUATION
DATA MINING OVERFITTING AND EVALUATION 1 Overfitting Will cover mechanisms for preventing overfitting in decision trees But some of the mechanisms and concepts will apply to other algorithms 2 Occam s
More informationDATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier
DATA MINING LECTURE 11 Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie?
More informationMetrics Overfitting Model Evaluation Research directions. Classification. Practical Issues. Huiping Cao. lassification-issues, Slide 1/57
lassification-issues, Slide 1/57 Classification Practical Issues Huiping Cao lassification-issues, Slide 2/57 Outline Criteria to evaluate a classifier Underfitting and overfitting Model evaluation lassification-issues,
More informationDATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation
DATA MINING LECTURE 9 Classification Basic Concepts Decision Trees Evaluation What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie? Facial hair alone is not
More informationLecture Notes for Chapter 4
Classification - Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web
More informationCS 584 Data Mining. Classification 3
CS 584 Data Mining Classification 3 Today Model evaluation & related concepts Additional classifiers Naïve Bayes classifier Support Vector Machine Ensemble methods 2 Model Evaluation Metrics for Performance
More informationNearest Neighbor Learning
Nearest Neighbor Learning Cassify based on oca simiarity Ranges from simpe nearest neighbor to case-based and anaogica reasoning Use oca information near the current query instance to decide the cassification
More informationA Petrel Plugin for Surface Modeling
A Petre Pugin for Surface Modeing R. M. Hassanpour, S. H. Derakhshan and C. V. Deutsch Structure and thickness uncertainty are important components of any uncertainty study. The exact ocations of the geoogica
More informationLecture outline. Decision-tree classification
Lecture outline Decision-tree classification Decision Trees Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes
More informationMobile App Recommendation: Maximize the Total App Downloads
Mobie App Recommendation: Maximize the Tota App Downoads Zhuohua Chen Schoo of Economics and Management Tsinghua University chenzhh3.12@sem.tsinghua.edu.cn Yinghui (Catherine) Yang Graduate Schoo of Management
More informationCS Machine Learning
CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K
More informationClassification Salvatore Orlando
Classification Salvatore Orlando 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. The values of the
More informationLanguage Identification for Texts Written in Transliteration
Language Identification for Texts Written in Transiteration Andrey Chepovskiy, Sergey Gusev, Margarita Kurbatova Higher Schoo of Economics, Data Anaysis and Artificia Inteigence Department, Pokrovskiy
More informationPart I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes
Week 4 Based in part on slides from textbook, slides of Susan Holmes Part I Classification & Decision Trees October 19, 2012 1 / 1 2 / 1 Classification Classification Problem description We are given a
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Classification (Basic Concepts) Huan Sun, CSE@The Ohio State University 09/12/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han Classification: Basic Concepts
More informationBinarized support vector machines
Universidad Caros III de Madrid Repositorio instituciona e-archivo Departamento de Estadística http://e-archivo.uc3m.es DES - Working Papers. Statistics and Econometrics. WS 2007-11 Binarized support vector
More informationAs Michi Henning and Steve Vinoski showed 1, calling a remote
Reducing CORBA Ca Latency by Caching and Prefetching Bernd Brügge and Christoph Vismeier Technische Universität München Method ca atency is a major probem in approaches based on object-oriented middeware
More informationResource Optimization to Provision a Virtual Private Network Using the Hose Model
Resource Optimization to Provision a Virtua Private Network Using the Hose Mode Monia Ghobadi, Sudhakar Ganti, Ghoamai C. Shoja University of Victoria, Victoria C, Canada V8W 3P6 e-mai: {monia, sganti,
More informationSensitivity Analysis of Hopfield Neural Network in Classifying Natural RGB Color Space
Sensitivity Anaysis of Hopfied Neura Network in Cassifying Natura RGB Coor Space Department of Computer Science University of Sharjah UAE rsammouda@sharjah.ac.ae Abstract: - This paper presents a study
More informationA Method for Calculating Term Similarity on Large Document Collections
$ A Method for Cacuating Term Simiarity on Large Document Coections Wofgang W Bein Schoo of Computer Science University of Nevada Las Vegas, NV 915-019 bein@csunvedu Jeffrey S Coombs and Kazem Taghva Information
More informationCS4491/CS 7265 BIG DATA ANALYTICS
CS4491/CS 7265 BIG DATA ANALYTICS EVALUATION * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Dr. Mingon Kang Computer Science, Kennesaw State University Evaluation for
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationA Memory Grouping Method for Sharing Memory BIST Logic
A Memory Grouping Method for Sharing Memory BIST Logic Masahide Miyazai, Tomoazu Yoneda, and Hideo Fuiwara Graduate Schoo of Information Science, Nara Institute of Science and Technoogy (NAIST), 8916-5
More informationA NEW APPROACH FOR BLOCK BASED STEGANALYSIS USING A MULTI-CLASSIFIER
Internationa Journa on Technica and Physica Probems of Engineering (IJTPE) Pubished by Internationa Organization of IOTPE ISSN 077-358 IJTPE Journa www.iotpe.com ijtpe@iotpe.com September 014 Issue 0 Voume
More informationACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES. Eyal En Gad, Akshay Gadde, A. Salman Avestimehr and Antonio Ortega
ACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES Eya En Gad, Akshay Gadde, A. Saman Avestimehr and Antonio Ortega Department of Eectrica Engineering University of Southern
More informationAutomatic Grouping for Social Networks CS229 Project Report
Automatic Grouping for Socia Networks CS229 Project Report Xiaoying Tian Ya Le Yangru Fang Abstract Socia networking sites aow users to manuay categorize their friends, but it is aborious to construct
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationApplication of Automated Fault Detection and Diagnostics For Rooftop Air Conditioners in California
Appication of Automated Faut Detection and Diagnostics For Rooftop Air Conditioners in Caifornia Haorong Li and James E. Braun, Purdue University ABSTRACT The primary goa of the research described in this
More informationUsing data flow analysis for the reliability assessment of safety-critical software systems
Recent Researces in Circuits, Systems, Communications and Computers Using data fow anaysis for te reiabiity assessment of safety-critica software systems BÖRCSÖK J., SCHAEFER S. Department of Computer
More informationDistance Weighted Discrimination and Second Order Cone Programming
Distance Weighted Discrimination and Second Order Cone Programming Hanwen Huang, Xiaosun Lu, Yufeng Liu, J. S. Marron, Perry Haaand Apri 3, 2012 1 Introduction This vignette demonstrates the utiity and
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationFREE-FORM ANISOTROPY: A NEW METHOD FOR CRACK DETECTION ON PAVEMENT SURFACE IMAGES
FREE-FORM ANISOTROPY: A NEW METHOD FOR CRACK DETECTION ON PAVEMENT SURFACE IMAGES Tien Sy Nguyen, Stéphane Begot, Forent Ducuty, Manue Avia To cite this version: Tien Sy Nguyen, Stéphane Begot, Forent
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART V Credibility: Evaluating what s been learned 10/25/2000 2 Evaluation: the key to success How
More informationOutline. Introduce yourself!! What is Machine Learning? What is CAP-5610 about? Class information and logistics
Outine Introduce yoursef!! What is Machine Learning? What is CAP-5610 about? Cass information and ogistics Lecture Notes for E Apaydın 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) About
More informationA Novel Method for Early Software Quality Prediction Based on Support Vector Machine
A Nove Method for Eary Software Quaity Prediction Based on Support Vector Machine Fei Xing 1,PingGuo 1;2, and Michae R. Lyu 2 1 Department of Computer Science Beijing Norma University, Beijing, 1875, China
More informationArithmetic Coding. Prof. Ja-Ling Wu. Department of Computer Science and Information Engineering National Taiwan University
Arithmetic Coding Prof. Ja-Ling Wu Department of Computer Science and Information Engineering Nationa Taiwan University F(X) Shannon-Fano-Eias Coding W..o.g. we can take X={,,,m}. Assume p()>0 for a. The
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationA probabilistic fuzzy method for emitter identification based on genetic algorithm
A probabitic fuzzy method for emitter identification based on genetic agorithm Xia Chen, Weidong Hu, Hongwen Yang, Min Tang ATR Key Lab, Coege of Eectronic Science and Engineering Nationa University of
More informationModel s Performance Measures
Model s Performance Measures Evaluating the performance of a classifier Section 4.5 of course book. Taking into account misclassification costs Class imbalance problem Section 5.7 of course book. TNM033:
More informationTransformation Invariance in Pattern Recognition: Tangent Distance and Propagation
Transformation Invariance in Pattern Recognition: Tangent Distance and Propagation Patrice Y. Simard, 1 Yann A. Le Cun, 2 John S. Denker, 2 Bernard Victorri 3 1 Microsoft Research, 1 Microsoft Way, Redmond,
More informationCollaborative Approach to Mitigating ARP Poisoning-based Man-in-the-Middle Attacks
Coaborative Approach to Mitigating ARP Poisoning-based Man-in-the-Midde Attacks Seung Yeob Nam a, Sirojiddin Djuraev a, Minho Park b a Department of Information and Communication Engineering, Yeungnam
More informationOptimization and Application of Support Vector Machine Based on SVM Algorithm Parameters
Optimization and Appication of Support Vector Machine Based on SVM Agorithm Parameters YAN Hui-feng 1, WANG Wei-feng 1, LIU Jie 2 1 ChongQing University of Posts and Teecom 400065, China 2 Schoo Of Civi
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationImage Segmentation Using Semi-Supervised k-means
I J C T A, 9(34) 2016, pp. 595-601 Internationa Science Press Image Segmentation Using Semi-Supervised k-means Reza Monsefi * and Saeed Zahedi * ABSTRACT Extracting the region of interest is a very chaenging
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationFurther Concepts in Geometry
ppendix F Further oncepts in Geometry F. Exporing ongruence and Simiarity Identifying ongruent Figures Identifying Simiar Figures Reading and Using Definitions ongruent Trianges assifying Trianges Identifying
More informationQuality of Service Evaluations of Multicast Streaming Protocols *
Quaity of Service Evauations of Muticast Streaming Protocos Haonan Tan Derek L. Eager Mary. Vernon Hongfei Guo omputer Sciences Department University of Wisconsin-Madison, USA {haonan, vernon, guo}@cs.wisc.edu
More informationNeural Network Enhancement of the Los Alamos Force Deployment Estimator
Missouri University of Science and Technoogy Schoars' Mine Eectrica and Computer Engineering Facuty Research & Creative Works Eectrica and Computer Engineering 1-1-1994 Neura Network Enhancement of the
More informationIntro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Why Learn to Program?
Intro to Programming & C++ Unit 1 Sections 1.1-3 and 2.1-10, 2.12-13, 2.15-17 CS 1428 Spring 2018 Ji Seaman 1.1 Why Program? Computer programmabe machine designed to foow instructions Program a set of
More informationA New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions
2006 Internationa Joint Conference on Neura Networks Sheraton Vancouver Wa Centre Hote, Vancouver, BC, Canada Juy 16-21, 2006 A New Supervised Custering Agorithm Based on Min-Max Moduar Network with Gaussian-Zero-Crossing
More informationWATERMARKING GIS DATA FOR DIGITAL MAP COPYRIGHT PROTECTION
WATERMARKING GIS DATA FOR DIGITAL MAP COPYRIGHT PROTECTION Shen Tao Chinese Academy of Surveying and Mapping, Beijing 100039, China shentao@casm.ac.cn Xu Dehe Institute of resources and environment, North
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Credibility: Evaluating what s been learned Issues: training, testing,
More informationAN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART
13 AN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART Eva Vona University of Ostrava, 30th dubna st. 22, Ostrava, Czech Repubic e-mai: Eva.Vona@osu.cz Abstract: This artice presents the use of
More informationProceedings of the International Conference on Systolic Arrays, San Diego, California, U.S.A., May 25-27, 1988 AN EFFICIENT ASYNCHRONOUS MULTIPLIER!
[1,2] have, in theory, revoutionized cryptography. Unfortunatey, athough offer many advantages over conventiona and authentication), such cock synchronization in this appication due to the arge operand
More informationSearching, Sorting & Analysis
Searching, Sorting & Anaysis Unit 2 Chapter 8 CS 2308 Fa 2018 Ji Seaman 1 Definitions of Search and Sort Search: find a given item in an array, return the index of the item, or -1 if not found. Sort: rearrange
More informationOn Upper Bounds for Assortment Optimization under the Mixture of Multinomial Logit Models
On Upper Bounds for Assortment Optimization under the Mixture of Mutinomia Logit Modes Sumit Kunnumka September 30, 2014 Abstract The assortment optimization probem under the mixture of mutinomia ogit
More informationAn Introduction to Design Patterns
An Introduction to Design Patterns 1 Definitions A pattern is a recurring soution to a standard probem, in a context. Christopher Aexander, a professor of architecture Why woud what a prof of architecture
More informationChapter Multidimensional Direct Search Method
Chapter 09.03 Mutidimensiona Direct Search Method After reading this chapter, you shoud be abe to:. Understand the fundamentas of the mutidimensiona direct search methods. Understand how the coordinate
More informationSpace-Time Trade-offs.
Space-Time Trade-offs. Chethan Kamath 03.07.2017 1 Motivation An important question in the study of computation is how to best use the registers in a CPU. In most cases, the amount of registers avaiabe
More informationMACHINE learning techniques can, automatically,
Proceedings of Internationa Joint Conference on Neura Networks, Daas, Texas, USA, August 4-9, 203 High Leve Data Cassification Based on Network Entropy Fiipe Aves Neto and Liang Zhao Abstract Traditiona
More informationA HYBRID FEATURE SELECTION METHOD BASED ON FISHER SCORE AND GENETIC ALGORITHM
Journa of Mathematica Sciences: Advances and Appications Voume 37, 2016, Pages 51-78 Avaiabe at http://scientificadvances.co.in DOI: http://dx.doi.org/10.18642/jmsaa_7100121627 A HYBRID FEATURE SELECTION
More informationModel-driven Collaboration and Information Integration for Enhancing Video Semantic Concept Detection
Mode-driven Coaboration and Information Integration for Enhancing Video Semantic Concept Detection Tao Meng, Mei-Ling Shyu Department of Eectrica and Computer Engineering University of Miami Cora Gabes,
More informationOF SCIENTIFIC DATABASES
CHAR4mCS OF SCIENTIFIC DATABASES Arie Shoshani, Frank Oken, and Harry K.T. Wong Computer Science Research Department University of Caifornia, Lawrence Berkeey Laboratory Berkeey, Caifornia 94720 The purpose
More informationLecture outline Graphics and Interaction Scan Converting Polygons and Lines. Inside or outside a polygon? Scan conversion.
Lecture outine 433-324 Graphics and Interaction Scan Converting Poygons and Lines Department of Computer Science and Software Engineering The Introduction Scan conversion Scan-ine agorithm Edge coherence
More informationNo connection establishment Do not perform Flow control Error control Retransmission Suitable for small request/response scenario E.g.
UDP & TCP 2018/3/26 UDP Header Characteristics of UDP No connection estabishment Do not perform Fow contro Error contro Retransmission Suitabe for sma request/response scenario E.g., DNS Remote Procedure
More informationSelf-Control Cyclic Access with Time Division - A MAC Proposal for The HFC System
Sef-Contro Cycic Access with Time Division - A MAC Proposa for The HFC System S.M. Jiang, Danny H.K. Tsang, Samue T. Chanson Hong Kong University of Science & Technoogy Cear Water Bay, Kowoon, Hong Kong
More informationM. Badent 1, E. Di Giacomo 2, G. Liotta 2
DIEI Dipartimento di Ingegneria Eettronica e de informazione RT 005-06 Drawing Coored Graphs on Coored Points M. Badent 1, E. Di Giacomo 2, G. Liotta 2 1 University of Konstanz 2 Università di Perugia
More informationEvaluating Machine Learning Methods: Part 1
Evaluating Machine Learning Methods: Part 1 CS 760@UW-Madison Goals for the lecture you should understand the following concepts bias of an estimator learning curves stratified sampling cross validation
More informationFiltering. Yao Wang Polytechnic University, Brooklyn, NY 11201
Spatia Domain Linear Fitering Yao Wang Poytechnic University Brookyn NY With contribution rom Zhu Liu Onur Gueryuz and Gonzaez/Woods Digita Image Processing ed Introduction Outine Noise remova using ow-pass
More informationHiding secrete data in compressed images using histogram analysis
University of Woongong Research Onine University of Woongong in Dubai - Papers University of Woongong in Dubai 2 iding secrete data in compressed images using histogram anaysis Farhad Keissarian University
More informationPath-Based Protection for Surviving Double-Link Failures in Mesh-Restorable Optical Networks
Path-Based Protection for Surviving Doube-Link Faiures in Mesh-Restorabe Optica Networks Wensheng He and Arun K. Somani Dependabe Computing and Networking Laboratory Department of Eectrica and Computer
More informationNeural Networks. Aarti Singh. Machine Learning Nov 3, Slides Courtesy: Tom Mitchell
Neura Networks Aarti Singh Machine Learning 10-601 Nov 3, 2011 Sides Courtesy: Tom Mitche 1 Logis0c Regression Assumes the foowing func1ona form for P(Y X): Logis1c func1on appied to a inear func1on of
More informationStatistics 202: Statistical Aspects of Data Mining
Statistics 202: Statistical Aspects of Data Mining Professor Rajan Patel Lecture 9 = More of Chapter 5 Agenda: 1) Lecture over more of Chapter 5 1 Introduction to Data Mining by Tan, Steinbach, Kumar Chapter
More informationOn-Chip CNN Accelerator for Image Super-Resolution
On-Chip CNN Acceerator for Image Super-Resoution Jung-Woo Chang and Suk-Ju Kang Dept. of Eectronic Engineering, Sogang University, Seou, South Korea {zwzang91, sjkang}@sogang.ac.kr ABSTRACT To impement
More informationIdentifying and Tracking Pedestrians Based on Sensor Fusion and Motion Stability Predictions
Sensors 2010, 10, 8028-8053; doi:10.3390/s100908028 OPEN ACCESS sensors ISSN 1424-8220 www.mdpi.com/journa/sensors Artice Identifying and Tracking Pedestrians Based on Sensor Fusion and Motion Stabiity
More informationAutomatic Hidden Web Database Classification
Automatic idden Web atabase Cassification Zhiguo Gong, Jingbai Zhang, and Qian Liu Facuty of Science and Technoogy niversity of Macau Macao, PRC {fstzgg,ma46597,ma46620}@umac.mo Abstract. In this paper,
More informationBacking-up Fuzzy Control of a Truck-trailer Equipped with a Kingpin Sliding Mechanism
Backing-up Fuzzy Contro of a Truck-traier Equipped with a Kingpin Siding Mechanism G. Siamantas and S. Manesis Eectrica & Computer Engineering Dept., University of Patras, Patras, Greece gsiama@upatras.gr;stam.manesis@ece.upatras.gr
More informationNeural Networks. Aarti Singh & Barnabas Poczos. Machine Learning / Apr 24, Slides Courtesy: Tom Mitchell
Neura Networks Aarti Singh & Barnabas Poczos Machine Learning 10-701/15-781 Apr 24, 2014 Sides Courtesy: Tom Mitche 1 Logis0c Regression Assumes the foowing func1ona form for P(Y X): Logis1c func1on appied
More informationfile://j:\macmillancomputerpublishing\chapters\in073.html 3/22/01
Page 1 of 15 Chapter 9 Chapter 9: Deveoping the Logica Data Mode The information requirements and business rues provide the information to produce the entities, attributes, and reationships in ogica mode.
More informationSpecial Edition Using Microsoft Excel Selecting and Naming Cells and Ranges
Specia Edition Using Microsoft Exce 2000 - Lesson 3 - Seecting and Naming Ces and.. Page 1 of 8 [Figures are not incuded in this sampe chapter] Specia Edition Using Microsoft Exce 2000-3 - Seecting and
More informationUtility-based Camera Assignment in a Video Network: A Game Theoretic Framework
This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Y.LI AND B.BHANU CAMERA ASSIGNMENT: A GAME-THEORETIC
More informationFastest-Path Computation
Fastest-Path Computation DONGHUI ZHANG Coege of Computer & Information Science Northeastern University Synonyms fastest route; driving direction Definition In the United states, ony 9.% of the househods
More informationAd Hoc Networks 11 (2013) Contents lists available at SciVerse ScienceDirect. Ad Hoc Networks
Ad Hoc Networks (3) 683 698 Contents ists avaiabe at SciVerse ScienceDirect Ad Hoc Networks journa homepage: www.esevier.com/ocate/adhoc Dynamic agent-based hierarchica muticast for wireess mesh networks
More informationOpen Access CS-1-SVM: Improved One-class SVM for Detecting API Abuse on Open Network Service
Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Contro Systems Journa, 2015, 7, 1293-1300 1293 Open Access CS-1-SVM: Improved One-cass SVM for Detecting API Abuse on Open
More informationA METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS. A. C. Finch, K. J. Mackenzie, G. J. Balsdon, G. Symonds
A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS A C Finch K J Mackenzie G J Basdon G Symonds Raca-Redac Ltd Newtown Tewkesbury Gos Engand ABSTRACT The introduction of fine-ine technoogies to printed
More informationFuzzy Equivalence Relation Based Clustering and Its Use to Restructuring Websites Hyperlinks and Web Pages
Fuzzy Equivaence Reation Based Custering and Its Use to Restructuring Websites Hyperinks and Web Pages Dimitris K. Kardaras,*, Xenia J. Mamakou, and Bi Karakostas 2 Business Informatics Laboratory, Dept.
More informationAUTOMATIC gender classification based on facial images
SUBMITTED TO IEEE TRANSACTIONS ON NEURAL NETWORKS 1 Gender Cassification Using a Min-Max Moduar Support Vector Machine with Incorporating Prior Knowedge Hui-Cheng Lian and Bao-Liang Lu, Senior Member,
More informationAn Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.
[Type text] [Type text] [Type text] ISSN : 0974-7435 Voume 10 Issue 16 BioTechnoogy 014 An Indian Journa FULL PAPER BTAIJ, 10(16), 014 [999-9307] Study on prediction of type- fuzzy ogic power system based
More informationProbabilistic Classifiers DWML, /27
Probabilistic Classifiers DWML, 2007 1/27 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8.4 & 8.5 Han, Chapters 4.5 & 4.6 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data
More information