Data Linkages - Effect of Data Quality on Linkage Outcomes
|
|
- Stewart Harper
- 5 years ago
- Views:
Transcription
1 Data Linkages - Effect of Data Quality on Linkage Outcomes Anders Alexandersson July 27, 2016 Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage OutcomesJuly 27, / 13
2 Introduction Data linkage synonyms = record linkage, record matching, re-identification, entitity heterogeneity, and merge/purge. Aim = Determine the true match status of each comparison pair: a match if records belong to the same individual and a non-match if records belong to different individuals. Use linkage criteria to assign a link status for each comparison pair: a link if records are classified as belonging to the same individual and a non-link if records are classified as belonging to different individuals. Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage OutcomesJuly 27, / 13
3 The Problem Ideally, all matches are classified as links, and all non-matches are classified as non-links. This presentation will demonstrate how data quality affects linkage outcomes. There are two types of possible errors: Type 1: False matches = linked non-matches ( false positives ) Type 2: Missed matches = non-linked matches ( false negatives ) Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage OutcomesJuly 27, / 13
4 The Table of Confusion The four outcomes can be displayed in a 2*2 table of confusion or error matrix : Figure 1: Table of Confusion Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage OutcomesJuly 27, / 13
5 Linkage Quality Measures Match status errors: True positive rate (TPR), matching rate, sensitivity, power = TP / Matches FNR, miss rate, beta (alpha in R) error = FN / Matches FPR, false match rate, alpha (beta in R) error = FP / Non-matches TNR or specificity = TN / Non-matches Linkage errors: Positive predictive value (PPV) or precision = TP / Links False discovery rate, false match rate (again!) = FP / Links False omission rate = FN / Non-links Negative predictive value (NPV) = TN / Non-links Record pairs quality measures: Accuracy = (TP + TN) / Record pairs Prevalence = Links / Record pairs Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage OutcomesJuly 27, / 13
6 The Solution: Probabilistic Record Linkage The theory behind probabilistic record linkage is based on probabilities. This improves on traditional, simple rule-based, deterministic record linkage. The standard reference is Fellegi-Sunter (1969). FPR = FP / Non-matches = u-probability TPR = TP / Matches = m-probability In practice, the process involves three key steps: 1 Preprocessing 2 Linking 3 Clerical review Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage OutcomesJuly 27, / 13
7 Step 1: Preprocessing Typically, preprocessing consists of two substeps: 1 parse a field (variable, column) into the relevant subcomponents 2 standardize common character strings Several data linkage software do not have features for preprocessing. Examples are BigMatch and the R package RecordLinkage. For preprocessing, any good stat software will work. We use the NYSIIS phonetic code to handle spelling mistakes in names. Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage OutcomesJuly 27, / 13
8 Example code Here is example code in Stata. The original data are in R.. R: load("rldata500.rda"). R: load.data(rldata500). decode fname_c1, gen(fname_c1s). nysiis fname_c1s, gen(nysf). list fname_c1 nysf in 1/ fname_c1 nysf CARSTEN carstan 2. GERD gad 3. ROBERT rabad Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage OutcomesJuly 27, / 13
9 Step 2: Linking At FCDS, we use the user-written R package RecordLinkage for the linking. We used to use the software AutoMatch. Example code in R: rpairs <- compare.linkage(rmort1,rpatient1,blockfld=c( ssn, sex ), strcmp=4:7,exclude=c( pid, address, st, county, zip, mi )); rpairs$pairs[c(1:5), ]; # (list obs 1-5, comparison pattern only) rpairs <- emweights(rpairs); # (calculate EM weights) summary(rpairs); # (show weight distribution ### pairs) tail(getpairs(rpairs, 40, 30)); # review obs to determine thresholds result <- emclassify(rpairs, 40, 30); # classification summary(result); Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage OutcomesJuly 27, / 13
10 Example output in R Anders Alexandersson Figure 2: Linkage result in R Data Linkages - Effect of Data Quality on Linkage Outcomes July 27, / 13
11 Step 3: Clerical Review At FCDS, we use the user-written Stata command clrevmatch for the clerical review. Example code in Stata: clrevmatch using cler_reviewed_14jul2016, idm(mort_id) idu(pat_id) /// varm(pass mort_id id1 fname_1 lname_1 ssn_1 dob_1 sex_1 race_1) /// varu(pass pat_id id2 fname_2 lname_2 ssn_2 dob_2 sex_2 race_2) /// clrev_result(crev) clrev_label(0 not match 1 match ) /// clrev_note(crnote) /// rlscoremin(30) rlscoremax(45) reclinkscore(weight) /// nobssave(1) replace saveold Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage Outcomes July 27, / 13
12 Data Linkage Requirements 1 At a minimum, the following information is required to link records with FCDS: First name, Last name, Sex, and Date of Birth and/or Social Security Number. 2 Additional information such as Middle Initial, Alias Name, Maiden Name, Race, Street Address, City, State, Zip Code and Birthplace improves linkage outcomes. Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage Outcomes July 27, / 13
13 Conclusion Data quality is central to data linkage outcomes! 1 Quality of identifiers: Most important. 2 Quality of linkage methods: Probabilistic linkage is recommended but has limitations. 3 Quality of evaluation: A clerical review note is better than usual. Match-status data would be best. Future work: 1 Improve existing code template. For example, Stata users can use more efficient code with command Rcall than with rsource. 2 Learn more R to better understand the package RecordLinkage. For example, it is possible but very challenging to create match-status data. R users can use Stata code with the package RStata. 3 Stay on top of methods. Examples are machine learning and literate programming. 4 Stay on top of software developments. For instance, a new version of LinkPlus is expected this year. Anders Alexandersson Data Linkages - Effect of Data Quality on Linkage Outcomes July 27, / 13
Introduction to blocking techniques and traditional record linkage
Introduction to blocking techniques and traditional record linkage Brenda Betancourt Duke University Department of Statistical Science bb222@stat.duke.edu May 2, 2018 1 / 32 Blocking: Motivation Naively
More informationUse of Synthetic Data in Testing Administrative Records Systems
Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive
More informationData Mining Classification: Alternative Techniques. Imbalanced Class Problem
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems
More informationChuck Cartledge, PhD. 23 September 2017
Introduction K-Nearest Neighbors Na ıve Bayes Hands-on Q&A Conclusion References Files Misc. Big Data: Data Analysis Boot Camp Classification with K-Nearest Neighbors and Na ıve Bayes Chuck Cartledge,
More informationCS4491/CS 7265 BIG DATA ANALYTICS
CS4491/CS 7265 BIG DATA ANALYTICS EVALUATION * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Dr. Mingon Kang Computer Science, Kennesaw State University Evaluation for
More informationdtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Keith Kranker
dtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Presentation at the 2018 Stata Conference Columbus, Ohio July 20, 2018 Keith Kranker Abstract Stata users
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationClassification Part 4
Classification Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Model Evaluation Metrics for Performance Evaluation How to evaluate
More informationOverview of Record Linkage Techniques
Overview of Record Linkage Techniques Record linkage or data matching refers to the process used to identify records which relate to the same entity (e.g. patient, customer, household) in one or more data
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationCross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024
Cross- Valida+on & ROC curve Anna Helena Reali Costa PCS 5024 Resampling Methods Involve repeatedly drawing samples from a training set and refibng a model on each sample. Used in model assessment (evalua+ng
More informationModel s Performance Measures
Model s Performance Measures Evaluating the performance of a classifier Section 4.5 of course book. Taking into account misclassification costs Class imbalance problem Section 5.7 of course book. TNM033:
More informationClasses for record linkage of big data sets
Classes for record linkage of big data sets Andreas Borg, Murat Sariyar July 27, 201 As of version 0., the package RecordLinkage includes extensions to overcome the problem of high memory consumption that
More informationUsing a Probabilistic Model to Assist Merging of Large-scale Administrative Records
Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records Ted Enamorado Benjamin Fifield Kosuke Imai Princeton University Talk at Seoul National University Fifth Asian Political
More informationMEASURING CLASSIFIER PERFORMANCE
MEASURING CLASSIFIER PERFORMANCE ERROR COUNTING Error types in a two-class problem False positives (type I error): True label is -1, predicted label is +1. False negative (type II error): True label is
More informationLink Plus. A Probabilistic Record Linkage Tool for Cancer Registry Data Linking and Deduplicating. Joe Rogers David Gu Tom Rawson
Link Plus A Probabilistic Record Linkage Tool for Cancer Registry Data Linking and Deduplicating Joe Rogers David Gu Tom Rawson DEPARTMENT OF HEALTH AND HUMAN SERVICES CENTERS FOR DISEASE CONTROL AND PREVENTION
More informationLab 3: Building Compound Comparisons
Lab 3: Building Compound Comparisons In this lab you will build a series of Compound Comparisons. Each Compound Comparison will relate to a group of related Identifiers. And each will hold an ordered list
More informationData linkages in PEDSnet
2016/2017 CRISP Seminar Series - Part IV Data linkages in PEDSnet Toan C. Ong, PhD Assistant Professor Department of Pediatrics University of Colorado, Anschutz Medical Campus Content Record linkage background
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationEvaluating Machine-Learning Methods. Goals for the lecture
Evaluating Machine-Learning Methods Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from
More informationIntegrating BigMatch into Automated Registry Record Linkage Operations
Integrating BigMatch into Automated Registry Record Linkage Operations 2014 NAACCR Annual Conference June 25, 2014 Jason Jacob, MS, Isaac Hands, MPH, David Rust, MS Kentucky Cancer Registry Overview Record
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationApplying Phonetic Hash Functions to Improve Record Linking in Student Enrollment Data
Int'l Conf. Information and Knowledge Engineering IKE'15 187 Applying Phonetic Hash Functions to Improve Record Linking in Student Enrollment Data (Research in progress) A. Pei Wang 1, B. Daniel Pullen
More informationMachine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio
Machine Learning nearest neighbors classification Luigi Cerulo Department of Science and Technology University of Sannio Nearest Neighbors Classification The idea is based on the hypothesis that things
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationPattern recognition (4)
Pattern recognition (4) 1 Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier (1D and
More informationUsing a Probabilistic Model to Assist Merging of Large-scale Administrative Records
Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records Kosuke Imai Princeton University Talk at SOSC Seminar Hong Kong University of Science and Technology June 14, 2017 Joint
More informationPrivacy Preserving Probabilistic Record Linkage
Privacy Preserving Probabilistic Record Linkage Duncan Smith (Duncan.G.Smith@Manchester.ac.uk) Natalie Shlomo (Natalie.Shlomo@Manchester.ac.uk) Social Statistics, School of Social Sciences University of
More informationThe Link King v6.0 User Manual Update
The Link King v6.0 User Manual Update The Link King v6.0 features upgrades in four areas: Enhancement to the display of the final linkage map. Enhancements to preserve the integrity of linked record clusters
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationHOW TO GENERATE AND UNDERSTAND THE UPLOAD LOG REPORT
Florida SHOTS HOW TO GENERATE AND UNDERSTAND THE UPLOAD LOG REPORT www.flshots.com Data Upload Log Review: The main reason to generate an upload log report is to ensure that data is being uploaded to Florida
More informationQuality and Complexity Measures for Data Linkage and Deduplication
Quality and Complexity Measures for Data Linkage and Deduplication Peter Christen and Karl Goiser Department of Computer Science, The Australian National University, Canberra ACT 0200, Australia {peter.christen,karl.goiser}@anu.edu.au
More informationUsing a Probabilistic Model to Assist Merging of Large-scale Administrative Records
Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records Ted Enamorado Benjamin Fifield Kosuke Imai Princeton Harvard Talk at the Tech Science Seminar IQSS, Harvard University
More informationClassification. Instructor: Wei Ding
Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification
More informationCluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University
Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical
More informationEvaluating Machine Learning Methods: Part 1
Evaluating Machine Learning Methods: Part 1 CS 760@UW-Madison Goals for the lecture you should understand the following concepts bias of an estimator learning curves stratified sampling cross validation
More informationMetrics Overfitting Model Evaluation Research directions. Classification. Practical Issues. Huiping Cao. lassification-issues, Slide 1/57
lassification-issues, Slide 1/57 Classification Practical Issues Huiping Cao lassification-issues, Slide 2/57 Outline Criteria to evaluate a classifier Underfitting and overfitting Model evaluation lassification-issues,
More informationBinary Diagnostic Tests Clustered Samples
Chapter 538 Binary Diagnostic Tests Clustered Samples Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. In the twogroup case, each cluster
More informationAutomatic Detection of Change in Address Blocks for Reply Forms Processing
Automatic Detection of Change in Address Blocks for Reply Forms Processing K R Karthick, S Marshall and A J Gray Abstract In this paper, an automatic method to detect the presence of on-line erasures/scribbles/corrections/over-writing
More informationProbabilistic Scoring Methods to Assist Entity Resolution Systems Using Boolean Rules
Probabilistic Scoring Methods to Assist Entity Resolution Systems Using Boolean Rules Fumiko Kobayashi, John R Talburt Department of Information Science University of Arkansas at Little Rock 2801 South
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationPackage RecordLinkage
Version 0.4-8 Title Record Linkage in R Package RecordLinkage May 28, 2015 Author Andreas Borg , Murat Sariyar Maintainer Andreas Borg
More informationA Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression
Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study
More informationCS 584 Data Mining. Classification 3
CS 584 Data Mining Classification 3 Today Model evaluation & related concepts Additional classifiers Naïve Bayes classifier Support Vector Machine Ensemble methods 2 Model Evaluation Metrics for Performance
More informationRLC RLC RLC. Merge ToolBox MTB. Getting Started. German. Record Linkage Software, Version RLC RLC RLC. German. German.
German RLC German RLC German RLC Merge ToolBox MTB German RLC Record Linkage Software, Version 0.742 Getting Started German RLC German RLC 12 November 2012 Tobias Bachteler German Record Linkage Center
More informationOnline Batch Services
Online Batch Services LexisNexis has enhanced its batch services to allow more user-friendly functionality for uploading batches and mapping layouts. Users sign in to the main product to access the online
More informationMultimedia Retrieval. Chapter 1: Performance Evaluation. Dr. Roger Weber, Computer Science / / 2018
Computer Science / 15731-01 / 2018 Multimedia Retrieval Chapter 1: Performance Evaluation Dr. Roger Weber, roger.weber@ubs.com 1.1 Introduction 1.2 Defining a Benchmark for Retrieval 1.3 Boolean Retrieval
More informationExpectation Maximization!
Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationWildfire smoke-detection algorithms evaluation
Wildfire smoke-detection algorithms evaluation Toni Jakovčević, Ljiljana Šerić, Darko Stipaničev, Damir Krstinić Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture, University
More informationMachine Learning for. Artem Lind & Aleskandr Tkachenko
Machine Learning for Object Recognition Artem Lind & Aleskandr Tkachenko Outline Problem overview Classification demo Examples of learning algorithms Probabilistic modeling Bayes classifier Maximum margin
More informationDisease prediction in the at-risk mental state for psychosis using neuroanatomical biomarkers: results from the FePsy-study. Supplementary material
Disease prediction in the at-risk mental state for psychosis using neuroanatomical biomarkers: results from the FePsy-study. Nikolaos Koutsouleris a,ca, MD; Stefan Borgwardt b, MD; Eva M. Meisenzahl, MD;
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationMachine learning in fmri
Machine learning in fmri Validation Alexandre Savio, Maite Termenón, Manuel Graña 1 Computational Intelligence Group, University of the Basque Country December, 2010 1/18 Outline 1 Motivation The validation
More informationRecord Linkage using Probabilistic Methods and Data Mining Techniques
Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More informationProbabilistic Classifiers DWML, /27
Probabilistic Classifiers DWML, 2007 1/27 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium
More informationMachine Learning: Symbolische Ansätze
Machine Learning: Symbolische Ansätze Evaluation and Cost-Sensitive Learning Evaluation Hold-out Estimates Cross-validation Significance Testing Sign test ROC Analysis Cost-Sensitive Evaluation ROC space
More informationPackage hmeasure. February 20, 2015
Type Package Package hmeasure February 20, 2015 Title The H-measure and other scalar classification performance metrics Version 1.0 Date 2012-04-30 Author Christoforos Anagnostopoulos
More informationPlease provide us with your current information below. Your personal information is required in order for us to properly process your dispute.
Consumer Dispute In accordance with FCRA guidelines, your dispute investigation will be completed within thirty days. A Trusted Employees representative will contact you if we require further information
More informationHOW TO TEST CROSS-DEVICE PRECISION & SCALE
HOW TO TEST CROSS-DEVICE PRECISION & SCALE Introduction A key consideration when implementing cross-device campaigns is how to strike the right balance between precision and scale. Your cross-device campaign
More informationPart I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes
Week 4 Based in part on slides from textbook, slides of Susan Holmes Part I Classification & Decision Trees October 19, 2012 1 / 1 2 / 1 Classification Classification Problem description We are given a
More informationNächste Woche. Dienstag, : Vortrag Ian Witten (statt Vorlesung) Donnerstag, 4.12.: Übung (keine Vorlesung) IGD, 10h. 1 J.
1 J. Fürnkranz Nächste Woche Dienstag, 2. 12.: Vortrag Ian Witten (statt Vorlesung) IGD, 10h 4 Donnerstag, 4.12.: Übung (keine Vorlesung) 2 J. Fürnkranz Evaluation and Cost-Sensitive Learning Evaluation
More informationData Mining Classification: Bayesian Decision Theory
Data Mining Classification: Bayesian Decision Theory Lecture Notes for Chapter 2 R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, 2nd ed. New York: Wiley, 2001. Lecture Notes for Chapter
More informationAssessing Deduplication and Data Linkage Quality: What to Measure?
Assessing Deduplication and Data Linkage Quality: What to Measure? http://datamining.anu.edu.au/linkage.html Peter Christen and Karl Goiser Department of Computer Science, Australian National University,
More information[Programming Assignment] (1)
http://crcv.ucf.edu/people/faculty/bagci/ [Programming Assignment] (1) Computer Vision Dr. Ulas Bagci (Fall) 2015 University of Central Florida (UCF) Coding Standard and General Requirements Code for all
More informationDATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS
DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute
More informationAn Ensemble Approach for Record Matching in Data Linkage
Digital Health Innovation for Consumers, Clinicians, Connectivity and Community A. Georgiou et al. (Eds.) 2016 The authors and IOS Press. This article is published online with Open Access by IOS Press
More informationMarin HMIS Online. Introduction to using the Client Services Network
Marin HMIS Online Introduction to using the Client Services Network First time logging into the system To enter the system go to https://www.clientservicesnetwork.com/csnmarinca/ Click on Login. The Login
More informationSession 6 Population and Housing Censuses; Registers of Population, Dwelling, and Buildings Brunei, August 2017
Mariet Tetty Nuryetty mariet@bps.go.id Session 6 Population and Housing Censuses; Registers of Population, Dwelling, and Buildings Brunei, 22-24 August 2017 1. Record Linkage 2. How to do it? As a rule
More informationECLT 5810 Evaluation of Classification Quality
ECLT 5810 Evaluation of Classification Quality Reference: Data Mining Practical Machine Learning Tools and Techniques, by I. Witten, E. Frank, and M. Hall, Morgan Kaufmann Testing and Error Error rate:
More informationDLS DEF1437. Case 2:13-cv Document Filed in TXSD on 11/19/14 Page 1 of 10 USE CASE SPECIFICATION. 2:13-cv /02/2014
Case 2:13-cv-00193 Document 774-33 Filed in TXSD on 11/19/14 Page 1 of 10 An USE CASE SPECIFICATION ISSUE ELECTION CERTIFICATE Texas Department of Public Safety September 13 2013 Version 10 2:13-cv-193
More informationOverview of Record Linkage for Name Matching
Overview of Record Linkage for Name Matching W. E. Winkler, william.e.winkler@census.gov NSF Workshop, February 29, 2008 Outline 1. Components of matching process and nuances Match NSF file of Ph.D. recipients
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationState of Michigan Sex Offender Procedures for OffenderWatch : Importing and setting up an initial Verification Cycle for newly released offenders
State of Michigan Sex Offender Procedures for OffenderWatch : Importing and setting up an initial Verification Cycle for newly released offenders After logging in to OffenderWatch, clicking Offender Search
More informationData Mining D E C I S I O N T R E E. Matteo Golfarelli
Data Mining D E C I S I O N T R E E Matteo Golfarelli Decision Tree It is one of the most widely used classification techniques that allows you to represent a set of classification rules with a tree. Tree:
More information10 Classification: Evaluation
CSE4334/5334 Data Mining 10 Classification: Evaluation Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2018 (Slides courtesy of Pang-Ning Tan, Michael Steinbach
More informationEnter Background Check Request
DFPS Enter Background Check Request A step-by-step guide for Designated ABCS Representatives Department of Family and Protective Services 1/29/2015 Enter Background Check in ABCS This tip sheet will show
More informationEVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM
EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM Assosiate professor, PhD Evgeniya Nikolova, BFU Assosiate professor, PhD Veselina Jecheva,
More informationA NOVEL ALGORITHM FOR THE AUTHENTICATION OF INDIVIDUALS THROUGH RETINAL VASCULAR PATTERN RECOGNITION
XX IMEKO World Congress Metrology for Green Growth September 9 14, 2012, Busan, Republic of Korea A NOVEL ALGORITHM FOR THE AUTHENTICATION OF INDIVIDUALS THROUGH RETINAL VASCULAR PATTERN RECOGNITION L.
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationHuman Object Classification in Daubechies Complex Wavelet Domain
Human Object Classification in Daubechies Complex Wavelet Domain Manish Khare 1, Rajneesh Kumar Srivastava 1, Ashish Khare 1(&), Nguyen Thanh Binh 2, and Tran Anh Dien 2 1 Image Processing and Computer
More informationEster Bernadó-Mansilla. Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain
Learning Classifier Systems for Class Imbalance Problems Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain Aim Enhance the applicability
More informationMeasuring Intrusion Detection Capability: An Information- Theoretic Approach
Measuring Intrusion Detection Capability: An Information- Theoretic Approach Guofei Gu, Prahlad Fogla, David Dagon, Wenke Lee Georgia Tech Boris Skoric Philips Research Lab Outline Motivation Problem Why
More informationNorth Carolina State Laboratory of Public Health HIS HIV Sample Submission Label Format Specifications Forms: DHHS 1111 and 3707
North Carolina State Laboratory of Public Health HIS HIV Sample Submission Label Format Specifications Forms: DHHS 1111 and 3707 Updated: Version 1.3 This document defines the State Laboratory of Public
More informationBackground Motion Video Tracking of the Memory Watershed Disc Gradient Expansion Template
, pp.26-31 http://dx.doi.org/10.14257/astl.2016.137.05 Background Motion Video Tracking of the Memory Watershed Disc Gradient Expansion Template Yao Nan 1, Shen Haiping 2 1 Department of Jiangsu Electric
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8.4 & 8.5 Han, Chapters 4.5 & 4.6 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationOnline Batch Services
Online Batch Services LexisNexis has enhanced its batch services to allow more user-friendly functionality for uploading batches and mapping layouts. Users log into the main product to access the online
More informationMissouri State Highway Patrol. OCN Query Application. Detailed Requirements Specification Version 1.3
Missouri State Highway Patrol OCN Query Application Detailed Requirements Specification Version 1.3 Table of Contents 1 Document Description... 6 1.1 Intent... 6 1.2 Executive Summary... 6 1.3 Overview...
More informationDATA MINING OVERFITTING AND EVALUATION
DATA MINING OVERFITTING AND EVALUATION 1 Overfitting Will cover mechanisms for preventing overfitting in decision trees But some of the mechanisms and concepts will apply to other algorithms 2 Occam s
More informationSSH Compromise Detection using NetFlow/IPFIX. Rick Hofstede, Luuk Hendriks
SSH Compromise Detection using NetFlow/IPFIX Rick Hofstede, Luuk Hendriks 51 percent of respondents admitted that their organizations have already been impacted by an SSH key-related compromise in the
More informationRecord Linkage 11:35 12:04 (Sharp!)
Record Linkage 11:35 12:04 (Sharp!) Rich Pinder Los Angeles Cancer Surveillance Program rpinder@usc.edu NAACCR Short Course Central Cancer Registries: Design, Management and Use Presented at the NAACCR
More informationPackage riskyr. February 19, 2018
Type Package Title Rendering Risk Literacy more Transparent Version 0.1.0 Date 2018-02-16 Author Hansjoerg Neth [aut, cre], Felix Gaisbauer [aut], Nico Gradwohl [aut], Wolfgang Gaissmaier [aut] Maintainer
More informationDATA MINING LECTURE 9. Classification Decision Trees Evaluation
DATA MINING LECTURE 9 Classification Decision Trees Evaluation 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium
More information.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationCHAPTER 3 MACHINE LEARNING MODEL FOR PREDICTION OF PERFORMANCE
CHAPTER 3 MACHINE LEARNING MODEL FOR PREDICTION OF PERFORMANCE In work educational data mining has been used on qualitative data of students and analysis their performance using C4.5 decision tree algorithm.
More information