Introduction Entity Match Service. Step-by-Step Description

Size: px
Start display at page:

Download "Introduction Entity Match Service. Step-by-Step Description"

Transcription

1 Introduction Entity Match Service In order to incorporate as much institutional data into our central alumni and donor database (hereafter referred to as CADS ), we ve developed a comprehensive suite of automated entity match services. The CADS database contains millions of entity records. The Entity Match Service Suite identifies if one of the millions of entity records CADS corresponds with the person represented by the input data, and if so, which record. Unfortunately, looking for exact matches on attributes such as Name, Address, Telephone, etc., will miss many true matches, potentially causing a number of duplicate records to be created in CADS. The reasons an exact match might fail are numerous: ambiguity in data (Thomas Smith and Tom Smith may represent the same person); unformatted data (the same address may be written in multiple ways); missing data elements; out-of-date information; or unrecognized partial matches. In addition to the difficulties posed by attempting an exact match, the sheer volume of data in CADS requires that the number of candidate records be narrowed prior to matching. To address these challenges, the Entity Match Web Service is divided into four general steps: 1. Receiving and accepting the data 2. Deciding which CADS entities to match against (Blocking) 3. Matching the data against the chosen CADS entities (Preliminary Match) 4. Using the Name data to refine and confirm the match results (Secondary Match) Step-by-Step Description In Step 1, the service receives any or all of the following input: First Name, Middle Name, Last Name, Address, Phone, and . If the input fails to meet the minimum requirements, or if the service is unavailable, an error will be generated. The minimum requirements for the Entity Match Service Suite are: During Step 2, the service determines which CADS entities will be looked at as potential matches. Instead of trying to match all the provided input fields against 1.4 million CADS entities, the service uses CADS data to choose a small subset to pass on to Step 3. The three criteria used to make this choice are: In Step 3, each piece of input data is matched against its counterpart for each CADS entity identified during Step 2. An exact match is attempted on all names, addresses, phones, and s on the CADS record. Each time a match is found on an attribute, the CADS entity s match score is increased; each time a non-match is found, the entity s score is decreased. For a detailed look at how the match score is calculated, please the explanation beginning on Page 4.

2 By the end of Step 3, all potential matches are assigned a match confidence value. The values are determined based on the aggregate score calculated during the match process. The various values and their score thresholds are as follows: Low: match score less than or equal to 0 Medium: match score greater than 0 but less than or equal to 8.8 High: match score greater than 8.8 Based on the confidence values assigned to each CADS entity in the pool of potential matches, the following actions are taken: Highest Confidence Value Action Performed in Result Set No results Pass input data to Entity Create Service; automatically create new CADS entity Low Pass input data to Entity Create Service; automatically create new CADS entity Medium Pass Medium result(s) to Exception Interface High Pass High result(s) to Step 4 Step 4 loops back and takes a final look at the name input and compares it to the name data present on each matched CADS record. Because Step 2 looks at other attributes than name when choosing entities to pass to Step 3, it s possible for two members of the same household to both be assigned a High match confidence value. It s also possible that the name input might include a misspelling or use a nickname not recorded on a CADS entity s record. In order to compensate for these possibilities, the service runs all High confidence results through the decision tree depicted below: The first name check compares the calculated Oracle SoundEx value of the input name with the known SoundEx value of all names found on the matched entity s record. This screens out members of the same household who have different first names. 2

3 The second name check employs the Jaro-Winkler similarity metric to calculate the string distance between the input First Name and all First Names found on the matched entity s record. Comparing the names character-bycharacter prevents names which sound different due to a misspelling from erroneously excluding an entity that is an otherwise correct match. The third name check runs the input First Name against a custom-built synonym table then, if any synonyms are found, compares those synonyms to all First Names found on the matched entity s record. Any single High confidence match that passes any one of the name checks is considered to represent the same individual as the input data. The input data is passed to the Entity Update Service, which will automatically apply any new or different input data to the applicable CADS entity record. Any High confidence match that cannot pass the three name checks is assigned a new confidence value of High*. This conditional high value indicates that although the CADS entity was a numerically strong candidate, there is insufficient name evidence for the Entity Match Service to decide that the CADS entity and the entity represented by the input data are truly the same individual. These records are passed to the Exception Interface where an expert can make the final determination regarding the match. Example of Match Process 3

4 Appendix Weights and Matching During the match process, each attribute is assigned a positive or negative match score, also known as a weight. The value of each weight was precalculated during the development process of the Entity Match Service Suite; the array of positive and negative weights is specially calibrated for the CADS dataset. Attribute Weight (Positive Match) Weight (Data Not Present in Input OR Not Present in CADS) Weight (Negative Match) First Name Middle Name Last Name Address Phone Highest Possible Score: Lowest Possible Score: Results Skewed Toward Address, Phone, and Because far fewer people share an Address, Phone and/or than share a name, these three attributes have a high positive weight. In other words, two sets of entity information that share an Address, Phone, and/or have a statistically significant likelihood to represent the same person. This does not mean, however, that two sets of entity information that do not share an Address, Phone, and/or are strongly predisposed not to represent the same person. The power of the Address, Phone and match is one of positive correlation. First Middle Last Name Address Phone Score Status Name Name Match High Match High Match High Match High Match Medium Match Low Match Low Results Skewed Toward Names Unlike Address, Phone, and , the power of the First Name and Last Name match is one of negative correlation. Since many different people can share the same first and/or last name, a positive match for those attributes isn t very powerful. A negative match on those attributes is powerful because it is much more likely to indicate that the two sets of entity information being compared do not represent the same person. For example, it is much more likely that many different people are named John Smith than that one person is named both Robert Johnson and Martin Bridges. 4

5 First Middle Last Name Address Phone Score Status Name Name Match High Match High Match Medium Match Low Match Low Match Low Match Low Taking a Closer Look at 50/50 Matches Match 11 shows both categories at their weakest: negative matches for Address, Phone, and ; and positive matches for First Name, Middle Name, and Last Name. With a total score of only , the positive and negative values for the two categories nearly cancel each other out. Match 4, on the other hand, shows both categories at their most powerful: positive matches for Address, Phone, and ; and negative matches for First Name, Middle Name, and Last Name. In this case, because Address, Phone, and are so heavily weighted, the result is still a High status match. First Name Middle Last Name Address Phone Score Status Name Match High Match Low Summary Weights and Matching In Summary, when determining if two sets of entity information represent the same person: Differing names are more meaningful than matching names Matching Address, Phone and/or are more meaningful than differing Address, Phone, and/or Matching Address, Phone, and/or are more meaningful than differing names 5

Overview of Record Linkage Techniques

Overview of Record Linkage Techniques Overview of Record Linkage Techniques Record linkage or data matching refers to the process used to identify records which relate to the same entity (e.g. patient, customer, household) in one or more data

More information

Duplicate Constituents and Merge Tasks Guide

Duplicate Constituents and Merge Tasks Guide Duplicate Constituents and Merge Tasks Guide 06/12/2017 Altru 4.96 Duplicate Constituents and Merge Tasks US 2017 Blackbaud, Inc. This publication, or any part thereof, may not be reproduced or transmitted

More information

Identifying Duplicate Persons in SACWIS

Identifying Duplicate Persons in SACWIS This knowledge base article discusses how to use the Identify Duplicate Person functionality in SACWIS to locate potential duplicate person records, research each person s information, and exclude the

More information

// The Value of a Standard Schedule Quality Index

// The Value of a Standard Schedule Quality Index // The Value of a Standard Schedule Quality Index Dr. Dan Patterson, PMP CEO & President, Acumen March 2012 Table of Contents INTRODUCTION... 3 WHAT IS THE SCHEDULE INDEX?... 3 HOW IS IT CALCULATED?...

More information

Record Linkage. with SAS and Link King. Dinu Corbu. Queensland Health Health Statistics Centre Integration and Linkage Unit

Record Linkage. with SAS and Link King. Dinu Corbu. Queensland Health Health Statistics Centre Integration and Linkage Unit Record Linkage with SAS and Link King Dinu Corbu Queensland Health Health Statistics Centre Integration and Linkage Unit Presented at Queensland Users Exploring SAS Technology QUEST 4 June 2009 Basics

More information

CSCI 5417 Information Retrieval Systems. Jim Martin!

CSCI 5417 Information Retrieval Systems. Jim Martin! CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 7 9/13/2011 Today Review Efficient scoring schemes Approximate scoring Evaluating IR systems 1 Normal Cosine Scoring Speedups... Compute the

More information

Version 1.4 Paribus Discovery for Microsoft Dynamics CRM User Guide

Version 1.4 Paribus Discovery for Microsoft Dynamics CRM User Guide Version 1.4 Paribus Discovery for Microsoft Dynamics CRM User Guide Document Version 1.3 Release Date: September 2011 QGate Software Limited D2 Fareham Heights, Standard Way, Fareham Hampshire, PO16 8XT

More information

Entity Resolution, Clustering Author References

Entity Resolution, Clustering Author References , Clustering Author References Vlad Shchogolev vs299@columbia.edu May 1, 2007 Outline 1 What is? Motivation 2 Formal Definition Efficieny Considerations Measuring Text Similarity Other approaches 3 Clustering

More information

Collective Entity Resolution in Relational Data

Collective Entity Resolution in Relational Data Collective Entity Resolution in Relational Data I. Bhattacharya, L. Getoor University of Maryland Presented by: Srikar Pyda, Brett Walenz CS590.01 - Duke University Parts of this presentation from: http://www.norc.org/pdfs/may%202011%20personal%20validation%20and%20entity%20resolution%20conference/getoorcollectiveentityresolution

More information

Proceedings of the Eighth International Conference on Information Quality (ICIQ-03)

Proceedings of the Eighth International Conference on Information Quality (ICIQ-03) Record for a Large Master Client Index at the New York City Health Department Andrew Borthwick ChoiceMaker Technologies andrew.borthwick@choicemaker.com Executive Summary/Abstract: The New York City Department

More information

2 Corporation Way Suite 150 Peabody, MA

2 Corporation Way Suite 150 Peabody, MA 2 Corporation Way Suite 150 Peabody, MA 01960 888-746-3463 www.locateplus.com Table of Contents Page 3 18 Free Searches and Reports VIP Customer Service & Site Walk-Through s Selecting a GLB Use for Searches

More information

Going Further with matchit. Version 5.2

Going Further with matchit. Version 5.2 Going Further with matchit Version 5.2 Copyright matchit is copyright helpit systems ltd 1994-2008, all rights reserved. FoxPro is copyright Microsoft Corporation 1988-2008, all rights reserved. Trademarks

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Version 1.5 Paribus Discovery for Saleslogix User Guide

Version 1.5 Paribus Discovery for Saleslogix User Guide Version 1.5 Paribus Discovery for Saleslogix User Guide Document Version 1.3 Release Date: July 2014 QGate Software Limited D2 Fareham Heights, Standard Way, Fareham Hampshire, PO16 8XT United Kingdom

More information

Legal Software Systems, Inc.

Legal Software Systems, Inc. Legal Software Systems, Inc. Conflict of Interest Quick Start Guide Legal Software Systems Inc. 1200 Executive Parkway, Suite 200, Eugene OR 97401 Phone: 800-331-4122 Fax: 541-342-7591 www.legalsoftwaresystems.com

More information

Cluster Analysis Gets Complicated

Cluster Analysis Gets Complicated Cluster Analysis Gets Complicated Collinearity is a natural problem in clustering. So how can researchers get around it? Cluster analysis is widely used in segmentation studies for several reasons. First

More information

Whitepaper Spain SEO Ranking Factors 2012

Whitepaper Spain SEO Ranking Factors 2012 Whitepaper Spain SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics GmbH Greifswalder Straße 212 10405 Berlin Phone: +49-30-3229535-0 Fax: +49-30-3229535-99 E-Mail: info@searchmetrics.com

More information

SMART LIVE CHAT LIMITER

SMART LIVE CHAT LIMITER Technical Disclosure Commons Defensive Publications Series June 26, 2017 SMART LIVE CHAT LIMITER Kurt Wilms Follow this and additional works at: http://www.tdcommons.org/dpubs_series Recommended Citation

More information

Introduction to blocking techniques and traditional record linkage

Introduction to blocking techniques and traditional record linkage Introduction to blocking techniques and traditional record linkage Brenda Betancourt Duke University Department of Statistical Science bb222@stat.duke.edu May 2, 2018 1 / 32 Blocking: Motivation Naively

More information

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics An Oracle White Paper October 2012 Oracle Social Cloud Platform Text Analytics Executive Overview Oracle s social cloud text analytics platform is able to process unstructured text-based conversations

More information

Using Query History to Prune Query Results

Using Query History to Prune Query Results Using Query History to Prune Query Results Daniel Waegel Ursinus College Department of Computer Science dawaegel@gmail.com April Kontostathis Ursinus College Department of Computer Science akontostathis@ursinus.edu

More information

Guide to Google Analytics: Admin Settings. Campaigns - Written by Sarah Stemen Account Manager. 5 things you need to know hanapinmarketing.

Guide to Google Analytics: Admin Settings. Campaigns - Written by Sarah Stemen Account Manager. 5 things you need to know hanapinmarketing. Guide to Google Analytics: Google s Enhanced Admin Settings Written by Sarah Stemen Account Manager Campaigns - 5 things you need to know INTRODUCTION Google Analytics is vital to gaining business insights

More information

Whitepaper Italy SEO Ranking Factors 2012

Whitepaper Italy SEO Ranking Factors 2012 Whitepaper Italy SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics GmbH Greifswalder Straße 212 10405 Berlin Phone: +49-30-3229535-0 Fax: +49-30-3229535-99 E-Mail: info@searchmetrics.com

More information

Tutorial for Windows and Macintosh SNP Hunting

Tutorial for Windows and Macintosh SNP Hunting Tutorial for Windows and Macintosh SNP Hunting 2010 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074

More information

A Parallel Implementation of a Higher-order Self Consistent Mean Field. Effectively solving the protein repacking problem is a key step to successful

A Parallel Implementation of a Higher-order Self Consistent Mean Field. Effectively solving the protein repacking problem is a key step to successful Karl Gutwin May 15, 2005 18.336 A Parallel Implementation of a Higher-order Self Consistent Mean Field Effectively solving the protein repacking problem is a key step to successful protein design. Put

More information

Probabilistic Scoring Methods to Assist Entity Resolution Systems Using Boolean Rules

Probabilistic Scoring Methods to Assist Entity Resolution Systems Using Boolean Rules Probabilistic Scoring Methods to Assist Entity Resolution Systems Using Boolean Rules Fumiko Kobayashi, John R Talburt Department of Information Science University of Arkansas at Little Rock 2801 South

More information

IBM InfoSphere MDM Enterprise Viewer User's Guide

IBM InfoSphere MDM Enterprise Viewer User's Guide IBM InfoSphere Master Data Management Version 11 IBM InfoSphere MDM Enterprise Viewer User's Guide GI13-2661-00 IBM InfoSphere Master Data Management Version 11 IBM InfoSphere MDM Enterprise Viewer User's

More information

The great primary-key debate

The great primary-key debate http://builder.com.com/5100-6388-1045050.html Página 1 de 3 16/11/05 Log in Join now Help SEARCH: Builder.com GO Home : Architect : Database : The great primary-key debate Resources Newsletters Discussion

More information

GiftWorks Import Guide Page 2

GiftWorks Import Guide Page 2 Import Guide Introduction... 2 GiftWorks Import Services... 3 Import Sources... 4 Preparing for Import... 9 Importing and Matching to Existing Donors... 11 Handling Receipting of Imported Donations...

More information

The Matching Engine. The Science of Maximising Legitimate Matches, Minimising False Matches and Taking Control of the Matching Process

The Matching Engine. The Science of Maximising Legitimate Matches, Minimising False Matches and Taking Control of the Matching Process The Matching Engine The Science of Maximising Legitimate Matches, Minimising False Matches and Taking Control of the Matching Process CLEANER DATA. BETTER DECISIONS. The Challenge of Contact Data Matching

More information

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Alejandro Bellogín 1,2, Thaer Samar 1, Arjen P. de Vries 1, and Alan Said 1 1 Centrum Wiskunde

More information

Optimizing Testing Performance With Data Validation Option

Optimizing Testing Performance With Data Validation Option Optimizing Testing Performance With Data Validation Option 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

Manage Duplicate Records in Salesforce PREVIEW

Manage Duplicate Records in Salesforce PREVIEW Manage Duplicate Records in Salesforce Salesforce, Winter 18 PREVIEW Note: This release is in preview. Features described in this document don t become generally available until the latest general availability

More information

USPTO INVENTOR DISAMBIGUATION

USPTO INVENTOR DISAMBIGUATION Team Member: Yang GuanCan Zhang Jing Cheng Liang Zhang HaiChao Lv LuCheng Wang DaoRen USPTO INVENTOR DISAMBIGUATION Institute of Scientific and Technical Information of China SEP 20, 2015 Content 1. Data

More information

Running SNAP. The SNAP Team February 2012

Running SNAP. The SNAP Team February 2012 Running SNAP The SNAP Team February 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More

More information

Detecting Network Intrusions

Detecting Network Intrusions Detecting Network Intrusions Naveen Krishnamurthi, Kevin Miller Stanford University, Computer Science {naveenk1, kmiller4}@stanford.edu Abstract The purpose of this project is to create a predictive model

More information

Data linkages in PEDSnet

Data linkages in PEDSnet 2016/2017 CRISP Seminar Series - Part IV Data linkages in PEDSnet Toan C. Ong, PhD Assistant Professor Department of Pediatrics University of Colorado, Anschutz Medical Campus Content Record linkage background

More information

Tutorial for Windows and Macintosh SNP Hunting

Tutorial for Windows and Macintosh SNP Hunting Tutorial for Windows and Macintosh SNP Hunting 2017 Gene Codes Corporation Gene Codes Corporation 525 Avis Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074

More information

Content-Aware Master Data Management

Content-Aware Master Data Management Karin Murthy, Deepak P, Prasad M. Deshpande, Sreekanth L. Kakaraparthy, Vedula T. Surya Sandeep, Vijaya K. Shyamsundar, Sanjay K. Singh Content-Aware Master Data Management MDM Master data management (MDM)

More information

DATA MANAGEMENT. About This Guide. for the MAP Growth and MAP Skills assessment. Main sections:

DATA MANAGEMENT. About This Guide. for the MAP Growth and MAP Skills assessment. Main sections: DATA MANAGEMENT for the MAP Growth and MAP Skills assessment About This Guide This Data Management Guide is written for leaders at schools or the district who: Prepare and upload student roster data Fix

More information

Whitepaper US SEO Ranking Factors 2012

Whitepaper US SEO Ranking Factors 2012 Whitepaper US SEO Ranking Factors 2012 Authors: Marcus Tober, Sebastian Weber Searchmetrics Inc. 1115 Broadway 12th Floor, Room 1213 New York, NY 10010 Phone: 1 866-411-9494 E-Mail: sales-us@searchmetrics.com

More information

Introduction. Chapter Background Recommender systems Collaborative based filtering

Introduction. Chapter Background Recommender systems Collaborative based filtering ii Abstract Recommender systems are used extensively today in many areas to help users and consumers with making decisions. Amazon recommends books based on what you have previously viewed and purchased,

More information

Employment of Multiple Algorithms for Optimal Path-based Test Selection Strategy. Miroslav Bures and Bestoun S. Ahmed

Employment of Multiple Algorithms for Optimal Path-based Test Selection Strategy. Miroslav Bures and Bestoun S. Ahmed 1 Employment of Multiple Algorithms for Optimal Path-based Test Selection Strategy Miroslav Bures and Bestoun S. Ahmed arxiv:1802.08005v1 [cs.se] 22 Feb 2018 Abstract Executing various sequences of system

More information

Applying Phonetic Hash Functions to Improve Record Linking in Student Enrollment Data

Applying Phonetic Hash Functions to Improve Record Linking in Student Enrollment Data Int'l Conf. Information and Knowledge Engineering IKE'15 187 Applying Phonetic Hash Functions to Improve Record Linking in Student Enrollment Data (Research in progress) A. Pei Wang 1, B. Daniel Pullen

More information

AudienceView How To Guides How to Run a Crosstab Report

AudienceView How To Guides How to Run a Crosstab Report AudienceView How To Guides How to Run a Crosstab Report What is a Crosstab report? A Crosstab Report allows you to conduct data exploration and profiling exercises for selected audiences using the full

More information

selection of similarity functions for

selection of similarity functions for Evaluating Genetic Algorithms for selection of similarity functions for record linkage Faraz Shaikh and Chaiyong Ragkhitwetsagul Carnegie Mellon University School of Computer Science ISRI - Institute for

More information

The Detection of Faces in Color Images: EE368 Project Report

The Detection of Faces in Color Images: EE368 Project Report The Detection of Faces in Color Images: EE368 Project Report Angela Chau, Ezinne Oji, Jeff Walters Dept. of Electrical Engineering Stanford University Stanford, CA 9435 angichau,ezinne,jwalt@stanford.edu

More information

Authorship Disambiguation and Alias Resolution in Data

Authorship Disambiguation and Alias Resolution in  Data Authorship Disambiguation and Alias Resolution in Email Data Freek Maes Johannes C. Scholtes Department of Knowledge Engineering Maastricht University, P.O. Box 616, 6200 MD Maastricht Abstract Given a

More information

ORACLE Communications Pricing Design Center (PDC) Frameworx Information Framework R9.5. Product Conformance Certification Report. Version 11.

ORACLE Communications Pricing Design Center (PDC) Frameworx Information Framework R9.5. Product Conformance Certification Report. Version 11. Frameworx 11.5 Information Framework R9.5 Product Certification Report ORACLE Communications Pricing Design Center (PDC) Version 11.1 January 2012 TM Forum 2011 Table of Contents Table of Contents... 2

More information

An Ensemble Approach for Record Matching in Data Linkage

An Ensemble Approach for Record Matching in Data Linkage Digital Health Innovation for Consumers, Clinicians, Connectivity and Community A. Georgiou et al. (Eds.) 2016 The authors and IOS Press. This article is published online with Open Access by IOS Press

More information

CONCEPTUAL DESIGN FOR SOFTWARE PRODUCTS: SERVICE REQUEST PORTAL. Tyler Munger Subhas Desa

CONCEPTUAL DESIGN FOR SOFTWARE PRODUCTS: SERVICE REQUEST PORTAL. Tyler Munger Subhas Desa CONCEPTUAL DESIGN FOR SOFTWARE PRODUCTS: SERVICE REQUEST PORTAL Tyler Munger Subhas Desa Real World Problem at Cisco Systems Smart Call Home (SCH) is a component of Cisco Smart Services that offers proactive

More information

Construction Change Order analysis CPSC 533C Analysis Project

Construction Change Order analysis CPSC 533C Analysis Project Construction Change Order analysis CPSC 533C Analysis Project Presented by Chiu, Chao-Ying Department of Civil Engineering University of British Columbia Problems of Using Construction Data Hybrid of physical

More information

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS TRIE BASED METHODS FOR STRING SIMILARTIY JOINS Venkat Charan Varma Buddharaju #10498995 Department of Computer and Information Science University of MIssissippi ENGR-654 INFORMATION SYSTEM PRINCIPLES RESEARCH

More information

Ontology-based Integration and Refinement of Evaluation-Committee Data from Heterogeneous Data Sources

Ontology-based Integration and Refinement of Evaluation-Committee Data from Heterogeneous Data Sources Indian Journal of Science and Technology, Vol 8(23), DOI: 10.17485/ijst/2015/v8i23/79342 September 2015 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Ontology-based Integration and Refinement of Evaluation-Committee

More information

CITS4009 Introduction to Data Science

CITS4009 Introduction to Data Science School of Computer Science and Software Engineering CITS4009 Introduction to Data Science SEMESTER 2, 2017: CHAPTER 4 MANAGING DATA 1 Chapter Objectives Fixing data quality problems Organizing your data

More information

II TupleRank: Ranking Discovered Content in Virtual Databases 2

II TupleRank: Ranking Discovered Content in Virtual Databases 2 I Automatch: Database Schema Matching Using Machine Learning with Feature Selection 1 II TupleRank: Ranking Discovered Content in Virtual Databases 2 Jacob Berlin and Amihai Motro 1. Proceedings of CoopIS

More information

DETECTING RESOLVERS AT.NZ. Jing Qiao, Sebastian Castro DNS-OARC 29 Amsterdam, October 2018

DETECTING RESOLVERS AT.NZ. Jing Qiao, Sebastian Castro DNS-OARC 29 Amsterdam, October 2018 DETECTING RESOLVERS AT.NZ Jing Qiao, Sebastian Castro DNS-OARC 29 Amsterdam, October 2018 BACKGROUND DNS-OARC 29 2 DNS TRAFFIC IS NOISY Despite general belief, not all the sources at auth nameserver are

More information

DATA POSITION AND PROFILING IN DOMAIN-INDEPENDENT WAREHOUSE CLEANING

DATA POSITION AND PROFILING IN DOMAIN-INDEPENDENT WAREHOUSE CLEANING DATA POSITION AND PROFILING IN DOMAIN-INDEPENDENT WAREHOUSE CLEANING C.I. Ezeife School of Computer Science, University of Windsor Windsor, Ontario, Canada N9B 3P4 cezeife@uwindsor.ca A. O. Udechukwu,

More information

Grouping methods for ongoing record linkage

Grouping methods for ongoing record linkage Grouping methods for ongoing record linkage Sean M. Randall sean.randall@curtin.edu.au James H. Boyd j.boyd@curtin.edu.au Anna M. Ferrante a.ferrante@curtin.edu.au Adrian P. Brown adrian.brown@curtin.edu.au

More information

Chapter 2: Understanding Data Distributions with Tables and Graphs

Chapter 2: Understanding Data Distributions with Tables and Graphs Test Bank Chapter 2: Understanding Data with Tables and Graphs Multiple Choice 1. Which of the following would best depict nominal level data? a. pie chart b. line graph c. histogram d. polygon Ans: A

More information

CS299 Detailed Plan. Shawn Tice. February 5, The high-level steps for classifying web pages in Yioop are as follows:

CS299 Detailed Plan. Shawn Tice. February 5, The high-level steps for classifying web pages in Yioop are as follows: CS299 Detailed Plan Shawn Tice February 5, 2013 Overview The high-level steps for classifying web pages in Yioop are as follows: 1. Create a new classifier for a unique label. 2. Train it on a labelled

More information

UIN Server Maintenance Manual

UIN Server Maintenance Manual UIN Server Maintenance Manual What Causes Multiple UIN S? Simply put, a person is assigned an additional UIN when his/her information being tested for a match does not exactly match the existing information

More information

IBM InfoSphere Master Data Management Version 11 Release 5. IBM InfoSphere MDM Inspector User's Guide IBM SC

IBM InfoSphere Master Data Management Version 11 Release 5. IBM InfoSphere MDM Inspector User's Guide IBM SC IBM InfoSphere Master Data Management Version 11 Release 5 IBM InfoSphere MDM Inspector User's Guide IBM SC27-6720-01 IBM InfoSphere Master Data Management Version 11 Release 5 IBM InfoSphere MDM Inspector

More information

Comparing Implementations of Optimal Binary Search Trees

Comparing Implementations of Optimal Binary Search Trees Introduction Comparing Implementations of Optimal Binary Search Trees Corianna Jacoby and Alex King Tufts University May 2017 In this paper we sought to put together a practical comparison of the optimality

More information

IBM InfoSphere MDM Inspector User's Guide

IBM InfoSphere MDM Inspector User's Guide IBM InfoSphere Master Data Management Version 11 Release 0 IBM InfoSphere MDM Inspector User's Guide GI13-2653-00 IBM InfoSphere Master Data Management Version 11 Release 0 IBM InfoSphere MDM Inspector

More information

VOID FILL ACCURACY MEASUREMENT AND PREDICTION USING LINEAR REGRESSION VOID FILLING METHOD

VOID FILL ACCURACY MEASUREMENT AND PREDICTION USING LINEAR REGRESSION VOID FILLING METHOD VOID FILL ACCURACY MEASUREMENT AND PREDICTION USING LINEAR REGRESSION J. Harlan Yates, Mark Rahmes, Patrick Kelley, Jay Hackett Harris Corporation Government Communications Systems Division Melbourne,

More information

HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A PHARMACEUTICAL MANUFACTURING LABORATORY

HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A PHARMACEUTICAL MANUFACTURING LABORATORY Proceedings of the 1998 Winter Simulation Conference D.J. Medeiros, E.F. Watson, J.S. Carson and M.S. Manivannan, eds. HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A

More information

etitlesearch User Guide Classic User Accounts

etitlesearch User Guide Classic User Accounts etitlesearch User Guide Classic User Accounts Copyright PropertyInfo Corp. All rights reserved One Solutions Way Hardy, AR 72542-0600 Revised: 04/07/2010 etitlesearch.com User Guide Welcome to etitlesearch,

More information

Implications of Post-NCSC Project Scenarios for Future Test Development

Implications of Post-NCSC Project Scenarios for Future Test Development Implications of Post-NCSC Project Scenarios for Future Test Development Brian Gong Center for Assessment All rights reserved. Any or all portions of this document may be used to support additional study

More information

A Session-based Ontology Alignment Approach for Aligning Large Ontologies

A Session-based Ontology Alignment Approach for Aligning Large Ontologies Undefined 1 (2009) 1 5 1 IOS Press A Session-based Ontology Alignment Approach for Aligning Large Ontologies Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University,

More information

Center, Shape, & Spread Center, shape, and spread are all words that describe what a particular graph looks like.

Center, Shape, & Spread Center, shape, and spread are all words that describe what a particular graph looks like. Center, Shape, & Spread Center, shape, and spread are all words that describe what a particular graph looks like. Center When we talk about center, shape, or spread, we are talking about the distribution

More information

UNIBALANCE Users Manual. Marcin Macutkiewicz and Roger M. Cooke

UNIBALANCE Users Manual. Marcin Macutkiewicz and Roger M. Cooke UNIBALANCE Users Manual Marcin Macutkiewicz and Roger M. Cooke Deflt 2006 1 1. Installation The application is delivered in the form of executable installation file. In order to install it you have to

More information

Hot Fuzz: using SAS fuzzy data matching techniques to identify duplicate and related customer records. Stuart Edwards, Collection House Group

Hot Fuzz: using SAS fuzzy data matching techniques to identify duplicate and related customer records. Stuart Edwards, Collection House Group Hot Fuzz: using SAS fuzzy data matching techniques to identify duplicate and related customer records. Stuart Edwards, Collection House Group Collection House Group; what do we do? Debt Collection; purchased

More information

How Is the CPA Exam Scored? Prepared by the American Institute of Certified Public Accountants

How Is the CPA Exam Scored? Prepared by the American Institute of Certified Public Accountants How Is the CPA Exam Scored? Prepared by the American Institute of Certified Public Accountants Questions pertaining to this decision paper should be directed to Carie Chester, Office Administrator, Exams

More information

EXECUTIVE REPORT ADOBE SYSTEMS, INC. COLDFUSION SECURITY ASSESSMENT

EXECUTIVE REPORT ADOBE SYSTEMS, INC. COLDFUSION SECURITY ASSESSMENT EXECUTIVE REPORT ADOBE SYSTEMS, INC. COLDFUSION SECURITY ASSESSMENT FEBRUARY 18, 2016 This engagement was performed in accordance with the Statement of Work, and the procedures were limited to those described

More information

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day

More information

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96 ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 95-96 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity (Matching)

More information

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find

More information

BETA DEMO SCENARIO - ATTRITION IBM Corporation

BETA DEMO SCENARIO - ATTRITION IBM Corporation BETA DEMO SCENARIO - ATTRITION 1 Please Note: IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM s sole discretion. Information regarding

More information

PREPARATION GUIDELINES FOR SUSPICIOUS ACTIVITY REPORT FORM (SAR) August 2001

PREPARATION GUIDELINES FOR SUSPICIOUS ACTIVITY REPORT FORM (SAR) August 2001 Banking Commission P.O. Box D Majuro, Marshall Islands 96960 Phone: (692) 625-6310 Fax: (692) 625-6309 e-mail: bankcom@ntamar.com PREPARATION GUIDELINES FOR SUSPICIOUS ACTIVITY REPORT FORM (SAR) August

More information

I. Contact Information: Lynn Herrick Director, Technology Integration and Project Management Wayne County Department of Technology

I. Contact Information: Lynn Herrick Director, Technology Integration and Project Management Wayne County Department of Technology CySAFE Security Assessment Tool Wayne County, Michigan P a g e 1 I. Contact Information: Lynn Herrick Director, Technology Integration and Project Management Wayne County Department of Technology 313.224.6006

More information

2. Smoothing Binning

2. Smoothing Binning Macro %shtscore is primarily designed for score building when the dependent variable is binary. There are several components in %shtscore: 1. Variable pre-scanning; 2. Smoothing binning; 3. Information

More information

Data: a collection of numbers or facts that require further processing before they are meaningful

Data: a collection of numbers or facts that require further processing before they are meaningful Digital Image Classification Data vs. Information Data: a collection of numbers or facts that require further processing before they are meaningful Information: Derived knowledge from raw data. Something

More information

Introduction. Aleksandar Rakić Contents

Introduction. Aleksandar Rakić Contents Beograd ETF Fuzzy logic Introduction Aleksandar Rakić rakic@etf.rs Contents Definitions Bit of History Fuzzy Applications Fuzzy Sets Fuzzy Boundaries Fuzzy Representation Linguistic Variables and Hedges

More information

dtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Keith Kranker

dtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Keith Kranker dtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Presentation at the 2018 Stata Conference Columbus, Ohio July 20, 2018 Keith Kranker Abstract Stata users

More information

Professional Evaluation and Certification Board Frequently Asked Questions

Professional Evaluation and Certification Board Frequently Asked Questions Professional Evaluation and Certification Board Frequently Asked Questions 1. About PECB... 2 2. General... 2 3. PECB Official Training Courses... 4 4. Course Registration... 5 5. Certification... 5 6.

More information

arxiv: v2 [cs.lg] 11 Sep 2015

arxiv: v2 [cs.lg] 11 Sep 2015 A DEEP analysis of the META-DES framework for dynamic selection of ensemble of classifiers Rafael M. O. Cruz a,, Robert Sabourin a, George D. C. Cavalcanti b a LIVIA, École de Technologie Supérieure, University

More information

Computing Classic Closeness Centrality, at Scale

Computing Classic Closeness Centrality, at Scale Computing Classic Closeness Centrality, at Scale Edith Cohen Joint with: Thomas Pajor, Daniel Delling, Renato Werneck Very Large Graphs Model relations and interactions (edges) between entities (nodes)

More information

Running SNAP. The SNAP Team October 2012

Running SNAP. The SNAP Team October 2012 Running SNAP The SNAP Team October 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More

More information

Political Organization Filing and Disclosure. Search Process User Guide

Political Organization Filing and Disclosure. Search Process User Guide Political Organization Filing and Disclosure Search Process User Guide Table of Contents 1.0 INTRODUCTION...4 1.1 Purpose... 4 1.2 How to Use this Guide... 4 1.3 Political Organization Disclosure... 4

More information

The Results of Falcon-AO in the OAEI 2006 Campaign

The Results of Falcon-AO in the OAEI 2006 Campaign The Results of Falcon-AO in the OAEI 2006 Campaign Wei Hu, Gong Cheng, Dongdong Zheng, Xinyu Zhong, and Yuzhong Qu School of Computer Science and Engineering, Southeast University, Nanjing 210096, P. R.

More information

2.19 Software Release Document Addendum

2.19 Software Release Document Addendum 2.19 Software Release Document Addendum Guidance for WIC Users with: Role 10 - LSA Implementation Date: 2/22/2014 2/22/2014 Release 2.19 1 Table of Contents System Administration: Duplicate Participant

More information

DATABASE DEVELOPMENT (H4)

DATABASE DEVELOPMENT (H4) IMIS HIGHER DIPLOMA QUALIFICATIONS DATABASE DEVELOPMENT (H4) Friday 3 rd June 2016 10:00hrs 13:00hrs DURATION: 3 HOURS Candidates should answer ALL the questions in Part A and THREE of the five questions

More information

Administrative Guidance for Internally Assessed Units

Administrative Guidance for Internally Assessed Units Administrative Guidance for Internally Assessed Units CiDA Certificate in Digital Application This document covers the requirements for the following units: Unit 2 Creative Multimedia (DA202) Unit 3 Artwork

More information

User Manual. Last updated 1/19/2012

User Manual. Last updated 1/19/2012 User Manual Last updated 1/19/2012 1 Table of Contents Introduction About VoteCast 4 About Practical Political Consulting 4 Contact Us 5 Signing In 6 Main Menu 7 8 Voter Lists Voter Selection (Create New

More information

AGIIS Duplicate Prevention

AGIIS Duplicate Prevention Objective The purpose of this document is to provide an understanding and background of what is considered a duplicate entity within the Ag Industry Identification System (AGIIS) and what processes and

More information

ARELLO.COM Licensee Verification Web Service v2.0 (LVWS v2) Documentation. Revision: 8/22/2018

ARELLO.COM Licensee Verification Web Service v2.0 (LVWS v2) Documentation. Revision: 8/22/2018 ARELLO.COM Licensee Verification Web Service v2.0 (LVWS v2) Documentation Revision: 8/22/2018 Table of Contents Revision: 8/22/2018... 1 Introduction... 3 Subscription... 3 Interface... 3 Formatting the

More information

Data.com Record Matching in Salesforce

Data.com Record Matching in Salesforce Data.com Record Matching in Salesforce Salesforce, Winter 16 @salesforcedocs Last updated: October 1, 2015 Copyright 2000 2015, inc. All rights reserved. Salesforce is a registered trademark of, inc.,

More information

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Richard Kershaw and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering, Viterbi School

More information

U1. Data Base Management System (DBMS) Unit -1. MCA 203, Data Base Management System

U1. Data Base Management System (DBMS) Unit -1. MCA 203, Data Base Management System Data Base Management System (DBMS) Unit -1 New Delhi-63,By Vaibhav Singhal, Asst. Professor U2.1 1 Data Base Management System Data: Data is the basic raw,fact and figures Ex: a name, a digit, a picture

More information