Record Linkage. with SAS and Link King. Dinu Corbu. Queensland Health Health Statistics Centre Integration and Linkage Unit
|
|
- Jeffery Sherman
- 5 years ago
- Views:
Transcription
1 Record Linkage with SAS and Link King Dinu Corbu Queensland Health Health Statistics Centre Integration and Linkage Unit Presented at Queensland Users Exploring SAS Technology QUEST 4 June 2009
2 Basics Record linkage: searching systematically for records that refer to the same entity in one or more datasets Purposes: building longitudinal datasets for analysis, reporting, planning, operational decision making, etc
3 Typically, a data linkage project follows the steps: Blocking records to reduce the decision space, which has quadratic size n(n-1)/2 - compare Johnson with Johnston, not Johnson with Lee Comparing names, surnames, DoBs, etc using field comparison functions: - phonetic similarity: Soundex, Methaphone, NYSIIS - pattern matching: Edit distance, Jaro, Winkler - 1 for perfect match, 0 for no-match, in between for partial match Classifying the pairs of records to assess the matching/non-matching status - deterministic and/or probabilistic algorithms - biggest challenge: Grey Area (un-certain pairs)
4 Grey Area Using record-pairs comparison functions: a score is assigned for each pair Grey area : where score histograms for matches and non-matches overlap
5 My involvement in records linkage In New Zealand working for a government organisation that runs a range of data matches (listed on ) - tools: big package of tailored SAS codes, batch processed - developments: new pieces of SAS codes to answer new requirements Griffith University, the research team Justice Griffith - the package Febrl was traditionally used there: very elegant and sophisticated (highly configurable, lots of blocking methods and comparison functions), but failing sometimes when processing big datasets - had to find a rescuer: Link King that runs on SAS platform Qld Health s Health Statistic Centre, Integration and Linkage Unit - SAS, Link King, and Febrl: the core of a system we are currently building -aim: linking tens of millions of records from health system and not only
6 Looking for powerful record linkage software Tried first Wikipedia - the top of specialised page:., and (Voilà!) at the bottom of the page: Link King is a free SAS/AF application that must be run on Foundation SAS Could not find something better (and free) when searching somewhere else
7 Starting Link King
8 Main interface
9 Importing data Link King can get data in different formats: SAS, csv, SPSS.
10 Control panel for processing data
11 Classify and Classification Report This report comes after pressing the corresponding button, and it summarises:
12 Cross-referencing of the Deterministic and Probabilistic Classification System
13 Manually reviewing interface It allows to accept /reject links, and to set up and apply filters, working either pair by pair or on subsets of the Grey Area Grey Area is split in 14 subsets, depending on what variables do not match perfectly
14 Mapping the Final Results This is based on the cross-referencing deterministic and probabilistic classification system after manually reviewing the Grey Area
15 Reviewing the Final Results This allows sorting, searching, breaking links, and regrouping
16 Saving Results The Final Master Linking File only, or all final files that Link King produces can be saved in SAS and csv formats To open them later in SAS you have to run first options nofmterr;
17 Linking QHAPDC (Queensland Hospital Admitted Data Collection) with Death Data (from Births, Deaths and Marriages) We build a process that uses SAS, Link King and Febrl (due to its flexibility) to reduce or even eliminate the manual reviewing in Link King. 1. Extract QHAPDC with a SAS macro from several Oracle tables, and: - split it in appropriate subsets, clean, rearrange names - sort with nodupkey option and produce the input file for Link King 2. Running part of the Link King process and refine Grey Area in Febrl: - interrupt Link King process, and extract Grey Area with a SAS macro - refine the Grey Area with Febrl, and export it back using SAS - continue the process in Link King 3. Repeat the procedure for linking the resulting Master File with Death Data
18 Appendix A few things from what we do with SAS to prepare input data for Link King: Extracting one year of data from four Oracle tables
19 Appendix Cleaning data of records about babies not having yet a name and records with too many missing values Rearranging surnames and alternate surnames when some conditions are met
20 Thank you and launching a challenge For refining the Grey Area : We chose to use Febrl because it is more flexible, and it provides a big range of comparison functions We would welcome the creation of an application on SAS platform allowing the Grey Area to be processed automatically using more comparison than Link King has Any questions and suggestions are welcomed. Contact details: - work: dinu_corbu@health.qld.gov.au; home: dinu.corbu@yahoo.com.au;
Self-documenting Data Processes with SAS
Queensland Users Exploring SAS Technology Self-documenting Data Processes with SAS 26 Sept 2013 Dinu Corbu Suncorp - Credit Risk Reporting The Problem SAS Data Processes of High Complexity: The Details
More informationOverview of Record Linkage Techniques
Overview of Record Linkage Techniques Record linkage or data matching refers to the process used to identify records which relate to the same entity (e.g. patient, customer, household) in one or more data
More informationGrouping methods for ongoing record linkage
Grouping methods for ongoing record linkage Sean M. Randall sean.randall@curtin.edu.au James H. Boyd j.boyd@curtin.edu.au Anna M. Ferrante a.ferrante@curtin.edu.au Adrian P. Brown adrian.brown@curtin.edu.au
More informationData Linkage Methods: Overview of Computer Science Research
Data Linkage Methods: Overview of Computer Science Research Peter Christen Research School of Computer Science, ANU College of Engineering and Computer Science, The Australian National University, Canberra,
More informationRecord Linkage using Probabilistic Methods and Data Mining Techniques
Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University
More informationdtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Keith Kranker
dtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Presentation at the 2018 Stata Conference Columbus, Ohio July 20, 2018 Keith Kranker Abstract Stata users
More informationProbabilistic Deduplication, Record Linkage and Geocoding
Probabilistic Deduplication, Record Linkage and Geocoding Peter Christen Data Mining Group, Australian National University in collaboration with Centre for Epidemiology and Research, New South Wales Department
More informationIntroduction Entity Match Service. Step-by-Step Description
Introduction Entity Match Service In order to incorporate as much institutional data into our central alumni and donor database (hereafter referred to as CADS ), we ve developed a comprehensive suite of
More informationTechniques for Large Scale Data Linking in SAS. By Damien John Melksham
Techniques for Large Scale Data Linking in SAS By Damien John Melksham What is Data Linking? Called everything imaginable: Data linking, record linkage, mergepurge, entity resolution, deduplication, fuzzy
More informationIntegrating BigMatch into Automated Registry Record Linkage Operations
Integrating BigMatch into Automated Registry Record Linkage Operations 2014 NAACCR Annual Conference June 25, 2014 Jason Jacob, MS, Isaac Hands, MPH, David Rust, MS Kentucky Cancer Registry Overview Record
More informationCAIR2 Patient Matching: Solving the 25 Million Piece Puzzle
CAIR2 Patient Matching: Solving the 25 Million Piece Puzzle Michael Powell, Immunization Technical Unit Chief, CDPH Mike Berry, HLN Consulting, LLC Gary Wheeler, Account Executive and Strategist, DXC Technology
More informationData Quality Maturity Index (DQMI) Power BI Interactive Report User Guide
Data Quality Maturity Index (DQMI) Power BI Interactive Report User Guide Published 08 May 2018 Copyright 2018 Health and Social Care Information Centre. The Health and Social Care Information Centre is
More informationThe Link King v6.0 User Manual Update
The Link King v6.0 User Manual Update The Link King v6.0 features upgrades in four areas: Enhancement to the display of the final linkage map. Enhancements to preserve the integrity of linked record clusters
More informationUse of Synthetic Data in Testing Administrative Records Systems
Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive
More informationBioGrid Australia (formerly BioGrid) Record Linkage
BioGrid Australia (formerly BioGrid) Record Linkage The Vision remove the silos Population data Hospital data Disease Sub specialty /research data Gene Expression Protein Expression Research Project 1
More informationTexas Death Row. Last Statements. Data Warehousing and Data Mart. By Group 16. Irving Rodriguez Joseph Lai Joe Martinez
Texas Death Row Last Statements Data Warehousing and Data Mart By Group 16 Irving Rodriguez Joseph Lai Joe Martinez Introduction For our data warehousing and data mart project we chose to use the Texas
More informationThe WellComm Report Wizard Guidance and Information
The WellComm Report Wizard Guidance and Information About Testwise Testwise is the powerful online testing platform developed by GL Assessment to host its digital tests. Many of GL Assessment s tests are
More informationData linkages in PEDSnet
2016/2017 CRISP Seminar Series - Part IV Data linkages in PEDSnet Toan C. Ong, PhD Assistant Professor Department of Pediatrics University of Colorado, Anschutz Medical Campus Content Record linkage background
More informationData Linkage Techniques: Past, Present and Future
Data Linkage Techniques: Past, Present and Future Peter Christen Department of Computer Science, The Australian National University Contact: peter.christen@anu.edu.au Project Web site: http://datamining.anu.edu.au/linkage.html
More informationEvaluating Record Linkage Software Using Representative Synthetic Datasets
Evaluating Record Linkage Software Using Representative Synthetic Datasets Benmei Liu, Ph.D. Mandi Yu, Ph.D. Eric J. Feuer, Ph.D. National Cancer Institute Presented at the NAACCR Annual Conference June
More informationHERA and FEDRA Software User Notes: General guide for all users Version 7 Jan 2009
HERA and FEDRA Software User Notes: General guide for all users Version 7 Jan 2009 1 Educational Competencies Consortium Ltd is a not-for-profit, member-driven organisation, offering a unique mix of high
More informationHot Fuzz: using SAS fuzzy data matching techniques to identify duplicate and related customer records. Stuart Edwards, Collection House Group
Hot Fuzz: using SAS fuzzy data matching techniques to identify duplicate and related customer records. Stuart Edwards, Collection House Group Collection House Group; what do we do? Debt Collection; purchased
More informationDatabase Design. 8-4 Drawing Conventions for Readability. Copyright 2015, Oracle and/or its affiliates. All rights reserved.
1 Database Design 8-4 Objectives This lesson covers the following objectives: Apply the Oracle drawing conventions to a data model diagram Identify high volume entities in a data model diagram and explain
More informationedamis Web Forms for sending data to Eurostat
Scope Creating Web Form instances for datasets by using the edamis Web Portal. Preparing data for sender. Making official transfers. Accessing to the Web Form Entry list. Prerequisites To be an edamis
More informationOutline. Probabilistic Name and Address Cleaning and Standardisation. Record linkage and data integration. Data cleaning and standardisation (I)
Outline Probabilistic Name and Address Cleaning and Standardisation Peter Christen, Tim Churches and Justin Xi Zhu Data Mining Group, Australian National University Centre for Epidemiology and Research,
More informationAkeneo PIM Enterprise Edition 1.6
Release Notes User guides on http://www.akeneo.com/doc/user-guide/ Akeneo PIM Enterprise Edition 1.6 Summary Excel, please meet Akeneo. Akeneo, this is Excel...2 Only XLSX? Come on! As Julia, I want to
More informationEnter a value into the Search box and press the Enter key or click the Search icon perform a simple search.
5 Search A search page can be displayed in either the Action pane or, if the search was initiated from a page, it will slide in from the right. You can perform a Simple Search or an Advanced Search. If
More informationNational Child Measurement Programme 2017/18. IT System User Guide part 3. Pupil Data Management
National Child Measurement Programme 2017/18 IT System User Guide part 3 Pupil Data Management Published September 2017 Version 4.0 Introduction 3 Who Should Read this Guidance? 3 How will this Guidance
More informationPerformance and scalability of fast blocking techniques for deduplication and data linkage
Performance and scalability of fast blocking techniques for deduplication and data linkage Peter Christen Department of Computer Science The Australian National University Canberra ACT, Australia peter.christen@anu.edu.au
More informationAn Ensemble Approach for Record Matching in Data Linkage
Digital Health Innovation for Consumers, Clinicians, Connectivity and Community A. Georgiou et al. (Eds.) 2016 The authors and IOS Press. This article is published online with Open Access by IOS Press
More informationVisit for more information about this public domain software or to download The Link King.
Paper 020-30 Rule Your Data with The Link King (a SAS/AF application for record linkage and unduplication) Kevin M Campbell, DrPH Washington State Division of Alcohol and Substance Abuse ABSTRACT Administrative
More informationPrivacy-Preserving Data Sharing and Matching
Privacy-Preserving Data Sharing and Matching Peter Christen School of Computer Science, ANU College of Engineering and Computer Science, The Australian National University, Canberra, Australia Contact:
More informationFebrl Freely Extensible Biomedical Record Linkage
Febrl Freely Extensible Biomedical Record Linkage Release 0.4.01 Peter Christen December 13, 2007 Department of Computer Science The Australian National University Canberra ACT 0200 Australia Email: peter.christen@anu.edu.au
More informationPrivacy Preserving Probabilistic Record Linkage
Privacy Preserving Probabilistic Record Linkage Duncan Smith (Duncan.G.Smith@Manchester.ac.uk) Natalie Shlomo (Natalie.Shlomo@Manchester.ac.uk) Social Statistics, School of Social Sciences University of
More informationAutomatic Record Linkage using Seeded Nearest Neighbour and SVM Classification
Automatic Record Linkage using Seeded Nearest Neighbour and SVM Classification Peter Christen Department of Computer Science, ANU College of Engineering and Computer Science, The Australian National University,
More informationQuality and Complexity Measures for Data Linkage and Deduplication
Quality and Complexity Measures for Data Linkage and Deduplication Peter Christen and Karl Goiser Department of Computer Science, The Australian National University, Canberra ACT 0200, Australia {peter.christen,karl.goiser}@anu.edu.au
More informationOntology-based Integration and Refinement of Evaluation-Committee Data from Heterogeneous Data Sources
Indian Journal of Science and Technology, Vol 8(23), DOI: 10.17485/ijst/2015/v8i23/79342 September 2015 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Ontology-based Integration and Refinement of Evaluation-Committee
More informationLongitudinal Linkage of Cross-Sectional NCDS Data Files Using SPSS
Longitudinal Linkage of Cross-Sectional NCDS Data Files Using SPSS What are we doing when we merge data from two sweeps of the NCDS (i.e. data from different points in time)? We are adding new information
More informationDEFINE THE QUERY AS ONE OF THE DATABASE OBJECTS A Query is database object that retrieves specific information from a database.
LESSON 1 DATABASE OBJECT-QUERY DEFINE THE QUERY AS ONE OF THE DATABASE OBJECTS A Query is database object that retrieves specific information from a database. For example, you can retrieve a student s
More informationNew Data: Sources, Governance, Infrastructure, Analysis. Tim Holt UK Data Forum
New Data: Sources, Governance, Infrastructure, Analysis Tim Holt UK Data Forum What do me mean by new forms of data? What the UK has done so far? What do we mean by infrastructure? Access Going Forward:
More informationEXCELLING WITH ANALYSIS AND VISUALIZATION
EXCELLING WITH ANALYSIS AND VISUALIZATION A PRACTICAL GUIDE FOR DEALING WITH DATA Prepared by Ann K. Emery July 2016 Ann K. Emery 1 Welcome Hello there! In July 2016, I led two workshops Excel Basics for
More informationFingerprint Authentication for SIS-based Healthcare Systems
Fingerprint Authentication for SIS-based Healthcare Systems Project Report Introduction In many applications there is need for access control on certain sensitive data. This is especially true when it
More informationApplying Improved Random Forest Explainability (RFEX 2.0) steps on synthetic data for variable features having a unimodal distribution
Applying Improved Random Forest Explainability (RFEX 2.0) steps on synthetic data for variable features having a unimodal distribution 1. Introduction Sabiha Barlaskar, Dragutin Petkovic SFSU CS Department
More information11/27/2011. Derek Chapman, PhD December Data Linkage Techniques: Tricks of the Trade. General data cleaning issue
Derek Chapman, PhD December 2011 Data Linkage Techniques: Tricks of the Trade General data cleaning issue Linkage can create more duplicates Easier to deal with before linkage Accurate counts are important
More informationCitizen Information Project
Final report: Annex 2: Stakeholders' processes, systems and data 2A: Overview Version Control Date of Issue 14 th June 2005 Version Number 1.0 Version Date Issued by Status 1.0 14/06/2005 PJ Maycock Final
More informationRLC RLC RLC. Merge ToolBox MTB. Getting Started. German. Record Linkage Software, Version RLC RLC RLC. German. German.
German RLC German RLC German RLC Merge ToolBox MTB German RLC Record Linkage Software, Version 0.742 Getting Started German RLC German RLC 12 November 2012 Tobias Bachteler German Record Linkage Center
More informationIntroduction... 1 Part I: How ITIL Can Help You... 7
Contents at a Glance Introduction... 1 Part I: How ITIL Can Help You... 7 Chapter 1: Managing IT Services: Welcome to the World of ITIL...9 Chapter 2: Using the Building Blocks of ITIL...19 Chapter 3:
More informationOnline Application Printing Tutorial
Online Application Printing Tutorial A couple of the more frustrating things for the genealogy staff to deal with on the new online application forms are: 1: Applications that are printed on three pages
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationNZQF Ref 1499 Version 3 Page 1 of 14. This qualification has been reviewed. The last date to meet the requirements is 31 December 2019.
NZQF Ref 1499 Version 3 Page 1 of 14 National Diploma in Business (Level 6) Level 6 Credits 120 This qualification has been reviewed. The last date to meet the requirements is 31 December 2019. Version
More informationIF there is a Better Way than IF-THEN
PharmaSUG 2018 - Paper QT-17 IF there is a Better Way than IF-THEN Bob Tian, Anni Weng, KMK Consulting Inc. ABSTRACT In this paper, the author compares different methods for implementing piecewise constant
More informationThis qualification has been reviewed. The last date to meet the requirements is 31 December 2018.
NZQF NQ Ref 1498 Version 2 Page 1 of 18 National Diploma in Business (Level 5) with optional strands in Accounting, Finance, Finance - Māori, Health and Safety, Human Resource, Māori Business and, Marketing,
More informationIntroduction to blocking techniques and traditional record linkage
Introduction to blocking techniques and traditional record linkage Brenda Betancourt Duke University Department of Statistical Science bb222@stat.duke.edu May 2, 2018 1 / 32 Blocking: Motivation Naively
More informationTechdata Solution. SAS Analytics (Clinical/Finance/Banking)
+91-9702066624 Techdata Solution Training - Staffing - Consulting Mumbai & Pune SAS Analytics (Clinical/Finance/Banking) What is SAS SAS (pronounced "sass", originally Statistical Analysis System) is an
More informationStep-by-step. Functional Skills Claims. Interchange. guide. Making online claims for Functional Skills qualifications via Interchange
Making online claims for Functional Skills qualifications via A step-by-step for centres a h c r e t n I o t e m o c l e W 1 This explains how to make online claims for Functional Skills via. For help
More informationSOPHISTICATED DATA LINKAGE USING SAS
SOPHISTICATED DATA LINKAGE USING SAS Philip J. d' Almada, Battelle Memorial Institute, Atlanta, Georgia ABSTRACT A by-product of the very-iow-birthweight project at the Centers for Disease Control and
More informationAustralian/New Zealand Standard
AS/NZS ISO/IEC 15910:2004 ISO/IEC 15910:1999 AS/NZS ISO/IEC 15910 Australian/New Zealand Standard Information technology Software user documentation process AS/NZS ISO/IEC 15910:2004 This Joint Australian/New
More informationVillage Software. Security Assessment Report
Village Software Security Assessment Report Version 1.0 January 25, 2019 Prepared by Manuel Acevedo Helpful Village Security Assessment Report! 1 of! 11 Version 1.0 Table of Contents Executive Summary
More informationData Management Glossary
Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative
More informationconcrete5 editing cheat sheet
concrete5 editing cheat sheet Welcome to concrete5. This document tells you what you need to know to start editing and updating your website. 1. Logging in Before you can make any changes to your website,
More informationIntroduction to. Sponsored by the Pediatric Research Office (PRO)
Introduction to Sponsored by the Pediatric Research Office (PRO) Agenda Overview of REDCap Basic project work flow Creating a project with REDCap Interactive demonstration Questions and Answers Overview
More informationC exam.34q C IBM InfoSphere QualityStage v9.1 Solution Developer
C2090-304.exam.34q Number: C2090-304 Passing Score: 800 Time Limit: 120 min C2090-304 IBM InfoSphere QualityStage v9.1 Solution Developer Exam A QUESTION 1 You re-ran a job to update the standardized data.
More informationawarding excellence Parnassus FAQs
Parnassus FAQs This document is designed to help answer frequently asked questions and resolve problems that you might encounter when using Parnassus. Please note that some of the screen shots will not
More informationQLD Valuation and Sales Client Reference Guide
QLD Valuation and Sales Client Reference Guide Dated: 11/04/2016 1800 773 773 confirm@citec.com.au Innovative Information Solutions 2 of 16 Table of Contents 1 Queensland Valuation and Sales... 4 2 Address
More informationNational Child Measurement Programme 2017/18. IT System User Guide part 5. Progress and Data Quality Monitoring.
National Child Measurement Programme 2017/18 IT System User Guide part 5 Progress and Data Quality Monitoring. Published September 2017 Version 4.0 Introduction 3 Who Should Read This Guidance? 3 How Will
More informationData Linkage in WA Unlocking the power of data linkage seminar 5 th August 2016
Overview Data Linkage in WA Unlocking the power of data linkage seminar 5 th August 2016 Alex Godfrey, Project Manager Tom Eitelhuber, Manager Data Linkage Systems Data Linkage Branch Department of Health
More informationALTERNATIVE INVESTMENT USER GUIDE
ALTERNATIVE INVESTMENT USER GUIDE CONTENTS 1. SELECT NEW OR EXISTING CLIENT... 2 NEW CLIENT... 3-4 EXISTING CLIENT... 5 SELECTING AN EXISTING CASE... 6 REVIEW SELECTION... 6 CREATE NEW ANALYSIS... 7 2.
More informationDeveloping Data-Driven SAS Programs Using Proc Contents
Developing Data-Driven SAS Programs Using Proc Contents Robert W. Graebner, Quintiles, Inc., Kansas City, MO ABSTRACT It is often desirable to write SAS programs that adapt to different data set structures
More informationSAS RDBMS! A GUI Approach to Data Entry Using SAS/AF Enforced With Relational Model Integrity Constraints from ORACLE
Paper 5 SAS RDBMS! A GUI Approach to Data Entry Using SAS/AF Enforced With Relational Model Integrity Constraints from ORACLE Annie Guo, Ischemia Research and Education Foundation, San Francisco, California
More informationACER Online Assessment and Reporting System (OARS) User Guide
ACER Online Assessment and Reporting System (OARS) User Guide January 2015 Contents Quick guide... 3 Overview... 4 System requirements... 4 Account access... 4 Account set up... 5 Create student groups
More informationExceptions Management Guide
SCI Store Exceptions Management Guide For SCI Store Release 8.5 SCI-DPUG-012 2013 NHS National Services Scotland 1 Introduction... 2 1.1 Purpose... 2 1.1 Intended Audience... 2 1.2 Scope... 2 2 Uploaded
More informationProceedings of the Eighth International Conference on Information Quality (ICIQ-03)
Record for a Large Master Client Index at the New York City Health Department Andrew Borthwick ChoiceMaker Technologies andrew.borthwick@choicemaker.com Executive Summary/Abstract: The New York City Department
More informationFoundation, Phonics and Key Stage Data Collections Formatted and Unformatted Export/Import
Foundation, Phonics and Key Stage Data Collections 2018 Formatted and Unformatted Export/Import All marksheets within Assessment Manager can be exported to Excel as either a formatted or unformatted file.
More informationDiskSavvy Disk Space Analyzer. DiskSavvy DISK SPACE ANALYZER. User Manual. Version Dec Flexense Ltd.
DiskSavvy DISK SPACE ANALYZER User Manual Version 10.3 Dec 2017 www.disksavvy.com info@flexense.com 1 1 Product Overview...3 2 Product Versions...7 3 Using Desktop Versions...8 3.1 Product Installation
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationMonitoring patient status over time using common pain and musculoskeletal outcome measures
icahe Outcomes Calculator Chronic Disease Version Software Guide Monitoring patient status over time using common pain and musculoskeletal outcome measures Prepared by Research Team The International Centre
More informationCorporate Online. Using Accounts
Corporate Online. Using Accounts About this Guide About Corporate Online Westpac Corporate Online is an internet-based electronic platform, providing a single point of entry to a suite of online transactional
More informationEfficient Record De-Duplication Identifying Using Febrl Framework
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 10, Issue 2 (Mar. - Apr. 2013), PP 22-27 Efficient Record De-Duplication Identifying Using Febrl Framework K.Mala
More informationStrategic Crash and Citation Analysis Using a State-Wide Dataset. Alex Wagner Center for Leadership in Public Service
Strategic Crash and Citation Analysis Using a State-Wide Dataset Alex Wagner Center for Leadership in Public Service Main Members of Project Team Alex Wagner, Fisher College Christopher Bruce (consultant)
More informationChecking for Duplicates Wendi L. Wright
Checking for Duplicates Wendi L. Wright ABSTRACT This introductory level paper demonstrates a quick way to find duplicates in a dataset (with both simple and complex keys). It discusses what to do when
More informationUnsupervised Duplicate Detection (UDD) Of Query Results from Multiple Web Databases. A Thesis Presented to
Unsupervised Duplicate Detection (UDD) Of Query Results from Multiple Web Databases A Thesis Presented to The Faculty of the Computer Science Program California State University Channel Islands In (Partial)
More informationBarchard Introduction to SPSS Marks
Barchard Introduction to SPSS 22.0 3 Marks Purpose The purpose of this assignment is to introduce you to SPSS, the most commonly used statistical package in the social sciences. You will create a new data
More informationPROJECT 1 DATA ANALYSIS (KR-VS-KP)
PROJECT 1 DATA ANALYSIS (KR-VS-KP) Author: Tomáš Píhrt (xpiht00@vse.cz) Date: 12. 12. 2015 Contents 1 Introduction... 1 1.1 Data description... 1 1.2 Attributes... 2 1.3 Data pre-processing & preparation...
More informationChaining Logic in One Data Step Libing Shi, Ginny Rego Blue Cross Blue Shield of Massachusetts, Boston, MA
Chaining Logic in One Data Step Libing Shi, Ginny Rego Blue Cross Blue Shield of Massachusetts, Boston, MA ABSTRACT Event dates stored in multiple rows pose many challenges that have typically been resolved
More informationPatient Reported Outcome Measures (PROMs)
Patient Reported Outcome Measures (PROMs) Published September 2017 Copyright 2017 Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created
More informationWild Mushrooms Classification Edible or Poisonous
Wild Mushrooms Classification Edible or Poisonous Yulin Shen ECE 539 Project Report Professor: Hu Yu Hen 2013 Fall ( I allow code to be released in the public domain) pg. 1 Contents Introduction. 3 Practicality
More informationPulse LMS: User Management Guide Version: 1.86
Pulse LMS: User Management Guide Version: 1.86 This Guide focuses on the tools that support User Managers. Please consult our separate guides for processes for end users, learning management and administration
More informationVETtrak Data Insights User Guide. for VETtrak version
VETtrak Data Insights User Guide for VETtrak version 4.4.8.2 Contents Data Insights User Guide... 2 What are Data Insights?... 2 Why is it called Data Insights?... 2 Why did we create this new feature?...
More informationModel-Based Systems Engineering: Documentation and Analysis
Week 1: What Is MBSE? Project Name Jane Doe 1 Instructions Before you begin, you should save your Project Portfolio on your local drive. We recommend the following format: Lastname_Firstname_Course3_Week1
More informationAPPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.
255 APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software Introduction 255 Generating a QMF Export Procedure 255 Exporting Queries from QMF 257 Importing QMF Queries into Query and Reporting 257 Alternate
More informationStream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
More informationOpen Food Facts : The Wikipedia of food products
Open Food Facts : The Wikipedia of food products Fosdem February 2018 Bruxelles @OpenFoodFacts #FOSDEM2018 Hi, I m Anca! Web Developer, working in Open Source since 2007 (for XWiki) Got involved in the
More informationIPC Integrated Food Security Phase Classification. Lesson: IPC Quality Assurance
IPC Integrated Food Security Phase Classification Version 2.0 Lesson: Text-only version In partnership with: In this lesson LEARNING OBJECTIVES... 2 INTRODUCTION... 2 WHERE YOU ARE IN THE IPC PACKAGE...
More informationWord processing software
Unit 351 Word processing software UAN: Level: 3 Credit value: 6 GLH: 45 Y/502/4629 Assessment type: Portfolio of Evidence or assignment (7574 ITQ Users) Relationship to NOS: Assessment requirements specified
More informationMedtech32 National Enrolment Services
Medtech32 National Enrolment Services Search and Update NHI Medtech Global 48 Market Place, Viaduct Harbour, Auckland, New Zealand P: 0800 2 MEDTECH E: support@medtechglobal.com W: medtechglobal.com Entering
More informationAyonix Milestone Plugin v1.3 Operation Manual
Ayonix Milestone Plugin v1.3 Operation Manual Ayonix Corporation Jan. 23th, 2018 Contents Contents... 2 1. Versions of this document:... 3 2. Basic Setup... 3 2.1 Person Categories... 3 3. Enrollment...
More informationUTS Library s Guide to Finding Evidence-Based Practice Resources
UTS Library s Guide to Finding Evidence-Based Practice Resources UTS: Library UTS Library s Health librarians have made this step-by-step guide to finding evidence-based practice resources using the PICO
More informationData Linkage with an Establishment Survey
Data Linkage with an Establishment Survey Jennifer Sayers 1, Scott Campbell 2, Clinton Thompson 1, Geoff Jackson 1 1 Centers for Disease Control and Prevention, National Center for Health Statistics 2
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S0 SPSS Intro December 2014 Wilma Heemsbergen w.heemsbergen@nki.nl This Afternoon 13.00 ~ 15.00 SPSS lecture Short break Exercise 2 Database Example 3 Types of data Type
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART V Credibility: Evaluating what s been learned 10/25/2000 2 Evaluation: the key to success How
More information