Techniques for Large Scale Data Linking in SAS. By Damien John Melksham

Size: px
Start display at page:

Download "Techniques for Large Scale Data Linking in SAS. By Damien John Melksham"

Transcription

1 Techniques for Large Scale Data Linking in SAS By Damien John Melksham

2 What is Data Linking? Called everything imaginable: Data linking, record linkage, mergepurge, entity resolution, deduplication, fuzzy matching, etc What we mean: finding the same entity when no explicit identifier is able to do it sufficiently well(often because of data quality/real-world issues). Contrast with simple equality merges or joins. Data linking has applications in administration, health, science, tax, fraud, business, security At a bit of an intersection between statistics, computer science, math, algorithms (and maybe even philosophy).

3 Pretend Example Linking death registries to an administrative data set Such a technique might be used for insurance or health studies to allow investigations into previously hidden causes, data, and comorbidities. Death registrations tend to carry variables describing names, age, sex, geography, cause of death, etc. Please note: All details that follow are completely fictional. Any resemblance to persons living or dead (especially actual death registry entries), is purely coincidental.

4 Pretend Data d_fname d_sname d_dob d_mob d_yob d_sex d_state John Smith M NSW Jane Smoth F VIC Bob O'toole 34 M QLD a_fname a_sname a_dob a_mob a_yob a_sex a_state John Smith M NSW Jane F VIC Jeremiah Davies M VIC

5 Techniques Covered The General Philosophy of Data Linking SAS Views and Blocking Efficient Pre-Processing, Macros, and Doing Work Up-Front Custom PROC FCMP Functions: Distance and String Metrics for SAS Efficient ordering of statements to match your data I can t nearly cover everything, so I ve tried to focus on topics that might inspire you or which you can apply to other work as well, or which you can take away and use after this presentation (i.e. some free function implementations you can download).

6 Data Linking in a Nutshell Variables on two data sets provide evidence that records represent the same entity. Records on two data sets having the same value for the same variable provide evidence those records represent the same entity. Of course, there are other comparisons and relationships available apart from equality Different variables provide different amounts of evidence: i.e Sex vs First Name. It is more significant to find two Damien Melksham s than it is to find two males Total evidence combined from many variables help you find the same entities across multiple data sets. You might see terms such as deterministic/probabilistic data linking in literature for specific techniques, but language and deeper math aside, the basic ideas are pretty intuitive.

7 Pretend Example d_fname d_sname d_dob d_mob d_yob d_sex d_state John Smith M NSW a_fname a_sname a_dob a_mob a_yob a_sex a_state John Smith M NSW Total = = 9 points Looking like a match!

8 Pretend Example cont d_fname d_sname d_dob d_mob d_yob d_sex d_state John Smith M NSW a_fname a_sname a_dob a_mob a_yob a_sex a_state Jane Smoth F VIC Total = -9 points Not looking like a match!

9 SAS SQL and Data Step SQL and Data Step language are actually very well suited to data linking. SQL lets you bring records together on common variables via efficient natural/inner joins easily Data step lets you implement custom rules to customise a data linking exercise to your particular needs. SAS functions are brilliant for data cleaning and standardising before bringing records together for comparisons.

10 Computational Feasibility Data linking might not be so hard if it was computationally easy. It scales poorly 20,000,000 records compared to 20,000,00 records = 400,000,000,000,000 possible record combinations. Computers don t have enough memory or disk space to hold data that big. People don t have enough time to wait for CPU s to run calculations that big (they would probably appear in the death records before getting their answers).

11 So we cut down on comparisons Data linkers call this blocking Use a cheap method to limit record comparisons before partaking in more expensive comparisons In practice this is often an equality join between data sets on high quality variables because other ways are comparatively complicated and infeasible. (This is a relative strength of SQL) Due to errors on data sets, sometimes you have to use a few different block and comparison runs to avoid blocking on missing data, and to maximise total good comparisons. But be careful using SQL: Its hash joins just using equality with logical AND operations are efficient. Linking together several equality clauses and AND operations with logical OR operations is not (and might crash or hang your computer at the time of writing).

12 SAS Views and Why You Use Them For large data sets, you probably don t have the disk space to store all comparisons even from a restrictive equality join. But with a SAS View, you can work with data that doesn t have to be written to disk. Instead views supply record pairs one at a time for further processing, but they can be treated like a data set. A view can implement your blocking strategy, and a further data step (or SQL) can implement your more involved and expensive comparisons, which can then be output to disk if record comparisons result in points that cross a certain threshold. But remember the restrictions of views (don t sort, no random access)

13 SAS Views: Quick Syntax Examples DATA STEP VIEW: data work.block /view=work.block; run; PROC SQL VIEW; PROC SQL; CREATE VIEW work.block AS QUIT;

14 Drop lots, or keep very little There s a good chance that you don t need all variables on both data sets that you re joining for the comparison stage, so drop what you can as early as possible. Data set options wherever you reference a data set help you do this with KEEP and DROP statements. The two compliment each other well and can lead to efficient operations. Example1: work.deaths(keep= d_fname d_dob) Example2: work.deaths(drop=d_sex)

15 And while we re on data set options There s a good chance you ll want to use these data set options DROP KEEP RENAME WHERE I always had trouble remembering how these interacted with each other until someone told me these 4 are applied in alphabetical order Example: work.deaths(keep = d_fname d_dob rename = (d_fname = fname) where = (fname = John ))

16 Bring calculations forward using macro language if you can In data linking, much of your time is going to be spent waiting for the compiled code to run, so anything you can do to speed that up is usually beneficial. Use macros to bring appropriate calculations forward during macro pre-processing to write the most efficient run-time code possible. SAS is actually quite good at this, due to the historical focus on macro operations relative to functions.

17 Recode during data cleaning, not comparison Some functions are quite expensive and lend themselves more to cleaning data rather than being used than during comparisons. Phonetic algorithms are a good example. They map several names to similar encodings and deal with typos, but are expensive. Say 10 cpu cycles. I.e. James -> JMS, Jomes -> JMS Equality operation is cheap. Say 1 cpu cycle. If you have 10 records on two data sets that need comparing to each other, its cheaper to encode data then compare encodings via equality, rather than calculate encodings for each comparison and then compare via equality. Encode and then compare: (2 * 10) + (10 * 10 * 1) = 120 cpu cycles Encode during comparison: (2 * 10 * 10) + (10 * 10 * 1) = 320 cpu cycles

18 Efficient conditionals and code written for the data Lastly, if you know your data is not structured uniformly, you might be able to increase run-time efficiency by structuring conditional work in your code based on the probability of values, and relative to the amount of work each value entails. In English: If your data is 80% male, 10% female, and 5% missing, and if all three states require you to do the same amount of work, consider performing actions in the order of: if male, else if female, else.

19 SAS ships with several functions suited to data linking Soundex Spedis Complev Compged But there s a few more in the literature and I ve made my basic FCMP implementations available that you re free to use if so inclined.

20 PROC FCMP SAS Functions Caverphone 2.0 Double Metaphone Jaro/Winkler NYSIIS Chebyshev/Cityblock/Euclidean/Hamming/Minkowski Distances Fuzzy number matching Expected-unique persons You can obtain the PROC FCMP source code for these functions from: You will have to change the libraries in the PROC FCMP call and let SAS know where to find the custom functions. See PROC FCMP documentation.

21 Phonetic Algorithms Caverphone 2.0 Double Metaphone NYSIIS Soundex (SAS native function) These algorithms all try to map several names/words to simple codes. Each has its own strengths and weaknesses. Don t think they re perfect. They re also typically pretty expensive/slow in terms of CPU. I.e. d_fname_code = soundex(d_fname); a_fname_code = soundex(a_fname); Then compare d_fname_code and a_fname_code in the resulting joined data set with an equality operator, and assign points as appropriate.

22 Jaro/Winkler string similarity functions Designed to be relatively cheap, so can be used in comparisons Still far more expensive than equality or most numeric operations To be used for short strings like first names, surnames, proper names, street names. My free implementation only works up to 52 characters in length, which is not really a bad thing Reliability drops as strings get comparatively long or extremely short. Reports similarity between two strings as a number between 1 (representing equality) and 0 (representing absolute disagreement). You ll typically use these with conditions or cut-offs to assign points that deal with some fuzziness or typographical error: i.e. if jaro(d_fname, a_fname) >= 0.9 then name_point = 1; else name_point = -1;

23 There s a lot more as well You can go about as deep into data linking as you re prepared to There s a lot more techniques I haven t managed to go into today. I wrote my own system of data linking macros (ICARUS) to specialise in the topic and automate as many of these issues as possible Source is not currently publically available, but a copy of the documentation can be found at Documentations for the new functions mentioned in this presentation can be found in the ICARUS user manual at the above link.

24 Questions/Contacts/Further Information: Further questions and comments, please direct communications to: or

Overview of Record Linkage Techniques

Overview of Record Linkage Techniques Overview of Record Linkage Techniques Record linkage or data matching refers to the process used to identify records which relate to the same entity (e.g. patient, customer, household) in one or more data

More information

Section 0.3 The Order of Operations

Section 0.3 The Order of Operations Section 0.3 The Contents: Evaluating an Expression Grouping Symbols OPERATIONS The Distributive Property Answers Focus Exercises Let s be reminded of those operations seen thus far in the course: Operation

More information

Fuzzy Matching with SAS: Data Analysts Tool to Cleaner Data. Josh Fogarasi

Fuzzy Matching with SAS: Data Analysts Tool to Cleaner Data. Josh Fogarasi Fuzzy Matching with SAS: Data Analysts Tool to Cleaner Data Josh Fogarasi Agenda What is Fuzzy Matching Anyways? Why is it relevant to a Data Professional? Introducing some useful SAS Text Functions Fuzzy

More information

Control Structures. Code can be purely arithmetic assignments. At some point we will need some kind of control or decision making process to occur

Control Structures. Code can be purely arithmetic assignments. At some point we will need some kind of control or decision making process to occur Control Structures Code can be purely arithmetic assignments At some point we will need some kind of control or decision making process to occur C uses the if keyword as part of it s control structure

More information

Record Linkage. with SAS and Link King. Dinu Corbu. Queensland Health Health Statistics Centre Integration and Linkage Unit

Record Linkage. with SAS and Link King. Dinu Corbu. Queensland Health Health Statistics Centre Integration and Linkage Unit Record Linkage with SAS and Link King Dinu Corbu Queensland Health Health Statistics Centre Integration and Linkage Unit Presented at Queensland Users Exploring SAS Technology QUEST 4 June 2009 Basics

More information

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC Paper CC-05 Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC ABSTRACT For many SAS users, learning SQL syntax appears to be a significant effort with a low

More information

TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA

TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA 1 TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA Notes adapted from Introduction to Computing and Programming with Java: A Multimedia Approach by M. Guzdial and B. Ericson, and instructor materials prepared

More information

5. Technology Applications

5. Technology Applications 5. Technology Applications 5.1 What is a Database? 5.2 Types of Databases 5.3 Choosing the Right Database 5.4 Database Programming Tools 5.5 How to Search Your Database 5.6 Data Warehousing and Mining

More information

Cache Coherence Tutorial

Cache Coherence Tutorial Cache Coherence Tutorial The cache coherence protocol described in the book is not really all that difficult and yet a lot of people seem to have troubles when it comes to using it or answering an assignment

More information

Reminder: Mechanics of address translation. Paged virtual memory. Reminder: Page Table Entries (PTEs) Demand paging. Page faults

Reminder: Mechanics of address translation. Paged virtual memory. Reminder: Page Table Entries (PTEs) Demand paging. Page faults CSE 451: Operating Systems Autumn 2012 Module 12 Virtual Memory, Page Faults, Demand Paging, and Page Replacement Reminder: Mechanics of address translation virtual address virtual # offset table frame

More information

1: Introduction to Object (1)

1: Introduction to Object (1) 1: Introduction to Object (1) 김동원 2003.01.20 Overview (1) The progress of abstraction Smalltalk Class & Object Interface The hidden implementation Reusing the implementation Inheritance: Reusing the interface

More information

We re going to start with two.csv files that need to be imported to SQL Lite housing2000.csv and housing2013.csv

We re going to start with two.csv files that need to be imported to SQL Lite housing2000.csv and housing2013.csv Basic SQL joining exercise using SQL Lite Using Census data on housing units, by place Created by @MaryJoWebster January 2017 The goal of this exercise is to introduce how joining tables works in SQL.

More information

The Ins and Outs of %IF

The Ins and Outs of %IF Paper 1135-2017 The Ins and Outs of %IF M. Michelle Buchecker, ThotWave Technologies, LLC. ABSTRACT Have you ever had your macro code not work and you couldn't figure out why? Even something as simple

More information

Lesson 9 Transcript: Backup and Recovery

Lesson 9 Transcript: Backup and Recovery Lesson 9 Transcript: Backup and Recovery Slide 1: Cover Welcome to lesson 9 of the DB2 on Campus Lecture Series. We are going to talk in this presentation about database logging and backup and recovery.

More information

How to approach a computational problem

How to approach a computational problem How to approach a computational problem A lot of people find computer programming difficult, especially when they first get started with it. Sometimes the problems are problems specifically related to

More information

Lesson 1. Introduction to Programming OBJECTIVES

Lesson 1. Introduction to Programming OBJECTIVES Introduction to Programming If you re new to programming, you might be intimidated by code and flowcharts. You might even wonder how you ll ever understand them. This lesson offers some basic ideas and

More information

Laboratory 1: Eclipse and Karel the Robot

Laboratory 1: Eclipse and Karel the Robot Math 121: Introduction to Computing Handout #2 Laboratory 1: Eclipse and Karel the Robot Your first laboratory task is to use the Eclipse IDE framework ( integrated development environment, and the d also

More information

Lecture 12. Lecture 12: The IO Model & External Sorting

Lecture 12. Lecture 12: The IO Model & External Sorting Lecture 12 Lecture 12: The IO Model & External Sorting Announcements Announcements 1. Thank you for the great feedback (post coming soon)! 2. Educational goals: 1. Tech changes, principles change more

More information

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas Paper 103-26 50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas ABSTRACT When you need to join together two datasets, how do

More information

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming Intro to Programming Unit 7 Intro to Programming 1 What is Programming? 1. Programming Languages 2. Markup vs. Programming 1. Introduction 2. Print Statement 3. Strings 4. Types and Values 5. Math Externals

More information

Mr G s Java Jive. #11: Formatting Numbers

Mr G s Java Jive. #11: Formatting Numbers Mr G s Java Jive #11: Formatting Numbers Now that we ve started using double values, we re bound to run into the question of just how many decimal places we want to show. This where we get to deal with

More information

CIS 45, The Introduction. What is a database? What is data? What is information?

CIS 45, The Introduction. What is a database? What is data? What is information? CIS 45, The Introduction I have traveled the length and breadth of this country and talked with the best people, and I can assure you that data processing is a fad that won t last out the year. The editor

More information

Remembering the Past. Who Needs Documentation?

Remembering the Past. Who Needs Documentation? Remembering the Past Using SAS Keyboard Macros to Enhance Documentation Pete Lund Looking Glass Analytics Olympia, WA Who Needs Documentation? How many times have you looked at line after line of code

More information

COMP 215: INTRO TO PROGRAM DESIGN. Prof. Chris Jermaine Chris Prof. Chris Dr. Chris

COMP 215: INTRO TO PROGRAM DESIGN. Prof. Chris Jermaine Chris Prof. Chris Dr. Chris COMP 215: INTRO TO PROGRAM DESIGN Prof. Chris Jermaine cmj4@cs.rice.edu Chris Prof. Chris Dr. Chris 1 This Class 50% of content: modern programming and program design The Java programming language will

More information

QUICK EXCEL TUTORIAL. The Very Basics

QUICK EXCEL TUTORIAL. The Very Basics QUICK EXCEL TUTORIAL The Very Basics You Are Here. Titles & Column Headers Merging Cells Text Alignment When we work on spread sheets we often need to have a title and/or header clearly visible. Merge

More information

ArcMap Online Tutorial Sarah Pierce How to map in ArcMap Online using the Fresh Prince of Bel Air as an example

ArcMap Online Tutorial Sarah Pierce How to map in ArcMap Online using the Fresh Prince of Bel Air as an example Fall GARP ArcMap Online Tutorial Sarah Pierce How to map in ArcMap Online using the Fresh Prince of Bel Air as an example Westfield State University Let s say you ve never used ArcGIS before and your professor

More information

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Table of Contents Introduction!... 1 Part 1: Entering Data!... 2 1.a: Typing!... 2 1.b: Editing

More information

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University While your data tables or spreadsheets may look good to

More information

Table of Contents. 1. Cover Page 2. Quote 3. Calculated Fields 4. Show Values As 5. Multiple Data Values 6. Enroll Today!

Table of Contents. 1. Cover Page 2. Quote 3. Calculated Fields 4. Show Values As 5. Multiple Data Values 6. Enroll Today! Table of Contents 1. Cover Page 2. Quote 3. Calculated Fields 4. Show Values As 5. Multiple Data Values 6. Enroll Today! "It is Kind Of fun to do the IMPOSSIBLE" Walt Disney Calculated Fields The purpose

More information

** Pre-Sell Page Secrets **

** Pre-Sell Page Secrets ** ** Pre-Sell Page Secrets ** Page 1 - CommissionBlueprint.com 2008 Introduction Using a pre-sell page is a highly effective tactic that can be used in almost any market to motivate a visitor into purchasing

More information

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi. Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data

More information

COMP : Practical 8 ActionScript II: The If statement and Variables

COMP : Practical 8 ActionScript II: The If statement and Variables COMP126-2006: Practical 8 ActionScript II: The If statement and Variables The goal of this practical is to introduce the ActionScript if statement and variables. If statements allow us to write scripts

More information

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change Chapter01.fm Page 1 Monday, August 23, 2004 1:52 PM Part I The Mechanics of Change The Mechanics of Change Chapter01.fm Page 2 Monday, August 23, 2004 1:52 PM Chapter01.fm Page 3 Monday, August 23, 2004

More information

Interaction Desig n for the Real World A new Selection Tool for GIMP Documentation. Tutor: Peter Sikking Students: Steffen Fridgen & Sarah Montti

Interaction Desig n for the Real World A new Selection Tool for GIMP Documentation. Tutor: Peter Sikking Students: Steffen Fridgen & Sarah Montti Interaction Desig n for the Real World A new Selection Tool for GIMP Documentation Tutor: Peter Sikking Students: Steffen Fridgen & Sarah Montti Contens 1. Our challenge 2. The current selection tool 3.

More information

Working with Data in Windows and Descriptive Statistics

Working with Data in Windows and Descriptive Statistics Working with Data in Windows and Descriptive Statistics HRP223 Topic 2 October 3 rd, 2012 Copyright 1999-2012 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected

More information

Data Manipulation with SQL Mara Werner, HHS/OIG, Chicago, IL

Data Manipulation with SQL Mara Werner, HHS/OIG, Chicago, IL Paper TS05-2011 Data Manipulation with SQL Mara Werner, HHS/OIG, Chicago, IL Abstract SQL was developed to pull together information from several different data tables - use this to your advantage as you

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

12 Key Steps to Successful Marketing

12 Key Steps to Successful  Marketing 12 Key Steps to Successful Email Marketing Contents Introduction 3 Set Objectives 4 Have a plan, but be flexible 4 Build a good database 5 Should I buy data? 5 Personalise 6 Nail your subject line 6 Use

More information

Welcome Back! Without further delay, let s get started! First Things First. If you haven t done it already, download Turbo Lister from ebay.

Welcome Back! Without further delay, let s get started! First Things First. If you haven t done it already, download Turbo Lister from ebay. Welcome Back! Now that we ve covered the basics on how to use templates and how to customise them, it s time to learn some more advanced techniques that will help you create outstanding ebay listings!

More information

Text Input and Conditionals

Text Input and Conditionals Text Input and Conditionals Text Input Many programs allow the user to enter information, like a username and password. Python makes taking input from the user seamless with a single line of code: input()

More information

Using Photoshop Actions in Batches

Using Photoshop Actions in Batches Using Photoshop Actions in Batches Overview Suppose you have a Photoshop action called Do Something that does something useful to a file that you have open in Photoshop. Record an action (say) Do Something

More information

Visual Design. Simplicity, Gestalt Principles, Organization/Structure

Visual Design. Simplicity, Gestalt Principles, Organization/Structure Visual Design Simplicity, Gestalt Principles, Organization/Structure Many examples are from Universal Principles of Design, Lidwell, Holden, and Butler Why discuss visual design? You need to present the

More information

Why Hash? Glen Becker, USAA

Why Hash? Glen Becker, USAA Why Hash? Glen Becker, USAA Abstract: What can I do with the new Hash object in SAS 9? Instead of focusing on How to use this new technology, this paper answers Why would I want to? It presents the Big

More information

It s Proc Tabulate Jim, but not as we know it!

It s Proc Tabulate Jim, but not as we know it! Paper SS02 It s Proc Tabulate Jim, but not as we know it! Robert Walls, PPD, Bellshill, UK ABSTRACT PROC TABULATE has received a very bad press in the last few years. Most SAS Users have come to look on

More information

Certificate-based authentication for data security

Certificate-based authentication for data security Technical white paper Certificate-based authentication for data security Table of Contents Introduction... 2 Analogy: A simple checking account... 2 Verifying a digital certificate... 2 Summary... 8 Important

More information

1.7 Limit of a Function

1.7 Limit of a Function 1.7 Limit of a Function We will discuss the following in this section: 1. Limit Notation 2. Finding a it numerically 3. Right and Left Hand Limits 4. Infinite Limits Consider the following graph Notation:

More information

Part I: Programming Access Applications. Chapter 1: Overview of Programming for Access. Chapter 2: Extending Applications Using the Windows API

Part I: Programming Access Applications. Chapter 1: Overview of Programming for Access. Chapter 2: Extending Applications Using the Windows API 74029c01.qxd:WroxPro 9/27/07 1:43 PM Page 1 Part I: Programming Access Applications Chapter 1: Overview of Programming for Access Chapter 2: Extending Applications Using the Windows API Chapter 3: Programming

More information

Fully Optimize FULLY OPTIMIZE YOUR DBA RESOURCES

Fully Optimize FULLY OPTIMIZE YOUR DBA RESOURCES Fully Optimize FULLY OPTIMIZE YOUR DBA RESOURCES IMPROVE SERVER PERFORMANCE, UPTIME, AND AVAILABILITY WHILE LOWERING COSTS WE LL COVER THESE TOP WAYS TO OPTIMIZE YOUR RESOURCES: 1 Be Smart About Your Wait

More information

In this white paper we want to look at seven basic principles that can help make your website be the best it can be.

In this white paper we want to look at seven basic principles that can help make your website be the best it can be. websites matter A good website is important now more than ever. More and more people first visit your church through your website -- not through your front doors. Visitors will likely evaluate six to 12

More information

Introduction to Programming Style

Introduction to Programming Style Introduction to Programming Style Thaddeus Aid The IT Learning Programme The University of Oxford, UK 30 July, 2013 Abstract Programming style is the part of the program that the human reads and the compiler

More information

Let me begin by introducing myself. I have been a Progress Application Partner since 1986 and for many years I was the architect and chief developer

Let me begin by introducing myself. I have been a Progress Application Partner since 1986 and for many years I was the architect and chief developer Let me begin by introducing myself. I have been a Progress Application Partner since 1986 and for many years I was the architect and chief developer for our ERP application. In recent years, I have refocused

More information

Reasoning About Programs Panagiotis Manolios

Reasoning About Programs Panagiotis Manolios Reasoning About Programs Panagiotis Manolios Northeastern University February 26, 2017 Version: 100 Copyright c 2017 by Panagiotis Manolios All rights reserved. We hereby grant permission for this publication

More information

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms Faculty of Mathematics Waterloo, Ontario N2L 3G1 Centre for Education in Mathematics and Computing Relations Let s talk about relations! Grade 6 Math Circles November 6 & 7 2018 Relations, Functions, and

More information

Operating System Principles: Memory Management Swapping, Paging, and Virtual Memory CS 111. Operating Systems Peter Reiher

Operating System Principles: Memory Management Swapping, Paging, and Virtual Memory CS 111. Operating Systems Peter Reiher Operating System Principles: Memory Management Swapping, Paging, and Virtual Memory Operating Systems Peter Reiher Page 1 Outline Swapping Paging Virtual memory Page 2 Swapping What if we don t have enough

More information

Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist. Copyright 2008, SAS Institute Inc. All rights reserved.

Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist. Copyright 2008, SAS Institute Inc. All rights reserved. Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist Purpose The purpose of this presentation is not to teach you how to program Hash Objects That s a two hour topic in PRG3. The purpose

More information

Getting Started with Amicus Document Assembly

Getting Started with Amicus Document Assembly Getting Started with Amicus Document Assembly How great would it be to automatically create legal documents with just a few mouse clicks? We re going to show you how to do exactly that and how to get started

More information

The compiler is spewing error messages.

The compiler is spewing error messages. Appendix B Debugging There are a few different kinds of errors that can occur in a program, and it is useful to distinguish between them in order to track them down more quickly. Compile-time errors are

More information

Beginning a New Project

Beginning a New Project 3 Beginning a New Project Introducing Projects 000 Creating and Naming a Project 000 Importing Assets 000 Importing Photoshop Documents 000 Importing Illustrator Documents 000 Importing QuickTime Movies

More information

All Paging Schemes Depend on Locality. VM Page Replacement. Paging. Demand Paging

All Paging Schemes Depend on Locality. VM Page Replacement. Paging. Demand Paging 3/14/2001 1 All Paging Schemes Depend on Locality VM Page Replacement Emin Gun Sirer Processes tend to reference pages in localized patterns Temporal locality» locations referenced recently likely to be

More information

LABORATORY. 16 Databases OBJECTIVE REFERENCES. Write simple SQL queries using the Simple SQL app.

LABORATORY. 16 Databases OBJECTIVE REFERENCES. Write simple SQL queries using the Simple SQL app. Dmitriy Shironosov/ShutterStock, Inc. Databases 171 LABORATORY 16 Databases OBJECTIVE Write simple SQL queries using the Simple SQL app. REFERENCES Software needed: 1) Simple SQL app from the Lab Manual

More information

Layout Assistant Help

Layout Assistant Help Layout Assistant Help The intent of this tool is to allow one to group controls on a form and move groups out of the way during the design phase, and then easily return them to their original positions

More information

Introduction to Software Testing Chapter 3, Sec# 1 & 2 Logic Coverage

Introduction to Software Testing Chapter 3, Sec# 1 & 2 Logic Coverage Introduction to Software Testing Chapter 3, Sec# 1 & 2 Logic Coverage Paul Ammann & Jeff Offutt http://www.cs.gmu.edu/~offutt/soft waretest/ Ch. 3 : Logic Coverage Four Structures for Modeling Software

More information

Guidelines for Coding of SAS Programs Thomas J. Winn, Jr. Texas State Auditor s Office

Guidelines for Coding of SAS Programs Thomas J. Winn, Jr. Texas State Auditor s Office Guidelines for Coding of SAS Programs Thomas J. Winn, Jr. Texas State Auditor s Office Abstract This paper presents a set of proposed guidelines that could be used for writing SAS code that is clear, efficient,

More information

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 Prof. John Park Based on slides from previous iterations of this course Today s Topics Overview Uses and motivations of hash tables Major concerns with hash

More information

Making ERAS work for you

Making ERAS work for you Making ERAS work for you (Beyond the basics) If you ve used ERAS for your application season, you ve probably mastered the basics; collecting mail from ERAS Post Office, checking boxes in the status listing,

More information

UV Mapping to avoid texture flaws and enable proper shading

UV Mapping to avoid texture flaws and enable proper shading UV Mapping to avoid texture flaws and enable proper shading Foreword: Throughout this tutorial I am going to be using Maya s built in UV Mapping utility, which I am going to base my projections on individual

More information

How Your First Program Works

How Your First Program Works How Your First Program Works Section 2: How Your First Program Works How Programs Are Structured...19 Method Main ( )...21 How Programs Are Structured In Section 1, you typed in and ran your first program

More information

Data organization. So what kind of data did we collect?

Data organization. So what kind of data did we collect? Data organization Suppose we go out and collect some data. What do we do with it? First we need to figure out what kind of data we have. To illustrate, let s do a simple experiment and collect the height

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL

New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL Paper SS-03 New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL ABSTRACT There s SuperCE for comparing text files on the mainframe. Diff

More information

Hot Fuzz: using SAS fuzzy data matching techniques to identify duplicate and related customer records. Stuart Edwards, Collection House Group

Hot Fuzz: using SAS fuzzy data matching techniques to identify duplicate and related customer records. Stuart Edwards, Collection House Group Hot Fuzz: using SAS fuzzy data matching techniques to identify duplicate and related customer records. Stuart Edwards, Collection House Group Collection House Group; what do we do? Debt Collection; purchased

More information

Native POSIX Thread Library (NPTL) CSE 506 Don Porter

Native POSIX Thread Library (NPTL) CSE 506 Don Porter Native POSIX Thread Library (NPTL) CSE 506 Don Porter Logical Diagram Binary Memory Threads Formats Allocators Today s Lecture Scheduling System Calls threads RCU File System Networking Sync User Kernel

More information

The Benefits of SMS as a Marketing and Communications Channel From The Chat Bubble written by Michael

The Benefits of SMS as a Marketing and Communications Channel From The Chat Bubble written by Michael The Benefits of SMS as a Marketing and Communications Channel 1 Why companies and organizations should do SMS. We re going to talk through from an organization or marketers point of view, what SMS is good

More information

practical Perl tools: let me help you get regular

practical Perl tools: let me help you get regular David N. Blank-Edelman practical Perl tools: let me help you get regular David N. Blank-Edelman is the director of technology at the Northeastern University College of Computer and Information Science

More information

In the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the

In the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the In the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the relationships between concepts. And we discussed common

More information

Chamberlin and Boyce - SEQUEL: A Structured English Query Language

Chamberlin and Boyce - SEQUEL: A Structured English Query Language Programming Languages (CS302 2007S) Chamberlin and Boyce - SEQUEL: A Structured English Query Language Comments on: Chamberlin, D. D. and Boyce, R. F. (1974). SEQUEL: A Structured English Query Language.

More information

Multi-Level Feedback Queues

Multi-Level Feedback Queues CS 326: Operating Systems Multi-Level Feedback Queues Lecture 8 Today s Schedule Building an Ideal Scheduler Priority-Based Scheduling Multi-Level Queues Multi-Level Feedback Queues Scheduling Domains

More information

4.1 Review - the DPLL procedure

4.1 Review - the DPLL procedure Applied Logic Lecture 4: Efficient SAT solving CS 4860 Spring 2009 Thursday, January 29, 2009 The main purpose of these notes is to help me organize the material that I used to teach today s lecture. They

More information

Google Drive: Access and organize your files

Google Drive: Access and organize your files Google Drive: Access and organize your files Use Google Drive to store and access your files, folders, and Google Docs anywhere. Change a file on the web, your computer, or your mobile device, and it updates

More information

Background. Let s see what we prescribed.

Background. Let s see what we prescribed. Background Patient B s custom application had slowed down as their data grew. They d tried several different relief efforts over time, but performance issues kept popping up especially deadlocks. They

More information

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA ABSTRACT This paper outlines different SAS merging techniques

More information

Git. all meaningful operations can be expressed in terms of the rebase command. -Linus Torvalds, 2015

Git. all meaningful operations can be expressed in terms of the rebase command. -Linus Torvalds, 2015 Git all meaningful operations can be expressed in terms of the rebase command -Linus Torvalds, 2015 a talk by alum Ross Schlaikjer for the GNU/Linux Users Group Sound familiar? add commit diff init clone

More information

SQL Server Whitepaper

SQL Server Whitepaper SQL Server Whitepaper INITIALIZING REPLICATION FROM BACKUP BY KENNETH FISHER and ROBERT L DAVIS Applies to: SQL Server 2008, SQL Server 2008 R2, SQL Server 2012, SQL Server 2014, SQL Server 2016 SUMMARY

More information

Embedded Linux Day 2

Embedded Linux Day 2 Embedded Linux Day 2 Stuffs HW1 posted today Shooting for 1-2 hours. Review scheduling stuff & licensing. HW0 in lab Sign up for group meetings for next Thursday posted today. Review I got a number of

More information

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT) EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT) DESCRIPTION: This example shows how to combine the data on respondents from the first two waves of Understanding Society into

More information

VoIP INTERNET-BASED PHONE SYSTEMS CHOCK FULL OF FEATURES

VoIP INTERNET-BASED PHONE SYSTEMS CHOCK FULL OF FEATURES VoIP INTERNET-BASED PHONE SYSTEMS CHOCK FULL OF FEATURES VoIP Internet-based phone systems chock full of features TABLE OF CONTENTS What is VoIP? Switching to VoIP is easy Business Telecom Features Improved

More information

CS61A Notes Week 6: Scheme1, Data Directed Programming You Are Scheme and don t let anyone tell you otherwise

CS61A Notes Week 6: Scheme1, Data Directed Programming You Are Scheme and don t let anyone tell you otherwise CS61A Notes Week 6: Scheme1, Data Directed Programming You Are Scheme and don t let anyone tell you otherwise If you re not already crazy about Scheme (and I m sure you are), then here s something to get

More information

CTI-TC Weekly Working Sessions

CTI-TC Weekly Working Sessions CTI-TC Weekly Working Sessions Meeting Date: October 4, 2016 Time: 15:00:00 UTC Purpose: Weekly CTI-TC Joint Working Session Attendees: Agenda: Jordan Trey Darley Wunder Ivan Kirillov Stephen Banghart

More information

Spectroscopic Analysis: Peak Detector

Spectroscopic Analysis: Peak Detector Electronics and Instrumentation Laboratory Sacramento State Physics Department Spectroscopic Analysis: Peak Detector Purpose: The purpose of this experiment is a common sort of experiment in spectroscopy.

More information

Testing is a very big and important topic when it comes to software development. Testing has a number of aspects that need to be considered.

Testing is a very big and important topic when it comes to software development. Testing has a number of aspects that need to be considered. Testing Testing is a very big and important topic when it comes to software development. Testing has a number of aspects that need to be considered. System stability is the system going to crash or not?

More information

Surfing the SAS cache

Surfing the SAS cache Surfing the SAS cache to improve optimisation Michael Thompson Department of Employment / Quantam Solutions Background Did first basic SAS course in 1989 Didn t get it at all Actively avoided SAS programing

More information

Virtual Memory. ICS332 Operating Systems

Virtual Memory. ICS332 Operating Systems Virtual Memory ICS332 Operating Systems Virtual Memory Allow a process to execute while not completely in memory Part of the address space is kept on disk So far, we have assumed that the full address

More information

XP: Backup Your Important Files for Safety

XP: Backup Your Important Files for Safety XP: Backup Your Important Files for Safety X 380 / 1 Protect Your Personal Files Against Accidental Loss with XP s Backup Wizard Your computer contains a great many important files, but when it comes to

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

dtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Keith Kranker

dtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Keith Kranker dtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Presentation at the 2018 Stata Conference Columbus, Ohio July 20, 2018 Keith Kranker Abstract Stata users

More information

CSC209. Software Tools and Systems Programming. https://mcs.utm.utoronto.ca/~209

CSC209. Software Tools and Systems Programming. https://mcs.utm.utoronto.ca/~209 CSC209 Software Tools and Systems Programming https://mcs.utm.utoronto.ca/~209 What is this Course About? Software Tools Using them Building them Systems Programming Quirks of C The file system System

More information

COSC 2P91. Introduction Part Deux. Week 1b. Brock University. Brock University (Week 1b) Introduction Part Deux 1 / 14

COSC 2P91. Introduction Part Deux. Week 1b. Brock University. Brock University (Week 1b) Introduction Part Deux 1 / 14 COSC 2P91 Introduction Part Deux Week 1b Brock University Brock University (Week 1b) Introduction Part Deux 1 / 14 Source Files Like most other compiled languages, we ll be dealing with a few different

More information

Read & Download (PDF Kindle) Data Structures And Other Objects Using Java (4th Edition)

Read & Download (PDF Kindle) Data Structures And Other Objects Using Java (4th Edition) Read & Download (PDF Kindle) Data Structures And Other Objects Using Java (4th Edition) Data Structures and Other Objects Using Java is a gradual, "just-in-time" introduction to Data Structures for a CS2

More information

ML from Large Datasets

ML from Large Datasets 10-605 ML from Large Datasets 1 Announcements HW1b is going out today You should now be on autolab have a an account on stoat a locally-administered Hadoop cluster shortly receive a coupon for Amazon Web

More information

Mount Points Mount Points is a super simple tool for connecting objects together and managing those relationships.

Mount Points Mount Points is a super simple tool for connecting objects together and managing those relationships. Mount Points Mount Points is a super simple tool for connecting objects together and managing those relationships. With Mount Points, you can simply drag two objects together and when their mount points

More information