Techniques for Large Scale Data Linking in SAS. By Damien John Melksham
|
|
- Arleen Johnston
- 5 years ago
- Views:
Transcription
1 Techniques for Large Scale Data Linking in SAS By Damien John Melksham
2 What is Data Linking? Called everything imaginable: Data linking, record linkage, mergepurge, entity resolution, deduplication, fuzzy matching, etc What we mean: finding the same entity when no explicit identifier is able to do it sufficiently well(often because of data quality/real-world issues). Contrast with simple equality merges or joins. Data linking has applications in administration, health, science, tax, fraud, business, security At a bit of an intersection between statistics, computer science, math, algorithms (and maybe even philosophy).
3 Pretend Example Linking death registries to an administrative data set Such a technique might be used for insurance or health studies to allow investigations into previously hidden causes, data, and comorbidities. Death registrations tend to carry variables describing names, age, sex, geography, cause of death, etc. Please note: All details that follow are completely fictional. Any resemblance to persons living or dead (especially actual death registry entries), is purely coincidental.
4 Pretend Data d_fname d_sname d_dob d_mob d_yob d_sex d_state John Smith M NSW Jane Smoth F VIC Bob O'toole 34 M QLD a_fname a_sname a_dob a_mob a_yob a_sex a_state John Smith M NSW Jane F VIC Jeremiah Davies M VIC
5 Techniques Covered The General Philosophy of Data Linking SAS Views and Blocking Efficient Pre-Processing, Macros, and Doing Work Up-Front Custom PROC FCMP Functions: Distance and String Metrics for SAS Efficient ordering of statements to match your data I can t nearly cover everything, so I ve tried to focus on topics that might inspire you or which you can apply to other work as well, or which you can take away and use after this presentation (i.e. some free function implementations you can download).
6 Data Linking in a Nutshell Variables on two data sets provide evidence that records represent the same entity. Records on two data sets having the same value for the same variable provide evidence those records represent the same entity. Of course, there are other comparisons and relationships available apart from equality Different variables provide different amounts of evidence: i.e Sex vs First Name. It is more significant to find two Damien Melksham s than it is to find two males Total evidence combined from many variables help you find the same entities across multiple data sets. You might see terms such as deterministic/probabilistic data linking in literature for specific techniques, but language and deeper math aside, the basic ideas are pretty intuitive.
7 Pretend Example d_fname d_sname d_dob d_mob d_yob d_sex d_state John Smith M NSW a_fname a_sname a_dob a_mob a_yob a_sex a_state John Smith M NSW Total = = 9 points Looking like a match!
8 Pretend Example cont d_fname d_sname d_dob d_mob d_yob d_sex d_state John Smith M NSW a_fname a_sname a_dob a_mob a_yob a_sex a_state Jane Smoth F VIC Total = -9 points Not looking like a match!
9 SAS SQL and Data Step SQL and Data Step language are actually very well suited to data linking. SQL lets you bring records together on common variables via efficient natural/inner joins easily Data step lets you implement custom rules to customise a data linking exercise to your particular needs. SAS functions are brilliant for data cleaning and standardising before bringing records together for comparisons.
10 Computational Feasibility Data linking might not be so hard if it was computationally easy. It scales poorly 20,000,000 records compared to 20,000,00 records = 400,000,000,000,000 possible record combinations. Computers don t have enough memory or disk space to hold data that big. People don t have enough time to wait for CPU s to run calculations that big (they would probably appear in the death records before getting their answers).
11 So we cut down on comparisons Data linkers call this blocking Use a cheap method to limit record comparisons before partaking in more expensive comparisons In practice this is often an equality join between data sets on high quality variables because other ways are comparatively complicated and infeasible. (This is a relative strength of SQL) Due to errors on data sets, sometimes you have to use a few different block and comparison runs to avoid blocking on missing data, and to maximise total good comparisons. But be careful using SQL: Its hash joins just using equality with logical AND operations are efficient. Linking together several equality clauses and AND operations with logical OR operations is not (and might crash or hang your computer at the time of writing).
12 SAS Views and Why You Use Them For large data sets, you probably don t have the disk space to store all comparisons even from a restrictive equality join. But with a SAS View, you can work with data that doesn t have to be written to disk. Instead views supply record pairs one at a time for further processing, but they can be treated like a data set. A view can implement your blocking strategy, and a further data step (or SQL) can implement your more involved and expensive comparisons, which can then be output to disk if record comparisons result in points that cross a certain threshold. But remember the restrictions of views (don t sort, no random access)
13 SAS Views: Quick Syntax Examples DATA STEP VIEW: data work.block /view=work.block; run; PROC SQL VIEW; PROC SQL; CREATE VIEW work.block AS QUIT;
14 Drop lots, or keep very little There s a good chance that you don t need all variables on both data sets that you re joining for the comparison stage, so drop what you can as early as possible. Data set options wherever you reference a data set help you do this with KEEP and DROP statements. The two compliment each other well and can lead to efficient operations. Example1: work.deaths(keep= d_fname d_dob) Example2: work.deaths(drop=d_sex)
15 And while we re on data set options There s a good chance you ll want to use these data set options DROP KEEP RENAME WHERE I always had trouble remembering how these interacted with each other until someone told me these 4 are applied in alphabetical order Example: work.deaths(keep = d_fname d_dob rename = (d_fname = fname) where = (fname = John ))
16 Bring calculations forward using macro language if you can In data linking, much of your time is going to be spent waiting for the compiled code to run, so anything you can do to speed that up is usually beneficial. Use macros to bring appropriate calculations forward during macro pre-processing to write the most efficient run-time code possible. SAS is actually quite good at this, due to the historical focus on macro operations relative to functions.
17 Recode during data cleaning, not comparison Some functions are quite expensive and lend themselves more to cleaning data rather than being used than during comparisons. Phonetic algorithms are a good example. They map several names to similar encodings and deal with typos, but are expensive. Say 10 cpu cycles. I.e. James -> JMS, Jomes -> JMS Equality operation is cheap. Say 1 cpu cycle. If you have 10 records on two data sets that need comparing to each other, its cheaper to encode data then compare encodings via equality, rather than calculate encodings for each comparison and then compare via equality. Encode and then compare: (2 * 10) + (10 * 10 * 1) = 120 cpu cycles Encode during comparison: (2 * 10 * 10) + (10 * 10 * 1) = 320 cpu cycles
18 Efficient conditionals and code written for the data Lastly, if you know your data is not structured uniformly, you might be able to increase run-time efficiency by structuring conditional work in your code based on the probability of values, and relative to the amount of work each value entails. In English: If your data is 80% male, 10% female, and 5% missing, and if all three states require you to do the same amount of work, consider performing actions in the order of: if male, else if female, else.
19 SAS ships with several functions suited to data linking Soundex Spedis Complev Compged But there s a few more in the literature and I ve made my basic FCMP implementations available that you re free to use if so inclined.
20 PROC FCMP SAS Functions Caverphone 2.0 Double Metaphone Jaro/Winkler NYSIIS Chebyshev/Cityblock/Euclidean/Hamming/Minkowski Distances Fuzzy number matching Expected-unique persons You can obtain the PROC FCMP source code for these functions from: You will have to change the libraries in the PROC FCMP call and let SAS know where to find the custom functions. See PROC FCMP documentation.
21 Phonetic Algorithms Caverphone 2.0 Double Metaphone NYSIIS Soundex (SAS native function) These algorithms all try to map several names/words to simple codes. Each has its own strengths and weaknesses. Don t think they re perfect. They re also typically pretty expensive/slow in terms of CPU. I.e. d_fname_code = soundex(d_fname); a_fname_code = soundex(a_fname); Then compare d_fname_code and a_fname_code in the resulting joined data set with an equality operator, and assign points as appropriate.
22 Jaro/Winkler string similarity functions Designed to be relatively cheap, so can be used in comparisons Still far more expensive than equality or most numeric operations To be used for short strings like first names, surnames, proper names, street names. My free implementation only works up to 52 characters in length, which is not really a bad thing Reliability drops as strings get comparatively long or extremely short. Reports similarity between two strings as a number between 1 (representing equality) and 0 (representing absolute disagreement). You ll typically use these with conditions or cut-offs to assign points that deal with some fuzziness or typographical error: i.e. if jaro(d_fname, a_fname) >= 0.9 then name_point = 1; else name_point = -1;
23 There s a lot more as well You can go about as deep into data linking as you re prepared to There s a lot more techniques I haven t managed to go into today. I wrote my own system of data linking macros (ICARUS) to specialise in the topic and automate as many of these issues as possible Source is not currently publically available, but a copy of the documentation can be found at Documentations for the new functions mentioned in this presentation can be found in the ICARUS user manual at the above link.
24 Questions/Contacts/Further Information: Further questions and comments, please direct communications to: or
Overview of Record Linkage Techniques
Overview of Record Linkage Techniques Record linkage or data matching refers to the process used to identify records which relate to the same entity (e.g. patient, customer, household) in one or more data
More informationSection 0.3 The Order of Operations
Section 0.3 The Contents: Evaluating an Expression Grouping Symbols OPERATIONS The Distributive Property Answers Focus Exercises Let s be reminded of those operations seen thus far in the course: Operation
More informationFuzzy Matching with SAS: Data Analysts Tool to Cleaner Data. Josh Fogarasi
Fuzzy Matching with SAS: Data Analysts Tool to Cleaner Data Josh Fogarasi Agenda What is Fuzzy Matching Anyways? Why is it relevant to a Data Professional? Introducing some useful SAS Text Functions Fuzzy
More informationControl Structures. Code can be purely arithmetic assignments. At some point we will need some kind of control or decision making process to occur
Control Structures Code can be purely arithmetic assignments At some point we will need some kind of control or decision making process to occur C uses the if keyword as part of it s control structure
More informationRecord Linkage. with SAS and Link King. Dinu Corbu. Queensland Health Health Statistics Centre Integration and Linkage Unit
Record Linkage with SAS and Link King Dinu Corbu Queensland Health Health Statistics Centre Integration and Linkage Unit Presented at Queensland Users Exploring SAS Technology QUEST 4 June 2009 Basics
More informationProgramming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC
Paper CC-05 Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC ABSTRACT For many SAS users, learning SQL syntax appears to be a significant effort with a low
More informationTOPIC 2 INTRODUCTION TO JAVA AND DR JAVA
1 TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA Notes adapted from Introduction to Computing and Programming with Java: A Multimedia Approach by M. Guzdial and B. Ericson, and instructor materials prepared
More information5. Technology Applications
5. Technology Applications 5.1 What is a Database? 5.2 Types of Databases 5.3 Choosing the Right Database 5.4 Database Programming Tools 5.5 How to Search Your Database 5.6 Data Warehousing and Mining
More informationCache Coherence Tutorial
Cache Coherence Tutorial The cache coherence protocol described in the book is not really all that difficult and yet a lot of people seem to have troubles when it comes to using it or answering an assignment
More informationReminder: Mechanics of address translation. Paged virtual memory. Reminder: Page Table Entries (PTEs) Demand paging. Page faults
CSE 451: Operating Systems Autumn 2012 Module 12 Virtual Memory, Page Faults, Demand Paging, and Page Replacement Reminder: Mechanics of address translation virtual address virtual # offset table frame
More information1: Introduction to Object (1)
1: Introduction to Object (1) 김동원 2003.01.20 Overview (1) The progress of abstraction Smalltalk Class & Object Interface The hidden implementation Reusing the implementation Inheritance: Reusing the interface
More informationWe re going to start with two.csv files that need to be imported to SQL Lite housing2000.csv and housing2013.csv
Basic SQL joining exercise using SQL Lite Using Census data on housing units, by place Created by @MaryJoWebster January 2017 The goal of this exercise is to introduce how joining tables works in SQL.
More informationThe Ins and Outs of %IF
Paper 1135-2017 The Ins and Outs of %IF M. Michelle Buchecker, ThotWave Technologies, LLC. ABSTRACT Have you ever had your macro code not work and you couldn't figure out why? Even something as simple
More informationLesson 9 Transcript: Backup and Recovery
Lesson 9 Transcript: Backup and Recovery Slide 1: Cover Welcome to lesson 9 of the DB2 on Campus Lecture Series. We are going to talk in this presentation about database logging and backup and recovery.
More informationHow to approach a computational problem
How to approach a computational problem A lot of people find computer programming difficult, especially when they first get started with it. Sometimes the problems are problems specifically related to
More informationLesson 1. Introduction to Programming OBJECTIVES
Introduction to Programming If you re new to programming, you might be intimidated by code and flowcharts. You might even wonder how you ll ever understand them. This lesson offers some basic ideas and
More informationLaboratory 1: Eclipse and Karel the Robot
Math 121: Introduction to Computing Handout #2 Laboratory 1: Eclipse and Karel the Robot Your first laboratory task is to use the Eclipse IDE framework ( integrated development environment, and the d also
More informationLecture 12. Lecture 12: The IO Model & External Sorting
Lecture 12 Lecture 12: The IO Model & External Sorting Announcements Announcements 1. Thank you for the great feedback (post coming soon)! 2. Educational goals: 1. Tech changes, principles change more
More information50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas
Paper 103-26 50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas ABSTRACT When you need to join together two datasets, how do
More informationIntro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming
Intro to Programming Unit 7 Intro to Programming 1 What is Programming? 1. Programming Languages 2. Markup vs. Programming 1. Introduction 2. Print Statement 3. Strings 4. Types and Values 5. Math Externals
More informationMr G s Java Jive. #11: Formatting Numbers
Mr G s Java Jive #11: Formatting Numbers Now that we ve started using double values, we re bound to run into the question of just how many decimal places we want to show. This where we get to deal with
More informationCIS 45, The Introduction. What is a database? What is data? What is information?
CIS 45, The Introduction I have traveled the length and breadth of this country and talked with the best people, and I can assure you that data processing is a fad that won t last out the year. The editor
More informationRemembering the Past. Who Needs Documentation?
Remembering the Past Using SAS Keyboard Macros to Enhance Documentation Pete Lund Looking Glass Analytics Olympia, WA Who Needs Documentation? How many times have you looked at line after line of code
More informationCOMP 215: INTRO TO PROGRAM DESIGN. Prof. Chris Jermaine Chris Prof. Chris Dr. Chris
COMP 215: INTRO TO PROGRAM DESIGN Prof. Chris Jermaine cmj4@cs.rice.edu Chris Prof. Chris Dr. Chris 1 This Class 50% of content: modern programming and program design The Java programming language will
More informationQUICK EXCEL TUTORIAL. The Very Basics
QUICK EXCEL TUTORIAL The Very Basics You Are Here. Titles & Column Headers Merging Cells Text Alignment When we work on spread sheets we often need to have a title and/or header clearly visible. Merge
More informationArcMap Online Tutorial Sarah Pierce How to map in ArcMap Online using the Fresh Prince of Bel Air as an example
Fall GARP ArcMap Online Tutorial Sarah Pierce How to map in ArcMap Online using the Fresh Prince of Bel Air as an example Westfield State University Let s say you ve never used ArcGIS before and your professor
More informationExcel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller
Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Table of Contents Introduction!... 1 Part 1: Entering Data!... 2 1.a: Typing!... 2 1.b: Editing
More informationMaximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University
Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University While your data tables or spreadsheets may look good to
More informationTable of Contents. 1. Cover Page 2. Quote 3. Calculated Fields 4. Show Values As 5. Multiple Data Values 6. Enroll Today!
Table of Contents 1. Cover Page 2. Quote 3. Calculated Fields 4. Show Values As 5. Multiple Data Values 6. Enroll Today! "It is Kind Of fun to do the IMPOSSIBLE" Walt Disney Calculated Fields The purpose
More information** Pre-Sell Page Secrets **
** Pre-Sell Page Secrets ** Page 1 - CommissionBlueprint.com 2008 Introduction Using a pre-sell page is a highly effective tactic that can be used in almost any market to motivate a visitor into purchasing
More informationData Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data
More informationCOMP : Practical 8 ActionScript II: The If statement and Variables
COMP126-2006: Practical 8 ActionScript II: The If statement and Variables The goal of this practical is to introduce the ActionScript if statement and variables. If statements allow us to write scripts
More informationChapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change
Chapter01.fm Page 1 Monday, August 23, 2004 1:52 PM Part I The Mechanics of Change The Mechanics of Change Chapter01.fm Page 2 Monday, August 23, 2004 1:52 PM Chapter01.fm Page 3 Monday, August 23, 2004
More informationInteraction Desig n for the Real World A new Selection Tool for GIMP Documentation. Tutor: Peter Sikking Students: Steffen Fridgen & Sarah Montti
Interaction Desig n for the Real World A new Selection Tool for GIMP Documentation Tutor: Peter Sikking Students: Steffen Fridgen & Sarah Montti Contens 1. Our challenge 2. The current selection tool 3.
More informationWorking with Data in Windows and Descriptive Statistics
Working with Data in Windows and Descriptive Statistics HRP223 Topic 2 October 3 rd, 2012 Copyright 1999-2012 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected
More informationData Manipulation with SQL Mara Werner, HHS/OIG, Chicago, IL
Paper TS05-2011 Data Manipulation with SQL Mara Werner, HHS/OIG, Chicago, IL Abstract SQL was developed to pull together information from several different data tables - use this to your advantage as you
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More information12 Key Steps to Successful Marketing
12 Key Steps to Successful Email Marketing Contents Introduction 3 Set Objectives 4 Have a plan, but be flexible 4 Build a good database 5 Should I buy data? 5 Personalise 6 Nail your subject line 6 Use
More informationWelcome Back! Without further delay, let s get started! First Things First. If you haven t done it already, download Turbo Lister from ebay.
Welcome Back! Now that we ve covered the basics on how to use templates and how to customise them, it s time to learn some more advanced techniques that will help you create outstanding ebay listings!
More informationText Input and Conditionals
Text Input and Conditionals Text Input Many programs allow the user to enter information, like a username and password. Python makes taking input from the user seamless with a single line of code: input()
More informationUsing Photoshop Actions in Batches
Using Photoshop Actions in Batches Overview Suppose you have a Photoshop action called Do Something that does something useful to a file that you have open in Photoshop. Record an action (say) Do Something
More informationVisual Design. Simplicity, Gestalt Principles, Organization/Structure
Visual Design Simplicity, Gestalt Principles, Organization/Structure Many examples are from Universal Principles of Design, Lidwell, Holden, and Butler Why discuss visual design? You need to present the
More informationWhy Hash? Glen Becker, USAA
Why Hash? Glen Becker, USAA Abstract: What can I do with the new Hash object in SAS 9? Instead of focusing on How to use this new technology, this paper answers Why would I want to? It presents the Big
More informationIt s Proc Tabulate Jim, but not as we know it!
Paper SS02 It s Proc Tabulate Jim, but not as we know it! Robert Walls, PPD, Bellshill, UK ABSTRACT PROC TABULATE has received a very bad press in the last few years. Most SAS Users have come to look on
More informationCertificate-based authentication for data security
Technical white paper Certificate-based authentication for data security Table of Contents Introduction... 2 Analogy: A simple checking account... 2 Verifying a digital certificate... 2 Summary... 8 Important
More information1.7 Limit of a Function
1.7 Limit of a Function We will discuss the following in this section: 1. Limit Notation 2. Finding a it numerically 3. Right and Left Hand Limits 4. Infinite Limits Consider the following graph Notation:
More informationPart I: Programming Access Applications. Chapter 1: Overview of Programming for Access. Chapter 2: Extending Applications Using the Windows API
74029c01.qxd:WroxPro 9/27/07 1:43 PM Page 1 Part I: Programming Access Applications Chapter 1: Overview of Programming for Access Chapter 2: Extending Applications Using the Windows API Chapter 3: Programming
More informationFully Optimize FULLY OPTIMIZE YOUR DBA RESOURCES
Fully Optimize FULLY OPTIMIZE YOUR DBA RESOURCES IMPROVE SERVER PERFORMANCE, UPTIME, AND AVAILABILITY WHILE LOWERING COSTS WE LL COVER THESE TOP WAYS TO OPTIMIZE YOUR RESOURCES: 1 Be Smart About Your Wait
More informationIn this white paper we want to look at seven basic principles that can help make your website be the best it can be.
websites matter A good website is important now more than ever. More and more people first visit your church through your website -- not through your front doors. Visitors will likely evaluate six to 12
More informationIntroduction to Programming Style
Introduction to Programming Style Thaddeus Aid The IT Learning Programme The University of Oxford, UK 30 July, 2013 Abstract Programming style is the part of the program that the human reads and the compiler
More informationLet me begin by introducing myself. I have been a Progress Application Partner since 1986 and for many years I was the architect and chief developer
Let me begin by introducing myself. I have been a Progress Application Partner since 1986 and for many years I was the architect and chief developer for our ERP application. In recent years, I have refocused
More informationReasoning About Programs Panagiotis Manolios
Reasoning About Programs Panagiotis Manolios Northeastern University February 26, 2017 Version: 100 Copyright c 2017 by Panagiotis Manolios All rights reserved. We hereby grant permission for this publication
More informationGrade 6 Math Circles November 6 & Relations, Functions, and Morphisms
Faculty of Mathematics Waterloo, Ontario N2L 3G1 Centre for Education in Mathematics and Computing Relations Let s talk about relations! Grade 6 Math Circles November 6 & 7 2018 Relations, Functions, and
More informationOperating System Principles: Memory Management Swapping, Paging, and Virtual Memory CS 111. Operating Systems Peter Reiher
Operating System Principles: Memory Management Swapping, Paging, and Virtual Memory Operating Systems Peter Reiher Page 1 Outline Swapping Paging Virtual memory Page 2 Swapping What if we don t have enough
More informationHash Objects Why Bother? Barb Crowther SAS Technical Training Specialist. Copyright 2008, SAS Institute Inc. All rights reserved.
Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist Purpose The purpose of this presentation is not to teach you how to program Hash Objects That s a two hour topic in PRG3. The purpose
More informationGetting Started with Amicus Document Assembly
Getting Started with Amicus Document Assembly How great would it be to automatically create legal documents with just a few mouse clicks? We re going to show you how to do exactly that and how to get started
More informationThe compiler is spewing error messages.
Appendix B Debugging There are a few different kinds of errors that can occur in a program, and it is useful to distinguish between them in order to track them down more quickly. Compile-time errors are
More informationBeginning a New Project
3 Beginning a New Project Introducing Projects 000 Creating and Naming a Project 000 Importing Assets 000 Importing Photoshop Documents 000 Importing Illustrator Documents 000 Importing QuickTime Movies
More informationAll Paging Schemes Depend on Locality. VM Page Replacement. Paging. Demand Paging
3/14/2001 1 All Paging Schemes Depend on Locality VM Page Replacement Emin Gun Sirer Processes tend to reference pages in localized patterns Temporal locality» locations referenced recently likely to be
More informationLABORATORY. 16 Databases OBJECTIVE REFERENCES. Write simple SQL queries using the Simple SQL app.
Dmitriy Shironosov/ShutterStock, Inc. Databases 171 LABORATORY 16 Databases OBJECTIVE Write simple SQL queries using the Simple SQL app. REFERENCES Software needed: 1) Simple SQL app from the Lab Manual
More informationLayout Assistant Help
Layout Assistant Help The intent of this tool is to allow one to group controls on a form and move groups out of the way during the design phase, and then easily return them to their original positions
More informationIntroduction to Software Testing Chapter 3, Sec# 1 & 2 Logic Coverage
Introduction to Software Testing Chapter 3, Sec# 1 & 2 Logic Coverage Paul Ammann & Jeff Offutt http://www.cs.gmu.edu/~offutt/soft waretest/ Ch. 3 : Logic Coverage Four Structures for Modeling Software
More informationGuidelines for Coding of SAS Programs Thomas J. Winn, Jr. Texas State Auditor s Office
Guidelines for Coding of SAS Programs Thomas J. Winn, Jr. Texas State Auditor s Office Abstract This paper presents a set of proposed guidelines that could be used for writing SAS code that is clear, efficient,
More informationCMSC 341 Lecture 16/17 Hashing, Parts 1 & 2
CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 Prof. John Park Based on slides from previous iterations of this course Today s Topics Overview Uses and motivations of hash tables Major concerns with hash
More informationMaking ERAS work for you
Making ERAS work for you (Beyond the basics) If you ve used ERAS for your application season, you ve probably mastered the basics; collecting mail from ERAS Post Office, checking boxes in the status listing,
More informationUV Mapping to avoid texture flaws and enable proper shading
UV Mapping to avoid texture flaws and enable proper shading Foreword: Throughout this tutorial I am going to be using Maya s built in UV Mapping utility, which I am going to base my projections on individual
More informationHow Your First Program Works
How Your First Program Works Section 2: How Your First Program Works How Programs Are Structured...19 Method Main ( )...21 How Programs Are Structured In Section 1, you typed in and ran your first program
More informationData organization. So what kind of data did we collect?
Data organization Suppose we go out and collect some data. What do we do with it? First we need to figure out what kind of data we have. To illustrate, let s do a simple experiment and collect the height
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationNew Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL
Paper SS-03 New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL ABSTRACT There s SuperCE for comparing text files on the mainframe. Diff
More informationHot Fuzz: using SAS fuzzy data matching techniques to identify duplicate and related customer records. Stuart Edwards, Collection House Group
Hot Fuzz: using SAS fuzzy data matching techniques to identify duplicate and related customer records. Stuart Edwards, Collection House Group Collection House Group; what do we do? Debt Collection; purchased
More informationNative POSIX Thread Library (NPTL) CSE 506 Don Porter
Native POSIX Thread Library (NPTL) CSE 506 Don Porter Logical Diagram Binary Memory Threads Formats Allocators Today s Lecture Scheduling System Calls threads RCU File System Networking Sync User Kernel
More informationThe Benefits of SMS as a Marketing and Communications Channel From The Chat Bubble written by Michael
The Benefits of SMS as a Marketing and Communications Channel 1 Why companies and organizations should do SMS. We re going to talk through from an organization or marketers point of view, what SMS is good
More informationpractical Perl tools: let me help you get regular
David N. Blank-Edelman practical Perl tools: let me help you get regular David N. Blank-Edelman is the director of technology at the Northeastern University College of Computer and Information Science
More informationIn the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the
In the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the relationships between concepts. And we discussed common
More informationChamberlin and Boyce - SEQUEL: A Structured English Query Language
Programming Languages (CS302 2007S) Chamberlin and Boyce - SEQUEL: A Structured English Query Language Comments on: Chamberlin, D. D. and Boyce, R. F. (1974). SEQUEL: A Structured English Query Language.
More informationMulti-Level Feedback Queues
CS 326: Operating Systems Multi-Level Feedback Queues Lecture 8 Today s Schedule Building an Ideal Scheduler Priority-Based Scheduling Multi-Level Queues Multi-Level Feedback Queues Scheduling Domains
More information4.1 Review - the DPLL procedure
Applied Logic Lecture 4: Efficient SAT solving CS 4860 Spring 2009 Thursday, January 29, 2009 The main purpose of these notes is to help me organize the material that I used to teach today s lecture. They
More informationGoogle Drive: Access and organize your files
Google Drive: Access and organize your files Use Google Drive to store and access your files, folders, and Google Docs anywhere. Change a file on the web, your computer, or your mobile device, and it updates
More informationBackground. Let s see what we prescribed.
Background Patient B s custom application had slowed down as their data grew. They d tried several different relief efforts over time, but performance issues kept popping up especially deadlocks. They
More informationChoosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA
Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA ABSTRACT This paper outlines different SAS merging techniques
More informationGit. all meaningful operations can be expressed in terms of the rebase command. -Linus Torvalds, 2015
Git all meaningful operations can be expressed in terms of the rebase command -Linus Torvalds, 2015 a talk by alum Ross Schlaikjer for the GNU/Linux Users Group Sound familiar? add commit diff init clone
More informationSQL Server Whitepaper
SQL Server Whitepaper INITIALIZING REPLICATION FROM BACKUP BY KENNETH FISHER and ROBERT L DAVIS Applies to: SQL Server 2008, SQL Server 2008 R2, SQL Server 2012, SQL Server 2014, SQL Server 2016 SUMMARY
More informationEmbedded Linux Day 2
Embedded Linux Day 2 Stuffs HW1 posted today Shooting for 1-2 hours. Review scheduling stuff & licensing. HW0 in lab Sign up for group meetings for next Thursday posted today. Review I got a number of
More informationEXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)
EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT) DESCRIPTION: This example shows how to combine the data on respondents from the first two waves of Understanding Society into
More informationVoIP INTERNET-BASED PHONE SYSTEMS CHOCK FULL OF FEATURES
VoIP INTERNET-BASED PHONE SYSTEMS CHOCK FULL OF FEATURES VoIP Internet-based phone systems chock full of features TABLE OF CONTENTS What is VoIP? Switching to VoIP is easy Business Telecom Features Improved
More informationCS61A Notes Week 6: Scheme1, Data Directed Programming You Are Scheme and don t let anyone tell you otherwise
CS61A Notes Week 6: Scheme1, Data Directed Programming You Are Scheme and don t let anyone tell you otherwise If you re not already crazy about Scheme (and I m sure you are), then here s something to get
More informationCTI-TC Weekly Working Sessions
CTI-TC Weekly Working Sessions Meeting Date: October 4, 2016 Time: 15:00:00 UTC Purpose: Weekly CTI-TC Joint Working Session Attendees: Agenda: Jordan Trey Darley Wunder Ivan Kirillov Stephen Banghart
More informationSpectroscopic Analysis: Peak Detector
Electronics and Instrumentation Laboratory Sacramento State Physics Department Spectroscopic Analysis: Peak Detector Purpose: The purpose of this experiment is a common sort of experiment in spectroscopy.
More informationTesting is a very big and important topic when it comes to software development. Testing has a number of aspects that need to be considered.
Testing Testing is a very big and important topic when it comes to software development. Testing has a number of aspects that need to be considered. System stability is the system going to crash or not?
More informationSurfing the SAS cache
Surfing the SAS cache to improve optimisation Michael Thompson Department of Employment / Quantam Solutions Background Did first basic SAS course in 1989 Didn t get it at all Actively avoided SAS programing
More informationVirtual Memory. ICS332 Operating Systems
Virtual Memory ICS332 Operating Systems Virtual Memory Allow a process to execute while not completely in memory Part of the address space is kept on disk So far, we have assumed that the full address
More informationXP: Backup Your Important Files for Safety
XP: Backup Your Important Files for Safety X 380 / 1 Protect Your Personal Files Against Accidental Loss with XP s Backup Wizard Your computer contains a great many important files, but when it comes to
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationdtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Keith Kranker
dtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Presentation at the 2018 Stata Conference Columbus, Ohio July 20, 2018 Keith Kranker Abstract Stata users
More informationCSC209. Software Tools and Systems Programming. https://mcs.utm.utoronto.ca/~209
CSC209 Software Tools and Systems Programming https://mcs.utm.utoronto.ca/~209 What is this Course About? Software Tools Using them Building them Systems Programming Quirks of C The file system System
More informationCOSC 2P91. Introduction Part Deux. Week 1b. Brock University. Brock University (Week 1b) Introduction Part Deux 1 / 14
COSC 2P91 Introduction Part Deux Week 1b Brock University Brock University (Week 1b) Introduction Part Deux 1 / 14 Source Files Like most other compiled languages, we ll be dealing with a few different
More informationRead & Download (PDF Kindle) Data Structures And Other Objects Using Java (4th Edition)
Read & Download (PDF Kindle) Data Structures And Other Objects Using Java (4th Edition) Data Structures and Other Objects Using Java is a gradual, "just-in-time" introduction to Data Structures for a CS2
More informationML from Large Datasets
10-605 ML from Large Datasets 1 Announcements HW1b is going out today You should now be on autolab have a an account on stoat a locally-administered Hadoop cluster shortly receive a coupon for Amazon Web
More informationMount Points Mount Points is a super simple tool for connecting objects together and managing those relationships.
Mount Points Mount Points is a super simple tool for connecting objects together and managing those relationships. With Mount Points, you can simply drag two objects together and when their mount points
More information