A Legislative Bill Text Retrieval and Distribution System Using SAS, PROC SQL, and SAS/Access to DB2

Similar documents
1.1 Introduction. Fig.1.1 Abstract view of the components of a computer system.

A Guided Tour Through the SAS Windowing Environment Casey Cantrell, Clarion Consulting, Los Angeles, CA

Lesson 3 Transcript: Part 1 of 2 - Tools & Scripting

Applications Development. Paper 38-28

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas

Data Warehousing on a Shoestring Rick Nicola, SPS Software Services Inc., Canton, OH

You might already know that tables are organized into vertical columns and horizontal rows.

Main challenges for a SAS programmer stepping in SAS developer s shoes

capabilities and their overheads are therefore different.

Characteristics of a "Successful" Application.

Biocomputing II Coursework guidance

SAS Data Libraries. Definition CHAPTER 26

The Mac 512 User Group Newsletter

Data Warehousing. New Features in SAS/Warehouse Administrator Ken Wright, SAS Institute Inc., Cary, NC. Paper

SAS Studio: A New Way to Program in SAS

The attendee will get a deep dive into all the DDL changes needed in order to exploit DB2 V10 Temporal tables as well as the limitations.

David DeFlyer Class notes CS162 January 26 th, 2009

Improving Your Relationship with SAS Enterprise Guide Jennifer Bjurstrom, SAS Institute Inc.

Collaborative Colour Production in Wide Format Printing. This is the third article in this part of the Wild Format Series. It is supported by

In This Issue. The Enhanced Editor in QMF 11.2: Highlights. 1st Quarter 2016 Edition

History of Operating Systems. History of Operating Systems. G53OPS: Operating Systems. History of Operating Systems. History of Operating Systems

wuss 1994 You can also limit the observations which you chose by the use of a Where clause (Example 4). While SAS provides the means for

Use That SAP to Write Your Code Sandra Minjoe, Genentech, Inc., South San Francisco, CA

Linked Lists. What is a Linked List?

Myths about Links, Links and More Links:

The ABRS Contractual Fees System

THE PHARMAGEN CORPORATION

SASe vs OB2 as a Relational DBMS for End Users: Three Corporations with Three Different Solutions Stephen C. Scott, Scott Consulting Services, Inc.

External Databases: Tools for the SAS Administrator

First Unitarian Online Photo Directory Frequently Asked Questions

Instructor: Craig Duckett. Lecture 03: Tuesday, April 3, 2018 SQL Sorting, Aggregates and Joining Tables

Beyond Proc GLM A Statistician's Perspective of (some of) The Rest of the SAS System

The %let is a Macro command, which sets a macro variable to the value specified.

Hypothesis Testing: An SQL Analogy

Voice. The lost piece of the BYOD puzzle.

Lecturer 4: File Handling

AJPS + + USER SPEC. MEMO /OS

x 2 + 3, r 4(x) = x2 1

Hi everyone. I hope everyone had a good Fourth of July. Today we're going to be covering graph search. Now, whenever we bring up graph algorithms, we

The Evolution of Integration by W. H. Inmon

Formal Methods of Software Design, Eric Hehner, segment 1 page 1 out of 5

Nextgen Transactions. Import Transactions. Local Government Corporation Resource /11/2014

Computer Science Lab Exercise 1

Updating Data Using the MODIFY Statement and the KEY= Option

V6 Programming Fundamentals: Part 1 Stored Procedures and Beyond David Adams & Dan Beckett. All rights reserved.

Staleness and Isolation in Prometheus 2.0. Brian Brazil Founder

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21

INSPIRE and SPIRES Log File Analysis

CPS352 Lecture: Database System Architectures last revised 3/27/2017

PORTAL RESOURCES INFORMATION SYSTEM: THE DESIGN AND DEVELOPMENT OF AN ONLINE DATABASE FOR TRACKING WEB RESOURCES.

Unit 2 : Computer and Operating System Structure

Module 6. Campaign Layering

Lesson 3 Transcript: Part 2 of 2 Tools & Scripting

XP: Backup Your Important Files for Safety

CS 252: Fundamentals of Relational Databases: SQL5

Efficiently Join a SAS Data Set with External Database Tables

Balancing the pressures of a healthcare SQL Server DBA

Enabling Performance & Stress Test throughout the Application Lifecycle

Providing Users with Access to the SAS Data Warehouse: A Discussion of Three Methods Employed and Supported

A Texas-sized networking challenge

A simplistic approach to Grid Computing Edmonton SAS Users Group. April 5, 2016 Bill Benson, Enterprise Data Scienc ATB Financial

Beginning Tutorials. BT004 Enterprise Guide Version 2.0 NESUG 2003 James Blaha, Pace University, Briarcliff Manor, NY ABSTRACT: INTRODUCTION:

Meet our Example Buyer Persona Adele Revella, CEO

Understanding Managed Services

This Week on developerworks Push for ios, XQuery, Spark, CoffeeScript, top Rational content Episode date:


Update of The University of Missouri s Solution for Processing E Transcripts

Voice. The lost piece of the BYOD puzzle.

Considerations of Analysis of Healthcare Claims Data

CSE : Python Programming. Homework 5 and Projects. Announcements. Course project: Overview. Course Project: Grading criteria

Promoting Component Architectures in a Dysfunctional Organization

This Old Model. Updating and Improving Older Tax Models. Property Tax and Rent Rebate (PTRR) model

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30

Integrating Visual FoxPro and MailChimp

Sample Exam. Advanced Test Automation - Engineer

This lesson is part 5 of 5 in a series. You can go to Invoice, Part 1: Free Shipping if you'd like to start from the beginning.

Using X-Particles with Team Render

Table of Contents. Overview of the TEA Login Application Features Roles in Obtaining Application Access Approval Process...

Emily Kofoid Dawne Tortorella

SocINDEX Guide. On the Ithaca College Library web site, SocINDEX is available through Ebsco. The top of the first screen will look like this:

Business Process Document Student Records: Posting Transfer Credit in Batch

Digital Marketing Manager, Marketing Manager, Agency Owner. Bachelors in Marketing, Advertising, Communications, or equivalent experience

SAS Solutions for the Web: Static and Dynamic Alternatives Matthew Grover, S-Street Consulting, Inc.

Locking SAS Data Objects

MITOCW watch?v=sdw8_0rdzuw

Sample Online Survey Report: Complex Software Application

Building a Data Warehouse with SAS Software in the Unix Environment

Tips from the experts: How to waste a lot of time on this assignment

Formal Methods of Software Design, Eric Hehner, segment 24 page 1 out of 5

Quicker Than Merge? Kirby Cossey, Texas State Auditor s Office, Austin, Texas

Q1 How often do you visit the MACTA website?

User Guide HelpSystems Insite 1.6

Elmhurst Public Library Create, Make, and Build

1 GSW Bridging and Switching

HOW TO TURN A GREAT APP INTO A KILLER APP Trade secrets from Red Gate Software

zenterprise zenteprise Usage Scenarios

Logical File Organisation A file is logically organised as follows:

10 Things to expect from a DB2 Cloning Tool

I'm Andy Glover and this is the Java Technical Series of. the developerworks podcasts. My guest is Brian Jakovich. He is the

Transcription:

A Legislative Bill Text Retrieval and Distribution System Using SAS, PROC SQL, and SAS/Access to DB2 John Turman and Kathe Richards Technical Support, Application Systems Division Texas Comptroller of Public Accounts One of the signs of the approach of winter in even numbered years around state agencies is the flurry of activity associated with getting ready for the Legislature to convene in the spring to determine our fates. Last year, as the session loomed, the committees examining our processes and procedures for doing analysis of proposed legislation produced a wish list of supporting automation. Included prominently on this list was the ability to be able to store, retrieve, read and print proposed bills electronically. The process of reviewing and analyzing proposed legislation in our agency typically involves many people, particularly when the legislation involves taxes, fees, funds, and state administrative policy -- and a good bit of it does. Handling the distribution of this text on-line would save us the expense and time delay involved in copying the bills and distributing them to the subject matter experts responsible for determining the administrative and fiscal impact the bill would have on the agency and state. Initially, we looked for a classic document-based implementation. We thought maybe something like passing the bills around like word processing documents on a local area network would be the best approach. This theory ran into two snags. The subject matter experts were scattered all over the agency, in several different physical locations, and with access to an incredible mix of hardware, software and network configurations, including none. The lowest common denominator was clearly the mainframe. Also, the system being developed to support tracking and collecting analysis on bills was a mainframe CICS system using DB2. 242

It became clear that it would be most useful to have the text available, if not directly through the tracking system, at least closely associated with it, since people doing the analysis work needed the text while they were doing it. Unfortunately, it did not become clear until considerable time had been expended exploring the other possibilities. By this time there was not much implementation time left. We turned once again to the tool that has bailed us out regularly in the past -- SAS. SAS, running on our mainframe, was clearly the least common denominator from the equipment mix standpoint. We had staff and expertise available. And we could be assured that the development effort would be quick. It was decided that the text would be made available for viewing online through the CICS system and would be stored in DB2 tables. We get the text from one of the legislative services on a tape in the wee hours of the morning. It is read and reformatted using SAS before being loaded to the DB2 tables. Originally, the plan was for the users to be able to get print of the bills after reviewing them on-line by executing a CICS transaction that submitted a print job to the internal reader. Because of the early indecision about how to handle the text distribution, however, this piece didn't get included in the early design of the on-line system and we discovered that some people using the system were getting their print by using CICS screen prints. This was clearly unsatisfactory. Some divisions extracted their text and downloaded to their Local Area Networks where the LAN users could import the text into word processing packages for print. Other users routed the print from the batch SAS jobs to their local VTAM printers. Formatting and printing the bills themselves was fairly straightforward. The SAS jobs to format the print select the bill to be printed on the basis of bill number and session number stored with the lines of text. Each line of text is a separate row in the DB2 table. We used PROC SQL to select text for specific bills and a DATA _NULL_ step with PUT statements to print the text in the format everyone was used to seeing. The initial code looked something like this: 243

PROC SQL; CREATE VIEW BILLS AS SELECT * FROM DB2FILE.BILLTEXT WHERE SESSION = '7300' AND BILLTYPE = 'HB' AND BILLNUM = 1 AND VERSION = 1; DATA _NULL_; SET BILLS; FILE PRINT; PUT @1 LINENUM 4. @1O TEXTLINE $70.; To take care of the people who were doing screen prints and to make life easier for the folks who were routing print to mainframe printers, we built an interactive front-end procedure that allows the user to enter a list of bills. This proc builds a file of keys of the bills to the batch SAS job to be selected, formatted and printed. The PROC SQL statement that supported this now became more complicated. The input to the SELECT statement was now a group of parameters instead of values. DATA BILLS; INPUT @1 SESSION $4. @5 BILLTYPE $3. @8 BILLNUM 4. @12 VERSION 2.; CARDS; 7300HB 1 1 7300HB 2 1 PROC SQL; CREATE VIEW BILLS AS SELECT * FROM DB2FILE.BILLTEXT A BILLS B etc. WHERE ASESSION = B.SESSION AND A BILL TYPE = B.BILL TYPE AND ABILLNUM = B.BILLNUM AND AVERSION = B.VERSION; 244

Several similar jobs were built to take care of special needs of different divisions. After working out all the minor problems associated with print routing and weird configurations, this technique worked out pretty well. In fact, there were virtually no problems associated with it until late in the session when one of the divisions began reporting missing print. When we looked at the output files from the jobs with problems we noticed that these little print jobs that ran in sub-seconds early in the session were now taking minutes of CPU time. The jobs with missing print were running out of time and ABENDing. Clearly the amount of data in the table holding the bill text was growing. With text from every bill proposed during the legislative session being added to the table, it was to be expected that we would have quite a bit of data by the end of the session. What was unexpected was the retrieval time in the PROC SQL statement. We determined pretty quickly that because SAS has no way of telling DB2 that the selection criteria being passed to DB2 are keys, DB2 does a table space scan to satisfy the request. When there are several bills to be selected, the cumulative effect of all the table space scans soon starts to amount to quite a bit of processing. PROC SQL operates within SAS. The only contact it has with DB2 is when it uses an access and view descriptor to get to a DB2 table. Even if our group of keys was in another DB2 table, PROC SQL would request the complete set of data from each table required to make the selection from both tables, pull them into the SAS work space, and perform the actual selection within SAS. The only way to take advantage of the power of DB2 to make the match would have been to use the SQL Pass-Thru feature of PROC SQL. This wasn't helpful in our case, however, since the proc that builds the key file was not building it as a DB2 file. (Making it a "temporary" DB2 table was considered as a possibility, but the administrative problems associated with this, such as getting the table defined, setting up the security, etc., proved to be daunting.) 245

When we presented the match values in the SELECT statement as literals -- numbers and text strings -- it was clear that DB2 would use the data values to do a keyed match on the table. The problem with this is that we had a variable number of keys. We thought first about building a macro that generates the code for each SELECT statement and execute it. This turned out to be much more complicated than it looks. We could fairly easily build and execute one SELECT statement, but doing a sequence of them proved to be problematical. The situation grew to be very complex and the effort was abandoned. (There is a sample of how to do this in the SAS Sample Library but I found it to be far from straightforward.) Material from a previous SUOI Conference included the suggestion that a side effect of the MODIFY statement could possibly work for us here: DATA DB 2F1LE.B ILL TEXT BVTX; MODIFY DB2FILE.BILL TEXT BILLS; BY SESSION BILLTYPE BILLNUM VERSION; IF _IORC_ = 0 THEN OUTPUT BVTX; _ERROR_=O; We were warned that this approach only works if KEY does not repeat in A. It sends repeated where clauses to the DBMS and does not scale well for lots of values of KEY. We managed to make this work, but we could not be assured that there would not be repeated keys. This also required that we have a full key to work with, so we would have had to add logic to be sure we were trapping all the qualifying rows. Too messy. The ultimate and most elegant solution to this problem was to build a temporary file of SELECT statements, one for each requested bill, using DATA _NULL_ and PUT statements. Then we executed them using %INCLUDE. (Because we wanted a separate print file for each bill, we actually built a complete PROC SQL step for each and followed it with the print formatting step.) 246

FILENAME TEMPSQL '&&TMPSQL'; OPTIONS $DB2DBUG; /* Allows you to see what SQL calls SAS is generating * / «DATA step that generates the SAS dataset of keys called BILLS goes here» DATA _NULL_; FILE TEMPSQL; SET BILLS; PUT "PROC SQL; "; PUT "CREATE VIEW BVTX AS "; PUT "SELECT * FROM DB2FILE.BILL TEXT " "WHERE SESSION = '" SESSION '" AND BILLTYPE = '" BILL '" AND BILLNUM = " BILLNUM " AND VERSION =" VERSION ".",., PUT "PROC PRINT DATA=BVTX; "; RUN; %INCLUDE TEMPSQL; From the standpoint of CPU utilization and ease of maintenance, this turned out to be quite satisfactory. The SASLOG can get to be quite large here since each execution is faithfully reported, but that proved to be a minor concern, since the users don't see that. We were able to cut down the run time for an average selection of bills from close to 5 minutes of CPU time to a couple of seconds (and make our computer performance guys really happy.) 247