CS2304 Spring 2014 Project 3

Similar documents
CS 1044 Program 6 Summer I dimension ??????

CS 2604 Minor Project 1 Summer 2000

CS 2604 Minor Project 1 DRAFT Fall 2000

Decision Logic: if, if else, switch, Boolean conditions and variables

// Initially NULL, points to the dynamically allocated array of bytes. uint8_t *data;

CS 1044 Project 1 Fall 2011

gcc o driver std=c99 -Wall driver.c bigmesa.c

The Program Specification:

CS 3114 Data Structures and Algorithms DRAFT Project 2: BST Generic

A rectangle in the xy-plane, whose sides are parallel to the coordinate axes can be fully specified by giving the coordinates of two opposite corners:

Pointer Casts and Data Accesses

Given that much information about two such rectangles, it is possible to determine whether they intersect.

CS 1044 Project 2 Spring 2003

Each line will contain a string ("even" or "odd"), followed by one or more spaces, followed by a nonnegative integer.

gcc o driver std=c99 -Wall driver.c everynth.c

CS 2604 Minor Project 3 Movie Recommender System Fall Braveheart Braveheart. The Patriot

For storage efficiency, longitude and latitude values are often represented in DMS format. For McBryde Hall:

CS 2604 Minor Project 3 DRAFT Summer 2000

Pointer Accesses to Memory and Bitwise Manipulation

Simple C Dynamic Data Structure

Creating a String Data Type in C

Programming Standards: You must conform to good programming/documentation standards. Some specifics:

struct _Rational { int64_t Top; // numerator int64_t Bottom; // denominator }; typedef struct _Rational Rational;

Fundamental Concepts: array of structures, string objects, searching and sorting. Static Inventory Maintenance Program

CS 2704 Project 1 Spring 2001

CS 1044 Project 5 Fall 2009

You will provide an implementation for a test driver and for a C function that satisfies the conditions stated in the header comment:

CS 1044 Program 2 Spring 2002

PR quadtree. public class prquadtree< T extends Compare2D<? super T> > {

Here is a C function that will print a selected block of bytes from such a memory block, using an array-based view of the necessary logic:

Pointer Accesses to Memory and Bitwise Manipulation

Difference Between Dates Case Study 2002 M. J. Clancy and M. C. Linn

Pointer Accesses to Memory and Bitwise Manipulation

Programming Logic and Design Sixth Edition

Example. Section: PS 709 Examples of Calculations of Reduced Hours of Work Last Revised: February 2017 Last Reviewed: February 2017 Next Review:

CS 2704 Project 3 Spring 2000

CS ) PROGRAMMING ASSIGNMENT 11:00 PM 11:00 PM

File Navigation and Text Parsing in Java

CS Homework 4 Lifeguard Employee Ranker. Due: Tuesday, June 3rd, before 11:55 PM Out of 100 points. Files to submit: 1. HW4.py.

Computer Grade 5. Unit: 1, 2 & 3 Total Periods 38 Lab 10 Months: April and May

PROGRAMMING CONCEPTS

Both parts center on the concept of a "mesa", and make use of the following data type:

Lecture-14 Lookup Functions

Welcome to... CS113: Introduction to C

DeVry University Houston

A Beginner s Guide to Programming Logic, Introductory. Chapter 6 Arrays

Law Firm Industry Analysis

a f b e c d Figure 1 Figure 2 Figure 3

Iowa Assessments TM Planning Guide

CS Homework 4 Employee Ranker. Due: Wednesday, February 8th, before 11:55 PM Out of 100 points. Files to submit: 1. HW4.py.

Dictionaries. By- Neha Tyagi PGT CS KV 5 Jaipur II Shift Jaipur Region. Based on CBSE Curriculum Class -11. Neha Tyagi, KV 5 Jaipur II Shift

CS 1510: Intro to Computing - Fall 2017 Assignment 8: Tracking the Greats of the NBA

Lab 4 - Input\Output in VB Using A Data File

Graded Project. Microsoft Excel

CMPSCI 187 / Spring 2015 Hanoi

Read and fill in this page now

CS 2704 Project 2: Elevator Simulation Fall 1999

B.2 Measures of Central Tendency and Dispersion

For this assignment, you will implement a collection of C functions to support a classic data encoding scheme.

Read and fill in this page now. Your lab section day and time: Name of the person sitting to your left: Name of the person sitting to your right:

Lecture 5 8/24/18. Writing larger programs. Comments. What are we going to cover today? Using Comments. Comments in Python. Writing larger programs

Project 5 Due 11:59:59pm Wed, Nov 25, 2015 (no late submissions)

Project 1 Balanced binary

The assignment requires solving a matrix access problem using only pointers to access the array elements, and introduces the use of struct data types.

CMPSC 111 Introduction to Computer Science I Fall 2016 Lab 8 Assigned: October 26, 2016 Due: November 2, 2016 by 2:30pm

Midterm Exam, October 24th, 2000 Tuesday, October 24th, Human-Computer Interaction IT 113, 2 credits First trimester, both modules 2000/2001

A GUIDE TO THE GRIEVANCE PROCESS IN THE DISTRICT OF COLUMBIA JAIL

Introduction to User Stories. CSCI 5828: Foundations of Software Engineering Lecture 05 09/09/2014

1099s: Out of the Holding Tank

Review of Engineering Fundamentals CIVL 4197

Introduction to Algorithms: Massachusetts Institute of Technology 7 October, 2011 Professors Erik Demaine and Srini Devadas Problem Set 4

Language Basics. /* The NUMBER GAME - User tries to guess a number between 1 and 10 */ /* Generate a random number between 1 and 10 */

CSCI 1100L: Topics in Computing Lab Lab 07: Microsoft Access (Databases) Part I: Movie review database.

CS3114 (Fall 2013) PROGRAMMING ASSIGNMENT #2 Due Tuesday, October 11:00 PM for 100 points Due Monday, October 11:00 PM for 10 point bonus

CMSC 201 Spring 2016 Lab 04 For Loops

MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY

CSC 443: Web Programming

Homework # 7 DUE: 11:59pm November 15, 2002 NO EXTENSIONS WILL BE GIVEN

CMSC 201 Spring 2017 Project 1 Number Classifier

Do not turn to the next page until the start of the exam.

ACORN.COM CS 1110 SPRING 2012: ASSIGNMENT A1

Furl Furled Furling. Social on-line book marking for the masses. Jim Wenzloff Blog:

Maintenance Minor Updates and Bug Fixes Release Dates... 8

Lab 7 1 Due Thu., 6 Apr. 2017

2. Formulas and Series


Sinusoidal Data Worksheet

Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis

CS 1803 Pair Homework 4 Greedy Scheduler (Part I) Due: Wednesday, September 29th, before 6 PM Out of 100 points

Problem Set 4. Problem 4-1. [35 points] Hash Functions and Load

Configuring GiftWorks to Work with QuickBooks

/ Cloud Computing. Recitation 2 January 19 & 21, 2016

C Programming. A quick introduction for embedded systems. Dr. Alun Moon UNN/CEIS. September 2008

Here is a C function that will print a selected block of bytes from such a memory block, using an array-based view of the necessary logic:

One of the hardest things you have to do is to keep track of three kinds of commands when writing and running computer programs. Those commands are:

Time and Attendance Self Service - Hourly Training Guide University of Massachusetts Boston Human Resources Department

EEN118 22nd November These are my answers. Extra credit for anyone who spots a mistike. Except for that one. I did it on purpise.

Update : CalFresh Elimination of Change Reporting in CalFresh

4. Java Project Design, Input Methods

Learning Objective. Project Objective

Transcription:

Goal The Bureau of Labor Statistics maintains data sets on many different things, from work place injuries to consumer spending habits, but what you most frequently hear about is employment. Conveniently, much of BLS s data is available online, and can be accessed using HTTP requests (GET/POST). For this project we re going to write a program that will allow us to access some of those data sets. During this project you ll obtain some experience with Python dictionaries and work with a few Python standard library packages. Program Interface and Output The program takes one command line parameter: an input file name. The input file starts with 2 lines of header text that can be discarded, followed by an arbitrary number of data lines. Each line in the file represents a data series, that may or may not exist and that can potentially be fetched. Here s a short sample input file with three series: Industry SA Data Type Start End ------------------------------------------------ Total nonfarm U ALL EMPLOYEES, THOUSANDS 1995 1995 Aircraft U WOMEN WORKERS, THOUSANDS 2003 2007 Millwork S AVERAGE HOURLY EARNINGS, 1982 DOLLARS 2003 2007 While it s not clear from this example, each column in the file is tab separated, so you can split using tab characters ( \t ). You may assume the input file is correctly structured. Each column contains a relevant piece of information needed to fetch a series. The Industry column provides the name of the industry we are examining, while the SA column tells whether we are requesting a data series that is Seasonally Adjusted (S) or Unadjusted (U). There are a few types of available information, which are specified by the Data Type. Finally, Start and End provide the starting and ending year for the data we are requesting. The Industry and Data Type information come from two additional input files (described later) that are always opened and processed. These input files contain the mapping of human- readable names like Total nonfarm to a numerical code that can be used to create a series ID number. To invoke the program: [cmdprompt$] python3 blsrequest.py input.txt The first and second series exist, so we print out the available data for each month of each year starting with the most recent month/year. The third series doesn t exist or doesn t have data for those years, and we print a message letting the user know. So running our program with the

input file above should produce the following output, printing each series ID, the human- readable information, and any data found: Series EEU00000001 Total nonfarm, Unadjusted, ALL EMPLOYEES, THOUSANDS, 1995-1995 Data found: December 1995 118918 November 1995 118917 October 1995 118665 September 1995 118083 August 1995 117180 July 1995 116926 June 1995 118138 May 1995 117409 April 1995 116674 March 1995 115849 February 1995 115093 January 1995 114435 Series EEU31372102 Aircraft, Unadjusted, WOMEN WORKERS, THOUSANDS, 2003-2007 All of the requested years are not available. Data Found: February 2003 41.4 January 2003 43.2 Series EES31243149 Millwork, Adjusted, AVERAGE HOURLY EARNINGS, 1982 DOLLARS, 2003-2007 The series doesn t exist or have data for given years. BLS Information The available data is broken down into series on the BLS website and each series has a code (a series ID) that you will need to create before trying to make a request. While there are many types of data sets available, we are only going to focus on the National Employment, Hours, and Earnings (SIC basis) data series. For our purposes, each series ID looks like: EES10140001. Each piece of the ID has a different meaning shown in the table below from the BLS website: Series ID EES10140001 Positions Value Field Name 1-2 EE Prefix 3 S Seasonal Adjustment Code 4-9 101400 Industry Code 10-11 01 Data Type Code

So, every series ID we ll create starts with EE, is either seasonally adjusted (S, in this case) or unadjusted (U), has a 6 digit Industry Code, and a 2 digit Data Type Code. The Industry codes and corresponding human- readable names can be found in ee.industry.txt, while the Data Type codes and human- readable names can be found in ee.datatype.txt. Both files can be found on the course website, or on BLS website here: http://download.bls.gov/pub/time.series/ee/ee.industry http://download.bls.gov/pub/time.series/ee/ee.datatype Like the input file these are tab separated. You may assume these files always exist and will be present in the same directory as your Python code. There are only a couple columns in each you need to worry about. Looking at the table above and those files, I can say EES10140001 is seasonally adjusted, the Industry Code is Nonmetallic minerals, except fuels, and the Data Type is ALL EMPLOYEES, THOUSANDS. Conversely, I could use the human- readable names in the files above (like in the input file) to derive the series ID. For more information, you may these links as reference: http://www.bls.gov/help/hlpforma.htmee http://www.bls.gov/developers/api_signature.htm http://www.bls.gov/developers/api_faqs.htm JSON Once we have created the series IDs, we need to send them to BLS. JSON is the data format we ll be using to request and receive series. The series IDs (and other info) must be marshaled into a dictionary then converted into a JSON string prior to making a request. What s JSON? From json.org: JSON (JavaScript Object Notation) is a lightweight data- interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA- 262 3rd Edition - December 1999. So, JSON is an easy, language independent way to store complex structures/values in strings and then transfer them between computers, where the information can be unpacked and used. Here s some more from json.org: JSON is built on two structures: A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array. An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

Here is an example of the type of Python dictionary we need to convert to JSON before transmitting the request to BLS. bls = {"seriesid":[ "EEU10140001", "EES10140002"], "startyear":"2010", "endyear":"2012" } So we have dictionary with 3 name and value pairs. seriesid s value is a list of strings (the series IDs), while startyear has value 2010 and endyear has value 2012. We ll convert this to a JSON string, which can be used to fetch the two series: EEU10140001 and EES10140002 for years from 2010-2012, assuming the data exists. Conveniently, Python provides functions to covert JSON to and from the string representation: import json bls is dictionary, which contains a list and the dates bls[ seriesid ] is the list, so bls[ seriesid ][0] = EEU10140001 and bls[ startyear ] is 2010 When you need to make a HTTP request you can turn the Python objects into a JSON string jsonstr = json.dumps(bls) So that dictionary is represented as string in jsonstr print(jsonstr) {"seriesid":[ "EEU10140001", "EES10140001"], "startyear":"2010", "endyear":"2012" } And we can turn the string back into Python data types. This will be useful when receiving series data. result = json.loads(jsonstr) Making HTTP Requests Now that we know what JSON is, we can look at using JSON to pass information back and forth between our computer and a webserver. Remember BLS s servers have all of the information we want and we are going to use HTTP requests (GET/POST) to access the appropriate data. Like before Python has a package, urllib, to help us out. I d suggest creating a Request object, and using urlopen to transmit and receive the required information. An example is shown below with carefully chosen parameters.

import urllib.request import json payload = json.dumps(bls) You shouldn t need to change anything here. Just change the value of the payload. r = urllib.request.request( "http://api.bls.gov/publicapi/v1/timeseries/data/", payload.encode("utf-8"), {"Content-Type": "application/json"}) Get the response result = urllib.request.urlopen(r) Get the data returned by the server. This is a JSON string with our series information. resultstr = result.read().decode( utf-8 ) The resultstr will be in JSON format and you ll be able to use json.loads() to turn the information from the server into Python data types. Once you ve used json.loads() you ll have a large nested dictionary and list structure, and you ll have experiment some to get to the right values. Take a look the links in the BLS links for examples of the exact format. Note: BLS limits you to a 10 year span in one HTTP request, so for time spans longer than 10 years (1960 1985), you ll need to make more than one JSON string and HTTP request. Summary and Hints This project has a lot of small pieces and some new topics to understand, but it s actually not that many lines of code if you use the suggestions above. Below, I ve also broken down what you need to do into different steps: Be able to read the information from ee.industry.txt and ee.datatype.txt. I d put the codes and human- readable names in dictionaries. Then you can covert the human- readable names to the appropriate code and vice versa. Once you ve done that I d use those dictionaries to build the required series IDs. You ll want to put those into a list and add the list to a dictionary. Once you ve created the dictionary with series ID(s) and a start and end year, you can convert that into a JSON string. Make the HTTP request using the JSON string. I d suggest making one request per series for simplicity but you can combine up to 25 series in a single request, they all share the same start and end date though. Once you ve received a response you can convert the JSON payload into Python data structures and iterate through them to print out the required information. You may need to make more than HTTP request to get all of the data.

Transient Errors While testing I occasionally received a response that looked like this: {"status":"request_failed","responsetime":0,"message":["your request has failed, please check your input parameters and try your request again."],"results":null} Note that Results is null (or None once we convert it Python data structures) rather than a list of dictionaries with empty data lists. The error seems to be transient, the same code could work one minute and give me the error later on. I ve tried the code on a few computers and encountered the same error using Curl, so I am fairly certain it s not a local issue or a code issue. If I had caught the error earlier on I may have changed the project substantially. Given this situation, since we ll be testing the project live on the Curator and we don t control BLS s resources, we need to have a backup plan. If you receive the error above or if BLS servers become other wise unreachable, you should print the series ID and information like before, and the print dictionaries you converted to JSON then tried to send to the BLS website. Series EEU00000001 Total nonfarm, Unadjusted, ALL EMPLOYEES, THOUSANDS, 1995-1995 {'seriesid': ['EEU00000001'], 'endyear': '1995', 'startyear': '1995'} Series EEU31372102 Aircraft, Unadjusted, WOMEN WORKERS, THOUSANDS, 2003-2007 {'seriesid': ['EEU31372102'], 'endyear': '2007', 'startyear': '2003'} Series EES31243149 Millwork, Adjusted, AVERAGE HOURLY EARNINGS, 1982 DOLLARS, 2003-2007 {'seriesid': ['EES31243149'], 'endyear': '2007', 'startyear': '2003'} Here is an example with a timespan greater than 10 years being broken into multiple requests: Series EEU00000001 Total nonfarm, Unadjusted, ALL EMPLOYEES, THOUSANDS, 1985-2010 {'seriesid': ['EEU00000001'], 'endyear': '2010', 'startyear': '2000'} {'seriesid': ['EEU00000001'], 'endyear': '1999', 'startyear': '1989'} {'seriesid': ['EEU00000001'], 'endyear': '1988', 'startyear': '1985'}

Submitting Your Work You will submit a single.py file, containing nothing but the implementation described above. Be sure to conform to any specified function interfaces. Your submission will be executed with a test driver and graded according to how many cases your solution handles correctly. This assignment will be graded automatically. You will be allowed up to ten submissions for this assignment. Test your function thoroughly before submitting it. Make sure that your function produces correct results for every test case you can think of. The course policy is that the submission that yields the highest score will be checked. If several submissions are tied for the highest score, the latest of those will be checked. The link to the submit page is located here: http://curator.cs.vt.edu:8080/2014springs04/index.jsp Pledge: Each of your program submissions must be pledged to conform to the Honor Code requirements for this course. Specifically, you must include the following pledge statement in the submitted file: On my honor: - I have not discussed the Python code in my program with anyone other than my instructor or the teaching assistants assigned to this course. - I have not used Python code obtained from another student, or any other unauthorized source, either modified or unmodified. - If any Python code or documentation used in my program was obtained from another source, such as a text book or course notes, that has been clearly noted with a proper citation in the comments of my program. - I have not designed this program in such a way as to defeat or interfere with the normal operation of the Curator System. <Student Name> Failure to include this pledge in a submission is a violation of the Honor Code.