Programming Assignment Comma Separated Values Reader Page 1

Similar documents
Text User Interfaces. Keyboard IO plus

Simple Java Input/Output

HST 952. Computing for Biomedical Scientists Lecture 8

CS 251 Intermediate Programming Java I/O Streams

Business Online TM. Positive Pay - Adding Issued Items. Quick Reference Guide

COMP-202 Unit 9: Exceptions

16-Dec-10. Consider the following method:

Exceptions and Libraries

Programming Languages and Techniques (CIS120)

Exceptions and I/O: sections Introductory Programming. Errors in programs. Exceptions

Introductory Programming Exceptions and I/O: sections

QueueBlock, ReversalADT, LinkedList,CustomerAccount, not MaintainCustomerData

Object-Oriented Programming Design Topic : Exception Programming

CS 455 Midterm Exam 2 Fall 2015 [Bono] Nov. 10, 2015

Question 1. (2 points) What is the difference between a stream and a file?

Answer Key. 1. General Understanding (10 points) think before you decide.

Formulas, LookUp Tables and PivotTables Prepared for Aero Controlex

Week 12. Streams and File I/O. Overview of Streams and File I/O Text File I/O

Java Style Guide. 1.0 General. 2.0 Visual Layout. Dr Caffeine

CISC 323 (Week 9) Design of a Weather Program & Java File I/O

CSE 143 Sp03 Midterm 2 Sample Solution Page 1 of 7. Question 1. (2 points) What is the difference between a stream and a file?

Input, Output and Exceptions. COMS W1007 Introduction to Computer Science. Christopher Conway 24 June 2003

Advanced Java Concept Unit 1. Mostly Review

Active Learning: Streams

Programming with the SCA BB Service Configuration API

CSV Import Guide. Public FINAL V

Exceptions. CSE 142, Summer 2002 Computer Programming 1.

Exceptions. Readings and References. Exceptions. Exceptional Conditions. Reading. CSE 142, Summer 2002 Computer Programming 1.

CSE 143 Lecture 25. I/O Streams; Exceptions; Inheritance. read 9.3, 6.4. slides adapted from Marty Stepp

CS11 Java. Fall Lecture 4

CSE 11 Style Guidelines

Java Input/Output. 11 April 2013 OSU CSE 1

Java How to Program, 10/e. Copyright by Pearson Education, Inc. All Rights Reserved.

CSE 143 Lecture 21. I/O Streams; Exceptions; Inheritance. read 9.3, 6.4. slides created by Marty Stepp

CSE 143 Lecture 22. I/O Streams; Exceptions; Inheritance. read 9.3, 6.4. slides created by Marty Stepp

Assignment 17 Deadline: Oct pm COSC211 CRN15376 Session 14 (Oct. 29)

Introduction to Java.net Package. CGS 3416 Java for Non Majors

1.00 Lecture 30. Sending information to a Java program

Any serious Java programmers should use the APIs to develop Java programs Best practices of using APIs

12/22/11. Java How to Program, 9/e. public must be stored in a file that has the same name as the class and ends with the.java file-name extension.

Project #1 Computer Science 2334 Fall 2008

Project #1 rev 2 Computer Science 2334 Fall 2013 This project is individual work. Each student must complete this assignment independently.

Exceptions. References. Exceptions. Exceptional Conditions. CSE 413, Autumn 2005 Programming Languages

Lecture 11.1 I/O Streams

1/16/2013. Program Structure. Language Basics. Selection/Iteration Statements. Useful Java Classes. Text/File Input and Output.

5/29/2006. Announcements. Last Time. Today. Text File I/O Sample Programs. The File Class. Without using FileReader. Reviewed method overloading.

CS193j, Stanford Handout #26. Files and Streams

Lecture 22. Java Input/Output (I/O) Streams. Dr. Martin O Connor CA166

Chapter 10. File I/O. Copyright 2016 Pearson Inc. All rights reserved.

Programming with the SCA BB Service Configuration API

BBM 102 Introduction to Programming II Spring Exceptions

Programming with the SCA BB Service Configuration API

Fall 2017 CISC124 10/1/2017

Lab 2: File Input and Output

Java in 21 minutes. Hello world. hello world. exceptions. basic data types. constructors. classes & objects I/O. program structure.

/* Copyright 2012 Robert C. Ilardi

Web Server Project. Tom Kelliher, CS points, due May 4, 2011

In this lab we will practice creating, throwing and handling exceptions.

Requirements. PA4: Multi-thread File Downloader Page 1. Assignment

Lecture 19 Programming Exceptions CSE11 Fall 2013

COMP-202 Unit 9: Exceptions

Exceptions. Examples of code which shows the syntax and all that

Exporting Contacts and Updating the Contacts in Your New Vision Source Address

How to Import a Text File into Gorilla 4

Projeto de Software / Programação 3 Tratamento de Exceções. Baldoino Fonseca/Márcio Ribeiro

Sri Vidya College of Engineering & Technology Question Bank

BBM 102 Introduction to Programming II Spring 2017

Text. Text Actions. String Contains

PIC 20A Streams and I/O

Introduction to Computer Science II (ITI 1121) Final Examination

Streams and File I/O

A variable is a name for a location in memory A variable must be declared

CIS 120 Final Exam May 3, Name (printed): Pennkey (login id):

I/O in Java I/O streams vs. Reader/Writer. HW#3 due today Reading Assignment: Java tutorial on Basic I/O

Chapter 15. Exception Handling. Chapter Goals. Error Handling. Error Handling. Throwing Exceptions. Throwing Exceptions

Pace University. Fundamental Concepts of CS121 1

Server Extensions Developer Guide

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

We now start exploring some key elements of the Java programming language and ways of performing I/O

CS 61B Discussion 5: Inheritance II Fall 2014

CSE 143 Au04 Midterm 2 Sample Solution Page 1 of 7

Programming Languages and Techniques (CIS120)

Lab 5: Java IO 12:00 PM, Feb 21, 2018

CS1092: Tutorial Sheet: No 3 Exceptions and Files. Tutor s Guide

Intro to Computer Science II. Exceptions

What is the purpose of exceptions and exception handling? Vocabulary: throw/raise and catch/handle Exception propagation Java checked and unchecked

Today. Book-keeping. File I/O. Subscribe to sipb-iap-java-students. Inner classes. Debugging tools

Exceptions and Error Handling

Starting Out with Java: From Control Structures Through Objects Sixth Edition

File IO. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 20

Wentworth Institute of Technology. Engineering & Technology WIT COMP1000. Java Basics

VARIABLES, DATA TYPES,

Exceptions and Working with Files

CS159. Nathan Sprague

Information Packet Trinity University ACM High School Programming Competition 2011

Programming Languages and Techniques (CIS120)

Vendor: Oracle. Exam Code: 1Z Exam Name: Java SE 7 Programmer I. Version: Demo

LibraryWorld.com. Getting Started Guide

//Initializes the variables that will hold the eventual final new versions of the string String pluralextension; String secondwordextension;

Exploring the Microsoft Access User Interface and Exploring Navicat and Sequel Pro, and refer to chapter 5 of The Data Journalist.

Transcription:

Programming Assignment Comma Separated Values Reader Page 1 Assignment What to Submit 1. Write a CSVReader that can read a file or URL that contains data in CSV format. CSVReader provides an Iterator for reading CSV data one line at a time. 2. The reader should provide the constructor and methods described below. 3. Use the package ku.util for this class. Commit your project to Github Classroom, using the repository that Github creates for you when you accept this assignment (invitation sent by email). The repository name is csvreader-your_github_id. The repository is private. Description Comma Separated Values (CSV) is a standard format for data exchange. Yahoo and Google Mail address books, Excel worksheets, and almost any database can import or export data in CSV format. Each line of text in a CSV file contains the data for one record or "row" from the application. Yahoo and Google Mail use CSV to import and export contacts. When you export your Yahoo Address Book it creates a file like this: Comma separates fields First line contains the names of the fields "First","Middle","Last","Nickname","Email","Messenger ID","Phone" "Taksin","","Shinawat","Pres.","taksin@ais.co.th","taksin","1175" "Tsuyoshi","","Abe","Pedro","pedro@asn.co.jp","pedro","+02-4854-2222" "Donald","John","Trump","Pres.","president@whitehouse.gov","@Trump", Fields are separated by a comma and each field value is placed in quote marks, although quotes are not required for CSV (unless the field value contains the separator character (comma)). Fields with no data are output as empty quotes. The first line contains the field names, and the remaining lines are your contacts (of course). The field names (first line) makes the data easy to import. For example, Gmail will figure out what data is in the file using field names on the first line. Microsoft Excel or LibreOffice can read a worksheet as CSV or save it as CSV. When you open a CSV file, a dialog will ask you what the separator char is, whether there are quotes, and other details. If you save a worksheet as CSV, the format looks similar to this: StudentID,Firstname,Lastname,Github Username 5910545639,Kanchanok,Kannee,mailtoy 5910545647,Kwankaew,Uttama,ploykwan 5910545655,Jiranan,Patrathamakul,jiranan-pat 5910545663,Charin,Tantrakul,kaizofaria 5910545671,Chawakorn,Suphepre,winChawakorn 5910545680,Triwith,Mutitakul,famefryer Each line contains one row from the spreadsheet. By default, it doesn't put quotes around fields, but you can specify quotes as an option. - 1 -

Programming Assignment Comma Separated Values Reader Page 2 Requirements for CSVReader 1. CSVReader reads data from an InputStream, a file, or a URL. It splits each line into fields using a delimiter character (the default is comma) and returns the values as an array of Strings. It has 2 constructors as described below. 2. CSVReader implements Iterator. The next( ) method returns one line of data as an array of String. 3. Each line of input data may contain a different number of fields, and some fields may be empty! CSVReader next( ) returns an array exactly matching the number of fields on the current source line, including empty fields. For example: first,second,third,fourth this,line,has,,,6 fields (4 fields) (6 fields, but 2 fields are empty) 4. If a field is blank then return an empty String for that field, not a null. For example, this line: Santa,Claus,,"I love Christmas" would be parsed as ["Santa", "Claus", "", "I love Christmas"] 5. Remove any whitespace (space or tab characters) at the start or end of field data. If a field begins with a double quote char (") then also remove the quote character from the beginning and end of the field. Quotes around field values are optional (as in the example above). Don't remove whitespace that is inside quote characters. For example, "inner ", "space" the first value should be "inner ". 5. Skip blank lines and comment lines. Skip any line containing only whitespace. Also skip lines where the first non-space character is #. Lines beginning with # are comment lines (by convention). 6. If application calls next() when there is no more input data, the next() method should throw a NoSuchElementException. 7. CSVReader can read the input source only once and must not attempt to store the entire input in memory! Only buffer and process one line at a time, and only do it when necessary. The input may be contain millions of lines. 8. CSVReader should not print anything on System.out. Not even error messages! 9. Do not use Scanner. You can use InputStreamReader, BufferedReader, StringTokenizer, or other classes from the Java API. 10. The default delimiter between fields is comma (',') but the user can change this at any time by calling setdelimiter(char). Example: This input is a file named "sample.csv" FIRST,LAST,EMAIL,TELEPHONE "Harry", "Potter", harry@wizards.com,086-9999999 Magic, Owl,, 12345678, In BlueJ Codepad we would see > CSVReader csv = new CSVReader("sample.csv"); > csv.hasnext( ) true ["FIRST", "LAST", "EMAIL", "TELEPHONE"] // array of Strings ["Harry", "Potter", "harry@wizards.com", "086-9999999"] // the line has 4 fields. Note that it removed extra space. - 2 -

Programming Assignment Comma Separated Values Reader Page 3 // the quote marks in the CSV data are NOT part of the Strings! > csv.hasnext() true ["Magic", "Owl", "", "12345678", ""] // Because of the trailing "," this line has 5 fields but the // last field is empty. > csv.hasnext( ) false java.util.nosuchelementexception at CSVReader.next(...) // throws exception because there is no more data. // This is part of the specification of Iterator.next() - 3 -

Programming Assignment Comma Separated Values Reader Page 4 CSVReader CSVReader( instream: InputStream ) CSVReader( name: String ) close( ): void getdelimiter( ): char hasnext( ): boolean next( ): String[ ] setdelimiter( delim: char ): void T :: String[ ] <<interface>> java.util.iterator hasnext( ): boolean next( ): T remove( ): void Method Descriptions CSVReader( InputStream ) Constructor for a CSVReader that reads from an InputStream. The parameter value must be a valid InputStream (not null). CSVReader( String name ) void close( ) setdelimiter(char c) char getdelimiter( ) boolean hasnext( ) String[] next( ) Constructor with a filename or URL to read data from. If the string matches the syntax for a URL (see below), then the URL is open for input. Otherwise it is treated as the name of a local file and opened for input. A URL should begin with a protocol name (at least 2 letters to distinguish it from a Windows disk drive letter), followed by "//" and a path: protocol://path For example: "http://se.cpe.ku.ac.th/students.csv" or "ftp://somehost/file.txt". You can even specify a local file as a URL: "file:///c:/temp/filename.txt". Exceptions: May throw FileNotFoundException, SecurityException, or MalformedURLException. See the constructor for FileInputStream(String) and URL(String) for explanation of when and why these exceptions are thrown. Close the input source (InputStream, file, or URL). Set the delimiter character for separating the input line into fields. The default delimiter is a comma character. Return the current delimiter character. Return true if there is more data in the input, otherwise return false. Return an array of Strings containing the next line of CSV data from the input, separated into fields. 1. In the input, fields are separated by a delimiter character (default is comma) and may be surrounded by quotation marks, as in the example above. - 4 -

Programming Assignment Comma Separated Values Reader Page 5 void remove( ) Exceptions Thrown 2. If any fields are empty, the array returned by next should have a zero length String for that element of the array (not a null). 3. If any fields begin/end with quotation marks such as "Red Dog" then remove the quotation marks from beginning and end of the field data. Also remove space from the beginning and end of the field. 4. The length of the array returned by next should exactly match the number of fields (including empty fields) in the data itself. Don't make the array larger than the data. Throws NoSuchElementException if no data. Does nothing. Leave it empty. The next( ) methods may throw NoSuchElementException if there is no data to return. The constructors may throw several kinds of checked exceptions. The methods never throw checked exceptions, but may throw a RuntimeException if there are any errors. See below for how to "wrap" an I/O exception in a RuntimeException. Programming Hints 1. For the constructor with String parameter, you need to distinguish between a local file and the name of a URL. Open URLs using the URL class. For local files, use classloader.getresourceasstream. If the file name is given as an absolute path you can also open it using FileInputStream. To recognize URLs, use a pattern that the URL string must match. The String class has a e matches method for testing if a String matches a pattern written as a regular expression. Regular expressions have a common syntax that is used in all modern programming languages. "protocol://location/path". string.matches( ) uses For example: // Match 2 word characters (\w\w) or more (+) followed by colon and then //, followed by anything except whitespace (\\S+). Must use capital "\S" to mean "not whitespace". String URLPATTERN = "^\\w\\w+://\\s+"; String name = "http://somewhere.com/filename"; name.matches(urlpattern) // true name = "file:///c:/temp/filename.csv"; name.matches(urlpattern) // true name = "C:/temp/filename.csv"; name.matches(urlpattern) // false 2. The Java URL class makes it easy to read from a URL. You create a new URL object and then call openstream() to create an InputStream connected to the URL. 3. Test your code early and often. Write the constructors and see if you can simply read one line and print it! Try all cases: InputStream, local file, and URL. Then write hasnext and next to read the file line-by-line and return the entire line using next. When that works, try splitting the line into fields as required. 4. Don't use Scanner to split an input line into fields. Scanner is slow. Better ways are: (a) Read lines using a BufferedReader to read lines of input and use a StringTokenizer to split the String - 5 -

Programming Assignment Comma Separated Values Reader Page 6 (b) Read lines using a BufferedReader to read lines of input and use String.split( ) (c) Read the input line into an array of characters and split it yourself using a loop. Process each character in the array looking for the delimiter character. Create Strings from characters using a StringBuilder. If you do it this way you can handle fields that contain the delimiter character inside of quotation marks "like,this" (the way Excel and CSV toolkits do). 5. Don't Read Any Input in the Constructor. Reading data should be performed by hasnext (that's the only way to check if there really is a next line). 6. The methods of CSVReader should not throw checked exceptions. We don't want to require users of CSVReader to write try - catch around all their code, so the CSVReader class will catch exceptions and "rethrow" them as a RuntimeException, which is unchecked (the user does not have to use try - catch). Many application frameworks do this. Here is how to catch an IOException and "wrap" it in a RuntimeException. /** Read one line from the input and return as a String. */ public String readoneline(bufferedreader reader) { try { String line = reader.readline(); return line; } catch ( IOException ex ) { // this code "wraps" the IOException in a RuntimeException throw new RuntimeException( ex.getmessage(), ex ); } } 7. As usual, write good Javadoc. You should document all exceptions thrown by methods and constructors. The Javadoc syntax for this is: /** * Initial a new CSVReader for reading from a file or URL. * @param filename blah, blah, blah * @throws FileNotFoundException if the file does not exist, * is not a regular file, or cannot be opened * @throws IOException (write it yourself. See FileInputStream for example) * @throws MalformedURLException if string matches a URL but is not valid */ 8. Write the code yourself. Don't use a CSV library. Don't use your friend's code. Sample Data and Tests Will be posted online. - 6 -