File Operations. Working with files in Python. Files are persistent data storage. File Extensions. CS111 Computer Programming

Similar documents
File Operations. Working with files in Python. Files are persistent data storage. File Extensions. CS111 Computer Programming

This document provides a concise, introductory lesson in HTML formatting.

CSCA20 Worksheet Working with Files

Starting chapter 5. l First open file, and say purpose read or write inputfile = open('mydata.txt', 'r') outputfile = open('myresults.

CIS192 Python Programming

CIS192 Python Programming

What we already know. more of what we know. results, searching for "This" 6/21/2017. chapter 14

Chapter 6: Files and Exceptions. COSC 1436, Spring 2017 Hong Sun 3/6/2017

CIS192 Python Programming

Chapter 6: Files and Exceptions. COSC 1436, Summer 2016 Dr. Ling Zhang 06/23/2016

What is XHTML? XHTML is the language used to create and organize a web page:

CMSC 201 Fall 2016 Lab 09 Advanced Debugging

Text University of Bolton.

DATA STRUCTURE AND ALGORITHM USING PYTHON

CS 1803 Pair Homework 4 Greedy Scheduler (Part I) Due: Wednesday, September 29th, before 6 PM Out of 100 points

Textbook. Topic 8: Files and Exceptions. Files. Types of Files

CS 2316 Individual Homework 4 Greedy Scheduler (Part I) Due: Wednesday, September 18th, before 11:55 PM Out of 100 points

CMSC201 Computer Science I for Majors

CS Programming Languages: Python

Introduction to Computation for the Humanities and Social Sciences. CS 3 Chris Tanner

Consolidation and Review

Exceptions and File I/O

Loop structures and booleans

Chapter 9: Dealing with Errors

Python Tutorial. Day 2

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

COM110: Lab 2 Numeric expressions, strings, and some file processing (chapters 3 and 5)

Files on disk are organized hierarchically in directories (folders). We will first review some basics about working with them.

Introduction to Computer Programming for Non-Majors

CS Homework 4 Employee Ranker. Due: Wednesday, February 8th, before 11:55 PM Out of 100 points. Files to submit: 1. HW4.py.

File Input/Output. Learning Outcomes 10/8/2012. CMSC 201 Fall 2012 Instructor: John Park Lecture Section 01. Discussion Sections 02-08, 16, 17

Introduction to C Programming

#11: File manipulation Reading: Chapter 7

Creating Word Outlines from Compendium on a Mac

Python Essential Reference, Second Edition - Chapter 5: Control Flow Page 1 of 8

CME 193: Introduction to Scientific Python Lecture 4: Strings and File I/O

Relational Model, Key Constraints

List Comprehensions, Data Analysis & Sorting

Introduction to python

Intro. Scheme Basics. scm> 5 5. scm>

MULTIPLE CHOICE. Chapter Seven

LECTURE 4 Python Basics Part 3

CS Homework 4 Lifeguard Employee Ranker. Due: Tuesday, June 3rd, before 11:55 PM Out of 100 points. Files to submit: 1. HW4.py.

CS1210 Lecture 28 Mar. 27, 2019

Seema Sirpal Delhi University Computer Centre

Previously. Iteration. Date and time structures. Modularisation.

Introduction to Python programming, II

Python review. 1 Python basics. References. CS 234 Naomi Nishimura

Iterators & Generators

Introduction to Web Development

CS 115 Exam 2 (Section 1) Spring 2017 Thu. 03/31/2017

Part III Appendices 165

GIS 4653/5653: Spatial Programming and GIS. More Python: Statements, Types, Functions, Modules, Classes

CS2304: Python for Java Programmers. CS2304: Sequences and Collections

Understanding the Web Design Environment. Principles of Web Design, Third Edition

Python I. Some material adapted from Upenn cmpe391 slides and other sources

CS116 - Module 10 - File Input/Output

Task 1. Set up Coursework/Examination Weights

comma separated values .csv extension. "save as" CSV (Comma Delimited)

Software Design and Analysis for Engineers

Interactive use. $ python. >>> print 'Hello, world!' Hello, world! >>> 3 $ Ctrl-D

Announcements. 1. Class webpage: Have you been reading the announcements? Lecture slides and coding examples will be posted

Pace University. Fundamental Concepts of CS121 1

Introduction to Python Code Quality

1 Strings (Review) CS151: Problem Solving and Programming

List Processing Patterns and List Comprehension

a b c d a b c d e 5 e 7

Announcements. 1. Class webpage: Have you been reading the announcements? Lecture slides and coding examples will be posted

Week One: Introduction A SHORT INTRODUCTION TO HARDWARE, SOFTWARE, AND ALGORITHM DEVELOPMENT

examples from first year calculus (continued), file I/O, Benford s Law

Professor Hugh C. Lauer CS-1004 Introduction to Programming for Non-Majors

Year 8 Computing Science End of Term 3 Revision Guide

Previously. Iteration. Date and time structures. Modularisation.

Alastair Burt Andreas Eisele Christian Federmann Torsten Marek Ulrich Schäfer. October 6th, Universität des Saarlandes. Introduction to Python

Exceptions & a Taste of Declarative Programming in SQL

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

Python Programming: An Introduction to Computer Science

CS 115 Exam 2 (Section 1) Fall 2017 Thu. 11/09/2017

Interactive use. $ python. >>> print 'Hello, world!' Hello, world! >>> 3 $ Ctrl-D

C H A P T E R 1. Introduction to Computers and Programming

Programming with Python

CMSC 201 Spring 2016 Lab 08 Strings and File I/O

ENGR 102 Engineering Lab I - Computation

Dictionaries. Looking up English words in the dictionary. Python sequences and collections. Properties of sequences and collections

National 5 Computing Science Software Design & Development

List Processing Patterns and List Comprehension

The Big Python Guide

Slide Set 15 (Complete)

Grading for Assignment #1

Internet: An international network of connected computers. The purpose of connecting computers together, of course, is to share information.

CIS192 Python Programming

Human-Computer Interaction Design

Lecture #12: Quick: Exceptions and SQL

TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA

Lecture. Loops && Booleans. Richard E Sarkis CSC 161: The Art of Programming

File I/O, Benford s Law, and sets

Exercise 1 Using Boolean variables, incorporating JavaScript code into your HTML webpage and using the document object

CSE 115. Introduction to Computer Science I

for loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

[301] JSON. Tyler Caraza-Harter

Transcription:

File Operations Files are persistent data storage titanicdata.txt in PS06 Persistent vs. volatile memory. The bit as the unit of information. Persistent = data that is not dependent on a program (exists outside it). CS111 Computer Programming Department of Computer Science Wellesley College Measuring information The unit of information is the bit. Since one character can be represented in 8 bits (or one byte), byte has become the unit for measuring information stored in a file. 1 B (byte) = 8 bits 1 kb = 1000 B 1 MB = 1.000.000 B 1 GB = 1.000.000.000 B A computer has two kinds of storage: volatile memory (i.e., the RAM) and persistent memory (i.e, the hard disk). The RAM memory is faster, but more expensive, so current computers have 8-16 GB of it, while a hard disk is slower, but cheaper, so current computers have up to 1-2 TB. Files 17-2 File Extensions Working with files in Python The built-in function open to create file objects. By convention, file extensions (e.g.,.pdf) indicate to people the content of a file. Computer programs use other means to detect file content. Here is a list of files we will work with today and in the future: txt plain text file, readable with a text editor. Lines separated by '\n'. csv comma separated values. Simple text, but can be read by spreadsheet applications as well, e.g., Microsoft Excel. json JavaScript object notation: data structured in lists, dictionaries, or a combination of thereof, to avoid text processing. Can be viewed as text, and loaded directly into Python with the module json. html hypertext markup language. A text file, but interpretable by a browser to display content as specified by markup tags. css cascading style sheets. A text file containing rules in a special language, used to style web pages. Also interpretable by a browser. Files 17-3 Until now, our data have been stored in variables. However, the universal way of storing data is in a file, which is persistent storage and can be used across applications. In Python, open creates and returns a file object In [ ]: myfile = open('thesis.txt', 'r') In [ ]: print myfile <open file 'thesis.txt', mode 'r' at 0x10bc075d0> In [ ]: type(myfile) Out[ ]: file A file can be opened for reading ('r'), writing ('w'), or appending ('a'). We cannot write or append in a file opened for reading. The most important methods for a file object are: read, readline, readlines, and write, that we ll cover today. Files 17-4

Preferred syntax: with as It s easy to forget to close a file. This usually isn t too bad when reading a file, but can be disastrous when writing a file (the contents may not actually be written until the file is closed!) Python s with as notation for files implicitly closes a file, even if an error occurs within the file operations. 'r' read mode; 'w' write mode f = open(filename, 'r') f.close() f = open(filename, 'w') f.close() How to open files with the notation with as with open(filename, 'r') as f: # f implicitly closed # when with is done. with open(filename, 'w') as f: # f implicitly closed # when with is done. Files 17-5 Reading one line with readline # reads one line at a time with open('', 'r') as inputfile: oneline = inputfile.readline() In []: oneline Out[]: '\n' In []: another = inputfile.readline() ValueError: I/O operation on closed file Interpret the first line as saying: open the file '' for reading and assign the variable inputfile to it, so that we can access its content. The indented statement is calling the method readline on the file object inputfile to read only the first line and save it in the variable oneline. Afterwards, the file is closed, which we can notice if we try to use readline again on the inputfile. Files 17-6 Reading all lines with readlines # reads all lines at once with open('', 'r') as inputfile: alllines = inputfile.readlines() In []: alllines Out[]: ['\n', '\n', '\n', '\n'] The method readlines returns a list of all lines in a file. Each line is a string terminated by the newline character '\n'. These newline characters are visible in the Out[] cell, but not if we print each line. They are also not visible in the text file as well (see green box). Important: Use readlines only when a file is not too large. This is because it will read all content and store it in the RAM memory (which, as you read in slide 17-2 is limited). Files 17-7 Reading all text with read # reads all lines at once with open('', 'r') as inputfile: alltext = inputfile.read() In []: alltext Out[]: '\n\n\n\n' In []: moretext = inputfile.read() ValueError: I/O operation on closed file The method read returns all the content of the file as a single string. Notice how the returned string contains the newline characters. Those are the ones that allow the two other methods readline and readlines to know how to recognize lines. In this case too, if we try to access inputfile outside the with as block, we ll get an error about operations with a closed file. Files 17-8

Reading with a list comprehension Function linesfromfile in readtitatic.py (PS06) def linesfromfile(filename): '''Returns a list of all lines in the given file. In each line, the terminating newline has been removed. ''' with open(filename, 'r') as inputfile: # open the file strippedlines = [line.strip() for line in inputfile] return strippedlines Within a list comprehension that uses a for loop to read the content of a file, we don t need to call explicitly any of the three methods that we saw. A file object is an iterator, it knows how to iterate over its elements, which are the lines denoted by the newline character. This is our preferred method for reading from a file. The string method strip removes the newline character and all white space around the line. Within a for loop, there is no need to explicitly call the three read methods. No explicit method with the file object. Files 17-9 Writing Files A file can be created (or opened) for writing by providing the argument 'w', signifying write mode. When writing files, the syntax with as is very important, because forgetting to close a file has consequences. with open('memories.txt', 'w') as memfilew: memfilew.write('buy milk\n') memfilew.write('do CS111 homework\n') memfilew.write('watch Sherlock\n') At this point, the file named memories.txt is stored persistently in the file system with the following contents: buy milk do CS111 homework watch Sherlock Writing to a file that is opened for writing. The second argument 'w' is what opens the file in writing mode. The strings to write in the file contain the newline character '\n' as their last character to denote that a new line should start in the file. The file memfilew is closed automatically. Files 17-10 Appending to files How do we add lines to an existing file? We can t open the file in write mode (with a 'w'), because that erases all previous contents and starts with an empty file. Instead, we open the file in append mode (with an 'a'). Any subsequent writes are made after the existing contents: with open('memories.txt', 'a') as memfilea: memfilea.write('win Nobel prize\n') memfilea.write('eat big sundae\n') Now the file memories.txt has the contents: buy milk do CS111 homework watch Sherlock win Nobel prize eat big sundae If a file does not already exist, opening it in append mode creates an empty file. Exception Handling with try/except Misspelled filename: An exception is an error detected during execution of a program. New Python keywords: try and except to catch exceptions. memories = linesfromfile('memory.txt') : : '. ' When part of a program raises an exception (e.g., code for reading a file), it is often better to catch and handle the exception rather than have the program terminate abruptly with an error message (e.g., if a file doesn t exist). In Python, exceptions can be caught and handled with try/except statements. # Lines of code that may raise an exception except ErrorType: # Lines to execute when exception # with type ErrorType is raised Files 17-11 Files 17-12

Exception Handling Examples memories = linesfromfile('memory.txt') except IOError: memories = [] # Use empty list in this case To remember We can use try/except whenever we anticipate errors coming from sources that are related to human input (humans often make mistakes). a = 0 x = 8/a print x except ZeroDivisionError: print 'Do not divide by zero.' while True: i = int(raw_input('please enter an integer: ')) print 'Good, you entered', i break # Python keyword to exit a loop except ValueError: print 'Not a valid integer. Try again... ' Files 17-13 Reading tuples from CSV files def tuplesfromfile(filename): '''Read each line from opened file, strip white space, split at commas, convert as tuple and return a list of tuples. ''' with open(filename, 'r') as inputfile: thetuples = [tuple(line.strip().split(',')) for line in inputfile] return thetuples In the file students.csv, the first element of each line is a name, the second is a year, the third is a house name, and the third a section number. students.csv A CSV file, all values are separated by commas. It s a common way to store table information. Harry Potter,2020,,1 Hermione Granger,2020,,2 Luna Lovegood,2021,,3 Fred Weasley,2018,,1 Cho Chang,2019,,2 Cedric Diggory,2018,,3 Draco Malfoy,2020,,1 Files 17-14 Making dictionaries out of tuples # read a list of tuples from the CSV file allstudents = tuplesfromfile('students.csv') # Create a new list by mapping each student tuple # to a student dictionary. studentdictlist = [ {'name': st[0], 'year': st[1], 'dorm': st[2], 'section': st[3]} for st in allstudents ] In []: studentdictlist[0] Out[]: {'dorm': '', 'name': 'Harry Potter', 'section': '1', 'year': '2020'} Storing each student as a dict improves the readability of code by using key names instead of indices. List comprehensions become easier to write, for example: [ sd['dorm'] for sd in studentdictlist ] [ sd['name'] for sd in studentdictlist if sd['dorm'] == ''] List comprehension to create a list of dictionaries. Files 17-15 How to store lists & dictionaries into a file? Suppose that we updated the dictionaries from 17-15 to contain a list of pset grades for each student as shown below. Can we store this data into a file as it is, preserving the structure of the list of dictionaries? In []: studentdictlist Out[]: [{'dorm': '', 'name': 'Hermione Granger', 'psets': [100, 100, 100, 100], 'section': 2, 'year': 2020}, {'dorm': '', 'name': 'Harry Potter', 'psets': [87, 92, 95, 80], 'section': 1, 'year': 2020}, {'dorm': '','name': 'Luna Lovegood', 'psets': [75, 82, 86, 93], 'section': 3, 'year': 2021},...] One option is to write the entries to a CSV-like file, but: need to handle nested pset scores must worry about any strings containing commas must not only define the file-writing function, but must also write the corresponding file-reading function. Sounds to complicated! Is there a better way? Files 17-16

JSON to the rescue! JSON (JavaScript Object Notation) is a standard format for encoding as a string (possibly nested) lists and dictionaries whose leaves are numbers/strings/booleans. In []: import json In []: with open(hogwartsstudents.json', 'w') as fw:...: json.dump(studentdictlist, fw) In []: with open('hogwartsstudents.json', 'r') as fr:...: hogwartsstudents = json.load(fr) In []: hogwartsstudents == studentdictlist Out[]: True The module json has two functions to deal with files: dump and load. dump is used to write data into a file, and load is used to read data from a file. They both take as an argument a file object created with the builtin function open. Important When storing data into a file, JSON automatically converts them into a format known as unicode, denoted by the letter u in front of a string. Don t worry when you see this letter, Python knows how to deal with it. No need for you to handle it with split. Files 17-17 Test your knowledge 1. We mentioned several file endings in 17-3. What are some others that you know? Have you tried to open those files with applications different from the ones which created them (or that you usually open them)? What have you seen? 2. When developing Python programs, why would we need to read files? Why would we need to write files? 3. What would be some good uses for the file methods readline, readlines, and read? 4. What do the three letters 'r', 'w', 'a'mean? Do you think one of them can be omitted when opening a file? 5. What would happen if we try to write into a file open for reading? 6. How does the method readlines recognize lines, so that it can return a list of all lines? 7. What is the advantage of a JSON file over a CSV or TXT file? 8. Challenge: suppose we have a file open for reading, inputfile. Can you write one line of code to print the total number of words in the file? Files 17-18