University of Texas at Arlington, TX, USA

Similar documents
What we already know. more of what we know. results, searching for "This" 6/21/2017. chapter 14

Chapter 6. Files and Exceptions I

PROGRAMMING, DATA STRUCTURES AND ALGORITHMS IN PYTHON

Introduction to File Systems

10/7/15. MediaItem tostring Method. Objec,ves. Using booleans in if statements. Review. Javadoc Guidelines

Introduc)on to Compu)ng. Heng Sovannarith

Opera&ng Systems ECE344

University of Texas at Arlington, TX, USA

DATA STRUCTURE AND ALGORITHM USING PYTHON

Lecture 4: Build Systems, Tar, Character Strings

5. MATLAB I/O 1. Beyond the Mouse GEOS 436/636 Jeff Freymueller, Sep 26, The Uncomfortable Truths Well, hop://xkcd.com/568 (April 13, 2009)

Programming Environments

Document Databases: MongoDB

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

File Processing. CS 112: Introduction to Programming: File Processing Sequence. File Processing. File IO

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson

S Understand more about variable scope, in the context of function calls and recursion. Practice reading and writing files.

Files. CSE 1310 Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington

Lecture 9: Potpourri: Call by reference vs call by value Enum / struct / union Advanced Unix

Segmentation with Paging. Review. Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Segmentation with Page (MULTICS)

The Practice of Computing Using PYTHON

Overview CSE 143. Input and Output. Streams. Other Possible Kinds of Stream Converters. Stream after Stream... CSE143 Wi

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

Before Reading Week. Lists. List methods. Nested Lists. Looping through lists using for loops. While loops

File I/O in Python Formats for Outputs CS 8: Introduction to Computer Science, Winter 2018 Lecture #12

Overview CSE 143. Data Representation GREAT IDEAS IN COMPUTER SCIENCE

Overview CSE 143. Data Representation GREAT IDEAS IN COMPUTER SCIENCE. Representation of Primitive Java Types. CSE143 Au

Overview CSE 143. Data Representation GREAT IDEAS IN COMPUTER SCIENCE. Representation of Primitive Java Types. CSE143 Sp

Files on disk are organized hierarchically in directories (folders). We will first review some basics about working with them.

File System Interface. ICS332 Operating Systems

W1005 Intro to CS and Programming in MATLAB. Brief History of Compu?ng. Fall 2014 Instructor: Ilia Vovsha. hip://

This exam has 10 pages including the title page. Please check to make sure all pages are included.

Intro. Scheme Basics. scm> 5 5. scm>

File System. Preview. File Name. File Structure. File Types. File Structure. Three essential requirements for long term information storage

CS1110 Lab 1 (Jan 27-28, 2015)

CS101: Fundamentals of Computer Programming. Dr. Tejada www-bcf.usc.edu/~stejada Week 1 Basic Elements of C++

Use JSL to Scrape Data from the Web and Predict Football Wins! William Baum Graduate Sta/s/cs Student University of New Hampshire

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

The Right Read Optimization is Actually Write Optimization. Leif Walsh

December Copyright 2018 Open Systems Holdings Corp. All rights reserved.

Reading and writing files

UNIX Sockets. COS 461 Precept 1

Preview. COSC350 System Software, Fall

SCHEME 7. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. October 29, 2015

Overview CSE 143. Data Representation GREAT IDEAS IN COMPUTER SCIENCE

Introduc)on to Stata. Training Workshop on the Commitment to Equity Methodology CEQ Ins;tute and The Ministry of Finance Accra February 7-10, 2017

NFS 3/25/14. Overview. Intui>on. Disconnec>on. Challenges

Introduction to Linux

Chapter 6: Files and Exceptions. COSC 1436, Summer 2016 Dr. Ling Zhang 06/23/2016

CSE 1310: Introduction Mariottini UT Arlington

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

NFS. CSE/ISE 311: Systems Administra5on

From Raw Data to Beau.ful Graph Using JSL Michael Hecht, SAS Ins.tute Inc., Cary, NC

Welcome to CS 449: Introduc3on to System So6ware. Instructor: Wonsun Ahn

ICS Principles of Operating Systems

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Re#ring Your Old Computer. Created by Sherry Surdam

Lecture 8: Memory Management

CSE Opera*ng System Principles

Objec+ves. Review. Basics of Java Syntax Java fundamentals. What are quali+es of good sooware? What is Java? How do you compile a Java program?

Fundamentals of Programming (Python) File Processing. Ali Taheri Sharif University of Technology Spring 2018

Typical File Extensions File Structure

Introduction to OS. File Management. MOS Ch. 4. Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1

What is Stata? A programming language to do sta;s;cs Strongly influenced by economists Open source, sort of. An acceptable way to manage data

Comp 151. More on working with Data

CS 351 Exam 3, Fall 2011

Part 1. Introduction to File Organization

Connecting to ICS Server, Shell, Vim CS238P Operating Systems fall 18

Principles of Operating Systems

Chapter 11: File System Implementation. Objectives

CS-537: Final Exam (Spring 2011) The Black Box

CS Lab 8. Part 1 - Basics of File I/O

File Systems: Allocation Issues, Naming, and Performance CS 111. Operating Systems Peter Reiher

COS 318: Operating Systems. File Systems. Topics. Evolved Data Center Storage Hierarchy. Traditional Data Center Storage Hierarchy

CONTAINERIZING JOBS ON THE ACCRE CLUSTER WITH SINGULARITY

Basic OS Progamming Abstrac7ons

Object Oriented Programming. Feb 2015

Objec0ves. Gain understanding of what IDA Pro is and what it can do. Expose students to the tool GUI

Basic OS Progamming Abstrac2ons

File Operations. Working with files in Python. Files are persistent data storage. File Extensions. CS111 Computer Programming

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Introduc)on to Axure RP Pro

CS 2316 Exam 2 Practice ANSWER KEY

User manual of STYLE WiFi Connec7on and Opera7on of imos STYLE app. (ios & Android version)

File Systems. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse]

CS108 Lecture 09: Computing with Text Reading and writing files. Aaron Stevens 6 February Overview/Questions

Mul$media Techniques in Android. Some of the informa$on in this sec$on is adapted from WiseAndroid.com

Lecture 3 The character, string data Types Files

Chapter 5. File and Memory Management

Outline. gzip and gunzip data compression archiving files and pipes in Unix. format conversions encrypting text

Basics of Stata, Statistics 220 Last modified December 10, 1999.

INTERNAL REPRESENTATION OF FILES:

CS 465 Final Review. Fall 2017 Prof. Daniel Menasce

Course work. Today. Last lecture index construc)on. Why compression (in general)? Why compression for inverted indexes?

Convenient way to deal large quantities of data. Store data permanently (until file is deleted).

Flushing the Cache Termination of Caching Interactions with the I/O Manager The Read-Ahead Module Lazy-Write Functionality

CS 245 Midterm Exam Winter 2014

CS Introduction to Programming Fall 2016

SOEE1160: Computers and Programming in Geosciences Semester /08. Dr. Sebastian Rost

Fundamental File Processing Operations 2. Fundamental File Processing Operations

Transcription:

Dept. of Computer Science and Engineering University of Texas at Arlington, TX, USA

A file is a collec%on of data that is stored on secondary storage like a disk or a thumb drive. Accessing a file means establishing a connec7on between the file and the program and moving data between the two (like a pipe!)

Files come in two general types: Text files: files where control characters such as / n are translated. These are generally human readable Binary files: all the informa7on is taken directly without transla7on. Not readable and contains non readable info.

When opening a file, you create a file object or file stream that is a connec-on between the file informa-on (on disk) and the program. The stream contains a buffer of the informa%on from the file, and provides the informa7on to the program

Reading from a disk is very slow. The computer reads a lot of data from a file: if you need the data in the future, it will be buffered in the file object. This means that the file object contains a copy of informa7on from the file called a cache (pronounced cash ).

myfile = open( myfile.txt, r ) myfile is the file object. It contains the buffer of informa7on. The open func7on creates the connec%on between the disk file and the file object. The first quoted string is the file name on disk, the second is the mode to open it (here, r means to read).

When opened, the name of the file can come in one of two forms: file.txt assumes the file name is file.txt, and it is located in the current program directory. c:\cse1310\file.txt is the fully qualified file name and includes the directory informa7on.

r is to read as a text file. w is to write as a text file. Wipes the contents of the file if there is any, creates file otherwise. a is append, adds to the end of an exis7ng file. b is a modifier, indica7ng a binary file. No character transla7on is done. + is a modifier, indica7ng both read and write. With r+, file must exist. With a+ appended to the file.

Be careful if you open a file with the w mode. It sets an exis7ng file s contents to be empty, destroying any exis7ng data. The a mode is nicer, allowing you to write to the end of an exis7ng file without changing the exis7ng contents.

If you are interac7ng with text files (which is all we will do for this semester), remember that everything is a string: everything read is a string if you write to a file, you can only write a string

Once you have a file object: fileobject.read() Reads the en%re contents of the file as a string and returns it. It can take an op7onal argument integer to limit the read to N bytes, that is fileobject.read(n). fileobject.readline() Reads the first line and jumps to the next one. for line in fileobject: Iterator to go through the lines of a file.

When done, you close the file. Closing is important because the informa7on in the fileobject buffer is flushed out of the buffer and into the file on disk, making sure that no informa7on is lost. fileobject.close()

for line in file( filetoread.txt ): print line File is automa7cally opened (by file( )). File is automa7cally closed at the end of the for loop. Defaults are read and text.

Once opened, you can write to a file (if the mode is appropriate): fileobject.write(s) writes the string s to the file fileobject.writelines(list) write a list of strings (one at a 7me) to the file

# File reading and wri-ng infile = open("input.txt", "r") outfile = open("output.txt", "w") oneline = infile.readline() print oneline # this gets printed in the Python shell for line in infile: outfile.write(line) # this gets wri=en to the output file infile.close() outfile.close()

Each opera7ng system (Windows, OS X, Linux) developed certain standards for represen7ng text. In par7cular, they chose different ways to represent the end of a file, the end of a line, etc. This can confuse our text readers!

To get around this, Python provides a special file op%on to deal with varia%ons of OS text encoding. The U op7on means that Python deals with the problem so you don t have to! fileobj = open( myfile.txt, ru )

Every file maintains a current file posi7on. It is the current posi7on in the file and indicates what the file will read next

When the disk file is opened, the contents of the file are copied into the buffer of the file object. Think of the file object as a very big list where every index is one of the pieces of informa7on of the file. The current posi7on is the present index in that list.

The tell() method tells you the current file posi7on. The posi7ons are in bytes (think characters for ASCII) from the beginning of the file: fd.tell() => 42L The Prac7ce of Compu7ng Using Python, Punch, Enbody, 2011 Pearson Addison Wesley. All rights reserved

The seek() method updates the current file posi%on to where you like (in bytes offset from the beginning of the file): fd.seek(0) # to the beginning of the file fd.seek(100) # 100 bytes from beginning

Coun7ng bytes is a pain. Seek has an addi7onal (op7onal) argument set: 0: count from the beginning 1: count for the current file posi7on 2: count from the end (backwards)

The spreadsheet is a very popular, and powerful, applica7on for manipula7ng data. Its popularity means there are many companies that provide their own version of the spreadsheet. It would be nice if those different versions could share their data

A basic approach to share data is the comma separated value (CSV) format: it is a text format, accessible to all apps each line (even if blank) is a row in each row, each value is separated from the others by a comma (even if it is blank) cannot capture complex things like formula

Name,Exam1,Exam2,Final Exam,Overall Grade Bill,75.00,100.00,50.00,75.00 Fred,50.00,50.00,50.00,50.00 Irving,0.00,0.00,0.00,0.00 Monty,100.00,100.00,100.00,100.00 Average,,,,56.25

As simple as that sounds, even CSV format is not completely universal: different apps have small varia%ons Python provides a module to deal with these varia7ons called the csv module. This module allows you to read spreadsheet info into your program.

Import the csv module. Open the file normally, crea7ng a file object. Create an instance of a csv reader, used to iterate through the file just opened: Itera7ng with the reader object yields a row as a list of strings.

import csv fobj = open('workbook1.csv','ru') csvreader = csv.reader(fobj) #Instance of csv reader for row in csvreader: print row fobj.close() >>> ['Name', 'Exam1', 'Exam2', 'Final Exam', 'Overall Grade'] ['Bill', '75.00', '100.00', '50.00', '75.00'] ['Fred', '50.00', '50.00', '50.00', '50.00'] ['Irving', '0.00', '0.00', '0.00', '0.00'] ['Monty', '100.00', '100.00', '100.00', '100.00'] [] ['Average', '', '', '', '56.25'] >>>

Much the same, except: the opened file must have write w enabled. the method is writerow, and it takes a list of strings to be wripen as a row.

The os module in Python is an interface between the opera7ng system and the Python language. As such, it has many sub func7onali7es dealing with various aspects. We will look mostly at the file related stuff.

Whether in Windows, Linux or on OS X, all OS s maintain a directory structure. A directory is a container of files or of other Directories. These directories are arranged in a hierarchy or tree.

Odd trees in CS! It has a root node with branch nodes, ending in leaf nodes. It is upside down. The directory structure is a tree. /code /bill /python / /fred Root Branches Leaves

Directories can be organized into a hierarchy with the root directory and subsequent branch and leaf directories. Each directory can hold files and point to parent and children directories.

A path to a file is a path through the hierarchy to the node that contains a file /CSE1310/python/code/myCode.py path is from the root node /, to the CSE1310 directory, to the python directory, to the code directory where the file mycode.py resides

Think of / as an operator, indica7ng that something is a directory. /bill / /fred Follow the path the leaf is either a directory or a file. /code /python

A valid path string for Python is a string which indicates a valid path in the directory structure. Thus /Users/CSE1310/python/code.py is a valid path string.

It turns out that each OS has its own way of specifying a path: C:\CSE1310\python\myFile.py /Users/CSE1310/python/myFile.py Nicely, Python knows that and translates to the appropriate OS.

The directory name. is a shortcut for the name of the current directory you are in as you traverse the directory tree. The directory name.. is a shortcut for the name of the parent directory of the current directory you are in.

os.getcwd(): Returns the full path of the current working directory. os.chdir(pathstring): Change the current directory to the path provided. os.listdir(pathstring): Return a list of the files and directories in the path (including. ).

os.rename(sourcepathstr, destpathstr): Renames a file or directory. os.mkdir(pathstr): make a new directory. So os.mkdir( /Users/bill/python/new ) creates the directory new under the directory python. os.remove(pathstr). Removes the file. os.rmdir(pathstr). Removes the directory, but the directory must be empty.