examples from first year calculus (continued), file I/O, Benford s Law

Similar documents
File I/O, Benford s Law, and sets

Fundamentals of Programming (Python) File Processing. Sina Sajadmanesh Sharif University of Technology Fall 2017

Fundamentals of Programming (Python) File Processing. Ali Taheri Sharif University of Technology Spring 2018

Files on disk are organized hierarchically in directories (folders). We will first review some basics about working with them.

CSCA20 Worksheet Working with Files

MATH 1MP3 Homework #4 Due: 11:59pm, Wednesday, March 6.

Using Files. Wrestling with Python Classes. Rob Miles

FILE HANDLING AND EXCEPTIONS

File Operations. Working with files in Python. Files are persistent data storage. File Extensions. CS111 Computer Programming

CS150 - Final Cheat Sheet

APT Session 2: Python

File Operations. Working with files in Python. Files are persistent data storage. File Extensions. CS111 Computer Programming

1 Modules 2 IO. 3 Lambda Functions. 4 Some tips and tricks. 5 Regex. Sandeep Sadanandan (TU, Munich) Python For Fine Programmers May 30, / 22

Python File Modes. Mode Description. Open a file for reading. (default)

CSC148 Fall 2017 Ramp Up Session Reference

Teaching London Computing

LECTURE 4 Python Basics Part 3

Introduction to programming using Python

4.7 Approximate Integration

Level 3 Computing Year 2 Lecturer: Phil Smith

20.5. urllib Open arbitrary resources by URL

Math 1MP3, midterm test February 2017

Python Tutorial. Day 2

MULTIPLE CHOICE. Chapter Seven

CSc 120 Introduction to Computer Programing II Adapted from slides by Dr. Saumya Debray

Interactive use. $ python. >>> print 'Hello, world!' Hello, world! >>> 3 $ Ctrl-D

CME 193: Introduction to Scientific Python Lecture 4: Strings and File I/O

Python review. 1 Python basics. References. CS 234 Naomi Nishimura

Announcements COMP 141. Writing to a File. Reading From a File 10/18/2017. Reading/Writing from/to Files

Python Working with files. May 4, 2017

CSc 120. Introduc/on to Computer Programing II. 01- b: Python review. Adapted from slides by Dr. Saumya Debray

#11: File manipulation Reading: Chapter 7

CS101 - Text Processing Lecture 8

CIS192 Python Programming

CIS192 Python Programming

CS 2316 Exam 1 Spring 2014

EXAMINATIONS 2012 MID-YEAR NWEN 241 SYSTEMS PROGRAMMING. The examination contains 5 questions. You must answer ALL questions

CS112 Spring 2012 Dr. Kinga Dobolyi. Exam 2. Do not open this exam until you are told. Read these instructions:

Python 3 Quick Reference Card

CS 3100 Models of Computation Fall 2011 Assignment 7, Posted on: 10/7. Due by 11/1/11 midnight

CIS192 Python Programming

Network Programming in Python. What is Web Scraping? Server GET HTML

Reading and writing files

Lab 7: Reading Files, Importing, Bigram Function. Ling 1330/2330: Computational Linguistics Na-Rae Han

Python 2: Loops & Data Input and output 1 / 20

Slide Set 15 (Complete)

Chapter 6: Files and Exceptions. COSC 1436, Summer 2016 Dr. Ling Zhang 06/23/2016

Black Problem 2: Huffman Compression [75 points] Next, the Millisoft back story! Starter files

f( x ), or a solution to the equation f( x) 0. You are already familiar with ways of solving

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

By default, it is assumed that we open a file to read data. So the above is equivalent to the following:

(b) If a heap has n elements, what s the height of the tree?

Python source materials

CME 193: Introduction to Scientific Python Lecture 4: File I/O and Classes

CIS192 Python Programming

PLEASE HAND IN UNIVERSITY OF TORONTO Faculty of Arts and Science

Exercise: The basics - variables and types

Introduction to programming Lecture 4: processing les and counting words

Previously. Iteration. Date and time structures. Modularisation.

PROGRAMMING, DATA STRUCTURES AND ALGORITHMS IN PYTHON

PLEASE HAND IN UNIVERSITY OF TORONTO Faculty of Arts and Science

PLEASE HAND IN UNIVERSITY OF TORONTO Faculty of Arts and Science

In this lab you will write a program that asks the user for the name of a file, and then prints a concordance for that file.

Introduction to Computer Programming for Non-Majors

Graphing Polynomial Functions: The Leading Coefficient Test and End Behavior: For an n th n. degree polynomial function a 0, 0, then

Chapter 10: File Input / Output

MATH 54 - LECTURE 4 DAN CRYTSER

DM550/DM857 Introduction to Programming. Peter Schneider-Kamp

Regular expressions and case insensitivity

Introduction Programming Using Python Lecture 8. Dr. Zhang COSC 1437 Fall 2017 Nov 30, 2017

EXAMINATION REGULATIONS and REFERENCE MATERIAL for ENCM 339 Fall 2017 Section 01 Final Examination

Part A. What is the output of the following code segments? Write the output to the right. Note that there is only output for the print statements.

Chapter 3 Limits and Derivative Concepts

Chapter 6: Files and Exceptions. COSC 1436, Spring 2017 Hong Sun 3/6/2017

Python Tutorial. Day 1

Interactive use. $ python. >>> print 'Hello, world!' Hello, world! >>> 3 $ Ctrl-D

Files. Reading from a file

Files. CSE 1310 Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington

2.1 Indefinite Loops. while <condition>: <body> rabbits = 3 while rabbits > 0: print rabbits rabbits -= 1

CSCI 195 Intro to Programming w/python Parsing the Bible Text into a Python data structure

One minute responses. Not really sure how the loop, for, while, and zip work. I just need more practice problems to work on.

CMSC 201 Spring 2016 Lab 08 Strings and File I/O

File I/O in Flex Scanners

Bisecting Floating Point Numbers

Announcements for this Lecture

Machine Learning and String Manipulation 11/2/2017

Lecture 5: Python PYTHON

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Python Essential Reference, Second Edition - Chapter 5: Control Flow Page 1 of 8

5 File I/O, Plotting with Matplotlib

Introduction to Web Scraping with Python

File Processing. CS 112: Introduction to Programming: File Processing Sequence. File Processing. File IO

CMSC201 Computer Science I for Majors

Question 1. Part (a) Part (b) December 2013 Final Examination Marking Scheme CSC 108 H1F. [13 marks] [4 marks] Consider this program:

C mini reference. 5 Binary numbers 12

Parallelizing data processing on cluster nodes with the Distributor

Finishing Regular Expressions & XML / Web Scraping

Web Clients and Crawlers

Converting File Input

1 Fall 2017 CS 1110/1111 Exam 3

Transcription:

examples from first year calculus (continued), file I/O, Benford s Law Matt Valeriote 5 February 2018

Grid and Bisection methods to find a root Assume that f (x) is a continuous function on the real numbers. Suppose that a < b, (f (a) < 0 and f (b) > 0) or (f (b) < 0 and f (a) > 0). By the Intermediate Value Theorem, there is some number c in between a and b with f (c) = 0. The Grid and the Bisection methods can be used to approximate such a number c. (Note that there may be more than one root of f in the interval between a and b).

Grid Method: Break the interval [a, b] into n subintervals of equal sizes, having endpoints x 0 = a, x 1,..., x n 1, x n = b. Compute f (x 0 ), f (x 1 ),..., f (x n ). Find the index i such that f (x i ) is closest to 0 and use this to approximate a root of f in the interval [a, b]. Project: Create a function grid_search(f, a, b, n) that implements the grid method.

Bisection Method: Bisect the interval [a, b] into two equal subintervals [a, m], [m, b], where m = (a + b)/2. If f (a) and f (m) have opposite signs, then there will be a root in [a, m]. Otherwise, there will be a root in [m, b]. Bisect this subinteral ([a, m] in the former case, [m, b] in the latter), and continue bisecting until the subinterval is small. A root of f will be located in this small subinterval. Project: Create a function bisect(f, a, b, tol) that approximates a root of f in the interval [a, b] with an error of at most tol.

Benford s law Benford s law describes the (surprising) distribution of first digits of many different sets of numbers. Read it about it on Wikipedia or MathWorld Benford s law states that in listings, tables of statistics, etc., the digit 1 tends to occur with probability ~30%, much greater than the expected 11.1% (i.e., one digit out of 9). Benford s law can be observed, for instance, by examining tables of logarithms and noting that the first pages are much more worn and smudged than later pages (Newcomb 1881). We ll write a Python function benford_count that tabulates the occurrence of digits from a set of numbers But where do we get our numbers from?

file I/O A bunch of ways to read a file Reference f = open(filename, mode) (mode = r for reading, w for writing, a for appending) - text by default; returns a file object f = open("test.txt", 'r') print(f) print(f.closed) f.close() print(f.closed) To free up system resources, files should be closed after no longer needed.

If a file is located in another directory, its path must be provided in the open statement. f = open('/users/matt/documents/temp/temp.txt', 'r') f.close() Once a file has been opened, its entire contents can be read into as a string using the read() method on the associated file object. f = open('test.txt', 'r') contents = f.read() print(contents) print(repr(contents)) f.close() Note that each line of the file ends in the newline character \n.

The read method can be used to read a specified number of characters from the file. f.read(10) will read the next 10 characters from the file. If the end of the file has been reached, this will return '', the empty string. the file object keeps track of how much has been read. f = open('test.txt', 'r') print('first 10 characters of the file:') print(f.read(10)) print('next 6 characters (this includes the end of line character `\\n`:') print(f.read(6)) print("this is the rest of the file:") contents = f.read() print(contents) f.close()

The lines of the file can be stored in a list of strings using the readlines() method. f = open('test.txt') lines = f.readlines() # print the list of lines. # Note that each line ends with a new line character \n. print(lines) print(lines[2]) f.close() A common way to process a file is with a for loop to read its lines: f = open("test.txt") for L in f: print(l) L2 = L.upper() print(l2) f.close()

More I/O details To get rid of pesky newlines and leading and trailing whitespace use the strip() method: f = open('test.txt') for L in f: print(repr(l)) print(l.strip()) f.close() To break a string (or line) into words, use the split() method for strings: f = open("test.txt") L = f.readline() ## read one line LL = L.strip().split(" ") print(ll)

And even more import os in order to find out working directory (os.getcwd()), or set the directory os.chdir(newpath); use full path or use (e.g.).. to go up one level. opening a URL from the web import urllib.request as ur; ur.urlopen(url) # open up a webpage using the urllib package import urllib.request as ur #open the 1mp3 webpage web_page = ur.urlopen('https://ms.mcmaster.ca/ ~matt/1mp3.html') lines = web_page.readlines() print(lines[25])

Benford s Law Project: Create a python function benford_count that tabulates the occurrence of leading digits from a set of numbers. The set of numbers should be read in from a file. The filename should be an argument to the function. The function should return a tuple digits_count of length 10 (one for each digit) such that digits_count[i] is equal to the number of occurrences of the digit i as the leading digit of some number from the file, for i = 1, 1,..., 9. So digits_count[2] is the count of the number of times the digit 2 occurred as the leading digit of some number from the file. Note that the digit 0 doesn t occur as the leading digit of a number, except for the number 0, so we really don t need to count the number of 0 s that occur. A better way to handle this is with a python dictionary object.

File Format In order to properly process a data file, we need to make some assumption about its format. We will assume that each line contains a number of words, for example, the name of a town, followed by its province/state and country, etc., and that the last word will be a number that represents the size of the quantities being counted in the file. Steps: 1. Initialize a digits_count list of length 10. 2. Open the file 3. For each line in the file, 3.1. retrieve the last word from the line 3.2. If it is a string of digits that doesn t start with 0, 3.2.1. get the leading digit and update digits_count. 4. return tuple(digits_count).

Other considerations A source for some interesting data sets to use is: http://www.testingbenfordslaw.com/. Modify the code so that it will process a list of files. Improve the code so that it can correctly deal with a general number format of the form: 123,456,789.123456... Enable the program to directly access suitable files from the internet. the file that we created contains several function definitions. This file could be imported by other files in order to make use of the functions. To prevent the code at the bottom of the file to be executed when the file is imported, first test to see if the interpreter s name for the file is main. This will be the case when the file itself is being executed and not imported. Use a python dictionary instead of a tuple or list to keep track of the digit counts.