CS 111: Program Design I Lecture 15: Objects, Pandas, Modules. Robert H. Sloan & Richard Warner University of Illinois at Chicago October 13, 2016

Similar documents
CS 111: Program Design I Lecture 15: Modules, Pandas again. Robert H. Sloan & Richard Warner University of Illinois at Chicago March 8, 2018

CS 111: Program Design I Lecture 14: Encodings & Files concluded; Pandas, Modules, legal data analytics

Lecture 9: Exam I Review

CS 111: Program Design I Lecture 21: Network Analysis. Robert H. Sloan & Richard Warner University of Illinois at Chicago April 10, 2018

CS 111: Program Design I Lecture 16: Module Review, Encodings, Lists

CS 111: Program Design I Lecture #26: Heat maps, Nothing, Predictive Policing

CS 111: Program Design I Lecture # 7: First Loop, Web Crawler, Functions

CSE 111 Bio: Program Design I Lecture 17: software development, list methods

CS 111: Program Design I Lecture 19: Networks, the Web, and getting text from the Web in Python

CS 11 C track: lecture 1

Chapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Python Programming: An Introduction to Computer Science

Lecture 1: Introduction and Strassen s Algorithm

n Some thoughts on software development n The idea of a calculator n Using a grammar n Expression evaluation n Program organization n Analysis

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0

CSE 111 Bio: Program Design I Class 11: loops

Python Programming: An Introduction to Computer Science

CMPT 125 Assignment 2 Solutions

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Computers and Scientific Thinking

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

CS 111: Program Design I Lecture 20: Web crawling, HTML, Copyright

ENGR Spring Exam 1

Abstract. Chapter 4 Computation. Overview 8/13/18. Bjarne Stroustrup Note:

Data Structures and Algorithms. Analysis of Algorithms

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CS 111: Program Design I Lecture 5: US Law when others have encryption keys; if, for

Pattern Recognition Systems Lab 1 Least Mean Squares

Ones Assignment Method for Solving Traveling Salesman Problem

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

CS 111: Program Design I Lecture 18: Web and getting text from it

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

CS 111 Green: Program Design I Lecture 27: Speed (cont.); parting thoughts

Weston Anniversary Fund

Arithmetic Sequences

Chapter 3 Classification of FFT Processor Algorithms

Overview. Chapter 18 Vectors and Arrays. Reminder. vector. Bjarne Stroustrup

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

IMP: Superposer Integrated Morphometrics Package Superposition Tool

Module 8-7: Pascal s Triangle and the Binomial Theorem

CSE 417: Algorithms and Computational Complexity

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

ENGR 132. Fall Exam 1

Location Steps and Paths

Exercise 6 (Week 42) For the foreign students only.

1.2 Binomial Coefficients and Subsets

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

top() Applications of Stacks

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Package RcppRoll. December 22, 2014

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

CS 111: Program Design I Lecture # 7: Web Crawler, Functions; Open Access

Math Section 2.2 Polynomial Functions

CS 111: Program Design I Lecture 25: Social networks, nothingness. Robert H. Sloan & Richard Warner University of Illinois at Chicago April 24, 2018

How do we evaluate algorithms?

Solution printed. Do not start the test until instructed to do so! CS 2604 Data Structures Midterm Spring, Instructions:

10/23/18. File class in Java. Scanner reminder. Files. Opening a file for reading. Scanner reminder. File Input and Output

Numerical Methods Lecture 6 - Curve Fitting Techniques

Recursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames

The number n of subintervals times the length h of subintervals gives length of interval (b-a).

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

3. b. Present a combinatorial argument that for all positive integers n : : 2 n

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Descriptive Statistics

Solutions to Final COMS W4115 Programming Languages and Translators Monday, May 4, :10-5:25pm, 309 Havemeyer

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

. Written in factored form it is easy to see that the roots are 2, 2, i,

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Floristic Quality Assessment (FQA) Calculator for Colorado User s Guide

condition w i B i S maximum u i

Appendix D. Controller Implementation

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Analysis of Algorithms

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

Lecture 5. Counting Sort / Radix Sort

CS : Programming for Non-Majors, Summer 2007 Programming Project #3: Two Little Calculations Due by 12:00pm (noon) Wednesday June

ENGR 132. Fall Exam 1 SOLUTIONS

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Alpha Individual Solutions MAΘ National Convention 2013

EE123 Digital Signal Processing

Analysis of Algorithms

The isoperimetric problem on the hypercube

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

CSE 2320 Notes 8: Sorting. (Last updated 10/3/18 7:16 PM) Idea: Take an unsorted (sub)array and partition into two subarrays such that.

Improving Template Based Spike Detection

Overview Chapter 12 A display model

Chapter 6. I/O Streams as an Introduction to Objects and Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

COP4020 Programming Languages. Subroutines and Parameter Passing Prof. Robert van Engelen

Chapter 8. Strings and Vectors. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

Chapter 8. Strings and Vectors. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Transcription:

CS 111: Program Desig I Lecture 15: Objects, Padas, Modules Robert H. Sloa & Richard Warer Uiversity of Illiois at Chicago October 13, 2016

OBJECTS AND DOT NOTATION

Objects (Implicit i Chapter 2, Variables, & 9.5, Strig Methods of book, but ot explicit aywhere: So pay attetio!) Everythig i Pytho is a object Object combies data (e.g., umber, strig, list) with methods that ca act o that object

Methods Methods: like (special case of) fuctio but ot globally accessible Caot call method just by givig its ame, the way we call prit(), ope(), abs(), type(), rage(), etc. Method: fuctio that ca oly be accessed through a object Usig dot otatio

Dot otatio To call method, use dot otatio: object_ame.method() Strig example: >>> test= "This is my test strig" >>> test.upper() 'THIS IS MY TEST STRING'

If o is object of type havig method do_it where do_it eeds a iput i additio to o, ad x is defied, what is the proper way to call do_it? A. do_it(x) B. do_it(o, x) C. o.do_it(x) D. o.do_it(o, x)

methods cotiued >>> test.fid("my") 8 >>> 42.upper() Sytax Error: ivalid sytax upper(test) barf

Methods deped o type of object scdb.head() prits out 5 rows because head() is a method of objects of type Padas dataframe, which is type of scdb object "test strig".head() gives back a error because head is ot a method of strigs

Methods' importace Uderstadig key data types depeds o uderstadig their methods We saw may methods for strigs We have used the apped method for lists, ad will come back to more list methods file referece methods write(), read(), readlie(), readlies() Padas dataframe methods head(), tail(), etc.

Whe you get to CS 341 & 342 Or if you kow Java or C++ ow methods are a Object Orieted (OO) cocept I our CS 111 We do eed to kow the basics of dot otatio ad methods We will otherwise be igorig OO, ad takig primarily a procedural approach

PANDAS (FROM ANOTHER ANGLE)

Padas: What ad Why High performace way to work with large dataframes Dataframe: The 2-d data structure most familiar from Excel spreadsheets, ofte with a header row Padas built to play icely with matplotlib for plottig (ad icidetally NumPy ad Scikit- Lear for machie learig)

Why Padas ad ot Excel Excel ot desiged for workig with large datasets Chicago Crimes to 2008 to preset file: 1.04 millio rows, 18 colums Ope file i Pytho: Istataeous padas.read_csv(): 8 secs (Sloa s 2013 laptop) Ope file i Excel: several miutes Just resize oe colum for better viewig: 5-30 sec

Why Padas ad ot Excel (2) Excel allows you to say/do/compute whatever is built ito Excel Pytho is geeral purpose programmig laguage: Ca say/do/compute aythig wat, ot limited to the fuctios Microsoft provides i Excel Geeky fie poit: Aythig that ca be doe with a computer. There are ucomputable problems (theory of computatio CS 301, maybe special lecture i this class if time at ed. Not really issue i data aalytics)

Padas data types Most importat: dataframe, which we are gettig from padas.read_csv() 2-d array, with colum headers Series: 1-d array, e.g., oe colum of a dataframe, is secod most importat

Dataframe idexig frame[columame] returs series from colum with ame columame Givig the []s a list of ames selects those colums i list order. E.g., scdb[["justicename","chief","docketid"]] Other idexig:.iloc,.loc (also others we wo't cover) Special case is that specifically a slice idex to whole frame will slice by rows for coveiece because it's a commo operatio, but icosistet with overall Padas sytax

Dataframe positioal slicig: iloc.iloc for 100% positioal idexig ad slicig with usual Pytho 0 to legth-1 umberig (stads for "iteger locatio") Argumets for both dimesios separated by comma [rows, cols]: frame.iloc[:3, :4] upper left 3 rows/4 cols frame.iloc[:, :3] all rows, first 3 cols Oe argumet: rows frame.iloc[3:6] secod 3 rows fame.iloc[42] 42 d row

Dataframe label idexig:.loc Use.loc to access by labels, or mix selectio list will put colums i list's order; selectio set i {}s origial dataframe order scdb.loc[3:6, {'docketid', 'chief', 'justicename'}] Rows 3 through 6 iclusive, colums i scdb's order scdb.loc[3:6, ['docketid', 'chief', 'justicename']] Rows 3 through 6 iclusive, colums i order ['docketid', 'chief', 'justicename'] Notice loc uses slices iclusive of both eds, ulike all the rest of Pytho & Padas.loc with oly slices: error (e.g., foo.loc[3:6, 2:4])

Dataframe ad series methods head(): returs sub-dataframe (top rows) or for series, first etries tail(): same, bottom rows With o argumet they default to 5 rows; ca give positive iteger argumet for umber of rows cout(): For series, returs umber of values (excludig missig, NaN, etc.) For dataframe, returs series, with cout of each colum, labeled by colum

Dataframe ad series methods (cot.) abs, max, mea, media, mi, mode, sum All behave like cout, except will give errors if data types do't support the operatio E.g., a series of strigs does retur good aswer with.max() method (based o alphabetical order), but caot take.media()

plottig Both DataFrame ad Series have a plot() method (as do may other Padas types) Must have loaded Pytho's plottig module, because Padas is makig use of it: import matplotlib.pyplot as plt Default is Series makes a lie graph; DataFrame makes oe lie graph per colum, ad labels each lie by colum labels

100% Optioal: Aside for graph geeks Optioal for fu: To chage style of your plot: import matplotlib matplotlib.style.use('fivethirtyeight') # OR matplotlib.style.use('ggplot') # R style Out of the box, it's Matlab style, which some folks like a lot

.plot() method Needs o argumets Has optioal argumets such as kid:.plot(kid='bar') for bar graphs May others icludig 'hist' for histogram 'box' for box with whiskers 'area' for stacked area plots 'scatter' for scatter plots 'pie' for pie charts

.plot() x ad y argumets If you have dataframe but wat oe colum as x values ad oe as y values, ca use optioal argumet(s) df.plot(x='year') Plot all colums but 'Year" as lie graphs agaist x beig the Year colum

Brief demo: Chi murder rate by year import matplotlib.pyplot as plt from matplotlib import style import padas f = ope("chicago murders to 2012HeaderRows.csv", "r") df = padas.read_csv(f) plt.io()

groupby(label) method Idea: split dataframe ito groups that all have same value i colum amed label. E.g., grouped = scdb.groupby("justicename") grouped has may of same methods, idexig optios as a dataframe grouped.cout() à dataframe with 60 colums (all but justice ame) ad 1 row per justicename grouped["docketid"] selects out that colum we plotted grouped["docketid"].cout() it has a plot() method

A series ad series groupby method uiue() is a method of true series, where it gives the umber of distict values i the series (a umber) uiue() is a method of a series-like groupby object, where it gives a true series: How may were i each group.

PYTHON STANDARD LIBRARY & BEYOND: MODULES

Extedig Pytho Every moder programmig laguage has way to exted basic fuctios of laguage with ew oes Pytho: importig a module module: Pytho file with ew capabilities defied i it Oe you import module, it's as if you typed it i: you get all fuctios, objects, variables defied it immediately

Pytho Stadard Library Pytho always comes with big set of modules List at https://docs.pytho.org/3/py-modidex.html Examples csv datetime math os radom urllib Read/write csv files Basic date & time types Math stuff (e.g., si(), cos()) E.g., list files i your operatig system radom umber geeratio Ope URLs, parse URLs

Usig Modules Use import <module_ame> to make module's fuctio's available Style: Put all import statemets at top of file After import module_ame, access its fuctios (ad variables, etc.) through module_ame.fuctio_ame If module_ame is log, ca abbreviate i import with as: import module_ame as m m.fuctio_ame

If you prefer to save typig (I mostly do ot do this) To access fuctio_ame without havig to type module_ame prefix, use: from module_ame import fuctio_ame

Commo but ot stadard matplotlib ad padas are ot part of set of modules that must come with every Pytho 3 matplotlib is very, very widely used, ad padas is widely used Both are amog the may modules that come with the Aacoda distributio of Pytho 3