CS 111: Program Desig I Lecture 15: Objects, Padas, Modules Robert H. Sloa & Richard Warer Uiversity of Illiois at Chicago October 13, 2016
OBJECTS AND DOT NOTATION
Objects (Implicit i Chapter 2, Variables, & 9.5, Strig Methods of book, but ot explicit aywhere: So pay attetio!) Everythig i Pytho is a object Object combies data (e.g., umber, strig, list) with methods that ca act o that object
Methods Methods: like (special case of) fuctio but ot globally accessible Caot call method just by givig its ame, the way we call prit(), ope(), abs(), type(), rage(), etc. Method: fuctio that ca oly be accessed through a object Usig dot otatio
Dot otatio To call method, use dot otatio: object_ame.method() Strig example: >>> test= "This is my test strig" >>> test.upper() 'THIS IS MY TEST STRING'
If o is object of type havig method do_it where do_it eeds a iput i additio to o, ad x is defied, what is the proper way to call do_it? A. do_it(x) B. do_it(o, x) C. o.do_it(x) D. o.do_it(o, x)
methods cotiued >>> test.fid("my") 8 >>> 42.upper() Sytax Error: ivalid sytax upper(test) barf
Methods deped o type of object scdb.head() prits out 5 rows because head() is a method of objects of type Padas dataframe, which is type of scdb object "test strig".head() gives back a error because head is ot a method of strigs
Methods' importace Uderstadig key data types depeds o uderstadig their methods We saw may methods for strigs We have used the apped method for lists, ad will come back to more list methods file referece methods write(), read(), readlie(), readlies() Padas dataframe methods head(), tail(), etc.
Whe you get to CS 341 & 342 Or if you kow Java or C++ ow methods are a Object Orieted (OO) cocept I our CS 111 We do eed to kow the basics of dot otatio ad methods We will otherwise be igorig OO, ad takig primarily a procedural approach
PANDAS (FROM ANOTHER ANGLE)
Padas: What ad Why High performace way to work with large dataframes Dataframe: The 2-d data structure most familiar from Excel spreadsheets, ofte with a header row Padas built to play icely with matplotlib for plottig (ad icidetally NumPy ad Scikit- Lear for machie learig)
Why Padas ad ot Excel Excel ot desiged for workig with large datasets Chicago Crimes to 2008 to preset file: 1.04 millio rows, 18 colums Ope file i Pytho: Istataeous padas.read_csv(): 8 secs (Sloa s 2013 laptop) Ope file i Excel: several miutes Just resize oe colum for better viewig: 5-30 sec
Why Padas ad ot Excel (2) Excel allows you to say/do/compute whatever is built ito Excel Pytho is geeral purpose programmig laguage: Ca say/do/compute aythig wat, ot limited to the fuctios Microsoft provides i Excel Geeky fie poit: Aythig that ca be doe with a computer. There are ucomputable problems (theory of computatio CS 301, maybe special lecture i this class if time at ed. Not really issue i data aalytics)
Padas data types Most importat: dataframe, which we are gettig from padas.read_csv() 2-d array, with colum headers Series: 1-d array, e.g., oe colum of a dataframe, is secod most importat
Dataframe idexig frame[columame] returs series from colum with ame columame Givig the []s a list of ames selects those colums i list order. E.g., scdb[["justicename","chief","docketid"]] Other idexig:.iloc,.loc (also others we wo't cover) Special case is that specifically a slice idex to whole frame will slice by rows for coveiece because it's a commo operatio, but icosistet with overall Padas sytax
Dataframe positioal slicig: iloc.iloc for 100% positioal idexig ad slicig with usual Pytho 0 to legth-1 umberig (stads for "iteger locatio") Argumets for both dimesios separated by comma [rows, cols]: frame.iloc[:3, :4] upper left 3 rows/4 cols frame.iloc[:, :3] all rows, first 3 cols Oe argumet: rows frame.iloc[3:6] secod 3 rows fame.iloc[42] 42 d row
Dataframe label idexig:.loc Use.loc to access by labels, or mix selectio list will put colums i list's order; selectio set i {}s origial dataframe order scdb.loc[3:6, {'docketid', 'chief', 'justicename'}] Rows 3 through 6 iclusive, colums i scdb's order scdb.loc[3:6, ['docketid', 'chief', 'justicename']] Rows 3 through 6 iclusive, colums i order ['docketid', 'chief', 'justicename'] Notice loc uses slices iclusive of both eds, ulike all the rest of Pytho & Padas.loc with oly slices: error (e.g., foo.loc[3:6, 2:4])
Dataframe ad series methods head(): returs sub-dataframe (top rows) or for series, first etries tail(): same, bottom rows With o argumet they default to 5 rows; ca give positive iteger argumet for umber of rows cout(): For series, returs umber of values (excludig missig, NaN, etc.) For dataframe, returs series, with cout of each colum, labeled by colum
Dataframe ad series methods (cot.) abs, max, mea, media, mi, mode, sum All behave like cout, except will give errors if data types do't support the operatio E.g., a series of strigs does retur good aswer with.max() method (based o alphabetical order), but caot take.media()
plottig Both DataFrame ad Series have a plot() method (as do may other Padas types) Must have loaded Pytho's plottig module, because Padas is makig use of it: import matplotlib.pyplot as plt Default is Series makes a lie graph; DataFrame makes oe lie graph per colum, ad labels each lie by colum labels
100% Optioal: Aside for graph geeks Optioal for fu: To chage style of your plot: import matplotlib matplotlib.style.use('fivethirtyeight') # OR matplotlib.style.use('ggplot') # R style Out of the box, it's Matlab style, which some folks like a lot
.plot() method Needs o argumets Has optioal argumets such as kid:.plot(kid='bar') for bar graphs May others icludig 'hist' for histogram 'box' for box with whiskers 'area' for stacked area plots 'scatter' for scatter plots 'pie' for pie charts
.plot() x ad y argumets If you have dataframe but wat oe colum as x values ad oe as y values, ca use optioal argumet(s) df.plot(x='year') Plot all colums but 'Year" as lie graphs agaist x beig the Year colum
Brief demo: Chi murder rate by year import matplotlib.pyplot as plt from matplotlib import style import padas f = ope("chicago murders to 2012HeaderRows.csv", "r") df = padas.read_csv(f) plt.io()
groupby(label) method Idea: split dataframe ito groups that all have same value i colum amed label. E.g., grouped = scdb.groupby("justicename") grouped has may of same methods, idexig optios as a dataframe grouped.cout() à dataframe with 60 colums (all but justice ame) ad 1 row per justicename grouped["docketid"] selects out that colum we plotted grouped["docketid"].cout() it has a plot() method
A series ad series groupby method uiue() is a method of true series, where it gives the umber of distict values i the series (a umber) uiue() is a method of a series-like groupby object, where it gives a true series: How may were i each group.
PYTHON STANDARD LIBRARY & BEYOND: MODULES
Extedig Pytho Every moder programmig laguage has way to exted basic fuctios of laguage with ew oes Pytho: importig a module module: Pytho file with ew capabilities defied i it Oe you import module, it's as if you typed it i: you get all fuctios, objects, variables defied it immediately
Pytho Stadard Library Pytho always comes with big set of modules List at https://docs.pytho.org/3/py-modidex.html Examples csv datetime math os radom urllib Read/write csv files Basic date & time types Math stuff (e.g., si(), cos()) E.g., list files i your operatig system radom umber geeratio Ope URLs, parse URLs
Usig Modules Use import <module_ame> to make module's fuctio's available Style: Put all import statemets at top of file After import module_ame, access its fuctios (ad variables, etc.) through module_ame.fuctio_ame If module_ame is log, ca abbreviate i import with as: import module_ame as m m.fuctio_ame
If you prefer to save typig (I mostly do ot do this) To access fuctio_ame without havig to type module_ame prefix, use: from module_ame import fuctio_ame
Commo but ot stadard matplotlib ad padas are ot part of set of modules that must come with every Pytho 3 matplotlib is very, very widely used, ad padas is widely used Both are amog the may modules that come with the Aacoda distributio of Pytho 3