DSC 201: Data Analysis & Visualization

Similar documents
DSC 201: Data Analysis & Visualization

DSC 201: Data Analysis & Visualization

DSC 201: Data Analysis & Visualization

Exercise: Introduction to NumPy arrays

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

PYTHON NUMPY TUTORIAL CIS 581

DSC 201: Data Analysis & Visualization

DSC 201: Data Analysis & Visualization

Flow Control: Branches and loops

Problem Based Learning 2018

Numpy fast array interface

COMP1730/COMP6730 Programming for Scientists. Sequence types, part 2

NumPy. Daniël de Kok. May 4, 2017

Part I. Wei Tianwen. A Brief Introduction to Python. Part I. Wei Tianwen. Basics. Object Oriented Programming

PYTHON CONTENT NOTE: Almost every task is explained with an example

Python Tutorial. CS/CME/BioE/Biophys/BMI 279 Oct. 17, 2017 Rishi Bedi

Simple Java Programming Constructs 4

CSC Advanced Scientific Computing, Fall Numpy

NumPy Primer. An introduction to numeric computing in Python

PREPARING FOR THE FINAL EXAM

Overview of OOP. Dr. Zhang COSC 1436 Summer, /18/2017

Introduction to Programming Using Java (98-388)

Announcements. Lab Friday, 1-2:30 and 3-4:30 in Boot your laptop and start Forte, if you brought your laptop

Data Structures (list, dictionary, tuples, sets, strings)

Python I. Some material adapted from Upenn cmpe391 slides and other sources

PYTHON FOR MEDICAL PHYSICISTS. Radiation Oncology Medical Physics Cancer Care Services, Royal Brisbane & Women s Hospital

Problem One: Loops and ASCII Graphics

Abstract Data Types. CS 234, Fall Types, Data Types Abstraction Abstract Data Types Preconditions, Postconditions ADT Examples

Last Name: First: Netid: Section. CS 1110 Final, December 17th, 2014

CS-141 Exam 2 Review November 10, 2017 Presented by the RIT Computer Science Community

Data Structures I: Linked Lists

Data Structures and Algorithms. Chapter 1

Python for Scientists

CS171 Midterm Exam. October 29, Name:

Name CPTR246 Spring '17 (100 total points) Exam 3

Chapter 5 : Informatics Practices. Class XII ( As per CBSE Board) Numpy - Array. New Syllabus Visit : python.mykvs.in for regular updates

About Python. Python Duration. Training Objectives. Training Pre - Requisites & Who Should Learn Python

Programming for Data Science Syllabus

Programming for Engineers in Python

About the Final. Saturday, 7-10pm in Science Center 101. Closed book, closed notes. Not on the final: graphics, file I/O, vim, unix

There are four numeric types: 1. Integers, represented as a 32 bit (or longer) quantity. Digits sequences (possibly) signed are integer literals:

NumPy. Arno Proeme, ARCHER CSE Team Attributed to Jussi Enkovaara & Martti Louhivuori, CSC Helsinki

NumPy quick reference

PREPARING FOR PRELIM 1

HW0 v3. October 2, CSE 252A Computer Vision I Fall Assignment 0

Recall. Key terms. Review. Encapsulation (by getters, setters, properties) OOP Features. CSC148 Intro. to Computer Science

Table of Contents. Preface... xxi

Short Introduction to Python Machine Learning Course Laboratory

At full speed with Python

Swift. Introducing swift. Thomas Woodfin

CS 1110 Final, December 8th, Question Points Score Total: 100

Worksheet 6: Basic Methods Methods The Format Method Formatting Floats Formatting Different Types Formatting Keywords

Text Input and Conditionals

Fundamentals of Programming (Python) Object-Oriented Programming. Ali Taheri Sharif University of Technology Spring 2018

infix expressions (review)

DSC 201: Data Analysis & Visualization

VLC : Language Reference Manual

Python review. 1 Python basics. References. CS 234 Naomi Nishimura

GIS 4653/5653: Spatial Programming and GIS. More Python: Statements, Types, Functions, Modules, Classes

List comprehensions (and other shortcuts) UW CSE 160 Spring 2015

CS 1110 Final, December 8th, Question Points Score Total: 100

CS 1110 Final, December 17th, Question Points Score Total: 100

ENGR 102 Engineering Lab I - Computation

Be careful when deciding whether to represent data as integers or floats, and be sure that you consider all possible behaviors in computation.

Sequence types. str and bytes are sequence types Sequence types have several operations defined for them. Sequence Types. Python

Advanced Algorithms and Computational Models (module A)

Preface... (vii) CHAPTER 1 INTRODUCTION TO COMPUTERS

Data Handing in Python

(DRAFT) PYTHON FUNDAMENTALS II: NUMPY & MATPLOTLIB

SPARK-PL: Introduction

Computer Programming II C++ (830)

Programming in Haskell Aug-Nov 2015

Handling arrays in Python (numpy)

cs1114 REVIEW of details test closed laptop period

Python memento TI-Smart Grids

Lecture 38: Python. CS 51G Spring 2018 Kim Bruce

CMSC201 Computer Science I for Majors

Python Crash Course Numpy, Scipy, Matplotlib

Computer Programming II Python

CircuitPython 101: Working with Lists, Iterators and Generators

Absolute C++ Walter Savitch

CIS192: Python Programming Data Types & Comprehensions Harry Smith University of Pennsylvania September 6, 2017 Harry Smith (University of Pennsylvani

Introduction to NumPy

[CHAPTER] 1 INTRODUCTION 1

CME 193: Introduction to Scientific Python Lecture 5: Object Oriented Programming

Implement NN using NumPy

The Pyth Language. Administrivia

Weiss Chapter 1 terminology (parenthesized numbers are page numbers)

Introducing Python Pandas

Al al-bayt University Prince Hussein Bin Abdullah College for Information Technology Computer Science Department

Data Science with Python Course Catalog

Getting Started. Office Hours. CSE 231, Rich Enbody. After class By appointment send an . Michigan State University CSE 231, Fall 2013

CS Summer 2013

Python for Finance. Control Flow, data structures and first application (part 2) Andras Niedermayer

age = 23 age = age + 1 data types Integers Floating-point numbers Strings Booleans loosely typed age = In my 20s

LECTURE 3 Python Basics Part 2

SCoLang - Language Reference Manual

CSCE 110 Programming I Basics of Python: Variables, Expressions, Input/Output

PREPARING FOR THE FINAL EXAM

Transcription:

DSC 201: Data Analysis & Visualization Arrays Dr. David Koop

Class Example class Rectangle: def init (self, x, y, w, h): self.x = x self.y = y self.w = w self.h = h def set_corner(self, x, y): self.x = x self.y = y def set_width(self, w): self.w = w def set_height(self, h): self.h = h def area(self): return self.w * self.h 2

Inheritance Parentheses after the class name indicate superclass class Rectangle: def init (self, x, y, w, h): self.x = x; self.y = y self.w = w; self.h = h class Square(Rectangle): def init (self, x, y, s): self.x = x; self.y = y self.w = s; self.h = s super() can be used to call the superclass method: class Square(Rectangle): def init (self, x, y, s): super(). init (x,y,s,s) Python allows multiple inheritance (multiple classes separated by commas in the parentheses) 3

Overriding and Overloading Methods class Square(Rectangle): def init (self, x, y, s): super(). init (x,y,s,s) def set_width(self, w): super().set_width(w) super().set_height(w) def set_height(self, h): super().set_width(h) super().set_height(h) Overriding: use the same name No overloading, but with keyword parameters (or by checking types) can accomplish similar results if needed 4

Exercise: Queue Class Write a class to encapsulate queue behavior. It should have five methods: - constructor: should allow a list of initial elements (in order) - size: should return the number of elements - is_empty: returns True if the queue is empty, false otherwise - enqueue: adds an item to the queue - dequeue: removes an item from the queue and returns it 5

Exercise: Stack Class How do we modify this for a stack? - constructor: should allow a list of initial elements (in order) - size: should return the number of elements - is_empty: returns True if the queue is empty, False otherwise - push instead of enqueue: adds an item to the stack - pop instead of dequeue: removes an item from the stack Could we use inheritance? 6

Iterators Remember range, values, keys, items? They return iterators: objects that traverse containers, only need to know how to get the next element Given iterator it, next(it) gives the next element StopIteration exception if there isn't another element Generally, we don't worry about this as the for loop handles everything automatically but you cannot index or slice an iterator d.values()[0] and range(100)[-1] will not work! If you need to index or slice, construct a list from an iterator list(d.values())[0] or list(range(100))[-1] In general, this is slower code so we try to avoid creating lists 7

List Comprehensions Shorthand for transformative or filtering for loops squares = [] for i in range(10): squares.append(i**2) squares = [i**2 for i in range(10)] Equivalent code, just moved the loop inside of list definition Advantages: concise, readable Filtering: squares = [] for i in range(10): if i % 3!= 1: squares.append(i ** 2) squares = [i**2 for i in range(10) if i % 3!= 1] if clause follows the for clause 8

Dictionary Comprehensions Similar idea, but allow dictionary construction dict([(k, v) for k,v in if ]) {k: v for k, v in if } Could do this with a for loop as well 9

Exceptions errors but potentially something that can be addressed try-except: except allows specifications of exactly the error(s) you wish to address finally: always runs (even if the program is about to crash) can also raise exceptions using the raise keyword and define your own 10

Arrays What is the difference between an array and a list (or a tuple)? 11

Arrays Usually a fixed size lists are meant to change size Are mutable tuples are not Store only one type of data lists and tuples can store anything Are faster to access and manipulate than lists or tuples Can be multidimensional: - Can have list of lists or tuple of tuples but no guarantee on shape - Multidimensional arrays are rectangles, cubes, etc. 12

Why NumPy? Fast vectorized array operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations Common array algorithms like sorting, unique, and set operations Efficient descriptive statistics and aggregating/summarizing data Data alignment and relational data manipulations for merging and joining together heterogeneous data sets Expressing conditional logic as array expressions instead of loops with if-elif-else branches Group-wise data manipulations (aggregation, transformation, function application). [W. McKinney, Python for Data Analysis] 13

import numpy as np 14

Notebook https://github.com/wesm/pydata-book/ ch04.ipynb Click the raw button and save that file to disk 15

Creating arrays data1 = [6, 7.5, 8, 0, 1] arr1 = np.array(data1) data2 = [[1,2,3,4],[5,6,7,8]] arr2 = np.array(data2) Number of dimensions: arr2.ndim Shape: arr2.shape Types: arr1.dtype, arr2.dtype, can specify explicitly (np.float64) Zeros: np.zeros(10) Ones: np.ones((4,5)) Empty: np.empty((2,2)) _like versions: pass an existing array and matches shape with specified contents Range: np.arange(15) 16

Types "But I thought Python wasn't stingy about types " numpy aims for speed Able to do array arithmetic int16, int32, int64, float32, float64, bool, object astype method allows you to convert between different types of arrays: arr = np.array([1, 2, 3, 4, 5]) arr.dtype float_arr = arr.astype(np.float64) 17

Operations (Array, Array) Operations (elementwise) - Addition, Subtraction, Multiplication (Scalar, Array) Operations: - Addition, Subtraction, Multiplication, Division, Exponentiation Slicing: - 1D: Just like with lists except data is not copied! a[2:5] = 3 works with arrays a.copy() or a[2:5].copy() will copy - 2D+: comma separated indices as shorthand: a[1][2] or a[1,2] a[1] gives a row a[:,1] gives a column 18

2D Array Slicing [W. McKinney, Python for Data Analysis] 19

Boolean Indexing names == 'Bob' gives back booleans that represent the elementwise comparison with the array names Boolean arrays can be used to index into another array: - data[names == 'Bob'] Can even mix and match with integer slicing Can do boolean operations (&, ) between arrays (just like addition, subtraction) - data[(names == 'Bob') (names == 'Will')] Note: or and and do not work with arrays We can set values too! - data[data < 0] = 0 20