Intermediate/Advanced Python. Michael Weinstein (Day 2)

Similar documents
Intermediate/Advanced Python. Michael Weinstein (Day 1)

Fast numerics in Python - NumPy and PyPy

Ch.1 Introduction. Why Machine Learning (ML)? manual designing of rules requires knowing how humans do it.

Ch.1 Introduction. Why Machine Learning (ML)?

Matplotlib Python Plotting

LECTURE 7: STUDENT REQUESTED TOPICS

CME 193: Introduction to Scientific Python Lecture 1: Introduction

ARTIFICIAL INTELLIGENCE AND PYTHON

HANDS ON DATA MINING. By Amit Somech. Workshop in Data-science, March 2016

DATA STRUCTURES USING C

debugging, hexadecimals, tuples

Introduction to Internet of Things Prof. Sudip Misra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Problem Based Learning 2018

Effective Programming Practices for Economists. 10. Some scientific tools for Python

Episode 8 Matplotlib, SciPy, and Pandas. We will start with Matplotlib. The following code makes a sample plot.

Welcome to Bootcamp2015 s documentation!

: Intro Programming for Scientists and Engineers Final Exam

ENGR 102 Engineering Lab I - Computation

pandas: Rich Data Analysis Tools for Quant Finance

Arrays: Higher Dimensional Arrays. CS0007: Introduction to Computer Programming

CPSC 67 Lab #5: Clustering Due Thursday, March 19 (8:00 a.m.)

Day 15: Science Code in Python

Installation Guide for Python

Introduction to MATLAB. Computational Probability and Statistics CIS 2033 Section 003

Numerical Methods. Centre for Mathematical Sciences Lund University. Spring 2015

Warmup: On paper, write a C++ function that takes a single int argument (n) and returns the product of all the integers between 1 and n.

Outline. Announcements. Homework 2. Boolean expressions 10/12/2007. Announcements Homework 2 questions. Boolean expression

Programming for Engineers in Python

COP 3014: Fall Final Study Guide. December 5, You will have an opportunity to earn 15 extra credit points.

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

Certified Data Science with Python Professional VS-1442

Euler s Method with Python

Skills Quiz - Python Edition Solutions

CSC-140 Assignment 5

Course May 18, Advanced Computational Physics. Course Hartmut Ruhl, LMU, Munich. People involved. SP in Python: 3 basic points

Python Advance Course via Astronomy street. Sérgio Sousa (CAUP) ExoEarths Team (

research assistant at VSE/LEE course site: janvavra.github.io consultations by appointment

System Process Document Define Condition Processes. Department Responsibility/Role File Name. Define Condition Processes Trigger:

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016

BMEGUI Installation Manual BMEGUI2.1.1 Update 1 (Last Edited on: 2009/02/25)

XXXX - AUTOMATING PROCESSES USING ACTIONS 1 N/08/08

1 Introduction: Using a Python script to compile and plot data from Fortran modular program for Newton s method

ENGR 1181 MATLAB 09: For Loops 2

Latent Semantic Analysis. sci-kit learn. Vectorizing text. Document-term matrix

San José State University Computer Science CS 122 Advanced Python Programming Spring 2018

c12011sep914.notebook September 09, 2014

Python for Earth Scientists

11 Data Structures Foundations of Computer Science Cengage Learning

Making Games With Python & Pygame By Al Sweigart READ ONLINE

Pandas UDF Scalable Analysis with Python and PySpark. Li Jin, Two Sigma Investments

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

NOTES ON RUNNING PYTHON CODE

Instructor (Mehran Sahami): All righty. Welcome back to yet another fun-filled, exciting day of CS106A.

Lotus IT Hub. Module-1: Python Foundation (Mandatory)

CS 301: Recursion. The Art of Self Reference. Tyler Caraza-Harter

FREE and Open Source Software Tools for WATer Resource Management

XXXX - AUTOMATING THE MARQUEE 1 N/08/08

System Process Document Define Requirement Usages. Department Responsibility/Role File Name. Define Requirement Usages Trigger: Additional Information

IAP Python - Lecture 4

Python for Data Analysis

Basic Beginners Introduction to plotting in Python

NumPy quick reference

Converting categorical data into numbers with Pandas and Scikit-learn -...

Module File Name on "Add or Remove Programs" 1 Python 2.5 python-2.5.mci Python MCR MCRInstaller.exe

While Loops A while loop executes a statement as long as a condition is true while condition: statement(s) Statement may be simple or compound Typical

Data Analytics Job Guarantee Program

AMS209 Final Project: Linear Equations System Solver

Introduction to Python

OREKIT IN PYTHON ACCESS THE PYTHON SCIENTIFIC ECOSYSTEM. Petrus Hyvönen

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Python Tutorial for CSE 446

LECTURE 19. Numerical and Scientific Packages

Data Analyst Nanodegree Syllabus

Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker,

COSC 490 Computational Topology

University of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2

Scientific Python. 1 of 10 23/11/ :00

Frameworks in Python for Numeric Computation / ML

Introduction to Python. Dmytro Karpenko Research Infrastructure Services Group, Department for Research Computing, USIT, UiO

On your way in. Pick-Up: 1. Nothing!

Data Acquisition and Processing

Introduction to Computer Science Midterm 3 Fall, Points

SCIENCE. An Introduction to Python Brief History Why Python Where to use

Intro. Scheme Basics. scm> 5 5. scm>

SWITCHING FROM GRASSHOPPER TO VECTORWORKS

DECISION STRUCTURES: USING IF STATEMENTS IN JAVA

Python With Data Science

System Process Document Test Transfer Rules

Scientific Computing using Python

A linked list grows as data is added to it. In a linked list each item is packaged into a node.

Python Tutorial for CSE 446

Homework 02. A, 8 letters, each letter is 7 bit, which makes 8*7=56bit, there will be 2^56= different combination of keys.

LECTURE 22. Numerical and Scientific Packages

Week Two. Arrays, packages, and writing programs

Engineering Innovation Center MATLAB Basics

Pandas and Friends. Austin Godber Mail: Source:

Blurring the Line Between Developer and Data Scientist

Parallel Systems. Project topics

CSE 153 Design of Operating Systems

Transcription:

Intermediate/Advanced Python Michael Weinstein (Day 2)

Topics Review of basic data structures Accessing and working with objects in python Numpy How python actually stores data in memory Why numpy can help Dot product example Extending our SAMLine object from yesterday Making and analyzing our quality score matrix Pandas Matplotlib Making a histogram Scipy

A true array (from C++ or similar languages) A true array is stored sequentially in a fixed space in the computer s memory A 10 integer array Possibly other stuff 0 1 2 3 4 5 6 7 8 9 Maybe passwords or instructions?

Seriously, that other stuff isn t a joke

Seriously, that other stuff isn t a joke

Seriously, that other stuff isn t a joke

A python list uses pointers for flexibility A pointer is a value that points to a specific location in the computer s memory A 10 integer list 0 4 6 8 9 5 2 1 3 7

A python list uses pointers for flexibility This method of storing data increases flexibility Deleting a number from a 10 integer list 0 4 6 8 9 5 2 1 3 7

A python list uses pointers for flexibility This method of storing data increases flexibility Deleting a number from a 10 integer list 0 4 6 8 9 5 1 3 7

A python list uses pointers for flexibility This method of storing data increases flexibility Deleting a number from a 10 integer list 0 4 6 8 9 5 1 3 7

A python list uses pointers for flexibility This method of storing data increases flexibility Adding a number to a 10 integer list 0 4 6 8 9 5 2 1 3 7

A python list uses pointers for flexibility This method of storing data increases flexibility Adding a number to a 10 integer list 0 4 6 8 9 5 2 1 3 7

A python list uses pointers for flexibility This method of storing data increases flexibility Replacing a number in a 10 integer list with a string 0 4 6 8 9 5 2 1 3 7

A python list uses pointers for flexibility This method of storing data increases flexibility Replacing a number in a 10 integer list with a string 0 4 6 8 9 I am to misbehave 5 2 1 3 7

A python list uses pointers for flexibility This method of storing data increases flexibility Replacing a number in a 10 integer list with a string 0 4 6 8 9 I am to misbehave 5 2 1 3 7

A python list uses pointers for flexibility This method of storing data increases flexibility Replacing a number in a 10 integer list with a string 0 4 6 8 9 I am to misbehave 5 1 3 7

Numpy gives us access to true arrays Your numpy objects will have limited flexibility Data in your numpy objects will have to be homogeneous There will be some overhead in terms of computing time to turn your python object into a numpy object Why on Earth would we want to do this?

A true array (from C++ or similar languages) Possibly other stuff 0 1 2 3 4 5 6 7 8 9 Maybe passwords or instructions? Send me the first value in the array at 0x08C4

A true array (from C++ or similar languages) Possibly other stuff 0 1 2 3 4 5 6 7 8 9 Maybe passwords or instructions?

A true array (from C++ or similar languages) Possibly other stuff 0 1 2 3 4 5 6 7 8 9 Maybe passwords or instructions? 1 Prefetching (big speedup) 0

Let s prove this (and learn some useful functions)

And the commands

Results Python is already pretty fast for this operation Going back and forth between numpy and standard python makes it much less fast Running as much of this in numpy as possible gave about a 10x speed increase

Topics Review of basic data structures Accessing and working with objects in python Numpy How python actually stores data in memory Why numpy can help Dot product example Extending our SAMLine object from yesterday Making and analyzing our quality score matrix Pandas Matplotlib Making a histogram Scipy

Quality Scores

Time to code a little more Create a new file in your working folder called qualitystringhandler.py Make a copy of your working samreader.py called samreader2.py

How to convert ASCII values

Our new class for quality strings

Modify SAMLine to handle the new data

Result

Goal: Generate a histogram to look at the average quality score distribution in the first 50 bases of any read of length 100 or more from the sample data What we know We can already break down this data very well We will have to filter on read length (easy, since we already store it) We will have to iterate over lines and build up a matrix We will have to take an average across each row (read) in the matrix We will have to generate a histogram of these values New file: histogrammaker.py

How to get the list of lists

How to turn a list of lists into a numpy matrix

And how to generate a matrix of just means

A function to get a matrix of qualities

A function to make a histogram

Result?