Bi 1x Spring 2014: Plotting and linear regression

Similar documents
Python Crash Course Numpy, Scipy, Matplotlib

Plotting With matplotlib

Chemistry 1A Graphing Tutorial CSUS Department of Chemistry

Tutorial 2 PHY409 Anadi Canepa Office, TRIUMF MOB 92 B ( )

APPM 2460 PLOTTING IN MATLAB

Interpolation and curve fitting

AMath 483/583 Lecture 28 June 1, Notes: Notes: Python scripting for Fortran codes. Python scripting for Fortran codes.

Intro to Research Computing with Python: Visualization

PyPlot. The plotting library must be imported, and we will assume in these examples an import statement similar to those for numpy and math as

PyPlot. The plotting library must be imported, and we will assume in these examples an import statement similar to those for numpy and math as

MS6021 Scientific Computing. TOPICS: Python BASICS, INTRO to PYTHON for Scientific Computing

Experimental Design and Graphical Analysis of Data

Tutorial Four: Linear Regression

699DR git/github Tutorial

Using the Matplotlib Library in Python 3

Getting Started with MATLAB

User-Defined Function

Visualisation in python (with Matplotlib)

Lab 4: Structured Programming I

No more questions will be added

An introduction to plotting data

Scientific Python: matplotlib

5 File I/O, Plotting with Matplotlib

Euler s Method with Python

Python Matplotlib. MACbioIDi February March 2018

Programming for Engineers in Python

26, 2016 TODAY'S AGENDA QUIZ

10.4 Linear interpolation method Newton s method

4. BASIC PLOTTING. JHU Physics & Astronomy Python Workshop Lecturer: Mubdi Rahman

Introductory Scientific Computing with Python

Introduction to Matlab

LECTURE 22. Numerical and Scientific Computing Part 2

Nonlinear curve-fitting example

INTRODUCTION TO DATA VISUALIZATION WITH PYTHON. Working with 2D arrays

Lecture 8. Divided Differences,Least-Squares Approximations. Ceng375 Numerical Computations at December 9, 2010

EE 350. Continuous-Time Linear Systems. Recitation 1. 1

NAVIGATING UNIX. Other useful commands, with more extensive documentation, are

CME 193: Introduction to Scientific Python Lecture 5: Numpy, Scipy, Matplotlib

Effective Programming Practices for Economists. 10. Some scientific tools for Python

3.7. Vertex and tangent

Newton and Quasi-Newton Methods

Python for Scientists

Here is the data collected.

Ingredients of Change: Nonlinear Models & 2.1 Exponential Functions and Models

CME 193: Introduction to Scientific Python Lecture 6: Numpy, Scipy, Matplotlib

ENGR (Socolofsky) Week 07 Python scripts

Part VI. Scientific Computing in Python. Alfredo Parra : Scripting with Python Compact Max-PlanckMarch 6-10,

0 Graphical Analysis Use of Excel

Matlab Tutorial 1: Working with variables, arrays, and plotting

Grace days can not be used for this assignment

Introduction to Matlab

ARTIFICIAL INTELLIGENCE AND PYTHON

Scientific Programming. Lecture A08 Numpy

EOSC 473/573 Matlab Tutorial R. Pawlowicz with changes by M. Halverson

Lab 6: Graphical Methods

Polymath 6. Overview

Graphing Linear Equations

Introduction to Matplotlib: 3D Plotting and Animations

Department of Chemical Engineering ChE-101: Approaches to Chemical Engineering Problem Solving MATLAB Tutorial Vb

Objectives. 1 Basic Calculations. 2 Matrix Algebra. Physical Sciences 12a Lab 0 Spring 2016

MATLAB Modul 3. Introduction

dyplot Documentation Release Tsung-Han Yang

Skills Quiz - Python Edition Solutions

MATLAB Tutorial. Mohammad Motamed 1. August 28, generates a 3 3 matrix.

Math Scientific Computing - Matlab Intro and Exercises: Spring 2003

Computational Physics Programming Style and Practices & Visualizing Data via Plotting

Sketching graphs of polynomials

Basic Beginners Introduction to plotting in Python

UNIVERSITETET I OSLO

Sampling from distributions

Lecture 15: High Dimensional Data Analysis, Numpy Overview

More on Curve Fitting

Introductory Scientific Computing with Python

UNIT I READING: GRAPHICAL METHODS

9.1 Linear Inequalities in Two Variables Date: 2. Decide whether to use a solid line or dotted line:

EE 301 Signals & Systems I MATLAB Tutorial with Questions

ENV Laboratory 2: Graphing

#To import the whole library under a different name, so you can type "diff_name.f unc_name" import numpy as np import matplotlib.

windrose Documentation Lionel Roubeyrie & Sebastien Celles

Objectives. 1 Running, and Interface Layout. 2 Toolboxes, Documentation and Tutorials. 3 Basic Calculations. PS 12a Laboratory 1 Spring 2014

Numerical Calculations

Test 1 - Python Edition

Pre-Lab Excel Problem

Graphical Analysis of Data using Microsoft Excel [2016 Version]

2. For each of the regular expressions, give a string that will matches it:

Section Graphs and Lines

Polynomial and Rational Functions. Copyright Cengage Learning. All rights reserved.

CSC Advanced Scientific Computing, Fall Numpy

The Python interpreter

Experiment 1 CH Fall 2004 INTRODUCTION TO SPREADSHEETS

Computational Optimization Homework 3

L15. 1 Lecture 15: Data Visualization. July 10, Overview and Objectives. 1.2 Part 1: Introduction to matplotlib

LECTURE 19. Numerical and Scientific Packages

Exercise 4. AMTH/CPSC 445a/545a - Fall Semester October 30, 2017

Introduction to Scientific Computing with Matlab

Math 182. Assignment #4: Least Squares

[CALCULATOR OPERATIONS]

Derek Bridge School of Computer Science and Information Technology University College Cork

Introduction to Python and NumPy I

1 Introduction. 2 Useful linear algebra (reprise) Introduction to MATLAB Reading. Spencer and Ware (2008), secs. 1-7, 9-9.3,

Transcription:

Bi 1x Spring 2014: Plotting and linear regression In this tutorial, we will learn some basics of how to plot experimental data. We will also learn how to perform linear regressions to get parameter estimates. In doing so, we will also get an introduction to NumPy s random number generation module, numpy.random, which we will use later on in the course. I also note that for the purposes of this simple tutorial, we are not going to consider error bars on experimental measurements nor error estimates on computed regression parameters. 1 Generating fake data For the purposes of this tutorial, we will generate some fake data to use for plotting. I chose to do this instead of giving you data because I want to also introduce you to random number generation in Python. We will create a module to generate fake data. Store it in a file fake_data.py. 1 """ Module to generate random data 3 """ 5 import numpy. random as np_rand 7 # ######################## def generate_ fake_ data (x, f, args =(), noise_ factor =0.1) : 9 """ Generates fake data that follows a function f(x, * args ) with random noise. 11 13 The data are generate at points x. The amplitude of the noise from the function scales like 15 noise_factor * mean ( abs (f(x))). 17 args is a tuple of other parameters that are passed into f. """ 19 # Generate base curve 21 base_ curve = f(x, * args ) 23 # Generate random noise ( sample random numbers on interval [ -1,1] noise = 2.0 * np_rand. rand ( len (x)) - 1.0 25 # Add noise 27 y = base_ curve + noise_ factor * abs ( base_ curve ). mean () * noise 29 return y 31 # ###################### def linear_ function (x, m, b): 33 """ Returns m * x + b. 35 """ 1

return m * x + b The function numpy.random.rand generates uniformly distributed random numbers on the interval [0, 1). The argument to the functions says how many numbers to generate. For example, rand(100) returns an np.ndarray of 100 random numbers between zero and one. To get uniformly distributed random numbers on the interval [ 1, 1), we linearly transform the numbers, as in line 27 above. For today, we will generate fake data that falls along a line. We therefore also include a simple linear function in our fake_data module. Note that in the generate_data function, we has used a function call f(x, *args). In Python, we can take a tuple and pass it as separate arguments into a function by using the * operator. We can test it out. 2 4 6 8 10 12 In [1]: import fake_ data In [2]: x = 1.0 In [3]: m, b = 4.0, 5.0 In [4]: args = (4.0, 5.0) In [5]: fake_ data. linear_ function (x, m, b) In [6]: fake_data. linear_function (x, * args ) In [7]: fake_ data. linear_ function (x, args ) The last function call will give an error, because without the *, args is just a single argument passed into the function. To generate the fake data for use within our Python window in Canopy, we simply use these functions. In [8]: import numpy as np In [9]: x = np. linspace (0.0, 10.0, 20) # 20 evenly spaces pts from 0 to 10 In [10]: y = fake_ data. generate_ fake_ data (x, fake_ data. linear_ function, args = args ) In [11]: x, y We now have our fake data, and we can begin plotting. 2 Plotting experimental data 2.1 Plotting data points The plt.plot function is the main utility for plotting data. You have seen the plt.fill_between function in the image processing tutorial, which was useful for viewing histograms, but that is in a way a fancy plotting function. You have also used skimage.io.imshow repeatedly, which is plotting data, as we have discussed. plt.plot is the workhorse of plotting. So, let s start by naively just plotting our data. 2

In [12]: import matplotlib. pyplot as plt In [13]: plt. plot (x, y) In [14]: plt. draw () In [15]: plt. show () First off, note that the function calls to plt.draw and plt.show are often unnecessary when operating in the Canopy Python window. They are necessary, however, to pull up windows with plots when you are running scripts. Now, when we look at our plot, we see that the default is to connect points with straight lines. This is useful when plotting theoretical curves. We sample the curves at dense points (e.g., x = np.linspace(0, 1, 200)), and then plot the function as a line. However, for experimental data do not plot your data as lines unless it is very highly sampled, like in an electrocardiogram. Plot your data as individual points. To do this, we can make use of plt.plot s many keyword arguments. In [16]: plt. clf () # This clears the figure window In [17]: plt. plot (x, y, marker = o, linestyle = none ) In [18]: plt. draw () ; plt. show () Now, we have a series of dots. There are many keyword arguments that give you lots of control over how the data are presented. In [19]: plt. plot? Finally, we most often plot our data as black dots. A shortcut to get this kind of plot is In [20]: plt. clf () In [21]: plt. plot (x, y, ko ) In [22]: plt. draw () ; plt. show () 2.2 Labeling axes Now that we have a plot to work with, we can label our axes. Always label your axes. As a reminder, always label your axes. I would say it a third time, but that would be obnoxious. Let us pretend for a moment that our x-axis is time in units of years and the y-axis is the average height of trees in my yard. Then, we would label our axes as In [23]: plt. xlabel ( time [ years ], fontsize =18) In [24]: plt. ylabel ( average height [ feet ], fontsize =18) In [25]: plt. draw () ; plt. show () Note that I have used the fontsize keyword argument to control the font size. Always make your fonts large enough to be easily legible. You can play around with with the font size to make it look right. 3

2.3 Legends If you have multiple plots, it is often useful to have a legend. Just for demonstration purposes, we ll make a legend for our single plot, In [26]: plt. legend (( tree height,), loc = upper left, numpoints =1, fontsize =16) In [27]: plt. draw () ; plt. show () Notices that the first argument is a tuple containing the labels for your curves. The ordering of the tuple corresponds to the order in which the curves were put in the figure using plt.plot. For multiple curves, it would have more than one entry. I also like to use the numpoints keyword argument to include only one marker in the legend. The default is to include two, which I think is ugly. Note that the text may seem off-center in the legend. This is usually corrected with you save the figure (see below). 2.4 Saving your plot You can save your figure. It is best to save it as vector graphics, such as PDF or SVG. PDF is usually preferred. In [28]: plt. tight_ layout () In [29]: plt. draw () ; plt. show () In [30]: plt. savefig ( tree_height. pdf ) The plt.tight_layout function is convenient to make sure all axis labels, etc., will appear properly in your saved figure. 3 Performing a linear regression To perform a linear regression, we try to find the values of m and b for the line y = mx + b that best describe the data. To do so, we minimize the sum of the square of the residuals. A residual is the difference between the line you are fitting and the data point itself for a given point x. For example, let s say experimental data point i, (x i, y i ), should fall on the line y = mx + b. The residual for point i is r i y i y = y i (mx i + b). (1) To get an idea of what the residuals are, we can draw a line through our data and plot the residuals in red. In [31]: y_theor = 4.0 * x + 5.0 In [32]: plot (x, y_theor, linestyle = -, color = gray ) In [33]: for i in xrange ( len (x)):...: plot ((x[i], x[i]), (y[i], y_theor [i]), r- ) In [34]: plt. draw () ; plt. show () 4

We can see from the plot that if we minimize the sum of the squares of the residuals, we can get the line that is closest to all the points taken together. (Note: This is a very deep topic, and we are only scratching the very surface.) So, the linear regression problem can be stated as m, b = arg min m,b [y i (mx i + b)] 2, (2) i where the term in brackets is the residual for data point i. This optimization problem is called a least squares problem. For the case where the function we are fitting our data with is a polynomial (such as a line), the problem has a unique solution that can be found by matrix operations. The details can be found in most introductory linear algebra textbooks. We will instead use the curve_fit function in the scipy.optimize module to perform the curve fit. I chose to do this instead of working through the linear algebra because this function can also be used to perform nonlinear regression. It is well-worth reading the doc string for this function. In [35]: from scipy. optimize import curve_ fit In [36]: curve_ fit? The first argument is the function we are fitting the data to. It must be of the form f(x, *args), which is conveniently what we already specified. The keyword argument p0 is the guess at the best parameters as an np.ndarray. For fitting a linear function, it is not important to specify this, but it is a very good idea to do so, as it can be very important for nonlinear regression (which we will do later in Bi 1x). The function curve_fit will assume that p0 is all ones otherwise. curve_fit returns to np.ndarrays. The first contains the best-fit parameters, given in the order that they are inputted into the fit function f. The second is the covariance matrix, the diagonals of which are supposed to give the variance of the fit parameters. Warning: in curve_fit s implementation, they variances will not be correctly reported unless you include error bars with you data, which we will not be doing in Bi 1x. Therefore, you should ignore the covariance matrix returned by curve_fit for the purposes of Bi 1x. Note also that if you want more control over your curve fitting routines and want to do more sophisticated error analysis, you can directly use scipy.optimize.leastsq, which is what curve_fit uses under the hood. Without further ado, let s fit our data with a line. In [37]: popt, pcov = curve_ fit ( fake_ data. linear_ function, x, y, p0 =(4.0, 5.0) ) In [38]: m, b = popt In [39]: m, b We can make a plot of our data with the curve fit. In [39]: x_theor = np. linspace (x[0], x[ -1], 200) 2 4 6 In [40]: y_theor = fake_ data. linear_ function ( x_theor, * popt ) In [41]: plt. clf () In [42]: plt. plot (x, y, ko ) 5

8 10 12 14 In [43]: plt. plot ( x_theor, y_theor, linestyle = -, color = gray ) In [44]: plt. xlabel ( time [ years ], fontsize =18) In [45]: plt. ylabel ( average height [ feet ], fontsize =18) In [46]: plt. draw () ; plt. show () Now that you know how to do a linear regression, I d like you to think about this: If you know that your data had to pass through zero, i.e., that you only had to fit the slope and not the slope plus intercept, how would you do it? 6