Thomas Vincent Head of Data Science, Getty Images

Similar documents
INTRODUCTION TO DATA VISUALIZATION WITH PYTHON. Visualizing time series

MANIPULATING TIME SERIES DATA IN PYTHON. Compare Time Series Growth Rates

Pandas 4: Time Series

DSC 201: Data Analysis & Visualization

CLEANING DATA IN PYTHON. Data types

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Pandas IV: Time Series

What is Data Science?

Introduction to Python: The Multi-Purpose Programming Language. Robert M. Porsch June 14, 2017

Data Preprocessing in Python. Prof.Sushila Aghav

INTRODUCTION TO DATABASES IN PYTHON. Filtering and Targeting Data

Quick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont

Part I. Graphical exploratory data analysis. Graphical summaries of data. Graphical summaries of data

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Python for Data Analysis

Statistics Lecture 6. Looking at data one variable

windrose Documentation Lionel Roubeyrie & Sebastien Celles

PYTHON DATA VISUALIZATIONS

1.1 Graphing Quadratic Functions (p. 2) Definitions Standard form of quad. function Steps for graphing Minimums and maximums

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

MANIPULATING TIME SERIES DATA IN PYTHON. How to use Dates & Times with pandas

Problem Based Learning 2018

Data Analyst Nanodegree Syllabus

Chapter 1 : Informatics Practices. Class XII ( As per CBSE Board) Advance operations on dataframes (pivoting, sorting & aggregation/descriptive

DATA STRUCTURE AND ALGORITHM USING PYTHON

Table of Contents (As covered from textbook)

The SciPy Stack. Jay Summet

DSC 201: Data Analysis & Visualization

Data Analyst Nanodegree Syllabus

CITS4009 Introduc0on to Data Science

DSC 201: Data Analysis & Visualization

Data Science and Machine Learning Essentials

Scientific Programming. Lecture A07 Pandas

CS1100: Computer Science and Its Applications. Creating Graphs and Charts in Excel

IMPORTING & MANAGING FINANCIAL DATA IN PYTHON. The DataReader: Access financial data online

Practical 2: Using Minitab (not assessed, for practice only!)

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Chapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Making plots in R [things I wish someone told me when I started grad school]

ENGG1811: Data Analysis using Spreadsheets Part 1 1

Pandas and Friends. Austin Godber Mail: Source:

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

3190 Week W 3 Graphing Data

(2) Hypothesis Testing

Choosing the right graph in Excel

ARTIFICIAL INTELLIGENCE AND PYTHON

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

MANIPULATING DATAFRAMES WITH PANDAS. Categoricals and groupby

Visualizing the World

Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS

ACHIEVEMENTS FROM TRAINING

Data preprocessing Functional Programming and Intelligent Algorithms

IMPORTING & MANAGING FINANCIAL DATA IN PYTHON. Read, inspect, & clean data from csv files

CSC Advanced Scientific Computing, Fall Numpy

STA 490H1S Initial Examination of Data

AP Statistics. Study Guide

Warm-Up Exercises. Find the x-intercept and y-intercept 1. 3x 5y = 15 ANSWER 5; y = 2x + 7 ANSWER ; 7

Q1. Write code to Import an entire module named as Calculator.py in your program.*1+

Basic Statistical Graphics in R. Stem and leaf plots 100,100,100,99,98,97,96,94,94,87,83,82,77,75,75,73,71,66,63,55,55,55,51,19

Certified Data Science with Python Professional VS-1442

An Introduction to Data Analysis, Statistics, and Graphing

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

MATH11400 Statistics Homepage

Section 2-2. Histograms, frequency polygons and ogives. Friday, January 25, 13

INTRODUCTION TO DATA VISUALIZATION WITH PYTHON. Working with 2D arrays

Combo Charts. Chapter 145. Introduction. Data Structure. Procedure Options

Data 100. Lecture 5: Data Cleaning & Exploratory Data Analysis

Introduction to Minitab 1

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

file:///users/williams03/a/workshops/2015.march/final/intro_to_r.html

DSC 201: Data Analysis & Visualization

Data Science Essentials

lab 04 December 3, 2015

Old Faithful Chris Parrish

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)

STATISTICAL THINKING IN PYTHON I. Introduction to Exploratory Data Analysis

STA490HY1Y. Initial Examination of Data

An Introduction to Minitab Statistics 529

An introduction to ggplot: An implementation of the grammar of graphics in R

Section 9: One Variable Statistics

CSC 1315! Data Science

Python Pandas- II Dataframes and Other Operations

Programming for Engineers in Python

MatPlotTheme Documentation

The SWMPrats website and the Widgets

More Summer Program t-shirts

INTRODUCTION TO R. Basic Graphics

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Case study: accessing financial data

lab_03 September 20, 2016

Understanding and Comparing Distributions. Chapter 4

3 Graphical Displays of Data

Lesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2

Math 167 Pre-Statistics. Chapter 4 Summarizing Data Numerically Section 3 Boxplots

DSC 201: Data Analysis & Visualization

Introduction to Graphics with ggplot2

An Introduction to R- Programming

The foundations of building Tableau visualizations and Dashboards

DataView Features. Input Data Formats. Current Release

Transcription:

VISUALIZING TIME SERIES DATA IN PYTHON Clean your time series data Thomas Vincent Head of Data Science, Getty Images

The CO2 level time series A snippet of the weekly measurements of CO2 levels at the Mauna Loa Observatory, Hawaii. datestamp co2 datestamp 1958-03-29 1958-03-29 316.1 1958-04-05 1958-04-05 317.3 1958-04-12 1958-04-12 317.6...... 2001-12-15 2001-12-15 371.2 2001-12-22 2001-12-22 371.3 2001-12-29 2001-12-29 371.5

Finding missing values in a DataFrame In [1]: print(df.isnull()) co2 datestamp 1958-03-29 False 1958-04-05 False 1958-04-12 False... In [2]: print(df.notnull()) co2 datestamp 1958-03-29 True 1958-04-05 True 1958-04-12 True...

Counting missing values in a DataFrame In [1]: print(df.isnull().sum()) datestamp 0 co2 59 dtype: int64

Replacing missing values in a DataFrame In [1]: print(df)... 5 1958-05-03 316.9 6 1958-05-10 NaN 7 1958-05-17 317.5... In [2]: df = df.fillna(method='bfill') In [3]: print(df)... 5 1958-05-03 316.9 6 1958-05-10 317.5 7 1958-05-17 317.5...

VISUALIZING TIME SERIES DATA IN PYTHON Let's practice!

VISUALIZING TIME SERIES DATA IN PYTHON Plot aggregates of your data Thomas Vincent Head of Data Science, Getty Images

Moving averages In the field of time series analysis, a moving average can be used for many different purposes: smoothing out short-term fluctuations removing outliers highlighting long-term trends or cycles.

The moving average model In [1]: co2_levels_mean = co2_levels.rolling(window=52).mean() In [2]: ax = co2_levels_mean.plot() In [3]: ax.set_xlabel("date") In [4]: ax.set_ylabel("the values of my Y axis") In [5]: ax.set_title("52 weeks rolling mean of my time series") In [6]: plt.show()

A plot of the moving average for the CO2 data

Computing aggregate values of your time series In [1]: co2_levels.index DatetimeIndex(['1958-03-29', '1958-04-05',...], dtype='datetime64[ns]', name='datestamp', length=2284, freq=none) In [2]: print(co2_levels.index.month) array([ 3, 4, 4,..., 12, 12, 12], dtype=int32) In [3]: print(co2_levels.index.year) array([1958, 1958, 1958,..., 2001, 2001, 2001], dtype=int32)

Plotting aggregate values of your time series In [1]: index_month = co2_levels.index.month In [2]: co2_levels_by_month = co2_levels.groupby(index_month).m In [3]: co2_levels_by_month.plot() In [4]: plt.show()

Plotting aggregate values of your time series

VISUALIZING TIME SERIES DATA IN PYTHON Let's practice!

VISUALIZING TIME SERIES DATA IN PYTHON Thomas Vincent Head of Data Science, Getty Images Summarizing the values in your time series data

Obtaining numerical summaries of your data What is the average value of this data? What is the maximum value observed in this time series?

Obtaining numerical summaries of your data The.describe() method automatically computes key statistics of all numeric columns in your DataFrame In [1]: print(df.describe()) co2 count 2284.000000 mean 339.657750 std 17.100899 min 313.000000 25% 323.975000 50% 337.700000 75% 354.500000 max 373.900000

Summarizing your data with boxplots In [1]: ax1 = df.boxplot() In [2]: ax1.set_xlabel('your first boxplot') In [3]: ax1.set_ylabel('values of your data') In [4]: ax1.set_title('boxplot values of your data') In [5]: plt.show()

A boxplot of the values in the CO2 data

Summarizing your data with histograms In [1]: ax2 = df.plot(kind='hist', bins=100) In [2]: ax2.set_xlabel('your first histogram') In [3]: ax2.set_ylabel('frequency of values in your data') In [4]: ax2.set_title('histogram of your data with 100 bins') In [5]: plt.show()

A histogram plot of the values in the CO2 data

Summarizing your data with density plots In [1]: ax3 = df.plot(kind='density', linewidth=2) In [2]: ax3.set_xlabel('your first density plot') In [3]: ax3.set_ylabel('density values of your data') In [4]: ax3.set_title('density plot of your data') In [5]: plt.show()

A density plot of the values in the CO2 data

VISUALIZING TIME SERIES DATA IN PYTHON Let's practice!