ENGG1811: Data Analysis using Spreadsheets Part 1 1

Similar documents
Data Analysis using Spreadsheets 1 I

1 Introduction to Using Excel Spreadsheets

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Introduction to CS graphs and plots in Excel Jacek Wiślicki, Laurent Babout,

Reference and Style Guide for Microsoft Excel

Using Excel for Graphical Analysis of Data

Dealing with Data in Excel 2013/2016

Excel Spreadsheets and Graphs

Lesson 76. Linear Regression, Scatterplots. Review: Shormann Algebra 2, Lessons 12, 24; Shormann Algebra 1, Lesson 94

Using Excel for Graphical Analysis of Data

ENV Laboratory 2: Graphing

Microsoft Excel. Charts

Ingredients of Change: Nonlinear Models

= 3 + (5*4) + (1/2)*(4/2)^2.

Math 227 EXCEL / MEGASTAT Guide

Spreadsheet Techniques and Problem Solving for ChEs

Pre-Lab Excel Problem

0 Graphical Analysis Use of Excel

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Scottish Improvement Skills

Experiment 1 CH Fall 2004 INTRODUCTION TO SPREADSHEETS

Descriptive Statistics, Standard Deviation and Standard Error

Polymath 6. Overview

To move cells, the pointer should be a north-south-eastwest facing arrow

Microsoft Excel 2007

Microsoft Excel Using Excel in the Science Classroom

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Data Management Project Using Software to Carry Out Data Analysis Tasks

Three-Dimensional (Surface) Plots

INSTRUCTIONS FOR USING MICROSOFT EXCEL PERFORMING DESCRIPTIVE AND INFERENTIAL STATISTICS AND GRAPHING

Year 10 General Mathematics Unit 2

Activity Graphical Analysis with Excel and Logger Pro

Here is the data collected.

Excel Primer CH141 Fall, 2017

What Is Excel? Multisheet Files Multiple Document Interface Built-in Functions Customizable Toolbars Flexible Text Handling

Chapter 10 Working with Graphs and Charts

lab MS Excel 2010 active cell

Activity: page 1/10 Introduction to Excel. Getting Started

Practical 1P1 Computing Exercise

Rockefeller College MPA Excel Workshop: Clinton Impeachment Data Example

Introduction to ANSYS DesignXplorer

SAS Visual Analytics 8.2: Getting Started with Reports

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

Name: Dr. Fritz Wilhelm Lab 1, Presentation of lab reports Page # 1 of 7 5/17/2012 Physics 120 Section: ####

Lab1: Use of Word and Excel

Rev. C 11/09/2010 Downers Grove Public Library Page 1 of 41

Excel Tips and FAQs - MS 2010

CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010

Spreadsheet Techniques and Problem Solving for ChEs

Appendix C. Vernier Tutorial

[1] CURVE FITTING WITH EXCEL

Excel for Gen Chem General Chemistry Laboratory September 15, 2014

Creating a Spreadsheet by Using Excel

Creating a Basic Chart in Excel 2007

1.1 Opening/Closing the file The first step to using Excel is launching the program and knowing how to close it when you re finished.

Working with Charts Stratum.Viewer 6

Introduction to the workbook and spreadsheet

An introduction to plotting data

The American University in Cairo. Academic Computing Services. Excel prepared by. Maha Amer

FUNCTIONS AND MODELS

Navigating In Uncharted Waters Of Microsoft Excel Charts

Tips and Guidance for Analyzing Data. Executive Summary

Microsoft Excel Basics Ben Johnson

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

MICROSOFT EXCEL BIS 202. Lesson 1. Prepared By: Amna Alshurooqi Hajar Alshurooqi

Introduction to Excel Workshop

Math 121 Project 4: Graphs

Laboratory 1. Part 1: Introduction to Spreadsheets

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Spreadsheet Warm Up for SSAC Geology of National Parks Modules, 2: Elementary Spreadsheet Manipulations and Graphing Tasks

Plotting Graphs. Error Bars

Pivot Tables, Lookup Tables and Scenarios

HOUR 12. Adding a Chart

Contents 10. Graphs of Trigonometric Functions

W7 DATA ANALYSIS 2. Your graph should look something like that in Figure W7-2. It shows the expected bell shape of the Gaussian distribution.

Statistics & Regression Tools for Excel

Introduction to Spreadsheets

Spreadsheet and Graphing Exercise Biology 210 Introduction to Research

How to Excel - Part 2

COMPUTER TECHNOLOGY SPREADSHEETS BASIC TERMINOLOGY. A workbook is the file Excel creates to store your data.

To Plot a Graph in Origin. Example: Number of Counts from a Geiger- Müller Tube as a Function of Supply Voltage

Chemistry Excel. Microsoft 2007

Fathom Dynamic Data TM Version 2 Specifications

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

Graphing on Excel. Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version):

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Intro To Excel Spreadsheet for use in Introductory Sciences

Welcome to Microsoft Excel 2013 p. 1 Customizing the QAT p. 5 Customizing the Ribbon Control p. 6 The Worksheet p. 6 Excel 2013 Specifications and

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet

Better Histograms Using Excel. Michael R. Middleton School of Business and Management, University of San Francisco

Flow Cytometry Analysis Software. Developed by scientists, for scientists. User Manual. Version Introduction:

Using Excel This is only a brief overview that highlights some of the useful points in a spreadsheet program.

7/14/2009. Chapters Autocad Multi Line Text Tools. Multi Line Text Tool Ribbon. Multi Line Text Tool. July 14, 2009

Chapter 2 Assignment (due Thursday, October 5)

Lab Practical - Limit Equilibrium Analysis of Engineered Slopes

Error Analysis, Statistics and Graphing

I can solve simultaneous equations algebraically, where one is quadratic and one is linear.

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

Excel 2013 for Beginners

Transcription:

ENGG1811 Computing for Engineers Data Analysis using Spreadsheets 1 I Data Analysis Pivot Tables Simple Statistics Histogram Correlation Fitting Equations to Data Presenting Charts Solving Single-Variable Equations Goal Seek examples: black holes and ballistics Data Analysis Data analysis techniques allow professionals such as engineers, social scientists and economists to extract meaningful information from a typically vast amount of data. Spreadsheets are widely available, and provide useful features for data analysis. Some features are integrated with charts. This week Pivot tables Simple Statistics Histogram Correlation Curve fitting and regression analysis Goal Seek (solves single-variable equations) Next week Solver (for more general optimisation problems) Matrix calculations (very briefly, other tools are more appropriate) Data tables (ideal for financial modelling) Value lookup operations ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 1 ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 2 Summarising Data Pivot Tables Summarising Data Pivot Tables A Pivot Table is an interactive table that can be used to summarise large amounts of data quickly. You can move data sets (named by their column header) between rows, columns and the table body, and use multivalued filters to see different summaries of the source data. To create a pivot table, select the data, including header rows, and choose Data Pivot Table Create Dialogue box appears as shown overleaf To make changes, right-click on table and select Edit Layout Page fields: overall single-valued filter Row and column fields: filterable data Data fields: function (sum, count, max etc) applied to values at intersecting row/col Pivot Table Simple Statistics Filters apply to whole table (single value drop-downs or standard filter) Row and column filters are fully optioned (like Excel s filters) Multiple rows in body if multiple fields selected Usually sum, but also count (= number of orders here) Row and column totals ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 5 Spreadsheets provide many predefined statistical functions to calculate useful information such as: mean, max, min, median, standard deviation, etc. These can be applied to columns or rows of data Excel provides a tool called Descriptive Statistics that calculates such commonly used statistical functions for a given data set and produces a useful report. OpenOffice Calc doesn t have this, but there are several user-contributed statistics packages that do More advanced statistics functions are available ( 2, t-test, various distributions, confidence intervals etc), but serious analysis usually requires specialised software such as SPSS, SAS or R, and the knowledge of how to use it. ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 6 Spreadsheets Part 1 1

Histogram A histogram is a graphical representation of a frequency distribution of a single variable, using a column chart. The columns represent contiguous ranges in the variable, called bins or classes, usually equally spaced, and the column height shows how many variable values lie in that range. Frequencies are stored in a table, the first column of which are the bin boundary values, the second is filled in by the software Calc and Excel consider bin thresholds to be upper bounds, so there s an extra unlabelled bin after the last Some built-in histogram generators such as Matlab s display the midrange on the chart rather than the boundaries, but you can always add a column next to the bin for your own labels. Histogram: Frequency table Find a spot on the sheet for the frequency table, 10 to 20 bins is about right Estimate the variable range by plotting or simple stats Fill in the bin values, equally spaced and rounded unless the categories are going to be labelled First value should be less than min and last value > max, to make sure the full range is covered ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 7 ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 8 Histogram: completion Click in the first frequency cell, select the Function Wizard (important because this is an array operation). Find FREQUENCY in the function list and double-click classes: the bin threshold range, without the header end freqs are 0 as expected Chart the result (see lecture, extended in lab03), adding titles and labels in case the bin values are misleading ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 9 Correlation Correlation is a statistical measure referring to the strength of linear relationship between two or more dependent variables that have the same independent variable (time, position etc) It can vary from 1 (perfect negative correlation) through 0 (no correlation) to +1 (perfect positive correlation) Negative correlation means that larger values in one set are associated with smaller values in the other, positive goes the other way. Uncorrelated data simply shows no significant relationship ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 10 Correlation using Spreadsheets Correlation Table Calc provides the CORREL function. It accepts two ranges and returns the correlation coefficient. Alternatively, you can plot a chart for two or more variables and try to visually identify possible correlations between variables. For several variables, you can produce a table by using references carefully to allow fill operations across and down Excel provides a Correlation tool that calculates correlations between two or more variables. It constructs the table for you. ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 11 Formula: = correl($d$8:$d$32; D$8:D$32) first range uses absolute addresses second range is mixed to allow fill right For each row after the first, copy formula for the new cell on the diagonal from the cell above it change column letter on the first range only beware of Calc s formula prompts! ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 12 Spreadsheets Part 1 2

Correlation Charting You can visually inspect the relationship between variables on a chart, also an exercise in secondary axes Create a line chart for Precision, move it aside, double-click to edit, Format Data ranges Data Series tab Add a new series (Name is Sensor2 (G7), y-values are in G8:G32 Oops, the scales are badly mismatched! Oops, the scales are badly mismatched! Secondary Axis When two variable with quite different Y value ranges are to be plotted together, one goes on a secondary axis on the right Edit chart, select one of the data series, say Sensor2, either press Format Selection on the toolbar or right-click and pick Format Data Series. Axis selection is under Options. Don t forget to tidy the line format, add titles, adjust fonts etc to achieve a professional look ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 13 ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 14 Correlation is Not Necessarily Causation If two data sets are correlated, it doesn t mean that the processes behind one caused the other, they could be influenced by some (often complex) third process such as climate change socio-economic factors There have been celebrated cases of correlations for which no credible explanation is likely, like this one: A classic is overleaf, revealed at the lecture geographic influences solar activity http://pubs.acs.org/doi/pdf/10.1021/ci700332k ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 15 Fitting Equations to Data Given an X-Y plot of values derived from a physical system with likely uncertainty in measurements, many lines or curves can be drawn to approximate the data The method of least squares chooses the parameters for a regression line or curve of a given type specified by the user The best-fit curve is the one that minimises the sum of the squares of the residuals (differences between data and the predicted value). Trend Lines apply regression to data on a line chart. They provide the regression equation and R 2 value, and show the regression line superimposed on the data values. The R 2 value (varies from 0 to 1) indicates how well the model fits the data. R 2 = 1 indicates that the regression line (curve) is a perfect fit to the data. ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 17 Regression and Spreadsheets Fitting Equations to Data Calc can apply linear, exponential, logarithmic and power regression trend lines to a chart, but not polynomials Excel can also do polynomials Both can display the equation and R 2 value on the chart Both do so with poor choice of typeface and significance But you can roll-your-own with drawing elements Both can extrapolate the trend (use the trend line as a prediction of behaviour outside the range); in Calc you just change the X axis limits. If you just want the numbers, can use the linest, logest, growth functions; trend calculates points on the linear trend line Apart from simple analyses, regression is often done with Matlab or with statistical packages Use scatter plot with small unfilled symbol markers, no lines Options to show regression on the chart When chart is in edit mode, Insert Trend Lines activates the dialogue box ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 18 ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 19 Spreadsheets Part 1 3

Content vs Presentation Fixing the Presentation The result shows that the trend is appropriate, but the chart is not yet ready to be included in a professionally presented document. Defaults are rarely good enough. Y-axis number format is inappropriate There may be a need to point out features Auto-range implies there s a zero day Equation format is very poor (font, exp, precision, spacing are all wrong) Don t need grids as the chart is there to show a trend, not data Axis number font size is too large, the numbers aren t as important as the shape of the plot ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 20 Remember to double-click the chart to enter edit mode Feature to improve Axis attributes Remove grid Edit equation format Add a text box (anywhere, not just on a chart) Format text in box (double click to edit) Lines, arrows, circles Format drawing elements How to make changes Right click on axis, select Format Axis Format Grid All Grids from main menu (not supported in Calc, try the next) Pick text from the drawing toolbar Drag (off chart), type initial text General font changes from main toolbar; To apply sub/superscript, select characters and then Format Character Pick from drawing toolbar Right click, select Line/Area/Text; resize by dragging handles ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 21 Final Product There s still room for personal taste: you might not like to use fill and outline on the regression box, or prefer different relative font sizes. Always ask yourself: does the chart (etc) convey the message embedded in the data effectively? ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 22 Other Presentation Issues Any labels you add overlay the chart, they re not part of it Select in turn with Shift-click, or drag a marquee around them Format Group allows selected objects to be part of a single item The chart can be copied intact to another document such as OpenOffice Writer or Microsoft Word Calc to Writer is reliable, Excel to Word isn t always Other copy options if the other application can t import Click on group, File Export as PDF, use Range Selection, and export graphic images using lossless compression (PNG) If you really have to take a screen dump, expand the view first and always save as PNG, never JPG. PDF PNG, 150ppi JPEG, 150ppi PNG, 75ppi ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 23 Trends in Global CO 2 Concentrations Solving Equations in One Variable For the final example, data for atmospheric concentrations of CO 2 taken at the Mauna Loa Observatory (Hawaii) each month since 1958 is readily available on the web Plot just the last 10 years of the data using thin lines: the result shows the detail known as the Keeling Curve after Mauna Loa CO 2 researcher Charles David Keeling Annual pattern is clearly evident, can you explain these? the overall shape, peak in May each year, trough in September the kink around January Apply trend line, looks pretty clear and ominous doesn t it? on this model (that is, business as usual), when will we never see a value below 400ppm* again? Apply a trend line to the whole data set, is it still linear? Finding the roots of an equation can sometimes, but not always, be done analytically, but if not we need other approaches. Method 1: Graphical estimate Assign a range of values to a variable (say x), calculate the corresponding function values (say f (x)), and plot a graph for f (x) vs x. Now try to see where the chart line intersects with the x-axis. This approach is useful provided we can guess finite intervals within which to search for possible root values * For interesting observations about this milestone, see climate.nasa.gov/400ppmquotes/ ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 24 ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 25 Spreadsheets Part 1 4

Solving Single Equations f ( x) 4x 12x 64x 16 ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 26 3 2 Solving Single Equations Method 2: Goal Seek (Tools Goal Seek) works with three inputs A formula cell A target value that you would like the cell to calculate, and A variable cell that is used, even indirectly, by the formula cell Goal Seek tries to find a value for the variable cell that results in the target value in the formula cell. It uses iterative refinement, beginning with the current value. If an initial guess is not close enough, Goal Seek may not be able to find a solution Goal Seek tries a fixed number of iterations (attempts at getting closer to the goal) and stops after that, even if the equation is not solved. The Solver tool (next week) is more powerful than Goal Seek. The tools are very similar in Calc and Excel ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 27 Example 1: Cygnus X-1 Mass Calculation Mass of an black hole in a binary system can be calculated from a formula that is cubic in the unknown mass (see CygnusX1 sheet) This cubic has one positive real root Example 2: Projectile The model used for the Trajectory example last week can be inverted to select one parameter so the trajectory passes through a designated point (the target) Assume V 0 is fixed, angle can vary x and y are now the target location Solve for t in terms of y, quadratic ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 28 ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 29 Example 2: Seeking the Target One of the solutions for t (typically the larger) is used to recalculate y (which must match because that s the equation we rearranged) The same t calculates x, but we may miss the target horizontally Use Goal Seek to change t (x, y) and set x to the target x value Summary of Learning Outcomes With practice you should be able to do the following Create and manipulate a pivot table from multivariate data Construct histograms to analyse large data sets Identify correlation between a small number of variables, using the correl function and correctly interpreting its results Use scatter charts and trend lines to identify the parameters of processes underlying noisy data, and to extrapolate trends Present charts and other visual content professionally Apply graphical methods to solving equations in one variable Set up a single-variable equation model and use Goal Seek to identify the value of an input parameter that converges the result to a particular target ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 30 ENGG1811 UNSW, CRICOS Provider No: 00098G Data Analysis using Spreadsheets I slide 31 Spreadsheets Part 1 5