Introduction to CS databases and statistics in Excel Jacek Wiślicki, Laurent Babout,

Similar documents
Data Management Project Using Software to Carry Out Data Analysis Tasks

Separate Text Across Cells The Convert Text to Columns Wizard can help you to divide the text into columns separated with specific symbols.

EXCEL 2003 DISCLAIMER:

Microsoft Excel 2010 Handout

CSSCR Excel Intermediate 4/13/06 GH Page 1 of 23 INTERMEDIATE EXCEL

1 Introduction to Using Excel Spreadsheets

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

INSTRUCTIONS FOR USING MICROSOFT EXCEL PERFORMING DESCRIPTIVE AND INFERENTIAL STATISTICS AND GRAPHING

MS Excel Advanced Level

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet

How to Make Graphs in EXCEL

Data Should Not be a Four Letter Word Microsoft Excel QUICK TOUR

CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010

If you finish the work for the day go to QUIA and review any objective you feel you need help with.

Math 121 Project 4: Graphs

Excel Spreadsheets and Graphs

M i c r o s o f t E x c e l A d v a n c e d. Microsoft Excel 2010 Advanced

Math 227 EXCEL / MEGASTAT Guide

Basics: How to Calculate Standard Deviation in Excel

1. Select a cell in the column you want to sort by. In this example, we will sort by Last Name.

A Brief Word About Your Exam

Excel Formulas & Functions I CS101

SPREADSHEET (Excel 2007)

Application of Skills: Microsoft Excel 2013 Tutorial

CHAPTER 1 GETTING STARTED

Microsoft Excel 2010

Spreadsheet View and Basic Statistics Concepts

เพ มภาพตามเน อหาของแต ละบท. Microsoft Excel Benjamas Panyangam and Dr. Dussadee Praserttitipong. Adapted in English by Prakarn Unachak

Microsoft Office Illustrated. Getting Started with Excel 2007

Step 1: Prepare the worksheet data in Excel for the mail merge You can FT Menu Prompt # 1 R for Report.

Microsoft Excel Basics Ben Johnson

Creating a Spreadsheet by Using Excel

INTRODUCTION... 1 UNDERSTANDING CELLS... 2 CELL CONTENT... 4

Chapter 3: Data Description Calculate Mean, Median, Mode, Range, Variation, Standard Deviation, Quartiles, standard scores; construct Boxplots.

Part 1. Module 3 MODULE OVERVIEW. Microsoft Office Suite. Objectives. What is A Spreadsheet? Microsoft Excel

Business Process Procedures

LABORATORY 1 Data Analysis & Graphing in Excel

Activity: page 1/10 Introduction to Excel. Getting Started

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

For a walkthrough on how to install this ToolPak, please follow the link below.

Intermediate Microsoft Excel (Demonstrated using Windows XP) Using Spreadsheets in the Classroom

Lesson Skill Matrix Skill Exam Objective Objective Number

CHAPTER 6. The Normal Probability Distribution

Skill Exam Objective Objective Number

Three-Dimensional (Surface) Plots

Excel Primer CH141 Fall, 2017

Homework 1 Excel Basics

Project 11 Graphs (Using MS Excel Version )

Introduction to CS graphs and plots in Excel Jacek Wiślicki, Laurent Babout,

SAMPLE. Excel 2010 Advanced. Excel 2010 Advanced. Excel 2010 Advanced Page 1

Microsoft Office Word 2013 Intermediate. Course 01 Working with Tables and Charts

download instant at

In Minitab interface has two windows named Session window and Worksheet window.

GCSE CCEA GCSE EXCEL 2010 USER GUIDE. Business and Communication Systems

Microarray Excel Hands-on Workshop Handout

More Skills 12 Create Web Queries and Clear Hyperlinks

Intermediate Excel 2016

Basics of Spreadsheet

Pre-Lab Excel Problem

New Perspectives on Microsoft Excel Module 5: Working with Excel Tables, PivotTables, and PivotCharts

ICT & MATHS. Excel 2003 in Mathematics Teaching

Reviewing Hidden Content during Native Review

Working with Microsoft Excel. Touring Excel. Selecting Data. Presented by: Brian Pearson

Chapter 3: Rate Laws Excel Tutorial on Fitting logarithmic data

Using Tables, Sparklines and Conditional Formatting. Module 5. Adobe Captivate Wednesday, May 11, 2016

Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts. Microsoft Excel 2013 Enhanced

SAMLab Tip Sheet #4 Creating a Histogram

Introduction to Excel 2007

Learning Microsoft Excel Module 1 Contents. Chapter 1: Introduction to Microsoft Excel

lab MS Excel 2010 active cell

Graphical Analysis of Data using Microsoft Excel [2016 Version]

3/31/2016. Spreadsheets. Spreadsheets. Spreadsheets and Data Management. Unit 3. Can be used to automatically

Using Large Data Sets Workbook Version A (MEI)

Practical 2: Using Minitab (not assessed, for practice only!)

Excel 2010 Worksheet 3. Table of Contents

One does not necessarily have special statistical software to perform statistical analyses.

For Microsoft Office XP or Student Workbook. TECHNOeBooks Project-based Computer Curriculum ebooks.

Survey Design, Distribution & Analysis Software. professional quest. Whitepaper Extracting Data into Microsoft Excel

Excel Tips for Compensation Practitioners Weeks Pivot Tables

EXCEL BASICS: PROJECTS

Excel Shortcuts Increasing YOUR Productivity

A Tutorial for Excel 2002 for Windows

Charts in Excel 2003

Chapter 7 Notes Chapter 7 Level 1

Elementary Statistics

Microsoft Office Excel 2010: Basic. Course Overview. Course Length: 1 Day. Course Overview

Spreadsheet Software

Spreadsheet Warm Up for SSAC Geology of National Parks Modules, 2: Elementary Spreadsheet Manipulations and Graphing Tasks

Instructions on Adding Zeros to the Comtrade Data

The Excel worksheet contains 16,384 rows that extend down the worksheet, numbered 1 through

Group Administrator. ebills csv file formatting by class level. User Guide

Excel 2. Module 2 Formulas & Functions

Lesson 18 Getting Started with Excel Essentials

Rockefeller College MPA Excel Workshop: Clinton Impeachment Data Example

Statistics with a Hemacytometer

Review Ch. 15 Spreadsheet and Worksheet Basics. 2010, 2006 South-Western, Cengage Learning

Creating a Histogram Creating a Histogram

Conditional Formatting

Excel Project 5 Creating Sorting, and Querying a Worksheet Database

Excel Level 1

Transcription:

One of the applications of MS Excel is data processing and statistical analysis. The following exercises will demonstrate some of these functions. The base files for the exercises is included in http://lbabout.iis.p.lodz.pl/teaching_and_student_projects_files/files/us/lab_04b.zip. Download the archive and extract its content to your local drive. Exercise 1 Open lab_04b.xls. The first worksheet (database) contains a dataset with a large amount of employee names and some personal data. Using the Autofilter function available from the Data tab: all the people living in Canada earning less that 15 zł per hour, all the people whose family name starts with cal, all the people working more than 35 hours a week employed between 1997 and 1999. autofilter Exercise 2 Remove the autofilter (it is not necessary, however it will not be used anymore). Using the Sort function from the Data tab sort the table by country ascending, family name descending and employment year descending: Exercise 3 Import the data from lab_04b.csv file into an empty worksheet. This file is an example of CSV (comma-separated values), that reflect column format used by spreadsheets, however it a plain text file (open it with a notepad and see the structure). CSV format is very simple and useful when interchanging data between different systems. page 1 of 9

Hint: 1. Click on the From Text button in the Get External Data group on the Data tab. 2. Choose text files filter and point the source file. 3. When an import dialog opens, you can set all import parameters. Set file encoding to Central European (ISO) so that the Polish diacritic characters are correctly displayed: column format file encoding start import from row text file preview 4. Since the file is semicolon delimited (not constant width), press Next button. 5. Choose the column delimiter character (in this case semicolon), no text qualifier: semicolon no qualifier file preview with distinguished columns 6. Press Next button. 7. Finally you can adjust the column types. In this case all can be left as general (MS Excel 2010 will recognize numbers). Press Finish button and point the cell where the import should start. The worksheet should look as below: page 2 of 9

Exercise 4 Using the formulae calculate the average, maximum and minimum, median and the standard deviation of the average grades. Then create a graph illustrating the average grade distribution having sorted the students by their marks (if needed, reverse Y-axis categories). Format the plot as in the example using appropriate options and functions: Exercise 5 The second worksheet (sales) in lab_04b.xls contains some data about quarter sales of some product. The sales differ among the quarters, which is a quite common phenomenon in case of many seasonal products as ice-cream for example. The quarter to which a current row refers is denoted with 1, the others with 0. Columns n and time mean the same. At first, create a line plot of sales with respect to time adding a linear trend line. As you can see, the sales are generally growing, however their reflect some seasonal fluctuations: Your task is to determine the estimated sales at any time (quarter) in the future, respecting the overall trend and fluctuations. page 3 of 9

Edit the trend line and in the Options tab check the equation and R-square displaying. The equation is the trend line functions in form y = ax + b, while R 2 visualizes how accurate is the trend line approximation. The maximum value of R 2 is 1, however it would happen only if the data fit exactly the trend equation. In realistic situations you can regard the trend line as quite good if R 2 is greater than 0.8. trend line parameters Choose Data analysis in the Analysis group on the Data tab (provided it is installed. If not installed, see the hint below). Select Regression from the list and press OK. How to install Add-in: If this add-in is not installed, proceed as follows: 1. click the File tab, then click Options 2. Click Add-Ins, and then in the Manage box, select Excel Add-ins 3. Click Go 4. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK page 4 of 9

Enter the X and Y data ranges (Y are sales, X are time and the quarters) and point the output range (any cell outside the data table). For ease of orientation in the regression parameters, select the data ranges with column labels. In such case, check the Labels option. titles Press OK button. The regression analysis results are placed in your worksheet: intersection [i] [tr] [q1r] [q2r] The only cells required in the prognosis are marked with a yellow background. Multiple regression has a form y = b 0 + a 1 x 1 +a 2 x 2 +a 3 x 3 +...+a n x n + [random component], here y is sales, x 1 - time, x n ones and zeros (the quarters). b 0 is an intersection, a 1 is a time coefficient and so on. There is no random factor, which is a difference between the real data and the estimated data. In the column at right of the table enter the regression formula: page 5 of 9

[i]+[tr]*[t]+[q1r]*[q1]+[q2r]*[q2]+[q3r]*[q3]+[q4r]*[q4] where: [i] absolute reference to the cell with intersection in the regression table, [t] relative reference to the cell in time column, [q1], [q2], [q3], [q4] relative references to the cells with quarters (zeros and ones), [tr] absolute reference to the cell with time coefficient in the regression table [q1r], [q2r], [q3r], [q4r] absolute references to the cells with quarter coefficients in the regression table. estimated values The result should be as follows: with the formula in cell H2: =$B$40+$B$41*C2+$B$42*D2+$B$43*E2+$B$44*F2 +$B$45*G2 As you can see in the regression results, R 2 is about 0,969 which is fairly close to one. This means that the regression estimation of the trend is very good. Illustrate this by drawing a plot containing real sales data and the estimated ones: page 6 of 9

Having the correct regression estimation, you can calculate the sales for any quarter in the future (and past to some zero moment), assuming that the trend will be constant. It is apply the regression formula to any data row containing the time index, and 1 in the quarter for each the estimation is performed. Exercise 6 source: http://www-zo.iinf.polsl.gliwice.pl/~kadam/pimfet_std/excel/excel.htm The last worksheet in lab_04b.xls (babies) contains some data of newborns. The exercise demonstrates techniques of statistical data analysis such as histograms. The important thing that will simplify the work is to name the data ranges used for calculations. We will use babies' weights and heights: 1. Select the cell, range of cells, that you want to name 2. Click the Name box at the left end of the formula bar Entre name here 3. Type the name that you want to use to refer to your selection (e.g. weight). Names can be up to 255 characters in length. Remark: alternatively, you can use the Name Manager box Names group on the Formulas tab. In the same way name the heights range as height. Then, calculate the maximum and the minimum weight and height: that you will find in the Defined Knowing the upper and lower bands of the values in our distribution, we will create histograms (e.g. for weight the ranges will be [1800; 2000[, [2000; 2200[, etc.). In order to achieve it, create the table containing the data for the graph. Use the FREQUENCY (CZĘSTOŚĆ) function, whose arguments will be the weight named range and the ranges in the distribution table. The formula you are entering is named matrix formula as its values are affect some cell ranges. Follow the next page 7 of 9

steps carefully not to make any mistakes. Select the whole column in the distribution table and enter the FREQUENCY function: =FREQUENCY(weight;K11:K25) named range histogram thresholds (aka bins) histogram thresholds Accept the formula by pressing Ctrl+Shift+Enter simultaneously. This is the only way to enter a matrix formula. The formula will appear in braces: {=FREQUENCY(weight;K11:K25)} and the distribution table will be filled up with the data: Remarks: 1. you you don't use the matrix formula (so, basically, ou omit to press Ctrl+Shift+Enter), the cumulative frequency is displayed (e.g. for the bin 2600, the corresponding frequency will actually consider the number of babies with a weight lower than 2600). You can still perform the frequency distribution or histogram by displaying in a new column the subtraction of adjacent cells of the cumulative frequency column (e.g. for bin 2600 (e.g. cell A10), the cumulative frequency is 10 in B10, but the frequency corresponds to C10 = B10 - B9 = 10-4 = 6). What about the first bin, i.e. 2000? 2. alternatively, you can also choose the histogram module from the Data Analysis dialog box (Data Analysis in Analysis group) to directly create, as for the regression, an output table which displays the histogram 3. you can also display a relative or normalised histogram. You simply have to divide each frequency value by the number of observations (i.e. the number of babies in the statistic). Now you can create the weight histogram: page 8 of 9

Repeat the above steps and prepare the height histogram: Finally, create the plot illustrating the dependence of height on weight. Adjust axis ranges and try applying the linear trend line: page 9 of 9