Does Pivot Tables and More Jim Holtman

Similar documents
Excel Level Three. You can also go the Format, Column, Width menu to enter the new width of the column.

Excel. Excel Options click the Microsoft Office Button. Go to Excel Options

CS1100: Computer Science and Its Applications. Creating Graphs and Charts in Excel

Excel 2013 Intermediate

Designed by Jason Wagner, Course Web Programmer, Office of e-learning NOTE ABOUT CELL REFERENCES IN THIS DOCUMENT... 1

Excel. Spreadsheet functions

Frequency Distributions

DATA STRUCTURE AND ALGORITHM USING PYTHON

Microsoft Excel 2016 / 2013 Basic & Intermediate

2 A little on Spreadsheets

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

EXCEL + POWERPOINT. Analyzing, Visualizing, and Presenting Data-Rich Insights to Any Audience KNACK TRAINING

Excel Core Certification

Review Ch. 15 Spreadsheet and Worksheet Basics. 2010, 2006 South-Western, Cengage Learning

Formulas, LookUp Tables and PivotTables Prepared for Aero Controlex

GO! with Microsoft Excel 2016 Comprehensive

Microsoft Excel 2010 Training. Excel 2010 Basics

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Learning Microsoft Excel Module 1 Contents. Chapter 1: Introduction to Microsoft Excel

An introduction to plotting data

COMPUTER TECHNOLOGY SPREADSHEETS BASIC TERMINOLOGY. A workbook is the file Excel creates to store your data.

Spreadsheet Concepts: Creating Charts in Microsoft Excel

Chapter 2 - Graphical Summaries of Data

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Excel 2010 Charts and Graphs

1. Data Analysis Yields Numbers & Visualizations. 2. Why Visualize Data? 3. What do Visualizations do? 4. Research on Visualizations

SUM - This says to add together cells F28 through F35. Notice that it will show your result is

Intermediate Excel 2013

06 Visualizing Information

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Ms Nurazrin Jupri. Frequency Distributions

Mikon Client Release Notes

9 POINTS TO A GOOD LINE GRAPH

Chapter 10 Working with Graphs and Charts

Intellicus Enterprise Reporting and BI Platform

Excel Basic 1 GETTING ACQUAINTED WITH THE ENVIRONMENT 2 INTEGRATION WITH OFFICE EDITING FILES 4 EDITING A WORKBOOK. 1.

Using Excel This is only a brief overview that highlights some of the useful points in a spreadsheet program.

Working with Data and Charts

Making EXCEL Work for YOU!

Office Applications II Lesson Objectives

Spreadsheet Applications Test

Project 4 Financials (Excel)

Gloucester County Library System. Excel 2010

OX Documents Release v Feature Overview

Spreadsheet Tips and Tricks for EBSCO Usage Reports. Melissa Belvadi University of Prince Edward Island October, 2017

Why Should We Care? More importantly, it is easy to lie or deceive people with bad plots

Basic Excel 2010 Workshop 101

Rockefeller College MPA Excel Workshop: Clinton Impeachment Data Example

Excel 2013 Intermediate

CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010

. Sheet - Sheet. Unhide Split Freeze. Sheet (book) - Sheet-book - Sheet{book} - Sheet[book] - Arrange- Freeze- Split - Unfreeze - .

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

Ms excel. The Microsoft Office Button. The Quick Access Toolbar

Practical QlikView MARK O DONOVAN

EXCEL 2003 DISCLAIMER:

Chapter 13. Creating Business Diagrams with SmartArt. Creating SmartArt Diagrams

CHAPTER 6. The Normal Probability Distribution

Contents. Introduction 15. How to use this course 18. Session One: Basic Skills 21. Session Two: Doing Useful Work with Excel 65

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet

Become strong in Excel (2.0) - 5 Tips To Rock A Spreadsheet!

DOING MORE WITH EXCEL: MICROSOFT OFFICE 2013

EXCEL BASICS: MICROSOFT OFFICE 2007

MS Excel Advanced Level

HP StorageWorks Command View TL TapeAssure Analysis Template White Paper

Excel 2010: Getting Started with Excel

PivotTables & Charts for Health

Use of GeoGebra in teaching about central tendency and spread variability

If the list that you want to name will change In Excel 2007 and later, the easiest way to create.

Excel 2. Module 3 Advanced Charts

Unit 2 Fine-tuning Spreadsheets, Functions (AutoSum)

Instructions on Adding Zeros to the Comtrade Data

How to stay connected. Stay connected with DIIT

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Microsoft Excel 2010 Handout

Chart For Dummies Excel 2010 Title From Cell Value Into

EVALUATION COPY. Unauthorized Reproduction or Distribution Prohibited EXCEL INTERMEDIATE

SHOW ME THE NUMBERS: DESIGNING YOUR OWN DATA VISUALIZATIONS PEPFAR Applied Learning Summit September 2017 A. Chafetz

Unit 3 Fill Series, Functions, Sorting

Table of Contents (As covered from textbook)

Excel 2013 Charts and Graphs

Office Excel. Charts

Chemistry 30 Tips for Creating Graphs using Microsoft Excel

1.a) Go to it should be accessible in all browsers

Unit 3 Functions Review, Fill Series, Sorting, Merge & Center

Excel 2013 Workshop. Prepared by

Introduction to Minitab 1

GCSE CCEA GCSE EXCEL 2010 USER GUIDE. Business and Communication Systems

Chapter 3 Analyzing Normal Quantitative Data

Coding & Data Skills for Communicators Dr. Cindy Royal Texas State University - San Marcos School of Journalism and Mass Communication

Programming. Dr Ben Dudson University of York

Excel Manual X Axis Labels Below Chart 2010

Golden Software, Inc.

Excel Tables and Pivot Tables

Excel Level 1

Microsoft Office Excel 2010: Basic. Course Overview. Course Length: 1 Day. Course Overview

Desktop Studio: Charts. Version: 7.3

+ Statistical Methods in

Introduction to Excel 2007

Spreadsheet Software L2 Unit Book

Transcription:

Does Pivot Tables and More Jim Holtman jholtman@gmail.com There were several papers at CMG2008, and previous conferences, that got me thinking about other ways that R can help with the analysis and visualization of performance data. There were a couple of sessions that made use of pivot tables in Excel to help analyze data. There was also a paper that referenced sparklines as a method of visualizing data. This paper will show how R can be used to do these, and other, procedures that will enhance your ability to better analyze performance data. 1 Overview At CMG many of the papers describe how various performance metrics about a system can be analyzed. There are a number of different ways that this data is collected (proprietary vendor code, open source, user written scripts, etc.). Once this data is collected, there are again a variety of vendor, open source and user written procedures to process this information. Many of these are very flexible in providing a user with ways of customizing the subset of data to be analyzed, the algorithms to analyze the data and the format for the presentation of this data. I have used many of these tools in the past and still rely on them. Like most practitioners of computer performance analysis, I have my own tool chest of things that make by life easier. These include Perl for preprocessing/formatting unstructured data from log files, standard text editors for examining/changing data, Excel for quick looks at the data and for communicating results to others who are used to working with Excel, and of course R which is my favorite because of its versatility for analysis and graphical presentation of the results. R is an open source language and environment for statistical processing. It is based on the S language originally developed at Bell Labs by John Chambers who won the ACM award in 1998 for the language. It easily handles data files with millions of records (e.g., transaction response times), and compute, for example, the average response time and create a histogram of the response times in less than a couple of seconds. The graphics available in R for data visualization are very rich and flexible. Being able to slice/dice your data and then visualize it in various ways allows you to quickly see patterns in your data that just numbers in a table will not reveal. It is very well supported through an active user s group and there are over 85 books available covering the areas that R has been used for. I have used it for the last 25 years for doing computer performance analysis. To quickly find R on the internet, just type R into Google and it will be the first hit. The links will provide an overview of R. There is a learning curve to it, but it is well worth the effort if you are serious about performance analysis. The presentation slides have a 10 minute R workshop which provides an overview of R. 2 Pivot Tables John Van Wagenen s paper Pivot Tables/Charts Magic Beans Without Living in a Fairy Tale at CMG 2008 gave a very good overview of how pivot tables can help in analyzing, and visualizing, data that a performance analyst typically works with. Pivot tables allow an analyst to slice/dice the data in various ways, and to create aggregations of the data by various classifications. Pivot tables are typically associated with Excel, but the same information can Figure 1 - Sample 15 Minute Data From Excel be constructed by a variety of packages. For example, SQL statements can be used to group the data by various criteria and then summarize the results. Most of the vendor supplied packages have similar capabilities. John gave his permission to use the data from his paper so that I can illustrate that the results are similar when using R. The spreadsheet that he shared with

DAY,HOUR,MIN,SEC,MACHINE,LPAR,PHY_TOT,MIPS,CPS,CPU_HOUR,TYPE be specified. 6/2/2008,0,30,1,713,*PHYSI,3.26,176.5907684,13,0.4238,PROD The data is read into an object ( cpu.15 ) 6/2/2008,0,30,1,713,AAMTBC,0,0,13,0,TEST 6/2/2008,0,30,1,713,BBMTBC,0,0,13,0,TEST which is a dataframe. In R, a dataframe 6/2/2008,0,30,1,713,GGMTBC,0,0,13,0,TEST is very similar to an Excel spreadsheet in 6/2/2008,0,30,1,713,QA,0.63,34.12643684,13,0.0819,TEST that it looks like a table where each of the 6/2/2008,0,30,1,713,QB,1.32,71.50301053,13,0.1716,TEST 6/2/2008,0,30,1,713,QD,0.44,23.83433684,13,0.0572,TEST columns can have a different attribute 6/2/2008,0,30,1,713,SOLAR1,0.33,17.87575263,13,0.0429,PROD (e.g., character, numeric, etc.) and it is easy to reference the data items individually or as a vector representing the Figure 2 - CSV File for Input to R entire column. Part of the power of R comes from the vectorized operations that make it easy to defined transformations on the data. The contents of the dataframe are shown in Figure 5; notice that it looks very similar to the Excel spreadsheet in Figure 1. Figure 3 - Pivot Table From Excel me had some different data, but it did have the pivot tables generated from this data. 2.1 Pivot Tables The first example is from 15 minute data that was collected on system utilization. Figure 1 is a sample of the first entries in the Excel spreadsheet. To read this data into R, I converted the spreadsheet to a CSV file. R can directly read from Excel spreadsheets, but it is easier to illustrate the processing if we assume the data is in a file, since that is probably where most data is located. The resulting CSV is shown in Figure 2. In Excel a pivot table was created summarizing the CPU_HOUR over each DAY, HOUR and MIN, and generating total on each of the breaks. The Excel pivot table is shown in Figure 3. You can read John s paper to see how to setup the pivot table from the given input. To create a similar output in R, the script is shown in Figure 4. The first statement ( read.csv ) calls a function that will read in a CSV (comma separated variable) file. The default parameters are that the separator is a comma and that there is a header line in the file that defines the names of the columns when the data is read in. If your data file did not have a header line, then the parameter header=false tells the function to start reading the data at line 1; you can then assign names to the columns as you desire. If you have another separator like a tab or semicolon, these can As in any programming environment, there are a number of ways of getting similar results. In R there are a number of functions (apply, aggregate, tapply, etc.) that can summarize data in a pivot table-like format. R also has a number of packages (similar to modules in Perl, classes in Java, or libraries in C/C++) which encapsulate useful functions that minimize the amount of code that has to be written. R has a number of these packages that make it easy to transform data, aggregate the data and then summarize the results. One of the packages that I have found very useful is the reshape package which lets you restructure and aggregate your data Figure 4 - R Commands to Create the Pivot Table

So in the script, I indicate that I want to use the package [require(reshape)], and then I melt the dataframe that was read into specifying that I intend to use three of the columns (DAY, HOUR, MIN) to aggregate the data and that the value I want aggregate is CPU_HOUR. Now that the data has been melt ed, it can be cast into some output. The cast function has as its first parameter the object (cpu.melt) from the melt, and then a formula specifying how the Figure 5 - Dataframe in R Created from the CSV File (Looks Like Excel data is to be aggregated. The formula Spreadsheet) DAY + HOUR ~ MIN indicates that the rows will have DAY and HOUR, and that columns will contain the MIN. The data will be aggregated with these variables and the sum will be computed and stored in the resulting dataframe. There is also a parameter to indicate that margins are to be created. Margins will produce row totals and column total on the control Figure 6 - Batch Data From Excel breaks, which in this case is DAY. The output of the first 25 lines is shown in Figure 4. Comparing this output with Figure 3 shows the results are the same; the layout of the data is different. The last command just creates a pivot table for summarizing the CPU_HOUR per day. The data file had over 10,000 lines of data. It took 1 second to read the data in and create the two pivot table outputs. The script can be reused to read in any number of data files. Figure 7 - Pie Chart of Shift Usage Figure 8 - Pivot Table of Shift Usage with just two functions: melt and cast. melt puts the data into a format that can be used by cast to then create new aggregations of the data. Documentation is provided with the package that provides plenty of examples of how to use it. 2.2 Pivot Charts Another use of the output from a pivot table is to generate a chart. John had a data file about batch jobs being run. A sample of the contents of the Excel spreadsheet is shown in Figure 6. This data was summarized by shift and the pie chart in Figure 7 was created; the pivot table for this chart is Figure 8. Figure 9 shows the R script used to read in the CSV file create from the Excel spreadsheet, summarize the cpu hours by shift and then create the pie chart in Figure 11. This used another R function (tapply) to create the aggregation by shift. As I mentioned previously, there are a number of ways of doing things in R. I did notice one difference in the data in that John s pivot table filtered out HOLIDAY since there was such a small usage. I choose to leave it in, but could have easily removed it from the data. This file had about 24,000 data lines. It took 0.5 seconds to read in the data, aggregate the data and generate the pie chart. The final example makes use of some implied information in the data. In the spreadsheet the column DB2 had a name that if the 3 rd character was a P, then it

5/1/2007 6/1/2007 7/1/2007 8/1/2007 9/1/2007 10/1/2007 11/1/2007 12/1/2007 1/1/2008 2/1/2008 3/1/2008 4/1/2008 5/1/2008 6/1/2008 cpu seconds Breakdown by Shifts Figure 9 - R Script for Shift Usage WEEKEND HOLIDAY PERIOD2 PRIME PERIOD3 Figure 11 - Pie Chart from R Figure 10 - Excel Data for Prod/Dev Pivot Table (DEV). So when the data was read in, a new column was added with this indication so the pivot table could be generated. Figure 10, Figure 12 and Figure 13 are the data in the Excel spreadsheet and the pivot table and chart created from the data. Figure 14 is the R script to read in the data, create a new column with the workload, create the pivot table and then generate the chart in Figure 15. This data only had 96 rows and it took 0.2 seconds to read in the data, do the transformations, generate the pivot table and the chart. Figure 12 - Excel Pivot Table from Data 2500000 2000000 1500000 1000000 500000 0 Figure 13 - Chart from Excel Pivot Table was production (PROD), otherwise it was development 3 Sparklines In Ron Kaminski s paper on Automating Process Pathology Detection Rule Engine Design Hints he described sparklines as one of the ways of presenting a lot of data in a small amount of space. Basically sparklines are graphs without the axes to clutter up the presentation of information. Sparklines were invented by Edward Tufte who is a well known expert on data visualization. DEV PROD Figure 16 is an example of sparklines showing the price of 4 stocks over a 5 year period. You can see that they have roughly the same shape, even though the y-axis has different ranges. Numbers provide the extent of these ranges and identify other important points. I have used multiple graphs on a page to show the relationships between various measurements, but typically I was limited to displaying around 15 charts with all the extra space being taken up

2007-05-01 2007-06-01 2007-07-01 2007-08-01 2007-09-01 2007-10-01 2007-11-01 2007-12-01 2008-01-01 2008-02-01 2008-03-01 2008-04-01 2008-05-01 2008-06-01 Total CPU Seconds Figure 16 - Example of Sparklines Figure 14 - R Script to Create Pivot Table and Chart 2000000 1500000 1000000 500000 0 Figure 15 - Chart Generated from R Figure 17 - Example of vmstat Log File by the labeling of the axes. Figure 25 is just to show the amount of space that is taken up with labeling the axes and such. It also makes it hard to compare different graphs to look for patterns. With R it is easy to generate sparklines because you have complete control over how graphics are created. R has some very sophisticated graphics, but I will use just the basic graphics to show how sparklines can be created. PROD DEV The only difference between creating a set of charts like Figure 25 and sparklines, is telling the system not to create the axes and to plot the data in a smaller window. The charts in Figure 25 were created from running the vmstat command on a UNIX system. Vmstat will record about 20 different measurements including CPU utilization, memory and number of running processes. Similar data will be used to demonstrate sparklines. One of the scripts that I have running on systems that I monitor writes the vmstat data to a file with a timestamp. This data is then read by the analysis programs and reports and charts are created. An example of the log file is shown in Figure 17. This data is read in and results in as a matrix with each row being a sample and the columns the data for that sample. Figure 18 shows the amount of R code that was written to create a plot of sparklines with nr rows and nc columns on a single page. Figure 26 is the sparklines that were generated. This represents one day of system operation (00:00 24:00). On the left side of each sparkline is the name of the measurement being plotted. This is followed by its average value over the day. The average value is represented by the horizontal gray line that can be used as a reference as to the variation of the data. The red number on the left above the gray line is the maximum value; the green number on the right below the gray line is the minimum value for the day. This allows you to quickly see some of the relationships. There is also a red dot to mark the first maximum and a green dot to mark the first minimum of the sample. The easiest one to point out is the last two lines on the chart; the idle time and the user + system time. As you can see these are mirror images of each other and this is what you would expect from the data. Even without the time being explicit, since we know that this represents a 24-hour day, we can see that the first third of the day appears to be the busiest with the overall activity in the rest of the day being low. For this system, that is what happens; it processes the performance data from a number of systems by downloading load files and then processing the data so that it is

Total Transactions 0 5000 10000 15000 Figure 18 - R Function to Plot Each Column as a Sparkline With nr Rows and nc Columns Figure 19 - Transaction Count for User/Tran Tran.01 Tran.02 Tran.03 Tran.04 Tran.05 Tran.06 Tran.07 Tran.08 Tran.09 Tran.10 Transaction Count by User ready by 07:00 for review to see how the system performed the previous day. Figure 27 was from a CMG2004 paper I wrote and is a levelplot for the system utilization for a month. It uses color to show what would be the z- axis value (utilization) if this were a 3D graph. The data used to create the sparklines is 5/16/05 so you should be able to compare the utilization (user + sys) of the sparkline with the levelplot. I also added to the plot the set of sparklines for the same period. Do they both convey the same information to you? In Figure 28 I just took the month s worth of sparklines and replicated them 12 times to show what a year s worth of utilization might look like. Wouldn t it be nice to have a page like this for each of your systems so that you would look for patterns. You could also line up the plot so that a day of the week was a row so you could see the pattern for that day in the month across the year. If you really like 3D plots, R can generate those also. The rgl package will create a 3D plot that you can rotate with a mouse to see different views. Figure 29, Figure 30 and Figure 31 show the interactive 3D graphs that can be created with R. User.01 User.02 User.03 User.04 User.05 User.06 User.07 User.08 User.09 User.10 Figure 20 - Stacked Bar Chart of the Transaction Count 4 Transaction Data I want to use some transaction data to show another way of visualizing the data from a pivot table. I originally had a transaction log of 79,000 transactions; 159 transaction types across 300 users. To make the data easier to present, I created 10 transaction types by splitting the transactions based on their response times (Trans.01 has the shortest average response time and

Tran User.01 User.02 User.03 User.04 User.05 User.06 User.07 User.08 User.09 User.10 Trans.01 Trans.02 Trans.03 Trans.04 Trans.05 Trans.06 Mosaic Plot of the Number of Transactions by User - Area Proportional to Count Trans.10 has the longest). The users were just split into 10 groups randomly. The log file has the user, transaction, start and end times. The file was read in with an R script and the pivot table in Figure 19 was created. If you look at the data, User.06 has the smallest transaction count and User.08 the largest. One way of visualizing this information is using a stacked barchart as shown in Figure 20. Here it is easy to see that User.08 entered the most transactions and User.06 the least. But it is hard to determine for each user the ratios between the individual transactions for that user. This is where a mosaic plot helps to visualize this relationship. In a mosaic plot, the values are plotted as rectangles and the area of the rectangle is proportional to the count. The vertical axis will be the same for all variables so that you can see the relationships of the transaction counts for a user. Figure 21 is the mosaic plot of the pivot table data. You can see on the chart that User.08 has the widest vertical area indicating that this user has the highest total transaction count; User.06 has the least area indicating the lowest transaction count. In this view of the data, you can see that User.06 has a higher percentage of transactions Tran.06, Tran.09 and Tran.10 than User.08. This might indicate that these two users have different roles and therefore execute different transaction mixes. A mosaic plot can help identify this condition. Figure 24 shows the relationship of the ratios of the average response times of transactions Trans.07 for each user. Here you can see that Trans.10 appears to Trans.08 have an average response time that is almost equal to the sum of the response times for the Trans.09 other 9 transactions. Again, based on how I partitioned the Trans.10 transactions, Trans.10 should User have the longest response Figure 21 - Mosaic Plot of Transaction Counts for a User time, but even across some of the users, there is quite a bit of variation. Remember that this chart does not show the value of the average response time of a transaction for a user, just the ratio of its response time compared to the other transactions executed by that user. Figure 22 - Average Response Time of Transactions for Each User Figure 22 shows the average transaction response time for each user. Even though Trans.10 has the longest response time, relatively it is less frequently executed than most of the other transactions as you can see in Figure 21. Figure 23 is a graph of the sparklines of the distribution of the average response times of the transactions for a given user. Think of this as a histogram drawn with a smooth line. The x-axis is 0-3 seconds for the response times. In the data, there was a maximum of 879 seconds for one transaction (I am not sure if the user really waited for a response in this case); the 95 th percentile was 1.7 seconds, so I choose 3 seconds for the chart since this encompassed over 95% of all the transactions. In most cases, SLAs (service level agreements) are based on XX% of the responses time being less than a given number. Systems I have worked on in the past had this number as 90%/95%. Looking at the data, it appears that User.06 and User.07 have larger tails on the right side indicating that they have are experiencing longer average response times. The pivot table in Figure 22 shows that these

User.01 User.02 User.03 User.04 User.05 User.06 User.07 User.08 User.09 User.10 User.01 User.02 User.03 User.04 User.05 User.06 User.07 User.08 Response Time Distribution -- Sparklines people on the projects. They will give me data in an Excel spreadsheet that I can use as input. When I generate output, in many cases I will transfer the results to an Excel spreadsheet (R can write Excel workbooks with multiple sheets) since it allows the recipient to do further manipulations of the data, or to include the data into Word documents or PowerPoint presentations. The R scripts, and data, used in this paper are available if you send me email requesting them. Trans.01 Trans.02 Trans.03 Trans.04 Trans.05 Trans.06 Trans.07 Trans.08 Trans.09 Trans.10 User.09 User.10 Figure 23 -Sparklines of the Density (Histogram) Plot of Response Times for a User Mosaic Plot of Response Times - Area Proportional to Time Figure 24 Ratios of Average Response Time of Transactions for a User users do have the longest average response times across all their transactions. These users might have different roles, and therefore execute a different mix of transactions some of which have longer response times. It is this type of analysis that leads to a better understanding of your environment. 6 References [1] J. Van Wagenen, Pivot Tables/Charts Magic Beans Without Living in a Fairy Tale, CMG 2008 [2] Ron Kaminski, Automating Process Pathology Detection Rule Engine Design Hints, CMG 2008 [3] R Development Core Team, R: A Language and Environment for Statistical Computing, {ISBN} 3-900051-07-0, http://www.r-project.org [4] J. Holtman, Using R for System Performance Analysis, CMG 2004 [5] J. Holtman, Visualization Techniques for Analyzing Patterns in System Performance Data, CMG 2005 [6] N. J. Gunther, Guerrilla Capacity Planning, Springer-Verlag, Heidelberg, Germany, 2007 [7] H. Wickham, Reshaping data with the reshape package, Journal of Statistical Software, 21(12), 2007 [8] Venables, W. N. and Ripley, B. D. Modern Applied Statistics with S. Fourth Edition. Springer, 2002, ISBN 0-387-95458-0 [9] Tufte, Edward Beautiful Evidence Graphic Press 2006 [10] Spector, Phil Data Manipulation with R (Use R) Springer, 2009. ISBN 978-0387747309 5 Wrap-Up Hopefully I have given you some examples of other things that R can do, and hopefully they will whet your appetite to learn more about R. R should be considered as one of the tools that you have in your toolkit. In my current engagement, I use R for most of the analysis that I do, but still make extensive use of Excel. Excel happens to be the preferred way of interchanging data among the other

Figure 25 - Typical Multiplots Per Page - Data from 5/16/05

Figure 26 - Sparklines Created from 'vmstat' Log File: 19 Different Measurements for 5/16/05 (red is max; green is min)

Figure 27 - Levelplot (3D on 2D Surface) of System Utilization for a Month + Equivalent Sparklines

Figure 28 - What One Year of System Utilization Might Look Like in Sparklines

Figure 29-3D Chart of the Utilization Data Figure 30 - Another View of the Same Data Figure 31 - Yet Another View from Underneath