Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction In this exercise, we will learn how to reorganize and reformat a data set in Excel and prepare it for export to JMP. We will reformat the measurement data to make records consistent and to proofread it. Second, we will format data so that it can be exported to JMP easily. As we will see, data can be manipulated and explored easily in JMP, but some operations are easier to perform in Excel. When analyzing data, I often perform some tasks in one program, then export data to the other. Sometimes I move data back and forth between programs several times before my analysis is complete. Thus, a data set should be formatted properly for both programs and easy to export. Proofreading is an essential step in data analysis. I always find mistakes in data entry when I proofread my data or that of my students. A single mistake can completely alter the results of a statistical analysis and lead a scientist astray. It is easier to catch mistakes in raw data than in a final version of a data set. My data sets conform to the following general principles: Labels (and filenames) should be long enough so that a naïve person can decipher them. For example, if you sampled populations in Big Pine Creek, use the full name in the data file, rather than abbreviating it as BPC. No spaces or strange characters (-, &, %,, /) should be included in a label. The left hand side of the data set should contain columns that group observations into certain categories (nominal variables or categories), while the right side of the data contains columns with the variables that have been measured (continuous variables). No rows or columns should be left empty, i.e. the data should form a full rectangle with no spaces inside of it. The data file should be as small as possible and it should be organized in a sensible way that is easy to read. I kept these principles in mind when I asked you to enter your measurement data last week. Objectives Learn how to sort data in Excel, learn to use the Replace option, and move cells to integrate differently entered data into a single data set. Learn to use simple formulas in Excel to create a new column of data. Develop skills at proofreading data and removing data entry errors. Format an Excel file for export to another statistical program. Import data to JMP and perform simple descriptive statistics on it in JMP.
Bio 345 Week 2- Formatting data in Excel and exporting to JMP -2- Exercise 1- Formatting and proofreading Copying files to your computer Obtain a floppy disk from your instructor and copy the files onto the hard disk of your laptop. Opening the data file Double click on the file NNNNBiometryF05measures.xls to load it into Excel. Save it, at once, into your Biometry folder as NNNNBio345f03_measures.xls, where NNNN is the first and last initial of your name and that of your lab partner. Revising the GroupName column You need to delete the characters and symbols that separate first names of some groups. Specifically, you should delete empty spaces, the underline characters, and the & sign that appears in some places. Scroll down your dataset and look for places where you see a space bar and punctuation in the GroupName column. Write down Groupnames with those characters in them at the end of the lab assignment. Using the Replace tool to find and replace spaces in the GroupName and SubjectName columns Select Edit and Replace. Type one space in the Find what: box and type nothing into the Replace with: box. Select Replace All and click your mouse once. Scroll down your dataset and check to see whether the spaces have been removed. If something looks funny, select Undo and check with your instructor before proceeding. Using the Replace tool to find and replace the. sign in the GroupName and SubjectName columns Select Edit and Replace again. Type. in the Find what: box and type nothing into the Replace with: box. Select Replace All and click your mouse once. Scroll down your dataset and check to see whether the & signs have been removed. Sorting observations Sorting is a powerful tool, but also dangerous, because it sometimes cannot be undone. I generally add a column to the very left side of the data set, which allows me to return to the original order of observations if my sorting operation creates a mishmash. Adding the rec column Select Column A by pointing at the letter A and clicking once. Select Insert and Columns. Then type rec in Column A and enter 1 in cell A1. Select Data and Fill. Point at cell A2 and hold your mouse button down to scroll down to cell A211. When the cells are selected, select Edit, Fill, Series. You will see a box with several options. Click OK, and you should now see that you have a column entitled rec with a sequence of numbers ending in 211.
Bio 345 Week 2- Formatting data in Excel and exporting to JMP -3- Making sure Subject names are consistent Scroll up to the top of the spreadsheet and click your mouse at cell A1. Holding your mouse down, scroll down slowly until the last cell (G262) is selected. Select Data and Sort You should see rec in the box beneath Sort by. (Sometimes you need to check whether the bottom of the sort box indicates that My list has header row option is selected. If it isn t, click the radio button so that it is.) Click your mouse onto the arrows next to this box and scroll down to SubjectName. Select rec in the box beneath the text Then by. Now press OK. Scroll down the data and proofread for differences in SubjectName, Gender, and Age among the SubjectName rows. Write any differences you find in the task section below: If you see any differences in SubjectName, Gender, and Age, select Edit and Copy from the correct version of the subject s name, and paste it into the rest of the cells. Repeat the sorting operation until the Subject Name and Gender in each row is identical. To make sure that the age is consistently recorded, sort by Agemonths, scroll down the data set, and write down the SubjectNames of any subjects where more than one age is recorded in the task section below. Copy and paste the most common age into the rows that show a rarer value for the age. Then repeat the sorting operation until the Subject Name and Gender in each row is identical. Standardizing the Gender column. Select Data and Sort and select Gender in the box. Using the sorting and editing procedures described above, replace variants on Female or Male with these expressions (with first letter capitalized). Using Excel formulas to make a column of mean measures. Type MeanValue in cell H1. Point towards the H2 cell and type an = sign and a ( sign. Point at the F2 cell. You should see F2 in a colored font. Type a + and then point at the G2 cell. Type a ) symbol, then a slash (/) and a 2. Press Enter. You should see the average of the two values in H2. Now copy this value down to cell H211. Final data formatting and preparation for export Sort your data in the order Measurement, GroupName, and SubjectName. Send your Excel file to me by taking the computer out to the hall and depositing it into the Biometryf03 student data folder (Alternatively, if you still need to work on it at home, email it to me). Remember that the filename has the first initials of you and your partner. Save your file into your Biometry folder as NNNNBio345f03_measures.txt, where NNNN is the first and last initial of your name and that of your lab partner. Make sure to change the
Bio 345 Week 2- Formatting data in Excel and exporting to JMP -4- format of the file to Text (tab delimited) before saving. Excel will warn you that only the current sheet will be saved. Press OK. Then it will ask you whether you want to lose Excel features by converting it to text. Click Yes here. You really do want a text file. Exercise 2- Importing your data to JMP Launch JMP. You will see a menu of options and should select Open Data Table. You need to select your Biometry folder. JMP should show you a list of files, but the list might be blank. To the right of 'Show:,' select 'Text Documents.' 1 Beneath this window, you will see 'Open As:' and you must select or 'Text Data with Preview.' You should see the text file that resulted from the Excel file you exported. Select it and click Open. You will see a window asking you how to import the file and you can just press RETURN here. Now you will see a window with the predicted format of the columns. Make sure that this window shows 'Tab' as the end of the field, and that it indicates that the table includes column headers. Once the file is open, you should see a list of variables on the left hand side with a C (continuous) or a N (Nominal) next to them. Check to make sure that the variables that you think should be continuous are labeled that way, and that the categories are labeled N. On the lower left, you will see a description of the number of rows in the data. Sometimes, on import, JMP mistakenly adds rows of missing data to the data set. If you notice that substantially more than 264 rows are in the data, scroll down to the bottom and delete the extra rows using the Row and Delete function. Save your file as a JMP file by clicking Save as, using the same filename as before, but replacing '.txt' suffix with a.jmp suffix. Email your Excel file and your JMP file to yourself for further exploration and analysis before the next laboratory session. If you have difficulties sending attachments, let me know and we will figure out a way to get the file to you. Follow the directions indicated by task #3 below to obtain a summary of your data in JMP, which you will check for proofreading errors during your work with Excel and reexport, if necessary, after any errors have been corrected. Tasks 1. Compare the data in the JMP file Bio345measuresF05.jmp to the data in the file that you started working with at the beginning of lab NNNNBio345f03_measures.xls. Write down Groupnames that started out with unusual characters in them and show how you changed them through the Edit/Replace function. 2. Write down any differences in the SubjectNames and show how you changed them. 3. Import your file into JMP, and use the Tables and Group/Summary command to create a class mean for each measurement. You will put the variables Measurement and Gender into the Grouping box at the top right, and select GroupMean and Mean to obtain the 1 In the Windows version, under Files of type click the arrow and select Text import files
Bio 345 Week 2- Formatting data in Excel and exporting to JMP -5- mean values. Export this file to a text file, and email it to me after making sure that proofreading your Excel file eliminated any mistakes. If mistakes are present, proofread the Excel file again and repeat the Export and Tables and Group/Summary commands you have eliminated the mistakes. 4. Master the techniques illustrated in Chapters 1 and 2 of the JMP Start Statistics book (i.e. Opening data tables, looking at the data and launching an analysis from a data tables.)
Bio 345 Week 2- Formatting data in Excel and exporting to JMP -6- Introductory review of JMP The following provides some general guidelines for using JMP. During the next week, you need to master the techniques illustrated in Chapters 1 and 2 of the JMP Start Statistics book (i.e. Opening data tables, looking at the data and launching an analysis from a data tables.) Almost everything we do this semester will involve importing data files into JMP. To import a data file, use the File and Open command and choose the appropriate format, usually a.jmp file or a text file containing data. The Open command displays a specialized open file dialog that lets you locate the file you want to open and tell JMP the format of the incoming file. The Open command then reads the file into a JMP data table. JMP reads JMP data tables, JMP journal files, JMP Script files, SAS data sets, SAS transport files, text files with any column delimiter, and Excel files. You indicate the kind of file with a Files of Type selection. For most uses in this class, the file will already be a.jmp data file or you will need to choose text import files. One of the best features about JMP is that the analysis platform is dynamic. This means that you can change your mind, back up and redo things in different ways. The title button of each analysis box open or closes the box. As you choose additional analyses or launch different platforms, additional analyses boxes are made available to you. You can access help from anywhere in JMP. In JMP 4.0, the help keys on each individual analysis item have been removed, and the program now uses a standard contents/index/search help function off the menu bar at the top of the program. When in doubt launch the help program. Creating new columns and using formulas in JMP One important command is Columns and New Column. This command creates a column and defines its characteristics. This is useful for hand-entering new data, and for entering formulas in JMP. You frequently will want to calculate new variables from existing variables. In the new column popup window, choose New Property and Formula to launch the calculator window. Chapter 3 in your manual gives detailed instructions on using the calculator. Creating graphs in JMP You can use the Graph menu to take a look at your data. This is a quick way to scan for data entry errors. All analysis platforms include a plot, so you won t need to use this function often. Creating new columns and using formulas in JMP You can use the commands under the Tables menu to create subsets, sort, copy and join data tables. The Table and Subset command will create a duplicate copy of the data table if you don t select any rows or columns. This is useful if you want to experiment with data sorting or manipulations and still keep a safe copy of the original table on the desktop. The most important command in the Tables menu for running different analyses is the Group/Summary command. Group/Summary creates a JMP window that contains a summary table. This feature uses grouping variables to create a single row for each level of a grouping variable you choose. When there are several grouping variables, the summary table has a row for each combination of grouping variables. The columns of the summary table are summary statistics that you request.
Bio 345 Week 2- Formatting data in Excel and exporting to JMP -7- Excluding/Including data in analyses Use Rows and Row Selection and Select Where to select the data set to exclude. Use Rows and Exclude to leave out the selected rows, so you can analyze what s left. Use Rows and Include to return the excluded rows to the analysis. Use Rows and Row Selection and Invert Selection to flip the selections.