Department of Language and Linguistics LG400 Computer Induction for Linguists Class Handouts Week 7 Working with data on MS Excel by Mutsumi Ogawa mogawa@essex.ac.uk Session description Do you work with quantitative data in your study? In this session, we learn basics of data management on MS Excel. The session focuses on how to sort data by using formulas, make graphs and charts, and convert text data into a spreadsheet. Additionally, it includes how to make Excel data available in SPSS. ***NOTE: This is not a statistics lesson.
LG400 IT Induction 1 1. Introduction This session focuses on the following topics. Formulas Graphs and Charts Text to Columns SPSS 2. Formulas 2.1 Conditional (AND, OR, NOT, IF) Testing whether conditions are true or false and making logical comparisons between expressions are common to many tasks. You can use the AND, OR, NOT, and IF functions to create conditional formulas. Create a conditional formula that results in a logical value (TRUE or FALSE) To do this task, use the AND, OR, and NOT functions and operators as shown in the following example.
LG400 IT Induction 2 <Example> A B 1 Data 2 15 3 9 4 8 5 Sprockets 6 Widgets 7 Formula Description (Result) =AND(A2>A3, A2<A4) Determines if the value in cell A2 is greater than the value in 8 A3 and also if the value in A2 is less than the value in A4. (FALSE) =OR(A2>A3, A2<A4) Determines if the value in cell A2 is greater than the value in 9 A3 or if the value in A2 is less than the value in A4. (TRUE) 10 11 =NOT(A2+A3=24) Determines if the sum of the values in cells A2 and A3 is not equal to 24. (FALSE) =NOT(A5="Sprockets") Determines if the value in cell A5 is not equal to "Sprockets." (FALSE) 12 =OR(A5<>"Sprockets", A6 = "Widgets") Determines if the value in cell A5 is not equal to "Sprockets" or if the value in A6 is equal to "Widgets." (TRUE) Create a conditional formula that results in another calculation or in values other than TRUE or FALSE Formula that uses the IF function logical_test: The condition that you want to check. value_if_true: The value to return if the condition is True. value_if_false: The value to return if the condition is False.
LG400 IT Induction 3 <Example> A B 1 Data 2 15 3 9 4 8 5 Sprockets 6 Widgets 7 Formula Description (Result) =IF(A2=15, "OK", "Not OK") If the value in cell A2 equals 15, return "OK." 8 Otherwise, return "Not OK." (OK) 9 =IF(A2<>15, "OK", "Not OK") If the value in cell A2 is not equal to 15, return "OK." Otherwise, return "Not OK." (Not OK) 10 =IF(NOT(A2<=15), "OK", "Not OK") If the value in cell A2 is not less than or equal to 15, return "OK." Otherwise, return "Not OK." (Not OK) 11 12 13 14 15 16 =IF(A5<>"SPROCKETS", "OK", "Not OK") =IF(AND(A2>A3, A2<A4), "OK", "Not OK") =IF(AND(A2<>A3, A2<>A4), "OK", "Not OK") =IF(OR(A2>A3, A2<A4), "OK", "Not OK") =IF(OR(A5<>"Sprockets", A6<>"Widgets"), "OK", "Not OK") =IF(OR(A2<>A3, A2<>A4), "OK", "Not OK") If the value in cell A5 is not equal to "SPROCKETS", return "OK." Otherwise, return "Not OK." (Not OK) If the value in cell A2 is greater than the value in A3 and the value in A2 is also less than the value in A4, return "OK." Otherwise, return "Not OK." (Not OK) If the value in cell A2 is not equal to A3 and the value in A2 is also not equal to the value in A4, return "OK." Otherwise, return "Not OK." (OK) If the value in cell A2 is greater than the value in A3 or the value in A2 is less than the value in A4, return "OK." Otherwise, return "Not OK." (OK) If the value in cell A5 is not equal to "Sprockets" or the value in A6 is not equal to "Widgets", return "OK." Otherwise, return "Not OK." (Not OK) If the value in cell A2 is not equal to the value in A3 or the value in A2 is not equal to the value in A4, return "OK." Otherwise, return "Not OK." (OK)
LG400 IT Induction 4 Lookup (VLOOKUP, HLOOKUP) Let's say that you want to look up a participant s test score by using their ID. You look up data to quickly and efficiently find specific data in a list and to automatically verify that you are using correct data. After you look up the data, you can perform calculations or display results with the values returned. There are several ways to look up values in a list of data and to display the results. Syntax =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]) lookup_value (Required) The value to search in the first column of the table or range. The lookup_value argument can be a value or a reference. If the value you supply for the lookup_value argument is smaller than the smallest value in the first column of the table_array argument, VLOOKUP returns the #N/A error value. table_array (Required) The range of cells that contains the data. You can use a reference to a range (for example, A2:D8), or a range name. The values in the first column of table_array are the values searched by lookup_value. These values can be text, numbers, or logical values. Uppercase and lowercase text are equivalent. col_index_num (Required) The column number in the table_array argument from which the matching value must be returned. A col_index_num argument of 1 returns the value in the first column in table_array; a col_index_num of 2 returns the value in the second column in table_array, and so on. range_lookup (Optional) A logical value that specifies whether you want VLOOKUP to find an exact match or an approximate match: If range_lookup is either TRUE or is omitted, an exact or approximate match is returned. If an exact match is not found, the next largest value that is less than lookup_value is returned. Important If range_lookup is either TRUE or is omitted, the values in the first column of table_array must be placed in ascending sort order; otherwise, VLOOKUP might not return the correct value. If range_lookup is FALSE, the values in the first column of table_array do not need to be sorted. If the range_lookup argument is FALSE, VLOOKUP will find only an exact match. If there are two or more values in the first column of table_array that match the lookup_value, the first value found is used. If an exact match is not found, the error value #N/A is returned.
LG400 IT Induction 5 Look up values vertically in a list by using an exact match <Example> In this example, you know the badge number and want to look up the phone extension. A B C D 1 Badge Number Last Name First Name Extension 2 ID-34567 Davolio Nancy 5467 3 ID-16782 Fuller Andrew 3457 4 ID-4537 Leverling Janet 3355 5 ID-1873 Peacock Margaret 5176 6 ID-3456 Buchanan Steven 3453 7 ID-5678 Suyama Michael 428 8 Formula Description (Result) 9 =VLOOKUP("ID-4537", A1:D7, 4, FALSE) Lookup the badge number, ID-4537, in the first column and return the matching value in the same row of the fourth column (3355) Look up values vertically in a list by using an approximate match <Example> In this example, you know the frequency and want to look up the associated color. A B 1 Frequency Color 2 4.14 red 3 4.19 orange 4 5.17 yellow 5 5.77 green 6 6.39 blue 7 Formula Description (Result) 8 =VLOOKUP(5.93, A1:B6, 2, TRUE) Looks up 5.93 in column A, finds the next largest value that is less than 5.93, which is 5.77, and then returns the value from column B that's in the same row as 5.77 (green)
LG400 IT Induction 6 Syntax =HLOOKUP(lookup_value, table_array, row_index_num, range_lookup) Lookup_value The value to be found in the first row of the table. Lookup_value can be a value, a reference, or a text string. Table_array A table of information in which data is looked up. Use a reference to a range or a range name. The values in the first row of table_array can be text, numbers, or logical values. If range_lookup is TRUE, the values in the first row of table_array must be placed in ascending order:...-2, -1, 0, 1, 2,..., A-Z, FALSE, TRUE; otherwise, HLOOKUP may not give the correct value. If range_lookup is FALSE, table_array does not need to be sorted. Uppercase and lowercase text are equivalent. Sort the values in ascending order, left to right. For more information, see Sort data. Row_index_num The row number in table_array from which the matching value will be returned. A row_index_num of 1 returns the first row value in table_array, a row_index_num of 2 returns the second row value in table_array, and so on. If row_index_num is less than 1, HLOOKUP returns the #VALUE! error value; if row_index_num is greater than the number of rows on table_array, HLOOKUP returns the #REF! error value. Range_lookup A logical value that specifies whether you want HLOOKUP to find an exact match or an approximate match. If TRUE or omitted, an approximate match is returned. In other words, if an exact match is not found, the next largest value that is less than lookup_value is returned. If FALSE, HLOOKUP will find an exact match. If one is not found, the error value #N/A is returned.
LG400 IT Induction 7 Look up values horizontally in a list by using an exact match <Example> In this example, you want to look up how many bolts are on order. A B C 1 Status Axles Bolts 2 In stock 4 9 3 On order 5 10 4 Back order 6 11 5 Formula Description (Result) 6 =HLOOKUP("Bolts", A1:C4, 3, FALSE) Looks up Bolts in row 1, and returns the value from row 3 that's in the same column (10) Look up values horizontally in a list by using an approximate match <Example> In this example, you want to look up the rate of sales volume closet to $78,658. A B C D 1 10000 50000 100000 Sales Volume 2.05.20.30 Rate 3 Formula Description (Result) 4 =HLOOKUP(78658,A1:D4,2, Looks up $78,658 in Row 1, finds TRUE) the next largest value that is less than $78,658, which is $50,000, and then returns the value from row 2 that's in the same column as $50,000 (20%) Notes You can display the rate and return number as a percentage. Select the cell, and then on the Home tab, in the Number group, click Percent Style. You can display the Sales Volume number as dollars. Select the cell, and then on the Home tab, in the Number group, click Accounting Number Format.
LG400 IT Induction 8 2.2 Counting (COUNT, COUNTA, COUNTIF, FREQUENCY) Syntax Count the number of cells that contain numbers =COUNT (value1, [value2], ) Count the number of cells that are not empty =COUNTA (value1, value2, ) Count the number of cells within a range that meet a single criterion that you specify =COUNTIF (range, criteria) Common COUNTIF formulas <Example> A B C 1 Data Data 2 apples 32 3 oranges 54 4 peaches 75 5 apples 86 6 Formula Description Result 7 =COUNTIF(A2:A5,"apples") Number of cells with apples in cells A2 through A5. 8 =COUNTIF(A2:A5,A4) Number of cells with peaches in cells A2 through A5. 9 =COUNTIF(A2:A5,A3) Number of cells with oranges and +COUNTIF(A2:A5,A2) apples in cells A2 through A5. 10 =COUNTIF(B2:B5,">55") Number of cells with a value greater than 55 in cells B2 through B5. 11 =COUNTIF(B2:B5,"<>"&B4) Number of cells with a value not equal to 75 in cells B2 through B5. 2 1 3 2 3
LG400 IT Induction 9 COUNTIF formulas using wildcard characters and handling blank values <Example> A B C 1 Data Data 2 apples Yes 3 4 oranges NO 5 peaches No 6 7 apples yes 8 Formula Description Result 9 =COUNTIF(A2:A7,"*es") Number of cells ending with the letters "es" in cells A2 through A7. 10 =COUNTIF(A2:A7,"?????es") Number of cells ending with the letters "es" and having exactly 7 letters in cells A2 through A7. 11 =COUNTIF(A2:A7,"*") Number of cells containing any text in cells A2 through A7. 12 =COUNTIF(A2:A7,"<>"&"*") Number of cells not containing text in cells A2 through A7. 13 =COUNTIF(B2:B7,"No") / The average number of No votes ROWS(B2:B7) (including blank cells) in cells B2 through B7. 14 =COUNTIF(B2:B7,"Yes") / The average number of Yes votes (ROWS(B2:B7) (excluding blank cells) in cells B2 -COUNTIF(B2:B7, "<>"&"*")) through B7. 4 2 4 2 0.333333333 0.5
LG400 IT Induction 1 0 Syntax Calculate how often values occur within a range of values, and then return a vertical array of numbers. =FREQUENCY(data_array, bins_array) Data_array An array of or reference to a set of values for which you want to count frequencies. If data_array contains no values, FREQUENCY returns an array of zeros. Bins_array An array of or reference to intervals into which you want to group the values in data_array. If bins_array contains no values, FREQUENCY returns the number of elements in data_array. A B 1 Scores Bins 2 79 70 3 85 79 4 78 89 5 85 6 50 7 81 8 95 9 88 10 97 Formula Description (Result) =FREQUENCY(A2:A10,B2:B4) Number of scores less than or equal to 70 (1) Number of scores in the bin 71-79 (2) Number of scores in the bin 80-89 (4) Number of scores greater than or equal to 90 (2) Note The formula in the example must be entered as an array formula. After copying the example to a blank worksheet, select the range A12:A15, press F2, and then press CTRL+SHIFT+ENTER. If the formula is not entered as an array formula, there will be only one result in cell A12 (1).
LG400 IT Induction 1 1 2.3 Statistical Average =AVERAGE (number1, [number2], ) which is the arithmetic mean, and is calculated by adding a group of numbers and then dividing by the count of those numbers. For example, the average of 2, 3, 3, 5, 7, and 10 is 30 divided by 6, which is 5. Median Mode =MEDIAN (number1, [number2], ) which is the middle number of a group of numbers; that is, half the numbers have values that are greater than the median, and half the numbers have values that are less than the median. For example, the median of 2, 3, 3, 5, 7, and 10 is 4. =MODE (number1, [number2], ) which is the most frequently occurring number in a group of numbers. For example, the mode of 2, 3, 3, 5, 7, and 10 is 3, because 3 occurs twice. 2.4 Text (LEFT, MID, RIGHT, SEARCH, LEN) Text functions are useful for manipulating strings in your data, for example, distributing the first, middle, and last names from a cell into three separate columns. This section demonstrates how to use combinations of the following text functions to extract and copy name components into separate cells. Function LEFT MID RIGHT SEARCH LEN Syntax LEFT(text, num_chars) MID(text,start_num,num_chars) RIGHT(text, num_chars) SEARCH(find_text,within_text,start_num) LEN(text) <Example: Jeff Smith> In this example, there are only two components: first name and last name. A single space separates the two name components. A B C Full name 1 First name Last name Jeff Smith 2 =LEFT(A2, SEARCH(" ",A2,1)) =RIGHT(A2,LEN(A2)-SEARCH(" ",A2,1))
LG400 IT Induction 1 2 First name The first name starts with the first character in the string (J) and ends at the fifth character (the space). The formula returns five characters in A2, starting from the left. Use the SEARCH function to find the value for num_chars: Search for the numeric position of the space in A2, starting from the left. (5) Last name The last name starts at the space, five characters from the right, and ends at the last character on the right (h). The formula extracts five characters in A2, starting from the right. Use the SEARCH and LEN functions to find the value for num_chars: Search for the numeric position of the space in A2, starting from the left. (5) Count the total length of the text string, and then subtract the number of characters from the left to the first space, as found in step 1. (10-5 = 5) For further styles of names including middle names, titles, etc., see Split text among columns by using functions.
LG400 IT Induction 1 3 3. Graphs and Charts Microsoft Office Excel 2007 no longer provides the chart wizard. Instead, you can create a basic chart by clicking the chart type that you want on the Microsoft Office Fluent user interface Ribbon. To create a chart that displays the details that you want, you can then continue with the next steps of the following step-by-step process. Getting to know the elements of a chart A chart has many elements. Some of these elements are displayed by default, others can be added as needed. You can change the display of the chart elements by moving them to other locations in the chart, resizing them, or by changing the format. You can also remove chart elements that you do not want to display. The chart area of the chart. The plot area of the chart. The data points of the data series that are plotted in the chart. The horizontal (category) and vertical (value) axis along which the data is plotted in the chart. The legend of the chart. A chart and axis title that you can use in the chart. A data label that you can use to identify the details of a data point in a data Charts vary in type and need. Please refer to Create a chart in order to find the way to meet needs for your charts. There is a very comprehensible demo on the MS website.
LG400 IT Induction 1 4 4. Text to Columns Use the Convert Text to Columns Wizard to separate simple cell content, such as first names and last names, into different columns. Full name First name Last name Syed Abbas Syed Abbas Molly Dempsey Molly Dempsey Lola Jacobsen Lola Jacobsen Diane Margheim Diane Margheim Depending on your data, you can split the cell content based on a delimiter, such as a space or a comma, or based on a specific column break location within your data. Split content based on a delimiter Use this method if your names have a delimited format, such as "First_name Last_name" (where the space between First_name and Last_name is the delimiter) or "Last_name, First_name" (where the comma is the delimiter). Split space-delimited or comma-delimited content A B 1 Syed Abbas Abercrombie, Kim 2 Molly Dempsey Cavaglieri, Giorgio 3 Lola Jacobsen Ito, Shu 4 Diane Margheim Philips, Carol 1. Select the range of data that you want to convert. 2. On the Data tab, in the Data Tools group, click Text to Columns. 3. In Step 1 of the Convert Text to Columns Wizard, click Delimited, and then click Next. 4. In Step 2, select the Space or Comma check box, and then clear the other check boxes under Delimiters. The Data preview box shows the first and last names in two separate columns.
LG400 IT Induction 1 5 5. Click Next. 6. In Step 3, click a column in the Data preview box, and then click Text under Column data format. Repeat this step for each column in the Data preview box. 7. If you want to insert the separated content into the columns next to the full name, click the icon to the right of the Destination box, and then select the cell next to the first name in the list (B2, in this example). Important If you do not specify a new destination for the new columns, the split data will replace the original data. 8. Click the icon to the right of the Convert Text to Columns Wizard. 9. Click Finish.
LG400 IT Induction 1 6 Split cell content based on a column break You can also customize how you want your data to be separated by specifying a fixed column break location. 1. Select the cell or range of cells. 2. On the Data tab, in the Data Tools group, click Text to Columns. 3. In Step 1 of the Convert Text to Columns Wizard, click Fixed Width, and then click Next. 4. In the Data preview box, drag a line to indicate where you want the content to be divided. Tip To delete a line, double-click it. 5. Click Next. 6. In Step 3, select a column in the Data preview box, and then click a format option under Column data format. Repeat this step for each column in the Data preview box. 7. If you want to show the split content in the columns next to the full name, click the icon to the right of the Destination box, and then click the cell next to the first name in the list.
LG400 IT Induction 1 7 Important If you do not specify a new destination for the new columns, the divided data will replace the original data. 8. Click the icon to the right of the Convert Text to Columns Wizard. 9. Click Finish. 5. SPSS You can open Excel data or Text files on SPSS, a statistics software. Excel (.xls,.xlsx,.xlsm) to SPSS 1. Put variable names on the first row of your Excel data. 2. SPSS: File > Read text data > Select a file that you want to open in SPSS. If files cannot be found in the screen, change files of type. 3. Select a worksheet. Text (.txt,.dat) to SPSS 1. Save data in text format using Notepad. 2. SPSS: File > Read text data Select a file that you want to open in SPSS. If files cannot be found in the screen, change files of type. 3. Follow Text Import Wizard Step1-6.