ENGR 102 Engineering Lab I - Computation Week 07: Arrays and Lists of Data Introduction to Arrays In last week s lecture, 1 we were introduced to the mathematical concept of an array through the equation x = [x 0, x 1, x 2,..., x n ], (1) 1 These notes are based closely on a PowerPoint presentation prepared by Dr. John Keyser for the Pilot Course version of ENGR 102. and we saw that many operations we may want to perform on an array involve iteration, such as for the mean x = 1 n n x i. (2) i=0 We also learned that we could store this type of data in a built-in Python data type called a list. Lists are key element of the for loop iteration structure, 2 and we can create lists using a command such as 2 More generally, for loops operate on any iterable object. In this lecture, we will learn more about how lists work, and we will introduce three other ways to store array-type data in Python. These are the data types of tuples, dictionaries, and a special type of list we will import from the numpy package, called arrays. Tuples are like lists, but their values cannot be changed once you create them. Dictionaries are like lists and tuples, except that you get to name the index to the list that index is a keyword; hence, the term dictionary. Numpy arrays are special lists that can be used in mathematical operations. Together, these Python data types give us a lot of flexibility to store arrays of data and manipulate them to extract meaningful, quantitative metrics that describe them. Data storage in list, tuple, dict, and array Each of the array-type data types in Python are similar in that they can store multiple values in a variable of a single name. We create these different data types using different syntax, and each data type behaves in a slightly different way. It is the differences among these data types that help us to decide the best data type to use to store a given dataset.
2 engr 102-213 Basic behavior of the list data type Lists are actually much more flexible than the single example above that stores exam grades. Lists can contain non-numeric values and can contain lists of lists, as in 3 names = [ John, Kathy, Elsa, David, [47, 36, 27, 19], [True, False, False, True]] This list has six elements, four are strings and two are additional lists; the nested lists contain a list of integers and a list of Booleans. On the other hand, because lists can contain just about anything, they do not operate in mathematical statements like arrays or vectors. If you try to add two lists, they concatenate like strings: 4 grades = grades + [87., 54.] print(grades) [98.0, 87.0, 97.0, 89.0, 92.0, 87.0, 54.0] 3 It would be a good idea to type all of the example code from this lecture into an interactive Python window to see what happens. 4 Notice that we have to add two lists; you cannot add a number to a list as in grades = grades + 87.; to do that we have to put the value in a list like [87.], or we can use the list method append, as in grades.append(87.). Another important attribute of lists is that you can change their values after you create them. As an example: grades[2] = 97. Remember from last week, we access individual elements of a list using an index counter that starts counting from zero. Thus, the above statement changes the 92.5 in the third element 5 to be 97. 5 Element number 2 when counting 0, 1, 2. Basic behavior of the tuple data type There is a similar kind of list, called a tuple, that does not allow you to change its values after you create it. You create tuples the same way you create lists, but using () instead of [], as in: grades = (98., 87., 92.5, 89., 92.) Tuples are accessed identically to lists, but are unchangeable. 6 In this way, grades[2] 6 The word we use for unchangeable in object-oriented programming is immutable. We say in Python that tuple objects are immutable. now returns the value 92.5, but grades[2] = 97. throws an error. We can also change lists into tuples and tuples into lists: exams = tuple(grades) scores = list(exams) # This is a list # This is a tuple # This is a list
engr 102 engineering lab i - computation 3 Basic behavior of the dict data type The last way we can store data using built-in data types is in dictionaries. Lists and tuples are both indexed by a counter that starts at zero, as in grades[2] In a dictionary, we get to choose what to call the index. We create dictionaries using {}. So, if we want our index to be a person s name and the data stored in the dictionary to be their age, we could write: ages = { John : 47, Kathy : 36, Elsa : 27, David : 19} Then, to find out Kathy s age, we could write ages[ Kathy ] and the value 36 is returned. We call the indexes to the dictionary a key, and if you try to use a key that does not yet exist, you get an error: ages[ Bill ] KeyError: Bill An alternative way to create a dictionary is to index a new key name and use the assignment operator. Hence, 7 ages[ Bill ] = 29. is a valid way to add the data for Bill to the variable ages. Basic behavior of the array data type in numpy The data types we have seen so far have parallels in lower-level programming languages, like C and Fortran. To do mathematics with them, we generally have to do everything by hand. 8 A common example is in converting sensor data to meaningful measurements. Temperature sensors are generally linear over a certain range, where the temperature can be computed from 7 Notice that even though we use [] on the right-hand-side of an assignment statement to create lists, () to create tuples, and {} to create dictionaries, we always use [] on the left-hand-side of an assignment to surround the index or key name. 8 In other words, we write operations that will be done on each element of the array, and we embed these operations in an iteration structure, such as a for loop. T = av + b (3) where a and b are calibration constants and V is the voltage read by the sensor. If you have an array of voltage data, you need to compute T i = a V i + b (4) for each element i to find all the temperatures. You cannot multiple a list by a constant to compute an equation with that list; instead, like strings, multiplication for lists is defined only for integers and results in multiple lists.
4 engr 102-213 You can solve the above problem with lists using a for loop 1 # Let V contain the voltage measured by an analog temperature probe 2 V = [3.57, 3.46, 3.67, 3.44, 3.55] 3 T = [] # Initialize empty list to hold computed temperatures 4 5 # Compute the temperature in deg F from the calibration curve given by 6 # T = 15.27 * V + 28.6, where V is the voltage reading of the sensor 7 a = 15.27 8 b = 28.6 9 for i in range(len(v)): 10 T.append(a * V[i] + b) 11 12 # Return the results 13 print('the measured temperatures in deg F are:') 14 print(t) This works fine, 9 but it takes a lot of lines of code, and it is normally not as fast to execute iterations of a for loop in Python as it would be in C or Fortan. Instead, it would be nice to replace the for loop with the single command 9 When we run this code, the output is: The measured temperatures in deg F are: [83.1139, 81.4342, 84.6409, 81.1288, 82.8085] T = a * V + b and have that create a new array T that contains the desired result. We can do this with numpy arrays. As the name implies, the numpy package provides numerical data types to Python. The most basic numpy data type is an array. By convention, we import the numpy package using the following command: import numpy as np This allows us to use functions in numpy through operation in the format np.<fun>(). The simplest way to create a numpy array is to send a list as the input to the np.array() function: V = np.array([3.57, 3.46, 3.67, 3.44, 3.55]) print(v) [3.57 3.46 3.67 3.44 3.55] This looks like a regular list, but we notice when we print the array, the commas are missing, and importantly, we can use it to do math without a for loop. Our program from above using numpy now becomes: 1 import numpy as np 2 3 # Let V contain the voltage measured by an analog temperature probe 4 V = np.array([3.57, 3.46, 3.67, 3.44, 3.55]) 5 6 # Compute the temperature in deg F from the calibration curve given by
engr 102 engineering lab i - computation 5 7 # T = 15.27 * V + 28.6, where V is the voltage reading of the sensor 8 a = 15.27 9 b = 28.6 10 T = a * V + b 11 12 # Return the results 13 print('the measured temperatures in deg F are:') 14 print(t) This program has the identical output as before, but without the commas when displaying the printed T data. Numpy arrays also have built-in methods to compute statistics. Hence, we could continue our code above to print out the mean and variance of the temperature as 10 10 Try it in Python. print( The mean temperature is + str(t.mean()) + (deg F) ) print( and the temperature variance is + str(t.var()) + (deg F)^2 ) Clearly, the numerical power of numpy to operate on our data is great, and we will want to take advantage of numpy to manipulate numerical data as much as possible. A Closer Look at the list data type The previous section introduced four different ways to store arraytype data 11 In this section, we will look at the list data type more carefully. Many of the operations we describe below can be applied to tuples and numpy arrays, except that tuples cannot be changed. 12 We can always find the length of a list using the built-in function len() as in print(len(grades)) 5 11 These are using lists, tuples, dictionaries, and numpy arrays. 12 We cannot change a tuple value in an assignment state, nor can we append anything to a tuple. Once a tuple is created, it is unchangeable, or immutable. When we create a list, Python allocates a block of memory on our computer, just as for any other variable. Because we may store a lot of data in a list, Python does something we have not yet seenif we create a list from an existing list. Consider these lines of code: my_grades = grades my_grades[2] = 0. print(grades[2]) What do you think the print function should print? We expect it to print 92.5 since that value is stored in the third element of grades (i.e., index number two). Instead, the outcome is: 0.0 What happend?! The line my_grades = grades does not create a new block of memory to store a new dataset. Instead, it creates a second
6 engr 102-213 variable name, in this case my_grades that points at the same block of memory as the variable name grades. So, when we change a value using the variable name my_grades, it actually changes the value for both grades and my_grades since they both point at the same physical block of bytes in the computer s memory. Hence, we must be very careful when we use variable names that contain lists on the right-hand-side of the assignment operator! Last week, we also saw that a for loop will access each element of a list one-at-a-time. What do you think this code will do: for grade in grades: grade = 0. print(grades) After the previous example, you may be tempted to think that grades now contains a list of five zeros. Instead, the print statement returns [98.0, 87.0, 92.5, 89.0, 92.0] What happend here?! The for loop creates a copy of each element of the list grades one-at-a-time in a new memory location in order to be sure you don t accidentally change values in grades inside the loop. If you wanted to change all the grades to zero, you would have to write for i in range(len(grades)): grades[i] = 0. print(grades) Here, grades[i] is an index to the real memory location of the ith element of grades, and the print statement does return [0.0, 0.0, 0.0, 0.0, 0.0] Adding data to a list As we saw above in the basic introduction to lists, we can add elements to a list using the.append() list method or by concatenation using +. If we just want to add one element to a list, you add it as a list: grades = [] grades = grades + [98.] grades.append(87.) print(grades) [98.0, 87.0] # create an empty list Indexing and slicing lists We can index arrays in more flexible ways than just with an integer value of the element number. For all of the following examples, assume we have created the list
engr 102 engineering lab i - computation 7 If we want the last element, we can use -1 to get it print(grades[-1]) 92.0 We can also extract slices, or spans, of the list using the colon operator. The colon operator works as in: start : stop : step where start is the element number of the first element to include in the slice, stop is the element number of the first element not to include in the slice, and step is the step size of the slice counter. Hence, we can have print(grades[1:3]) [87.0, 92.5] print(grades[0:4]) [98.0, 87.0, 92.5, 89.0] print(grades[:4]) [98.0, 87.0, 92.5, 89.0] print(grades[:-4:-1]) [92.0, 89.0, 92.5] print(grades[3:]) [89.0, 92.0] There are lots of ways to use slices! If we do not specify a step, we get the default step size, which is one. If we omit start, Python starts at zero; if we omit stop, Python sets stop to include the last data point. If we use negative values, Python just counts backward the -2 element of grades is 89. We can slice numpy arrays in much the same way, and the slice operation will return a numpy array. When you have a list inside a list, you can use successive [][] indices to drill down into a list. Consider points = [[2, 4], [3, 5], [6, 4]] To return the 6 in the third list above, we could use points[2][0] Numpy also supports multidimensional arrays, and their indexing is the same. Try np.array(points) and see how you can slice it. Are strings lists? If you worked some of the challenge problems in the Lab assignments, you may have seen that we can get one character of a string using the indexing notation, for example name = Scott A. Socolofsky print(name[6]) A
8 engr 102-213 Strings act like lists, but they are their own data type. They can only store characters. Hence, this command gives an error name = name + [7] since 7 is an integer and only a string can be concatenated to a string. Also unlike lists, strings are said to be immutable. As a result, this command also gives an error name[6] = a Hence, in this regard, one might think of strings as special kinds of tuples. Conclusions The fastest way to understand array-type data is to start programming with it and we will do that this week in the Lab assignments. We will start by primarily using the list data type, which will require you to iterate through the dataset by hand to make calculations. In this way, you will gain understanding of what is happening to your data element-by-element. Soon, we will avoid writing a lot of loops to do dataset mathematics whenever we can replace the operation by a numpy array operation these will always be faster since they are coded in C and compiled to create the numpy package. We conclude with a very loose rule-of-thumb for when to use what data type: Use a list for non-numeric datasets and for datasets that may be formatted and sorted but that will not be operated on using math formulas. Use a tuple like a list, but select a tuple when you want to ensure that the data values in the dataset cannot be changed. Use a dict when the keyword that is associated with each datapoint is something other than simply an integer subscript. For instance, when you want to associate a data value with a keyword name as in a database. Use a numpy array whenever you have numerical data on which you want to make numerical calculations or for which you want to compute statistics. This will be the most common Python arraylike data type used in engineering.