Lecture 6: Data Structures

By the end of this lecture, you will be able to

  • build dictionaries to store data that can be looked up with a key,
  • access values in a dictionary with a key,
  • create and access values stored in tuples,
  • modify mutable data structures, such as lists and dictionaries,
  • identify when Python creates a new box for a variable, and when it stores a reference,
  • apply the map, filter and reduce patterns to data.


We're really building up our programming skills now. After covering lists last week, you can probably start to see how large data will get stored in the computer, and how you might design a program to process a large chunk of data stored in a list. However, lists only get us so far. If we want to retrieve a specific value in a list, we need to know the index of that value in the list. But what if we don't have indices? For example, consider retrieving Boo's door in the big warehouse of doors in Monster's Inc. Maybe we store all the doors in a list of doors in which each door has a label corresponding to who the door belongs to. Now, if we want to retrieve Boo's door without knowing the index of her door, then we have to loop through every door in the list, and check if each door label matches "Boo". As you can see, if there are many doors, this could be really inefficient, especially if Boo's door is stored at the end of the list.


dictionaries

Instead of looping through all the items in a list, we can store a dictionary in which each item is associated with a key. This is in contrast to a list, in which each item is associated with an index. This means that we can directly associate someone's name (a key) with their door! Your keys can be anything, and the specific problem you are solving will lend to more natural choices for your keys. In our Monster's Inc. example, we want to find doors by someone's name (a string), so it makes sense to use strings as the keys. We can then use these keys to retrieve a value (a door). This is the central concept when using dictionaries: storing key-value pairs. Let's now see how to build dictionaries, add key-value pairs and retrieve a value from a dictionary using a key.


The most frequent operations you will do with dictionaries are: create a dictionary, access the value at a key, and add a key-value pair, and iterate through key-value pairs:

All of these operations are summarized in the trinket below. You'll notice that we can also retrieve the number of key-value pairs stored in a dictionary d using the len function (just like we did with lists and strings).

You'll notice yet another way to create a dictionary: using zip. To use zip you need to pass in two lists (one list for the keys, and another list for the values). Each item in the keys is then associated with an item in the values (essentially "zipping" up items in these two lists). Wrap the call to zip inside a call to dict and you have yourself a new dictionary!

Finally, if you want to retrieve all the keys and values of a dictionary, you can use the keys() and values() functions defined for dictionaries. For example, for a dictionary d, you would type d.keys() and d.values().


Say you're programming a cash register and you want to be able to pull up the price of a particular item given the string representing that item. This is where a dictionary will be useful! In the example below, we first create a dictionary in which the keys are the food names, and the values are the price for that food. Notice that some foods might have the same price. Given a list of grocery items (maybe coming off the conveyor belt), try to complete the getGroceryCost function to deterine the cost of all the items in the cart. Try to check if an item is not in the store!

Note that in a convential cash register, you would probably associate an integer (as the key) with a particular price (a float), since foods are usually labelled with PLUs (integers). This could also be achieved with dictionaries! Remember, the keys could be anything, and the values could be anything too.


Given a word (a string), how frequently does each character show up? Let's write a function to find out!

The solution to this problem (other solutions are possible too), is to create a dictionary that keeps track of the "count" of every character. In other words, we create a dictionary with keys for every letter we find, initialize the counter (the value for the letter key) to 1 the first time we encounter this character, and then increase this counter whenever we find the letter again.


One more example! Remember when we looked at Fibonacci numbers? The equation we had to compute the $n$-th Fibonacci number is $F(n) = F(n-1) + F(n-2)$ with $F(0) = 0$ and $F(1) = 1$. The problem we initially had was that directly applying this formula recursively meant that we recomputed a lot of Fibonacci numbers. But what if we could remember them the first time we compute them? This is a technique known as memoization. In our Fibonacci example, we will store known Fibonacci numbers in a dictionary, in which the keys are integers (corresponding to $n$) and the values are the $F(n)$. This makes for a solution that is both readable and fast!


Remember how we were able to assign a value at a particular index in a list? Something like x[2] = 'banana' is valid, assuming x is a list. Also, remember that we can assign a value for a particular key in a dictionary? So something like d['banana'] = 0.25 is valid, assuming d is a dictionary. But remember that if s = 'piper' (a string), then we can retrieve the value at index 2 with s[2], but we cannot assign the value of s[2]. Try it out (maybe in the interpreter below): Python will be angry.

The reason for this behavior is because some data structures are mutable whereas others are immutable. This is a confusing term (I actually don't like it, but that's what it's called). When we say that something is mutable, it means that it is "mutate"-able, i.e. we can change what it stores. Based on this definition, think back to all the data structures we have seen so far: numbers (ints, floats), strings, lists and dictionaries. Which ones are mutable? Which are immutable?

Lists and dictionaries are mutable. Numbers and strings are immutable.

references and values

Okay, I guess it's time for me to come clean. You know how I've been saying that variables are like little boxes where we store values? This is a really good model, but it isn't perfect (it was good enough when we were starting out). The reason it isn't perfect is because, under the hood, Python actually stores references and values. The reference is a unique identifier that is assigned to every variable. If you like, you can think about this as the "address" of this variable in your computer's memory. Python has a useful function for determining the unique identifier assigned to a variable: it's called the id function. So if you say x = 2 and then print(id(x)), you'll see some integer that is the "id" of x.

Now comes the confusing part (which breaks our model). When we write y = x, Python doesn't create a new box in memory. It actually copies the reference it had for x and assigns it to y. This means both x and y "point" to the same value. Try it out: type y = x and then print(id(y)): you should see the same "id" for x and y!

But mutability comes into play here. If we re-assign the value in y, then Python creates a new box for y. Why? Because y (and also x) are numbers, which are immutable! As soon as you type y = 5, Python says "oh, numbers are immutable, so I need a new box." Now, with mutable types, like lists and dictionaries, modifying one variable will modify the other - because they "point" to the same thing in memory! The same is true if you pass a mutable data type into a function. You can actually modify the contents of the mutable type inside the function and you will see the changes outside the function too. This is a challenging concept to grasp, so I encourage you to play with the following trinket to help solidify this concept.


Now that we know about immutability, I want to tell you about one more data structure: tuples. Some people pronounce this "tuh-ple" and some pronounce it as "too-ple". Honestly, I haven't heard anyone say one way is right and one way is wrong, so do whatever you like :) I personally pronounce it as "too-ple". Tuples aren't that special, but they're good to know about. They're basically the immutable twin of lists. In other words, they do the same thing as lists: they store ordered values. So things like access (by index), len are the same as with lists. However, whereas lists are mutable, tuples are immutable. Tuples are also delimited by parentheses () in contrast to lists which are delimited by square brackets []. The immutability thing is important: if you create a tuple x = (1,2,3), you will not be able to say x[1] = 4 - Python will be unhappy.

So what are tuples useful for anyway? They're useful if you want to return multiple values from a function. Consider the following trinket in which we want to return both the minimum and maximum value in a list.


Now that we've seen several data structures, it's important to consider the different operations (or patterns) we can perform with these data structures. There are three patterns we care about:

As an exercise, try to compute the sum of the items in the following list: x = [1,5,3,2,9,7,3]. Try using a loop - notice that you won't be able to do this (or any reduce pattern for that matter) using a list comprehension, because list comprehensions always create another list.


© Philip Claude Caplan, 2021