Week 09 - Files & Image Processing

Lecture 9: Files & Image Processing

learning objectives (video)

By the end of this lecture, you will be able to

read and write files in Python,
represent images as a grid of pixels,
perform transformations of images.

Today, we'll jump back into Python. In fact, we've covered most of the Python syntax you will learn in this course (except for objects...coming soon), so the second half of the semester will feel more like "applications." This means you'll practice a lot with the techniques we have covered so far.

Remember lab 6 and homework 6 in which I provided a variable called mystery or lyrics, which stored the lyrics to "Imagine" or "Here Comes the Sun"? It's inconvenient to manually define a big string variable like that at the beginning of your program. It would be much better if we could just keep the lyrics in a separate file and then load them when we need them. We'll first get into some file I/O (i.e. input & output) which will allow you to read and write your data to files. Then we'll look at images, which are stored in special kinds of files (like JPG, PNG, etc.).

A picture of Ada Lovelace (the first computer programmer): hover the mouse over the image to zoom in and see the pixels.

files (video)

Whether we are reading or writing a file, we need a way to open files. This is done with a built-in function in Python, called the open function. The open function takes in two parameters (both are strings):

filename: the name of the file we would like to open. You can either specify the full path (i.e. the full location on your computer of where this file is located), otherwise the path will be assumed to be relative to where your current Python script lives. An example of a full path might be /Users/mwazowsk/Documents/cs145/homeworks/homework6/lyrics.txt on OS X, or C:\mwazowsk\Documents\cs145\homeworks\homework6\lyrics.txt on Windows. Assuming we are writing a Python script in the same "homework6" folder, then a relative path would simply be lyrics.txt (this is more common).
mode: specifies how we will communicate with the file. In other words, are we reading contents from a file or are we writing to it? Furthermore, how should we read data from the file? There are two options for the "how": we either read data in text format (by specifying a 't' in the mode string) or in binary format (by specifying a 'b' in the mode string). We will only work with the text format in our course. You can leave out the 't' because it is the default value. To specify how we are communicating with the file, you can specify a 'r' for reading and a 'w' for writing to the file.

For example, to read a file called 'lyrics.txt' you would use open('lyrics.txt','r'). To write a file called 'frequency.txt', you would use open('frequency.txt','w'). Now, what does the open function give us in return? It gives us a File object in return, specifically a _io.TextWrapper object (a built-in type). Don't worry too much about this: just think of it as a file object. We need this file object to either read from, or write to the file.

A file object is iterable, just like strings, lists, dictionaries and range's are iterable. Remember, in the case of strings, the iteration variable is a character; with lists, the iteration variable is an item; with dictionaries, the iteration variable is a key; with range the iterable variable is a number. With a file object, the iteration variable is a single line in the file. So typing for L in file: will iterate through all lines in file. On each iteration, L receives the next line, which is a string. Have a look at the trinket below which reads in the lyrics to "Here Comes The Sun", and then writes word frequency data to a new file called frequency.txt.

When you're done with your file, you should close it! This can be done with the command file.close() (assuming file is the file object you received from the open function). To write contents to a file, we can use the write function defined for file objects. This is very similar to the print function we have been using to print information to the console. However, there are a few subtle differences. First, we can only pass one parameter (a string) to the write function (in contrast to how you can pass in multiple things to print separated by commas). This means you will need to create a single string (via concatenation) to write a line to a file. Second, whereas print implicitly creates a new line after printing, you need to specify that you want a new line when using write. Remember the '\n' (newline) characters in homework 6? You can use these to create a new line in the file you are writing to.

images (video)

Now for a fun application of the things we have learned so far! We will do some image processing, which will give us some more practice with iteration. Specifically, we will now have nested loops to iterate along both the width and height of an image.

An image is a rectangular grid of pixels (short for "picture element"). The number of pixels in the horizontal direction is the width of the image, whereas the number of pixels in the vertical direction is called the height of the image. At each pixel in an image, we store a color. In our course, we will represent color as an RGB triplet. That is, we specify "how much RED do we have at the pixel?", "how much BLUE do we have?" and "how much GREEN do we have?" Each of these R-G-B values will be a value between 0 and 255 (i.e. 1 byte = $2^8$). The higher the value (closer to 255), the more of that specific color channel we will have. For example, a red value of 255 means we have a lot of red at the pixel, but a red value of 0 means we don't have any red at that pixel. These RGB triplets will be stored as 3-element tuples. In some cases, however, such as with greyscale images, we might only want one channel, so we only store a single value (not a 3-item tuple).

storing and manipulating images

You might be used to a coordinate system in which the origin is at the bottom left corner. Images are a little different because they use a different convention. In particular, the origin of the image is at the top-left corner. This means that looping through the pixels along the width of the image follows the normal intuitive procedure, however looping through the pixels along the height starts at the top and proceeds downwards. Have a look at the smiley image below - the image on the right is a zoomed-in version of the orange rectangle in the left image. This diagram shows that the top-left corner has "pixel coordinates" of $(0,0)$. In this example, we are using a single channel to represent the color at each pixel, meaning it is a greyscale image.

If an image has a width of w and a height of h, what are the row and column indices of the bottom-right pixel?

solution

The bottom-right pixel has indices of $(h-1,w-1)$. The $-1$ is because the first pixel (at the top-left) has coordinates of $(0,0)$. This will be convenient when we loop through the rows and columns of the image in our applications (think about range when we talked about loops).

MiddImage (video)

We will do all of our image processing applications using an in-house middimage module, written by Prof. Andrews. This module provides an interface to a few functions for manipulating images:

open(filename): img = middimage.open(filename) opens an image stored in a file called filename. This returns a MiddImage object (we'll talk in more detail about objects next week).
save(filename): img.save(filename) saves a MiddImage object to a file called filename.
new(width,height,channels): creates a new MiddImage object. You can provide width and height parameters to initialize a grid of width x height pixels. You can also provide a channels parameter to specify how many channels (1 for grayscale, 3 for RGB) your image will have at each pixel.
properties of images: retrieve the width/height of a MiddImage object (perhaps saved in a variable called img), using img.width or img.height. To retrieve the number of channels, you can use img.channels. In general, you will only have 1 (greyscale) or 3 (RGB) channels.
pixel access: individual pixels can be accessed using square brackets (similar to lists). The difference, however, is that we should provide two indices to access a particular pixel - in fact, these two indices form a tuple (with two elements). The first one is the row index, and the second one is the column index. For example, to access the color at a row r and a column c for some MiddImage object img we would type img[r,c]. This can be used to retrieve and assign the color at a pixel (i.e. MiddImage objects are mutable).

As with modules we have been using so far (such as math, turtle, random), we need to import middimage before using it. However, since middimage is not a built-in module in Python, you will need to include it in the folder you are working in (I will provide the middimage module in your repls). Some of these operations will make more sense with some examples, so let's get into some image processing! Here are some questions to ask yourself as you work on an image processing problem:

what is the size of the original image? what is the size of the resulting image?
how many channels (either 1 or 3) will be in the resulting image?
how should you loop through the rows/columns to process pixels in the original image?
how does an iteration through the pixel rows/columns in the original image map to a row/column in the resulting image?
how should we compute the color for the pixels in the resulting image?

example 1: upside down (video)

Let's read an image and flip it upside down. In response to the questions above, the size and number of channels in the resulting image will be the same as the original image. We will loop through all pixels in the width and height of the original image. The map from the original image to the resulting image is that the horizontal pixel locations are the same, but the vertical pixel locations become height - i -1 where i is the current pixel row being processed. The -1 is there so we don't go out of bounds! The resulting pixel colors (after the map) will be the same as the original pixel colors.

example 2: abstract art (video)

Let's do another example. This time, we will do two things: (1) we will swap the left and right portions of the image (down the vertical midline) and (2) we will only have blue components of the color in the left portion while darkening all components of the color in the right side. Please note that this is an artificial example and not something you would generally be doing (unless you're creating abstract art?), however, it will give you all the tools you need to work on the lab and homework this week.

Again, the resulting image is going to have the same width, height and number of channels as the original image. Also, we will loop through all pixels in the vertical direction (height number of pixels), however, we only need to loop through half the pixels in the horizontal direction. We loop through half the width because we can make two assignments for the left and right portions of the resulting image. We first retrieve the color of the pixel on the right side of the original image and store this in the variable c. To assign the new pixel color on the left side, we just zero out the red and green components (a 0 in the first two items in the tuple). We then retrieve the appropriate pixel color to assign in the right and darken it by multiplying the components by some factors. I picked attenuation factors of $0.8$, $0.5$ and $0.2$ on the red, green and blue components, respectively, but these could have been anything (or maybe something specific to your application).