Lecture 9: Files & Image Processing
Today, we'll jump back into
Remember lab 6 and homework 6 in which I provided a variable called |
A picture of Ada Lovelace (the first computer programmer): hover the mouse over the image to zoom in and see the pixels. |
Whether we are reading or writing a file, we need a way to open files.
This is done with a built-in function in Python
, called the open
function.
The open
function takes in two parameters (both are strings):
-
filename: the name of the file we would like to open. You can either specify the full path (i.e. the full location on your computer of where this file is located), otherwise the path will be assumed to be relative to where your current
Python
script lives. An example of a full path might be/Users/mwazowsk/Documents/cs145/homeworks/homework6/lyrics.txt
on OS X, orC:\mwazowsk\Documents\cs145\homeworks\homework6\lyrics.txt
on Windows. Assuming we are writing aPython
script in the same "homework6" folder, then a relative path would simply belyrics.txt
(this is more common). -
mode: specifies how we will communicate with the file.
In other words, are we reading contents from a file or are we writing to it?
Furthermore, how should we read data from the file?
There are two options for the "how": we either read data in text format (by specifying a
't'
in the mode string) or in binary format (by specifying a'b'
in the mode string). We will only work with the text format in our course. You can leave out the't'
because it is the default value. To specify how we are communicating with the file, you can specify a'r'
for reading and a'w'
for writing to the file.
For example, to read a file called 'lyrics.txt'
you would use open('lyrics.txt','r')
.
To write a file called 'frequency.txt'
, you would use open('frequency.txt','w')
.
Now, what does the open
function give us in return
?
It gives us a File object in return, specifically a _io.TextWrapper
object (a built-in type).
Don't worry too much about this: just think of it as a file object.
We need this file object to either read from, or write to the file.
A file object is iterable, just like strings, lists, dictionaries and range
's are iterable.
Remember, in the case of strings, the iteration variable is a character; with lists, the iteration variable is an item; with dictionaries, the iteration variable is a key; with range
the iterable variable is a number.
With a file object, the iteration variable is a single line in the file.
So typing for L in file:
will iterate through all lines in file
.
On each iteration, L
receives the next line, which is a string.
Have a look at the trinket below which reads in the lyrics to "Here Comes The Sun", and then writes word frequency data to a new file called frequency.txt
.
When you're done with your file, you should close it!
This can be done with the command file.close()
(assuming file
is the file object you received from the open
function).
To write contents to a file, we can use the write
function defined for file objects.
This is very similar to the print
function we have been using to print information to the console.
However, there are a few subtle differences.
First, we can only pass one parameter (a string) to the write
function (in contrast to how you can pass in multiple things to print
separated by commas).
This means you will need to create a single string (via concatenation) to write a line to a file.
Second, whereas print
implicitly creates a new line after printing, you need to specify that you want a new line when using write
.
Remember the '\n'
(newline) characters in homework 6?
You can use these to create a new line in the file you are writing to.
Now for a fun application of the things we have learned so far! We will do some image processing, which will give us some more practice with iteration. Specifically, we will now have nested loops to iterate along both the width and height of an image.
An image is a rectangular grid of pixels (short for "picture element"). The number of pixels in the horizontal direction is the width of the image, whereas the number of pixels in the vertical direction is called the height of the image. At each pixel in an image, we store a color. In our course, we will represent color as an RGB triplet. That is, we specify "how much RED do we have at the pixel?", "how much BLUE do we have?" and "how much GREEN do we have?" Each of these R-G-B values will be a value between 0 and 255 (i.e. 1 byte = $2^8$). The higher the value (closer to 255), the more of that specific color channel we will have. For example, a red value of 255 means we have a lot of red at the pixel, but a red value of 0 means we don't have any red at that pixel. These RGB triplets will be stored as 3-element tuples. In some cases, however, such as with greyscale images, we might only want one channel, so we only store a single value (not a 3-item tuple).
storing and manipulating images
You might be used to a coordinate system in which the origin is at the bottom left corner. Images are a little different because they use a different convention. In particular, the origin of the image is at the top-left corner. This means that looping through the pixels along the width of the image follows the normal intuitive procedure, however looping through the pixels along the height starts at the top and proceeds downwards. Have a look at the smiley image below - the image on the right is a zoomed-in version of the orange rectangle in the left image. This diagram shows that the top-left corner has "pixel coordinates" of $(0,0)$. In this example, we are using a single channel to represent the color at each pixel, meaning it is a greyscale image.
If an image has a width of w
and a height of h
, what are the row and column indices of the bottom-right pixel?
range
when we talked about loops).
We will do all of our image processing applications using an in-house middimage
module, written by Prof. Andrews.
This module provides an interface to a few functions for manipulating images:
- open(filename):
img = middimage.open(filename)
opens an image stored in a file calledfilename
. This returns aMiddImage
object (we'll talk in more detail about objects next week). - save(filename):
img.save(filename)
saves aMiddImage
object to a file calledfilename
. - new(width,height,channels): creates a new
MiddImage
object. You can providewidth
andheight
parameters to initialize a grid ofwidth x height
pixels. You can also provide achannels
parameter to specify how many channels (1 for grayscale, 3 for RGB) your image will have at each pixel. - properties of images: retrieve the width/height of a
MiddImage
object (perhaps saved in a variable calledimg
), usingimg.width
orimg.height
. To retrieve the number of channels, you can useimg.channels
. In general, you will only have 1 (greyscale) or 3 (RGB) channels. - pixel access: individual pixels can be accessed using square brackets (similar to lists).
The difference, however, is that we should provide two indices to access a particular pixel - in fact, these two indices form a tuple (with two elements).
The first one is the row index, and the second one is the column index.
For example, to access the color at a row
r
and a columnc
for someMiddImage
objectimg
we would typeimg[r,c]
. This can be used to retrieve and assign the color at a pixel (i.e.MiddImage
objects are mutable).
As with modules we have been using so far (such as math
, turtle
, random
), we need to import middimage
before using it.
However, since middimage
is not a built-in module in Python
, you will need to include it in the folder you are working in (I will provide the middimage
module in your repls).
Some of these operations will make more sense with some examples, so let's get into some image processing!
Here are some questions to ask yourself as you work on an image processing problem:
- what is the size of the original image? what is the size of the resulting image?
- how many channels (either 1 or 3) will be in the resulting image?
- how should you loop through the rows/columns to process pixels in the original image?
- how does an iteration through the pixel rows/columns in the original image map to a row/column in the resulting image?
- how should we compute the color for the pixels in the resulting image?
Let's read an image and flip it upside down.
In response to the questions above, the size and number of channels in the resulting image will be the same as the original image.
We will loop through all pixels in the width and height of the original image.
The map from the original image to the resulting image is that the horizontal pixel locations are the same, but the vertical pixel locations become height - i -1
where i
is the current pixel row being processed.
The -1
is there so we don't go out of bounds!
The resulting pixel colors (after the map) will be the same as the original pixel colors.
Let's do another example. This time, we will do two things: (1) we will swap the left and right portions of the image (down the vertical midline) and (2) we will only have blue components of the color in the left portion while darkening all components of the color in the right side. Please note that this is an artificial example and not something you would generally be doing (unless you're creating abstract art?), however, it will give you all the tools you need to work on the lab and homework this week.
Again, the resulting image is going to have the same width, height and number of channels as the original image.
Also, we will loop through all pixels in the vertical direction (height
number of pixels), however, we only need to loop through half the pixels in the horizontal direction.
We loop through half the width
because we can make two assignments for the left and right portions of the resulting image.
We first retrieve the color of the pixel on the right side of the original image and store this in the variable c
.
To assign the new pixel color on the left side, we just zero out the red and green components (a 0 in the first two items in the tuple).
We then retrieve the appropriate pixel color to assign in the right and darken it by multiplying the components by some factors.
I picked attenuation factors of $0.8$, $0.5$ and $0.2$ on the red, green and blue components, respectively, but these could have been anything (or maybe something specific to your application).
© Philip Claude Caplan, 2021