Lecture 13: Complexity

By the end of this lecture, you will be able to

  • define the computational complexity of algorithms in terms of the best, worst and average cases,
  • identify a divide-and-conquer algorithm for solving a problem,
  • analyze the complexity of (1) linear search and (2) binary search algorithms,
  • analyze the complexity of (1) selection sort and (2) merge sort algorithms.

So far, we've written solutions to computational problems, but we haven't studied how efficient these solutions are. As we've seen several times, there is usually more than one way to solve a problem. The main question we want to think about today is how much computational resources will we need for a particular solution to a problem? These resources could include computational time and also memory. Specifically, we want to know "how long will it take for a program to run (or how much memory will we need) when the inputs to our program get really big?" This is the study of computational complexity.

There are other aspects of computational complexity that we won't learn about in detail, but I want to mention them briefly. For example, we may want to understand whether the running time of an algorithm grows as $\log(n)$, $n^2$, $\exp(x)$ or $n!$, where $n$ is the size of the input. The picture on the right is a solution to the traveling salesperson problem (TSP) which asks: "given a list of cities and distances between cities, what is the shortest possible path that visits all cities exactly once and returns to the starting city." This is a very hard problem and, in fact, an exact brute-force solution can be computed in $n!$ time, which is not very efficient! The study of computational complexity also asks which problems can be solved with a computer, such as the halting problem.



Two possible solutions to the traveling salesperson problem. (source)

big-oh notation

In order to study the complexity of an algorithm, we need a of way of measuring the performance (whether that be run time or memory). Further, we want to estimate the performance of an algorithm when the inputs get really really really BIG. This is done using big-oh notation, which is represented by a "big O": $\mathcal{O}$. Big-oh notation is a useful way to characterize how a function grows without caring too much about the details of the function. We want to know: does an algorithm run in linear time (denoted by $\mathcal{O}(n)$)? or does it run in quadratic time (denoted by $\mathcal{O}(n^2)$)? or does it run in factorial time (denoted by $\mathcal{O}(n!)$)? We don't really care about whether it runs in $\mathcal{O}(2n)$ or $\mathcal{O}(3n)$ - it's still linear time, $\mathcal{O}(n)$ (we don't care about the $2$ or the $3$ because these are constant factors on the function inside the $\mathcal{O}$). Ultimately, we're interested in placing an upper bound on the performance, so that we know our algorithm will always perform better than this upper bound. When designing an algorithm, we would like the stuff inside the $\mathcal{O}$ to grow as slowly as possible. So we would prefer an algorithm that runs in $\mathcal{O}(n)$ time instead of an algorithm that runs in $\mathcal{O}(n^2)$ time; similarly we would prefer an algorithm that runs in in $O(n\log n)$ instead of an algorithms that runs in $\mathcal{O}(n^2)$.


best, worst and average case analysis

So how do we go from an algorithm to estimating the performance? Well, maybe we can write up the algorithm (into actual code) and then just run some experiments to estimate how long it takes to run for various input sizes. This isn't the most practical approach because you might end up waiting a long time for your program to run. Instead, we can look at the pseudocode to try to count the the number of operations performed. In Week 8, we talked about how computers perform operations at a particular GHz, i.e. the number of operations per second. The total time spent on an algorithm is, therefore, dependent on how many operations are performed. In the examples on searching and sorting below, we will focus on counting the number of comparisons performed, using comparative operators like == or <.

When counting operations, we might want to consider how many operations are performed in the best case, worst case or average case. Analyzing the average case requires some probability theory, so we won't do that in our course. Just as it sounds, a best case analysis is one the treats the best possible case (for the inputs) to our algorithm. For example, in a sorting algorithm, this algorithm might perform really well if the input list is already sorted. A worst case analysis involves being really pessimistic about the inputs to your algorithm (hence, being the worst case). In other words, you want to assume that any comparison that might be made will be made. For example, in a sorting algorithm, if we want to sort the list from smallest to biggest, but the input list is sorted from biggest to smallest, then this might be the worst case for the algorithm (it depends on the algorithm).


divide-and-conquer algorithms

Before we see some specific algorithms for searching and sorting, I want to introduce the idea of a divide-and-conquer algorithm. These are algorithms that break up the current problem into smaller pieces (the divide stage), assuming that we can eventually reach some base case which we know how to solve. Then we can conquer our original problem by using the solution to the smaller problems. If this sounds recursive-y it's because divide-and-conquer algorithms are often implemented using recursion!


searching algorithms

In a searching algorithm, the goal is to identify whether a list contains a specific value. There are two inputs: some number we are looking for and the list. The size of the list is the "$n$" we care about when analyzing the performance of a searching algorithm.


A naive approach would be to go through every item in the list, starting from the beginning and check, using ==, whether the item in the list matches the value we are looking for. In the best case, the item we are looking for is in the first item in the list, so only 1 operation is performed: $\mathcal{O}(1)$. In the worst case, the item we are looking for is at the end of the list, so $n$ operations are performed: $\mathcal{O}(n)$.


We can actually do much better in our searching algorithm by employing a divide-and-conquer approach. In this approach, we will assume the input list is sorted and then run our algorithm recursively on half of the input list. Because of our assumption about the list being sorted, then checking the middle of the list will tell us which half of the list we should ignore. This saves a lot of computation! In the best case, we might still find the value in the first item in the list: $\mathcal{O}(1)$. In the worst case, we will need to perform $\mathcal{O}(\log n)$ operations. Don't worry how to derive this for now - but take CS 200 if you're interested in learning more. Ultimately the $\log n$ comes from asking "how many times do we need to break a list in half until we get to a sublist of length 1 (the base case)?"


sorting algorithms

In binary search, we made an assumption that the input list was already sorted - but how do we sort the items in a list? In sorting algorithms, our only input will be a list, and the "$n$" will be the length of the list (the number of items). Our goal is to sort the list from smallest to largest items in the list.


A naive approach to sorting items in a list consists of imagining a scanline that moves from left to right (or top to bottom in the diagram on the right) through the items in the list. The first time this scanline passes through the list, it starts at item 1 and then traverses the entire list to search for the smallest item. When it finds it, this smallest item will be placed at the beginning of the list (index 0). Index 0 now contains the smallest item, so it doesn't make sense for our scanline to start there - we will now start scanning at index 1 and search for the second smallest item. This second smallest item will then be placed at index 1 of the list. Scanning then starts at index 2 and scans for the third smallest item, and so on, iteratively looking for the smallest remaining item in the list. Suppose we start scanning at index $j$, then the total number of comparisons (<) we need to perform for this scan is $n-j-1$ (in the worst case). How many scans do we need to do? Well we need to start at index $j = 0$ and are done at $j = n-1$, which gives $n$ total scans. To count the total number of operations in the worst case, we can image each scan corresponding to a row in a triangle (see the lecture video). The horizontal and vertical sides of the triangle have a length of $n$ (roughly). The area of the triangle is the total number of comparisons, so $n^2/2$ (base times height/2). But, as we said earlier, we don't care about the $/2$ (it is a constant factor), so in the worst case, this algorithm runs in $\mathcal{O}(n^2)$ time. In the best case, the input list is already sorted, but we still need to check all the items during a particular scan, so the algorithm is still $\mathcal{O}(n^2)$ in the best case. The code for selection sort is provided below.


(source)

An illustration of the selection sort algorithm is shown on the right. Yellow items are already sorted. Red items show the current minimum in the scan through the remaining list. Items currently being scanned are shown in blue. When an arrow appears, the two items on either side of the arrow are swapped.

Another sorting algorithm which also runs in $\mathcal{O}(n^2)$ time is called insertion sort. The details are a bit more complicated with the advantage that the best case runtime is reduced to $\mathcal{O}(n)$ if the input list is already sorted.


Can we do better than $\mathcal{O}(n^2)$ for sorting a list of numbers? Yes, in fact, we can use a divide-and-conquer approach to recursively sort two halves of the input list (dividing our original problem into two smaller subproblems) and then merge the two sorted sublists. This example is a little more advanced, so don't worry too much about the details. All you need to know is that the run time of merge sort is $\mathcal{O}(n\log n)$. To derive this result, please take CS 200 :) However, I still want to illustrate the basic idea of merge sort:

  1. check if the input list has a length of 1 - if so, a list with 1 item is already sorted!
  2. otherwise, split the input list into two halves: a left sublist and a right sublist,
    • sort the left and right sublists recursively (i.e. calling merge_sort) on these two sublists,
    • merge the resulting sorted sublists into the a sorted length $n$ list.


Illustration of the merge sort algorithm. Items in red are those that are currently being compared during the merge stage. (source).

In merge_sort, all comparisons (<) are done in the "merging" stage. It can be shown that, in the worst case, merging two sublists of length $n/2$ requires $n-1$ operations. Since we need to divide our original list $\log(n)$ times to get to our base case, this leads to the result that merge sort requires, in the worst case, $\mathcal{O}(n\log n)$ operations to sort a list of length $n$. Again, don't worry too much about the details, but remember the result. The algorithm is written below in Python if you are curious.


© Philip Claude Caplan, 2021