Big O Notation

Faster Programs

How can you make a program run faster?

  1. Buy a faster computer.
  2. Let it do something simpler.
  3. Write a better program.

As a programmer, you'll want to choose the 3rd option - a better program.
There are many ways to do this, but usually it requires a better algorithm.

Comparing Algorithms

There are fast (efficient) and slow (less efficient) algorithms.

Consider the problem of finding a word in an alphabetical list of words,
for example 100 words.  Here are 3 possible algorithms:
 

  1. Sequential Search - Check the first word, then the second, then the third, etc.
    If the target word is found, stop searching, and return the result FOUND.
    If you reach the end of the list, without stopping, return the result NOT FOUND.
     
  2. Binary Search - Start in the middle of the list.  If the target word is smaller
    than the middle word, throw away the bigger half of the list (after the middle),
    then start over in the middle of the remaining half.  If the target word is larger
    than the middle word, throw away the lower half of the list and start over again.
    If the middle word EQUALS target word, stop and return FOUND.
    If the length of the remaining list ever gets down to 1 word, and it isn't correct,
    then quit and return NOT FOUND.
       
  3. Dictionary Search - This is the way people search in a dictionary.  Look at the
    first letter of the word, and estimate the approximate position of the word in the
    list - e.g. if it starts with F, go to position 25, about 1/4 of the way through the list.
    Look at the word in this position, and then search sequentially forward or backward,
    until the target word is found, or until the search becomes "hopeless" - e.g. if you
    are searching for "Funny", and you get to "Stupid", you have gone too far in the list
    and it is time to give up.

The Dictionary Search is very attractive, because we are familiar with the concept.  It also seems fast, because it stands a good chance of getting quite close to the target word in one step.  But is it actually the most efficient method?

Sequential Search

Big O is a theoretical measurement of the approximate number of operations required to complete the algorithm.  For the Sequential Search, this is pretty easy to calculate.  If the word is not found, it will take 100 steps, or N steps for a list of length N.  You might get lucky and find it on the first try - e.g. "aardvark". On the "average", you can expect to search through half the list, so 50 operations, or N/2 for a list of length N.  We can state this as follows:

    The efficiency of a Sequential Search algorithm is  O(N/2) .

This would be programmed using a single loop, counting from 1 to 100 (or N).

Binary Search

The binary search is a bit more confusing.  At each step, the list is cut in half, like this:

Start with   100  items
Step 1 -->  50 items left
Step 2 -->  25 items left
Step 3 -->  12 items left (the middle item is thrown out if it is not correct)
Step 4 -->   6 items left
Step 5 -->   3 items
Step 6 -->   1 item
Step 7 -->  Check the 1 item, and return FOUND or NOT FOUND

So the maximum possible number of steps is 7.  This is equal to the number of times you must divide 100 by 2, until you reach 1.  Don't worry too much about whether the last step counts (e.g. is it 6 or 7).  This can also be calculated by answer the question:  2 raised to what power is bigger than 100?  The answer is :  2^7 = 128.

The name for this calculation is logarithm base 2, written as   log2(100) = approx 7.

The exact logarithm can be calculated as :   log(100) / log(2) = 6.64  .  But when an algorithm executes, it won't need a fraction of a step, so we round this up to 7.

What about a very long list of words - for example 80 million names for all the people in Germany?  How many steps will this take?  Here is the calculation the long way:

80 million
40 million
20 million
10 million
 5 million
.....

Well, that won't really take so long, but it is a lot quicker with a calculator:

log2 (80,000,000) =  log(80,000,000) / log(2) = 26.5

so it will take at most 27 steps to find any name in the list of 80 million.

In a list of N names, it will take log2 (N) steps, so we write:

     The efficiency of a Binary Search is  O( log2 N )

In Computer Science, logarithms usually assume base 2, so we can write simply  log N.
 

Dictionary Search

The dictionary search seems very promising, with the hope of "getting lucky" and starting very close to the right place.  But this is only useful in a short list, like 100 names.  Think about the 80 million German names, searching for Fritz.  If the 80 million names are evenly distributed (they are not actually), then each one-letter section will contain approximately 3 million names.  If we start searching at in the 6th block, where we expect the F's to be, we might need to search as many as 3 million names.  That doesn't sound so good.  But the search through the 3 million names is actually a sequential search, so only needs to check half of them on average.  How can this be stated in Big O notation?  The section to be searched will contain approximately N/26 names, so we would state:

     The efficiency of a Dictionary Search is  O( N / 26 / 2 ) = O( N / 52 )

More about Big O

The O(N/52) looks pretty good - at least 26 times faster than the normal Sequential Search.  But in theoretical terms, there is no difference between the Dictionary Search and the Sequential Search.  This is because Big O is not a measure of actual time required, but an attempt to see how the number of iterations changes as the problem becomes larger.

Graph 1 : Small Lists

Look at the graphs of  y = N/52, y = N/2, and y = log2 (N)

[Image]

The x-axis is counting the size of the list (N).  The y-axis is counting the number of operations, e.g. time.  If a graph is above another, it means that algorithm takes longer.  It appears that the algorithms are ranked in this order:
    Slowest:  N/2  =  Sequential 
    Middle:   Log N = Binary
    Fastest:   N/56  =  Dictionary   

Graph 2 : Bigger Lists

But this is only for a very short list of 7 items.  What happens when the list gets longer?

[Image]

With 100 names in the list, it looks bad for the Sequential Search O(N/2).  The Binary Search O(log N) has flattened out quite a bit, but it still slower than the Dictionary Search O(N/52).  

Graph 3 : BIG Lists

But things look quite different when the list gets longer.

[Image]

The graph shows that around 450 names, the Dictionary Search takes about the same time as the Binary Search.  In this graph, the y-axis has been expanded - otherwise it is very difficult to see the lower lines.

This looks like there isn't really much difference between the two good algorithms, but when the list gets REALLY long....

Graph 4 : HUGE LISTS!

[Image]

The steady GROWTH (linear growth) of the Dictionary Search means that it must eventually get slower than the Binary Search, whose growth rate becomes less and less as it flattens out.

More Theory

It is not really correct to write O(N/2).  Big O notation attempts to measure the growth rate of an algorithms time requirements.  Since all straight lines grow steadily, one straight line will never overtake another (if they start at zero, that is).  But all straight lines will eventually overtake a log N function.  Thus, all linear algorithms are basically worse than a log N algorithm.  In this sense, all linear algorithms are equally bad, even if one of them is 26 times faster than the other.  

In Big O notation, we simply dispose of coefficients, constant multipliers, and lower degree terms.  In each row below, the different formulae are actually equivalent:

Formula Another Formula Simplification
O(N/2) O(N/52) O(N)
O(N^2 + 6N) O(5 N^2)     O(N^2)
O(5 log N) O( log N / 100) O( log N )

Other Algorithms

Here are some more examples of algorithms, with an analysis of their Big O efficiency.

Bubble Sort

A bubble sort makes N passes through the array.  In each pass, it compares N-1 pairs of neighbors.  This is programmed with nested loops:

   for Pass = 1 to N
      for X = 1 to N-1
         if ( item[x] > item[x+1] ) then
            swap item[x] and item[x+1]
      endfor
   endfor

This takes N * (N-1) iterations.

    The efficiency of Bubble Sort is  O( N * (N-1) ) = O( N^2 - N) = O(N^2)

(The lower order term N is not significant when N is large, so it is thrown away.)

Travelling Salesman

Imagine a salesman must start in Paris, and fly to 4 cities ( London, Berlin, Rome, and Athens).  What is the shortest route he can take?  This is a famous problem which cannot be solved efficiently.  The straightforward method is to examine all possible route, calculate the length of each trip, and choose the smallest.  Here is the list of all of the 24 possible trips (he always starts and finishes in Paris):

P-L-B-R-A-P
P-L-B-A-R-P
P-L-R-A-B-P
P-L-R-B-A-P
P-L-A-R-B-P
P-L-A-B-R-P
P-B-L-R-A-P
P-B-L-A-R-P
P-B-A-L-R-P
P-B-A-R-L-P
P-B-R-L-A-P
P-B-R-A-L-P
P-R-A-L-B-P
P-R-A-B-L-P
P-R-B-L-A-P
P-R-B-A-L-P
P-R-L-B-A-P
P-R-L-A-B-P
P-A-B-L-R-P
P-A-B-R-L-P
P-A-R-L-B-P
P-A-R-B-L-P
P-A-L-R-B-P
P-A-L-B-R-P

Why are there 24 possibilities?  There are 4 choices for the first city, then 3 choices for the next, then 2 choices for the next, then 1 choice left for the last city.  So the number of possible trips is 4*3*2*1 = 4! (factorial)

If the salesmen must visit 10 cities, there are 10! = 3,628,800 - over 3 million trips to examine.

The number of trips grows very fast as the number of cities grows.  

    The efficiency of the Travelling Salesman algorithm is  O( N ! )

The Travelling Salesman problem is an example of a problem desperately in need of a more efficient algorithm. Although some clever programs have been written for this problem, the programs only guarantee to get a good result, not necessarily the best result.  This remains an unsolved problem in Computer Science.

What do we learn from Big O?

Nested loops are inherently inefficient, as one must multiply the size of the loops to obtain the required execution time.  Two nested loops produce O(N^2), if both of them increase when the size of the problem increases.  Three nested loops require O(N ^ 3) operations, which is a lot worse than N^2.  The Travelling Salesman problem is especially frightening, as a new loop must be added for each city!  Avoid nested loops if you want to write fast programs.

Good algorithms, like a Binary Search, are worth spend time to write.  It may be a lot more difficult to program a binary search instead of a sequential search, but the potential time savings in large lists is enormous.