Recent excerpts from my blog:

Monday, December 19, 2016

Auger - Automatic Unit Test Generation for Python

Unit tests are crucial to software development. They verify whether a given component actually implements what it promises. They are also important for long-living code, where future maintenance can be much more expensive without the proper level of unit test coverage. Not surprisingly, the lack of unit tests is a show-stopper for submitting production code at many companies.

You may wonder, if unit tests are so great, why do engineers hate to write them? More than once have I heard fellow engineers say "If you approve my code now, I will write the unit tests in a future change list." All engineers I know actually like writing code. It is creative. Code means impact. However, in general, engineers do not like writing unit tests at all. Why is that?

Why is Unit Testing Hard?

Unit testing is hard for various reasons:
  1. One reason would be that a unit testing framework is exactly that. A framework. When I worked on the IBM J9 and Eclipse team with Dave Thomas, he used to say "Everybody likes to write frameworks, but nobody likes to actually use them". In practice, unit test frameworks add yet another complexity to learn and master, especially when unit test frameworks have little in common across programming languages or organizations.
  2. All unit testing frameworks start off modestly. Erich Gamma once confided in me how he and Kent Beck wrote the original version of JUnit when they got bored on a transcontinental flight. He added how those few hours of work had been by far the best investment of his technical career ever. Today, however, unit testing frameworks are far from trivial and require a considerable learning investment in understanding their power and intricacies.
  3. Software systems themselves are also becoming increasingly complex. Even seemingly standalone components are heavily embedded in a context of complex runtimes. To isolate those dependencies and sculpt them with the proper "mocks" is an art. It is not uncommen to spend hours trying to find out how to write the mocks for one line of test code. Dependency injection and fancy mock syntax conspire to make the life of unit test authors challenging.
  4. Code under development constantly changes. Often, coding can be a discovery game. A design doc can outline architectural decisions on what database to use and how the UI is created. However, it tends to leave the actual coding as an exercise to the reader. During coding, discoveries are made. Code is constantly refactored. Code is found obsolete and is deleted even before it is ever committed to version control. This is particularly the case when a new domain or technology is being explored. We still write unit tests, but we write them later, when the dust settles.

OK. Unit tests are hard. But, what is the alternative?

Let's agree, unit test are a chore and tough to write. This does not mean engineers are not testing their code, even if they do not write unit tests. Many engineers, like myself, write code in a highly iterative fashion. For instance, say I am writing an AppEngine app to return a web page. I would first add a route, write a Handler, and simply return "Hello World". To test, I would spin up a local instance, point my browser at localhost:8080 and see if it shows the string I expected. I iterate that step hundreds of times.

In this iterative development mode, small, incremental steps are made towards the end goal and progress is constantly validated. Each time, some more functionality is added and tested on the code in progress. At some point, we are going to be "done". At his final point in time, hundreds or thousands of "test" runs have been exercised on the code under development. Each component has had some inputs and produced some expected outputs. If only we could remember what those inputs and outputs were and our tests would be so much easier to write. Cue: Project Auger.

What is Project Auger?

Project Auger (Automated Unittest Generator) watches your Python code while you write it and automatically generates all unit tests for your code, including all the mocks. Little or no work is required by the developer.

How does Auger Work?

Auger works like a smart Python debugger that sets breakpoints for each component you are interested in. Auger tracks two kinds of function calls related to the module under test:

  • For each function defined in the module, Auger records both the values of the arguments and the returned results. After recording enough execution traces, unit tests can be generated with the meaningful placeholder argument values and assertions. 
  • For each call made from a given component to dependent libraries or other components, we record the return value, so that this call can be automatically mocked out with known return values.

Auger tracks all possible functions, including instance methods, class methods, and static functions.

Consider the following example,, that provides a Pet with a name, age, and a species:

from sample.animal import Animal

class Pet(Animal):
    def __init__(self, name, *args):
        Animal.__init__(self, *args)
        self._name = name

    def get_name(self):
        return self._name

    def lower(s):
        return s.lower()

    def __str__(self):
        return '%s is a %s aged %d' % (
            Pet.lower(self.get_species()), self.get_age()

def create_pet(name, species, age=0):
    return Pet(name, species, age)

if __name__ == '__main__':
    print(Pet('Polly', 'Parrot'))
    print(create_pet('Clifford', 'Dog', 32))

This class has a few different entry points we would need to unit test:
  • The class Pet itself which has:
    • a static method, lower
    • two instance methods, get_name and __str__
    • a constructor, __init__, which is really a very special instance method
  • A static function that creates a Pet and returns it
The Pet class is a subclass of Animal, which we know nothing about, so we will need to mock that entire class. We do know that the class is used in the Pet constructor and inherited methods are from Pet called as well. This means that the implementation for self.get_species() and self.get_age() are unknown, as we cannot look at the implementation of Animal, when unit testing Pet. Therefore, those two inherited methods will be mocked out.

Unit Tests Generation with Auger

The above class definition combined with an execution run is enough for Auger to automatically create the following fully functional unit test:

from mock import patch
from sample.animal import Animal
from import Pet
import unittest

class PetTest(unittest.TestCase):
    @patch.object(Animal, 'get_species')
    @patch.object(Animal, 'get_age')
    def test___str__(self, mock_get_age, mock_get_species):
        mock_get_age.return_value = 12
        mock_get_species.return_value = 'Dog'
        pet_instance = Pet('Clifford', 'Dog', 12)
        self.assertEquals(pet_instance.__str__(), 'Clifford is a dog aged 12')

    def test_create_pet(self):
        self.assertIsInstance(,species='Dog',name='Clifford'), Pet)

    def test_get_name(self):
        pet_instance = Pet('Clifford', 'Dog', 12)
        self.assertEquals(pet_instance.get_name(), 'Clifford')

    def test_lower(self):
        self.assertEquals(Pet.lower(s='Dog'), 'dog')

if __name__ == "__main__":

No changes to the original code are needed to teach Auger anything. All that is required is for the developer to write their original code and exercise it somehow. In the above case, we simply ran python to produce two scenarios in which Pet instances were created and manipulated. From those two sample, a single test was extracted.

Of course Auger is limited in the sense that it cannot guess what scenario is being tested. Rather than generates multiple, focused tests per module, it will generate one big test that covers the entire module. The value of Auger is more in generating all the boiler plate code, imports and mocks, ensuring proper coverage, and to generate a template for manual refinement.

To generate a set of unit tests, Auger magic is invoked:

import auger
if __name__ == "__main__":
    with auger.magic(pet):

In this case, one module is passed, pet, but multiple modules can be passed as well. Each one will be traced and unit tested.

When a unit test is produced, it is written out to the local file system under the corresponding tests folder.

IDE Integration

Auger does not have direct IDE integration per se, but works really well with PyCharm. This integration comes for free, because the IDE watches the underlying file system and will automatically discover when new files are created by Auger in the local file system. These tests can then be executed easily as well:

Adding the generated tests to Git and commit/push them to a repository means just a few clicks from that point onwards in an IDE such as PyCharm.

Future Work

  • Incremental test generation. Collect multiple execution runs, persist the invocations, and merge multiple runs into one test case.
  • Preserve manual edits performed by users on generated test cases when a test is regenerated.
  • Support other unit test frameworks, such as pytest, nose, cucumber, etc.
  • Figure out how to run Auger on itself. This is non-trivial :-)

Check out Project Auger and let me know what you think of it. Contributions are welcomed.

Sunday, December 18, 2016


QuickSort is a divide-and-conquer algorithm that splits up an array into two halves, and recursively sorts each half. In other words, to sort an array, we pick a random pivot, split the array in three section, being smaller, equal, or larger than the pivot, and recursively sort the array:

  def qsort(array):
      if len(array) < 2: return array
      pivot = array[0]
      less = filter(lambda n: n<pivot, array)
      equal = filter(lambda n: n==pivot, array)
      larger = filter(lambda n: n>pivot, array)
      return qsort(less) + equal + qsort(larger)

In the original implementation of QuickSort, the first element in the array was picked as pivot, as is done on line 3 above. Below, the red bar indicates the choice of pivot. At each step of the algorithm, it is at the left of the array being sorted:

Choosing the first element as pivot can generate extra work when the array is already (partially) sorted. Rather than picking the first element, the middle element can be chosen, behaving better on already sorted arrays:

def qsort(array):
    if len(array) < 2: return array
    pivot = array[len(array)/2]
    less = filter(lambda n: n<pivot, array)
    equal = filter(lambda n: n==pivot, array)
    larger = filter(lambda n: n>pivot, array)
    return qsort(less) + equal + qsort(larger)

When choosing the value in the middle of the array as pivot, yields the following result, with again the red bar indicating the pivot:

Rather than pick the middle element as pivot, a random value can be picked from the array:

def qsort(array):
    if len(array) < 2: return array
    pivot = array[random.randint(0, len(array)-1)]
    less = filter(lambda n: n<pivot, array)
    equal = filter(lambda n: n==pivot, array)
    larger = filter(lambda n: n>pivot, array)
    return qsort(less) + equal + qsort(larger)

The pivot is now picked randomly as shown in the visualization below:

The final approach to pivot selection we will look at was invented by Robert Sedgewick and involves picking the median of three values:

def qsort(array):
    if len(array) < 2: return array
    pivot = (array[0] + array[len(array)/2] + array[-1])/3
    less = filter(lambda n: n<pivot, array)
    equal = filter(lambda n: n==pivot, array)
    larger = filter(lambda n: n>pivot, array)
    return qsort(less) + equal + qsort(larger)

The pivot is now picked by looking at three different values as shown in the visualization below:

Aside from how we pick the pivot, all versions of QuickSort shown above have the same thing in common that they use recursion for implementing the divide and conquer tactic. Recursion has overhead in that new stackframes need to be constructed for each call. Recursion also limits the maximum size of the array we can sort, but that is more of a theoretical challenge.

QuickSort can also be implement without using recursion, by using a user-defined stack to keep track of the work to be done. A possible implementation:

def qsort(array):
    work = [array]
    result = []
    while work:
        array = work.pop(0)
        if len(array) < 2:
            pivot = array[0]
            work = [
                filter(lambda n: n<pivot, array),
                filter(lambda n: n==pivot, array),
                filter(lambda n: n>pivot, array),
            ] + work
    return result

And the execution (this time using the first element as pivot again):

In general, QuickSort is not a stable sort. Elements of equal value are not guaranteed to remain in their original sort order for most implementations. However, the implementations above keep the elements that are equal to the pivot in the original order as they were found. That makes the above implementation stable.

Check out more algorithm visualizations at PyAlgoViz.

Friday, December 16, 2016


MergeSort is a divide-and-conquer algorithm that recursively sorts two halves of an array and merges them to form a fully sorted end result. In other words, to sort a sequence, we find the middle point, sort the first half, sort the second half, and finally merge both halves:

def mergeSort(array, start, end):
    if end - start > 1:
        middle = (start + end) / 2
        mergeSort(array, start, middle)
        mergeSort(array, middle, end)
        merge(array, start, middle, middle, end)

The merging of the two halves is done by inserting elements of the second half into the first half. The second half keeps shrinking in size until we are done:

def merge(array, left, leftEnd, right, rightEnd):
    while left<leftEnd and right<rightEnd:
        if array[left] > array[right]:
            array.insert(left, array.pop(right))
            right += 1
            leftEnd += 1
            left += 1

Executing the above implementation on 40 random numbers produces the following result:

This implementation is a stable sort. Elements of equal value remain in their original sort order.

Check out more algorithm visualizations at PyAlgoViz.

Thursday, December 15, 2016

Generating Prime Numbers

In this blog post, we will generate prime numbers and explore brute force, memoization, and dynamic programming.

Prime number are numbers greater than one that are only divisible by themselves and one. In other words, the following Python function returns True if n is greater than one and none of the numbers in between 2 and n is a factor of n:

Of course, this implementation is rather naive and takes a lot longer than is necessary:

Our first optimization would be to realize that we need not check all factors in between 2 and n. Namely, we only need to look at the factors between 2 and n/2, as any number higher than that could never be a factor of n.

Our code now looks like this:

<== doing half the work

With this optimization in place, the algorithm runs about twice as fast already:

Still, we can do better. We can actually already stop considering factors when we reach sqrt(n):

A further optimization would be to realize that we only need to consider the factors 2, 3, 5, 7, 9, etc. as 4,6,8,etc. could not be factors as they are already divisible by 2. But, we'll leave that refinement to the reader.

Let's try our sqrt(n) optimization in the next iteration of our algorithm:

By now, our algorithm should be more than twice as fast.

Of course, we can still do better. For instance, say we are checking to see if 41 is a prime number. To validate 41, our algorithm will try the factors between 2 and sqrt(41), which is 6:
  • 41 % 2 = 1, not a factor
  • 41 % 3 = 2, not a factor
  • 41 % 4 = 1, not a factor
  • 41 % 5 = 1, not a factor
  • 41 % 6 = 5, not a factor
When we check for factors 4 and 6, we are doing double work. Namely, if 41 would be divisible by 4, it would already be divisible by 2. The same argument goes for 6, which is already divisible by 2 and 3 itself. 

In other words, to check if a given number is prime, we need to only verify the prime numbers between 2 and sqrt(n) as divisible factors. This greatly reduces the search space. All we need to do is remember the current list of prime numbers and use those as factors. This approach is generally referred to as a Prime Sieve as opposed to the trial division approach we used so far:

Note that the first line of our isPrime function now has a linear search because we used a list to hold the current primes. Ideally we would have used an OrderedSet instead of a list, to make membership checks O(1) and speed up that part of the algorithm even more. For simplicity, we use a list now.

Using our newly minted insight on using primes as factors, we produce the following execution:

That is confusing. This is slower. How can that be?

One reason is that at this point, our code is a lot more complex than before. Namely, in order to answer the question if a given number is a prime, in the worst case, we need to discover all the primes before that number. There is also some bookkeeping overhead to caching each of the primes we found so far. We don't get much apparent speedup due to all the extra work we need to do.

However, once we actually warmed up the cache, by running our loop once ahead of time, and then measuring a second iteration of our loop, we see an interesting speedup again:

Essentially, we are demonstrating the power of memoization here. The progression we have seen so far is from brute force, to a bit smarter, to using another smart insight, to remembering intermediate steps. By remembering the intermediate results, we essentially are performing a space-time tradeoff. By adding a bit more memory, we can avoid performing repetitive operations.

The next big leap in an algorithm such as finding prime numbers is whether we can make any assumptions on exactly in what order the API is accessed. For instance, are we more interested in being able to figure out if any given random number is a prime number, or are we more interested in producing a prime number generator, one that produces increasingly larger primes? If the latter is the case, we enter the domain of Dynamic Programming.

By assuming we produce increasingly larger primes, the Pythonic instrument we love to go for is a generator function. It can keep local state, return one result at a time, and resume execution at the place where it yielded the most recent result:

The use of co-routines in this manner allows us to combine the concept of memoization and cleanly wrap it in a nice abstraction. The function can be used like any regular Python function as follows:

The final iteration of our algorithm now produces:

This demonstrates a nice progression from brute force to memoization to DP.

That said, if you made it this far, you probably want to take a look at Atkin's Sieve, that generates primes using an O(n) approach.

For more algorithms and their visualizations, check out PyAlgoViz.

Tuesday, December 13, 2016

What I Learned While Fighting Fake News

In a couple of upcoming blog entries I will share what I learned while building Realnews/Fakenews, a Chrome extension that adds visual indicators to sites such as Facebook and Twitter to warn readers of possible Fake News links: Fake News

I will talk about the extension itself, the architecture and implementation of the server, choices in hosting plans and pricing for the server, and how to use machine learning to predict fake news from just looking at the URL. 

Fake News is serious. It can lead to vigilantes storming pizza parlors and foreign parties influencing US elections or upcoming UK elections. It is hard to root out fake news as it feeds on a powerful psychological trait called confirmation bias. What that means is that fake news is often designed to cater to a specific target audience. Fake news authors write specific content their targets like to hear, whether it is factual, hyperbolic, or not. Goal is to increase the likeliness of them sharing, thereby strengthening the original cause, or more likely, generating advertisement income.

What can we do about fake news? In a simple diagram, this is how fake news spreads:

There is not much we can do to stop fake news publishers from actually creating content. What the industry can do, social media outlets in particular, is to stop giving fake news publishers a free distribution channel. That's a good step in the right direction.

Where the Realnews/Fakenews Chrome extension helps is at the reading stage. When a link is presented on Facebook or Twitter to a third party site as a shared post, a small button is added to the top right to indicate the category of the site:

In this case, the site this tweet links to is marked as real news. Of course, the subject itself is fake news, but that's not the point. When the green button is pressed, a category selector shows up as follows:

When a different category is chosen by the reader, it counts as one vote. A consensus model is used on the server so that future readers will be presented with the popular vote. Voting is non-moderated. 

Once enough pages have been voted on, features can be extracted such as the hosting domain of the link, words used in the URL, or the occurrence of specific content, such as lots of ads. Machine learning is then used to classify unknown sites and seed the prediction with a meaningful starting point. 

Like I said at the start, more details on the implementation will come in future blog entries. For now, try it out!

Monday, December 5, 2016

PyAlgoViz on Github

PyAlgoViz on Github

Long overdue, I finally published my PyAlgoViz source code on github. The Python source shows you how to create a sandbox on AppEngine to run Python code, step-wise debug the code, run visualization logic at each step, record the visualization, and return the result back to the browser to visualize it there using HTML5 Canvas. 

You may also like the accompanying video and slide deck linked from

Happy visualizing!