Programming

Table of Contents

1. Mission statement

  • What are unit tests good at?
  • What are unit test not good at?
  • What does maintainability gain us?
  • How does encapsulation fit in?

2. What is maintainability?

Maintainability is, ideally, exactly what it sounds like. Programs decay, requirements age, the logic ages, the code itself ages; and the author ages as well! This produces problems around keeping a program working, and working properly. The expected behavior of a program can change over time, stakeholders might want it to do something different today than what it was doing yesterday.

The author might not recall exactly what a program does, why it does what it does, if there are any surprises or limitations in the way it behaves. There are lots of implementation details that will be fresh in the author’s mind when the program is first minted, these will inevitability decay.

As a result, a lot of very smart people have spent time “going back” and changing code, those folks collectively have opinions about what kind of code is harder to change, and conversely what kind of code is easier to change.

One might object at this point. This hypothetical interlocuteur might point out that maybe there’s a program that gets written one time and never changes ever again! This is true, and it highlights one of the reasons that it’s helpful to architect a program. A very good question to pose when starting to write something is: Which parts of this will change? This, of course, has some follow up details that are also impactful, a not-exhaustive check list one might go through follows:

  • Define the logical system
  • Identify aspects that are
    • expected to change
    • likely to change
    • possible to change
    • definitely not going to change
  • For those aspects, define the shape and scope of those changes

Looking into these questions further, we can get to the root of these questions and discover that we have some goals. Some goals might be:

  • We want the program to be delivered quickly.
  • We want the system to be flexible, and to be able to change, ideally on the axis that we desire, possible on other axis as well.
  • We also want to be able to make modifications quickly, or as quickly as possible.
  • We want to be able to fix things if the break, again ideally quickly.
  • We want to be able to confirm that the program works, both “without error” and “as expected”.

Some of these goals can run up against each other. It’s often easier and faster to write a program that’s written to never change. However, programs that are written that way are most likely harder to change. The “ask” might be for “throw away code” that is only going to be run one time. It could also be for something that we intend to keep for a long time, with expected and even scheduled changes. Often the currency here is “developer hours”, where doing something “right” generally takes longer.

This is one of many reasons why it’s important to identify what your goals and expectations are up front. This allows one to allocate developer hours in alignment with what the stakeholders find important.

For the purposes of this essay we will be ignoring “throw away code” and looking more deeply at how to write code that is commonly called “maintainable” or “clean”. We will also be imagining a large single file program “The Program” that runs sequentially as something to compare or contrast these ideas against.

3. What is DRY?

DRY is an old programming axiom acronym (of which there are several) that obtusely stands for “Don’t Repeat Yourself”. A good rule of thumb for this is “Am I copying a bunch of code from one place to another with few if any modifications?”, of course that’s while writing The Program, if we’re looking at it after it’s already been written it might be more like “did someone copy a bunch of code from one place to another with few if any changes”.

If the answer is “yes”, then you are very likely violating DRY, or working with code that does so.

3.1. Why do I care?

The overarching reasoning behind why DRY is considered best practice, in no particular order:

3.1.1. Less code is better

Everything else being equal, more lines of code is worse than fewer. There are trade offs and caveats of course (so “everything else being equal” is doing some heavy lifting here), but in general, as the amount of code increases:

  • The number of locations where the program can break goes up.
  • The temporal cost to understand the code goes up.
  • The cognitive load on the developer reading the code goes up.
  • The cognitive load on the developer reviewing the code goes up, and as that load increases the quality of the code review decreases.
  • The overhead to make a change and confirm it works goes up This is particularly pronounced in applications where manual testing is slow, complicated, confusing, expensive, or (as is often the case); all of the above.

3.1.2. We want to minimize all locations that code needs to change in the result of a logical change.

A trivial example of this might be: different average calculation. Let’s assume that we’re writing a program that has a bunch of different users, and further assume that for some reason we need to calculate the average age of our users. Maybe we don’t know going into this what the method is, or perhaps we are expecting it to get really complicated. In order to prepare for either of these situations we make a function that takes a list of ages and returns the average. I have also added some typing here since we know those expectations as well.

def get_user_age_avg(ages: List[int, int]) -> float:
    return mean(ages)

This is usable right now, allows the developer to continue building the system, and in the future if the method for getting the average of ages changes, we only have one location in The Program where we’re doing that, so one logical change “Please use median instead of mean” equates to one physical change “update getuserageavg”. Even more important, we can test the method by itself so we can ensure that it continues to work as expected.

Encapsulating correctly, as this should illustrate, is much easier to do if you know what changes are expected; however, it’s also possible to structure code reasonably well without knowing what changes are scheduled. We sometime call this process “abstraction” and will talk about “abstracting the code”. This can be thought of as an iterative process with several steps addressed later.

3.2. How do I avoid repeating myself?

There are many methods for reducing repetition (or Drying out your code), a very straightforward process, and likely the one that the reader is used to seeing is: functions.

A trivial example of a function is moving a calculation that is performed many times over and over throughout the program into one place: a function. Unit conversion is a concrete example of this. Imagine that we have a program that ingests data measured in one unit (meters), and then needs to convert those units to different units (feet). Applying this with our litmus test might be “Am I doing this calculation in multiple places in The Program?”. If performed retroactively, one might search the program for `feet = int(meters / 0.3048)` and count the number of implementations.

3.2.1. Example: feet to meters

width_feet = int(width_meters / 0.3048)
height_feet = int(height_meters / 0.3048)
depth_feet = int(depth_meters / 0.3048)

This can be “abstracted” to move the calculation into a function:

def feet_to_meters(meters):
    return int(width_meters / 0.3048)

width_feet = feet_to_meters(width_meters)
height_feet = feet_to_meters(height_meters)
depth_feet = feet_to_meters(depth_meters)

One might immediately notice that we have increased the number of lines of code, which all things being equal is a bad thing; However, things are not all equal here, we have move the “feet to meters” calculation into one place, which means that if we ever want to change that, or if there’s a bug to fix, there’s just one place to fix it. Of course, it’s terribly unlikely that the math for this calculation is ever going to change, so in reality we’re likely going to be attempting to abstract more complexity than just one division operation.

3.2.2. Example: circular region

A more complex example might be calculating a GIS region. One might image that a program is written and that part of the expected behavior from the program includes calculating a physical region from some incoming data. Let’s further assume that the stakeholders don’t have hard details about the shape and size of the region, but that it’s going to be contiguious, and mutable; it will be one discrete region and have the ability to change over time.

This is an interesting problem because it has a very simple pseudo solution, but a much more complex rigorous solution (particularly when you add in projection).

3.2.3. Euclidean square: inline

Any “blob” shape can be closely estimated by a polygon, with the accuracy of the shape increasing as we add points to the shape. A very simple polygon differs from a very complex polygon in the number of individual points that make up the polygon. Imagine a very large, very complicated GIS region (such as a storm or cloud). Such a region can ultimately be represented, and stored by a list of points (latitude, longitude) at the edges of the region. We don’t need to start with super complex though. We just need a polygon.

Perhaps the most simple solution is calculating a square, and the most straightforward operation is likely to calculate the square inline.

So, let’s calculate a square with sides of 5 from the euclidean origin:

# Comment: calculate a square
SQUARE = [(0, 0), (0, 5), (5, 5), (5, 0)]

>>> print(SQUARE)
[(0, 0), (0, 5), (5, 5), (5, 0)]

This is a very simply shape, but it’s a valid polygon, and placing it at the origin gives us the ability to heuristically define a point that should be inside the region: (2,2). We could now, armed with this expectation, write a unit test for the logic to determine that a point is in a region:

# Comment: calculate a square
def is_point_inside(region, point):
    # TODO: implement me
    return True

>>> assert is_point_inside(square, (2,2))
True

There’s a glaring omission here, the very broad “TODO: implement me”. This is very intentional. What we have done here is to move the “is a given arbitrary point inside a region” calculation (which is non-trivial) into it’s own function. This means that:

  • We can reuse the function everywhere we want to perform this calculation.
  • We can “check to see” if the function works manually.
  • We can write an automated test to ensure that the function continues to work.

This is what a unit test for that calculation might look like. Because we’re passing in simple regions and points we can actually write the tests before we write the code. The second argument to assert is an optional failure message.

# Comment: calculate a square
from main_program import is_point_inside
from fixtures import SQUARE #5x5 at origin

def test_is_point_inside():
    # Confirm that known interior points are "inside"
    assert is_point_inside(SQUARE, (2,2)), "Failed in the middle"
    assert is_point_inside(SQUARE, (5,0)), "Failed on a corner"

    # Confirm that known interior points are "outside"
    assert not is_point_inside(SQUARE, (6,0)), "Failed outside on one axis"
    assert not is_point_inside(SQUARE, (6,6)), "Failed outside on both axis"
    assert not is_point_inside(SQUARE, (-1,2)), "Failed negative axis"

Because we’re using a simple square, we can put together this test (this is a unit test), and define how the real program should behave. This method is called “test driven development”, or TDD. To continue on the TDD trajectory we would then run the test with the expection that it will fail (because we hardcoded it to return true, and there are some test cases where it should return false). However, we don’t have to. If we want to work on logic that adds a new point to a region, we could hardcode ispointinside to false. For now let’s just add a doc block (and some typing).

# Comment: calculate a square
def is_point_inside(region, point) -> bool:
    """
    Determine if the given point is inside the given region.
    TODO: look for an external library that does this well
    Returns a boolean
    """
    return True

Let’s put that complicated logic down for now and return to our region, and encapsulation.

A square doesn’t really approximate a circle, it has corners that take up more region than we want. At this point, we can further refine the shape, we can add more sides!

3.2.4. Euclidean hexagon: inline

Calculate a hexagon with sides of length l around the euclidean origin:

# Comment: calculate a hexawrong
import math
pi = math.pi

hexagon = [(math.cos(2*pi/6*x)*5,math.sin(2*pi/6*x)*5) for x in range(0,6+1)]

print(hexagon)
[(5.0, 4.330127018922193), (2.5000000000000004, 4.443255075045337), (-2.499999999999999, 0.471274906292427), (-5.0, -3.9339932379101574), (-2.500000000000002, -4.722366141717481), (2.4999999999999964, -1.1690173931370136), (5.0, 3.4591205554937012)]

Now one might imagine that there are several places where we are calculating a hexagon from a point. This implementation is fairly brittle, hard to change, and hard to confirm that it’s working or working as expected.

Now let’s assume that we’re calculating multiple regions of different sizes:

# Comment: calculate a hexawrong
import math
pi = math.pi

l = 15
large_hexagon = [(math.cos(2*pi/6*x)*l,math.sin(2*pi/6+x)*l) for x in range(0,6+1)]

l = 6
tiny_hex = [(math.cos(2*pi/6*x)*l,math.sin(2*pi/6*x)*l) for x in range(0,6+1)]

print(tiny_hex)
[(5.0, 4.330127018922193), (2.5000000000000004, 4.443255075045337), (-2.499999999999999, 0.471274906292427), (-5.0, -3.9339932379101574), (-2.500000000000002, -4.722366141717481), (2.4999999999999964, -1.1690173931370136), (5.0, 3.4591205554937012)]

Did you notice that the math is wrong for the large hexagon above? I introduced a typo; some rhetorical questions about this typo:

  • Was it immediately obvious?
  • Can you find it?
  • How long did it take to find?

This is the sort of error that is unlikely to be identified by a peer review, the math still produces a list of points, but one would likely need to plot the shape on a map to detect the issue.

3.2.5. Euclidean hexagon: function

One technique to combat this is to move the calculation into a function, so we only have to get hexagon creation right one time, and can reuse that known good code:

def hexagon(length):
    return [(math.cos(2*pi/6*x)*length,math.sin(2*pi/6*x)*length) for x in range(0,6+1)]

a_hex = hexagon(5)
b_hex = hexagon(6)

Of note: this code is now testable; we can write a “unit test” that confirms that the function always returns a good value:

assert hexagon(5) == KNOWN_GOOD_HEXAGON_SIZE_FIVE

Where the constant is a the known correct value for a five size hexagon.

We can see this in action here:

>>> print(hexagon(5))
[(5.0, 0.0), (2.5000000000000004, 4.330127018922193), (-2.499999999999999, 4.330127018922194), (-5.0, 6.123233995736766e-16), (-2.500000000000002, -4.330127018922192), (2.4999999999999964, -4.330127018922195), (5.0, -1.2246467991473533e-15)]
>>>

You might be familiar with variables and suggest “Perhaps we want to make other shapes”. We can achieve that by moving the number of sided into an argument for the function, of course renaming it along the way to represent what it does:

def polygon(l, n):
    return [(math.cos(2*pi/n*x)*l,math.sin(2*pi/n*x)*l) for x in range(0,n+1)]

Which will produce the same output, but could be used to make regions that are octagons, or any polygon. Now let’s return back to the stakeholders who are not aware or interested in hexagons, they want something closer to a circle.

3.2.6. Euclidean pseudo circle:

We can add more sides to the polygon to more closely approximate a circle:

def polygon(l, n):
    return [(math.cos(2*pi/n*x)*l,math.sin(2*pi/n*x)*l) for x in range(0,n+1)]

def circle(l):
    """Return a hundred sided polygon."""
    return polygon(100)

Of course this is all just theoretical until we try to use said circle. As per the stakeholders we’re calculating the “circle” at a point (lat/long), so we actually want to provide a point for the origin. We might see the program start to take shape like this:

def polygon(l, n):
    return [(math.cos(2*pi/n*x)*l,math.sin(2*pi/n*x)*l) for x in range(0,n+1)]

def position_polygon(polygon, point):
    """
    Given a poly and a point, center the polygon on the point
    returns the resulting polygon
    """
    #Assume implementation is here
    return polygon
    
def circle(l):
    """Return a hundred sided polygon."""
    return position_polygon(polygon(100))

I’ve not included the implementation of moving a polygon to a euclidean point. This is, I believe, a good example of “postponing complexity”. The program operates on polygons, it does not care where the polygon is, so we can get everything working end to end, but without the unimplemented part above. Manual testing of this program could occur, and the tester would just need to know “Look for the polygon at null island”.

3.2.7. Euclidean region

Now let’s further assume that the initial request from the stakeholders was not a circle, but was in fact any arbitrary region (which is a superset that contains a circle and all the pseudo circles we created above), we could further refine the ask to be:

  • Given a point, a width w, and a shape calculate a shape of width w around the point

Of course, size can be a property of a well defined shape, so you could further collapse the ask into:

  • Given a point and an arbitrary shape, calculate and return a region centered at said point.

3.2.8. Projection region

Of course, in the real world, you cannot just map a euclidean shape directly onto a lat/long: that will produce an oval, the size of which varies depending on where it is located on the globe. So there is a further step that needs to be taken to get the circle to the right place and shape: projection.

3.3. Abstracting region complexity

There are multiple aspects of the region creation here, but what I want to point out is that for most use cases, a square might work just fine, and furthermore in some cases the square might work without a projection. In that instance we might take the approach that

  • A true circle is not needed
  • A projected circle is not needed
  • The circle can be at the origin (null island)

If those things are true for an initial implementation, then we can decide to encapsulate that complexity and skip it for the time being.

def circle(p, l, n):
    """
    Given a point, length and number of sides, calculate a region around the
    lat/long point that has a radius of l

    Returns an n sided polygon.
    """

    return [(0, 0), (0, 5), (5, 5), (5, 0)]

As long as this is returning a well formed polygon, the rest of the program does not care that it’s actually a rectangle, or that it’s in the middle of the ocean, but it does allow the end to end manual test to finish.

Once the manual test of a rectangle being created on null island has been proved out, the program could be altered to return an entirely different polygon from the circle function, and the majority of the program would remain the same, meaning that there are fewer vectors for bugs.

So one approach to developing this functionality is to write it as complicated as it will ever be the first time. It’s really hard to get everything right the first time though, so depending on the requirements and the developer time, starting with a more simple implementation and then making that simple implementation progressively more complex might make the development process easier.

3.4. What’s the point?

After using the program for a bit, doing a few demos and gathering feedback, it comes to light that the region needs to be a different shape than a circle, it is actually a shape that grows as additional lat/long points are added to the circle. It’s not a circle at all, but an arbitrary region. So in this imaginary scenario the stakeholders say “oh, let’s just change this from a circle to a blob”, which is a very simple logical change.

At this juncture though, how complex or difficult it is to implement the change from “circle” to “blob” is going to be dependent on how The Program was implemented. If it is calculating circles all over the place, this change might require an entire rewrite of the whole program. Alternatively though if the whole program is written such that it’s using arbitrary regions (one of which happens to be a circle), then the only part that changes is the region calculation. Even better would be if the region is represented by a class! This is a perfect application for a model pattern:

class Region():
    """Represents a geographical region made of multiple points."""
    points = []

    def create(center, radius):
        #Creates a new region at center, not implementation
        return Region()

    def add_point(point):
        points.append(point)

Now we have a class that represents a region, and we can utilize that class to pass around region objects. We can pass Regions around, make modification, add points, do lots of high level logic, and not have to worry about the details just yet.

3.5. Abstracting Code

  1. Identify a location of complexity or confusion.
  2. Identify the scope or bounds of said complexity.
  3. Move the complexity out of “The Program” and into a structure that encapsulates the complexity.

4. What is encapsulation?

Encapsulation is generally considered part of Object Oriented Programming (OOP), but that’s really just one implementation of encapsulation. It can be achieved with files and functions, only importing as needed.

The important part is seperation of concerns, and hiding complexity. To continue with the region example from above, The Program almost certainly does not care if the region is a circle, square, pseudo circle, or any other shape. Furthermore, determining the correct shape might be a very complex and intensive process.

Now let’s imagine that the “thing” we’re going to be doing with the region is determining if an entirely new point is inside the region or outside of the region.

This is very testable if we have modified our Program to use Region objects.

class Region():
    """Represents a geographical region made of multiple points."""
    points = []

    def create(points):
        #Creates a new region from given points
        r = Region()
        r.points = points
        return r

    def add_point(point):
        points.append(point)
        
    def is_point_inside(point):
        # Assume some sort of manual laborious process here
        return bool(is_inside)

Because we’re using a region class, we can write a unit test for the new method, but because we’ve abstracted the region complexity, we can test the ispointinside method using a square, and perhaps more interestingly, we can write the test such that it’s self evident that the testregion below should work:

def test_region():
    square = [(0, 0), (0, 5), (5, 5), (5, 0)]
    r = Region.create(square)
    middle = (2.5, 2.5)
    assert r.is_point_inside(middle) == True, "Failure: 2.5,2.5 is not inside 5,5"

5. What are “patterns”?

Patterns, in the software world, are very similar to the same concept in the sewing or other flat manufacturing processes. Very simply, it’s a general shape that has worked well in the past for a given application. In software, “patterns” refers to which objects and what shapes are used for a project. Over the course of several decades of software creation, there have emerged some “good” patterns, where “good” usually means “abstracted complexity and remained flexible and maintainable over time”

5.1. What are some common patterns?

5.1.1. Model

The model pattern might be the most simple, and perhaps the most elegant. It’s the M in the much vaunted “MVC” pattern. A model represents a discrete “thing” (I know that’s not helpful yet), which can be a real word “thing” like a user, a company, a home, or as in the example above: a geographic region.

class Region():
    """Represents a geographical region made of multiple points."""
    points = []

This pattern is also sometimes known as “Active Model”, generally when the model performs some validation. Here’s an explicit validation method:

class Region():
    """Represents a geographical region made of multiple points."""
    points = []

    def is_valid(self):
        """Confirm that the region is a valid shape."""
        assert is_instance(self.points, List)
        assert len(self.points) > 2
        return True

This allows us to hide complexity in the Region object. Once we have done that, then operations that occur on the Region level become possible, and can expose the business logic while hiding the implementation details. Imagine now that we’re asked to combine regions. If The Program were littered with `square = [(0, 0), (0, l), (l, l), (l, 0)]` snippets all over the place, we’d likely need to tear the whole thing apart! However, if we have a Region class, then we could write a Region.combine method like the following:

class Region():
    def combine(new: Region):
        r = Region(self.points)
        for p in new.points:
            r.append(p)

        return r

This means that our “The Program” code can look like this:

def main(existing_region, new_region, new_point):
    if existing_region.is_point_inside(new_point):
        # point is in existing region return that region
        return existing_region

    if new_region.is_point_inside(new_point):
        # point is in expanded region return that region
        return existing_region.combine(new_region)

    return False

The above has hidden the complexity involved with shapes, regions, projections and is really just the business logic. One can certainly comment on this, but it should be clear from the above code that it:

  • Returns the existing region if the new point is in the existing region
  • Returns the combined regions if the new point is in the new region
  • Returns false if the point is in neither region

This logic does not need to “know” about the details of the shape creation, it just “knows” that it’s manipulating regions.

Models are sometimes referred to as “Active Record” when the model represents a row in a database. In that case you might have some methods in the model that act as a facade for the database operations. An example here is CRUD operations, the business logic very likely does not care if “saving” a region results in an insert into the database or an update of an existing row. The business logic often only wants to ensure that the region is saved, a trivial abstraction of that logic would look like this:

class Region():
    """Represents a geographical region made of multiple points."""
    points = []

    def save(self):
        if self.is_new():
            return self.insert()
        else:
            return self.update()

5.1.2. View

View pattern, the V in MVC generally speaks to seperation of the display logic from everything else. An example of this might be an api that returns json today, and we might imagine that the ask is to add the ability to return yaml.

The business logic won’t change at all, just the manner in which the client views the response. If the application was written in such a way that the response is format agnostic up until the last second, then it would be trivial to return yaml instead of json, but if The Program is littered with json specific behavior, then The Program might need to be pretty much torn apart.

5.1.3. Controller

Controller is the traditional “catch-all” for business logic and it’s where you might look for “The Program”. That might looks something like this:

class WeatherController():
    def main(self):
        pass

That’s an unnecessarily simple example, something that shows off some business logic might look more like this:

from repos import RegionRepository

class WeatherController():
    def grow_active(self):
        for active_region in self.active_regions:
            active_region.expand_by(meters=3)
            active_region.save()

    def main(self):
        if self.is_growing:
            self.grow_active()

Then the entrypoint might looks like this:

from controllers import WeatherController

return WeatherController().main()

The interesting part here is that we don’t care where the region is stored, we don’t care what it’s shape or structure is, all that complexity is hidden elsewhere.

Now we have moved “The Program” from our main.py file and into an object called “Controller”, as such one might be tempted to ask: what does that get me? This structure gives the developer more control over the inputs and outputs from the controller, in the form of unit tests and mocks (TODO: link to mocks) which we will cover later on in the unit test section.

5.1.4. Repository

Repository patterns are for getting objects. You might often see these when working with a database or other data store. The business logic does not need to be burdened with where the objects are coming from, or how they are stored! To continue with our Region example, we might want to fetch all the regions that are “active” with the following

class RegionRepository():
    def fetch_active():
        # slug implementation, return a single region in a list
        return [Region()]

Additionally, let’s imagine that there are a dozen or so places where we “expand” a region, so let’s give the region a method to encapsulate that behavior:

class Region():
    def expand_by(meters):
        # TODO: for every given point in the shape, move the point x meters
        # away from the epicenter of the region
        return True

Let’s also combine the controller and repository pattern, by making the repository into a member of the controller:

class WeatherController():
    def __init__(self):
        self.region_repo = RegionRepository()

    def grow_active(self):
        for active_region in self.region_repo.get_active():
            active_region.expand_by(meters=3)
            active_region.save()

    def main(self):
        if self.is_growing():
            self.grow_active()

Now this is getting pretty close to pseudo code. Ideally, I want the main part of a controller to be something that a non-developer can follow.

This very clearly fetches the active regions (we don’t need to know how, or even from where, maybe it’s postgres, maybe it’s dynamo, maybe it’s S3! Those are all implementation details and don’t need to be exposed in the controller). For each of the active regions it then expands the region and saves the region back to wherever it is/was; we don’t need to care about how it expands the region (and perhaps that will change, or isn’t even defined right now! (which is the case with our example).

6. Unit testing

Unit tests are a tool that allows the developer (and stakeholders) to ensure that a number of pieces are working properly.

6.1. What are unit tests good at?

Unit tests are really good at making sure that there are no unintended changes in a changeset. Everyone is likely familiar with explicit change requests, some examples being:

  • Please make the regions grow by 10 meters.
  • Please make the region expansion logarithmic, not linear.

However, there are almost always implicit requirements:

  • Don’t break anything
  • Don’t make any unintended changes

These implicit requirement are often much harder to actively confirm. How do you prove that everything is working as it was? That’s a lot of manual testing, especially for “unhappy paths”. A good example is: Does it still handle a database time out gracefully? When the “database” is behind a Repository pattern, we can instruct the Repository object to behave as if it has timed out, and then see what the controller does about it. A common manner by which to achieve this is with Mock Objects.

6.2. What are Mock Objects?

Mock Objects, Stubs, or just Mocks are all variants of the same idea, namely: replace part of the code with a known entity, and observe how the other “Real” code behaves. This is an example that very explicitly mocks out the RegionRepository (the mocking process is often abstracted away, but we’re going to not do that here so we can see all the parts)

class TimeoutRegionRepository(RegionRepository):
    def save_region(region: Region):
        raise TimeoutException()

The above test will overwrite the repository in the controller with a new “Mock” repository that will raise a TimeoutException any time someone tries to save a region with it. Because the TimeoutRegionRepository extends the regular repository, it will otherwise behave exactly the same as the real repository. This means that we can write a test for ensuring that the controller behaves the correct way if there’s a timeout, but more importantly, we can ensure that it stays that way with a unit test, let’s write the test first:

def test_db_timeout():
    # Instantiate the controller we're going to test
    controller = WeatherController()

    # Replace the regular repo with the timeout repo
    controller.region_repo = TimeoutRegionRepository()
    controller.main()

    assert controller.error == "ERROR: Database timeout"

Now let’s implement that behavior in the controller; (this is, again, trivial behavior). In the following, we set an error message when there’s a database timeout. This will allow the controller to do something appropriate, perhaps that’s notify the client, or maybe not, we don’t need to care right now.

from repos import RegionRepository

class WeatherController():
    def record_timeout(self):
        self.error = "ERROR: Database timeout"

    def grow_active(self):
        for active_region in self.active_regions:
            active_region.expand_by(meters=3)
            try:
                active_region.save()
            except TimeoutException:
                self.record_timeout()

    def main(self):
        if self.is_growing:
            self.grow_active()

If you are familiar with Mocks, then you might complain that we’re manually implementing an expectation here, and you would be right. This is such reused behavior that we have a lot of tools to do these things in a more consistant manner. Mock object expectations are one such tool. A test that leverages the patch decorator to replace the repo, and expectations might look like this:

from unittest.mock import patch

@patch("src.controllers.weather_controller.WeatherController.region_repo")
def test_db_timeout(mock_region_repo):
    mock_region_repo = TimeoutRegionRepository()
    # Instantiate the controller we're going to test
    controller = WeatherController()

    # Replace the regular repo with the timeout repo
    controller.region_repo = mock_region_repo
    controller.main()

    assert controller.error == "ERROR: Database timeout"

Ideally, a unit test will point at one specific part of The Program and the test will act as a jigsaw puzzle piece, where the results of the unit test must match the expected results, confirming that the program behaves the same way as it used to in that one very specific area.

Going back to the Region example, let’s assume that there is logic in the combine method that removes any internal points (because the shape of the region can be expressed by only the outermost points). We can write a test for this behavior, without even knowing what the implementation of the internal point removal looks like.

def test_region_combination():
    square_a = [(0, 0), (0, 5), (5, 5), (5, 0)]
    square_b = [(0, 0), (0, 9), (9, 9), (9, 0)]

    # Combine the small and large squares
    square_c = square_a.combine(square_b)

    # Because square is entirely contained in square_c, they should be identical
    assert square_c == square_b

This gives us confidence in the combination logic, while at the same time allowing us to abstract the details of that away from the main program.

Author: Ashton Honnecke

Created: 2023-08-22 Tue 09:43