Automatically Adding Tests (e.g., Unit, Integration) to a Legacy Code Base

Many of us have written code that lacks tests to ensure code correctness and protect against human error. Most of us have worked with legacy code bases that have lacked these tests. Oftentimes the code has been written in a way that makes such testing quite difficult to add after the fact.

I’m curious if there are currently tools available that automatically add testing to legacy code bases? I understand that in many cases there may not be enough context from the code for the automated creation of fool-proof tests. For example:

a = 10
b = 23
while z < 100:
  t = (b + a) * 2

The variable names tell us nothing about what the values represent – are they ages? monetary values? temperatures? And z appears out of nowhere – potentially a value which is mutated numerous times before being called by this code. And so on…

While we might not be able to provide mathematically provable tests for this code could we not provide “good enough” tests in many instances? Example: If I wanted to ensure I wasn’t accidentally altering my code I could create a test with a limited (but sufficient) number of inputs to create a baseline.

In the example above this might include running the code with some series of z values like: 1, -10, 1.22, 5.88, 100, 100,000,000, 99, 99.887492421, etc. The output would then serve as a baseline. Lets say using these values it came out looking something like: 10, 100, 10, 50, 10,000,000,000, 990, 990, etc.

As I made modifications to the code the test could be rerun and the test would fail if inputting the same value for z resulted in altered outputs. For instance, if inputting -10 resulted in -100 instead of the expected baseline of 100.

This still leaves a human making the final determination on whether the failure is a correct result but at least makes the developer aware that a change has occurred.

Beyond these more difficult cases one will have code segments which are quite easy to test as the available inputs / expected outputs are known quantities. For example, this should be easier:

int myAge = 10;
int magicAge = 23;
int randomFactor = generateRandomNumber();
while randomFactor < 100:
  int theNewAge = (myAge + magicAge) * 2; 

In this case we have clearly limited input/output values (we know that the potential range of numbers is limited to those supported by the integer data type; assuming generateRandomNumber() is fairly self-contained we also should have a good idea of what values will be returned by the function and in a worst case still know that the valid value cannot be outside the range of an integer).

Okay, a bit of a long-winded question. To sum up: “Is there an automated way to add tests to legacy code bases?” with the caveat that, “Some tests need not be mathematically provable, only pragmatically useful.”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.