Saving Time Running Subsets of Tests with Tagging

Posted on Tue 01 May 2018 in TDDA

It is common, when working with tests for analytical processes, for test suites to take non-trivial amount of time to run. It is often helpful to have a convenient way to execute a subset of tests, or even a single test.

We have added a simple mechanism for allowing this to unittest-based tests in the ReferenceTest functionality of the tdda Python library. It is based on tagging tests.

The quick summary is:

A decorator called tag can be imported and used to decorate individual tests or whole test classes (by preceding the test function or class with @tag).
When a script calling ReferenceTest.main() is run, if the flag --tagged (or –1, the digit one) is used on the command line, only tagged tests and tests from tagged test classes will be run.
There is a second new option, --istagged (or –0, the digit zero). When this is used, the software will report only which test classes are tagged, or contain tests that are tagged, and will not actually run any tests. This is helpful if you have a lot of test classes, spread across different files, and want to change the set of tagged tests.

Benefits

The situations where we find this particularly helpful are:

Fixing a broken test or working on a new feature or dataset. We often find ourselves with a small subset of tests failing (perhaps, a single test case), either because we're adding a new feature, or because something has changed or we are working with data that has slightly different characteristics. If the tests of interest run in a few seconds, but the whole test suite takes minutes or hours to run, we can iterate dramatically faster if we have an easy way to run only the subset of tests currently failing.
Re-writing test output. The tdda library provides the ability to re-write the expected ("reference") output from tests with whatever the code is currently generating, using the --write-all command-line flag. If it's only a subset of the tests that have failed, there is real benefit in re-writing only the output for the previously failing tests, rather than for all tests. This is particularly true if the reference outputs contain some differences each time (version numbers, dates etc.) that are being ignored using the ignore-lines or ignore-patterns options provided by the library. If we re-write all the tests, and then look at which files have changed, we might see differences in all reference files, whereas if we only regenerate the tests with meaningful changes, we will avoid committing changes that were not required.

Example

We'll illustrate this with a simple example. The code below implements four trivial tests across two classes.

Note the import of the tag decorator function near the top, and that the two of the tests—testTwo and testThree in the class Tests—are decorated with the @tag decorator, as is the entire test class MoreTests.

# tests.py
from tdda.referencetest import ReferenceTestCase, tag

class Tests(ReferenceTestCase):
    def testOne(self):
        self.assertEqual(1, 1)

    @tag
    def testTwo(self):
        self.assertEqual(2, 2)

    @tag
    def testThree(self):
        self.assertEqual(3, 3)

    def testFour(self):
        self.assertEqual(4, 4)


@tag
class MoreTests(ReferenceTestCase):
    def testFive(self):
        self.assertEqual(5, 5)

    def testSix(self):
        self.assertEqual(6, 6)


if __name__ == '__main__':
     ReferenceTestCase.main()

If we run this as normal, all six tests run and pass:

$ python tests.py
......
----------------------------------------------------------------------
Ran 6 tests in 0.000s

OK

But if we add the –1 flag, only four tests run:

$ python tests.py -1
....
----------------------------------------------------------------------
Ran 4 tests in 0.000s

OK

Adding the –v (verbose) flag confirms that these four are the tagged tests, as expected:

$ python tests.py -1 -v
testFive (__main__.MoreTests) ... ok
testSix (__main__.MoreTests) ... ok
testThree (__main__.Tests) ... ok
testTwo (__main__.Tests) ... ok

----------------------------------------------------------------------
Ran 4 tests in 0.000s

OK

Finally, if we want to find out which classes include tagged tests, we can use the –0 flag:

$ python tests.py -0
__main__.MoreTests
__main__.Tests

----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK

This is particularly helpful when our tests are spread across multiple files, as the filenames are then shown as well as the class names.

Installation

Information about installing the library is available in this post.

Other Features

Other features of the ReferenceTest capabilities of the tdda library are described in this post. Its capabilities in the area of constraint discovery and verification are discussed in this post, and this post.