or Why Tests Spontanously Fail
You might think that if you write a program, and don't change
anything, then come back a day later (or a decade later) and run it
with the same inputs, it would produce the same output. At their core,
reference tests exist because this isn't true, and it's useful to find
out if code you wrote in the past no longer does the same thing it
used to. This post collects together some of reasons the behaviour of
code changes over time.
The Environment Has Changed
E1 You updated your compiler/interpreter (Python/R etc.)
E2 You updated libraries used in your code (e.g. from PyPI/CRAN).
E3 You updated the operating system of the machine you're running on.
E4 Someone else updated the operating system or library/compiler etc.
E5 Your code uses some other software on your machine (or another)
machine that has been updated (e.g. a database).
E6 Your codes uses an external service whose behaviour has changed
(e.g. calling a web service to get/do something).
E7 You have updated/replaced your hardware.
E8 You run it on different hardware (another machine or OS or OS version
or under a different compiler or...)
E9 You move the code to a different location in the file system.
E10 You have changed something in the file system that messes up the code
e.g.
- deleting a file the code uses
- renaming a file the code uses
- editing a file the code uses
- removing or renaming a directory the code uses
- changing permissions on a file or directory the code uses
- creating a file or directory that the code expects to create
and is now unable to, e.g. because of permissions.
E11 You run as a different user.
E12 You run from a different directory while leaving the code in the
same place.
E13 You run the code in a different way (e.g. from a script instead
of interactively, or in a scheduler).
E14 A disk fills or some other resource becomes full or depleted.
E15 The load on the machine is higher, and the code runs out of
memory or disk or some other source; or has a subtle timing
dependency or assumption that fails under load.
E15a The load on the machine is lower, meaning part of your code
runs faster, causing a race condition to behave differently.
[Added 2022-02-17]
E16 The hardware has developed a fault.
E17 A systems manager has changed some limits e.g. disk quotas,
allowed nice levels, a directory service, some permissions or groups...
E18 A shell variable or changed or was created or destroyed.
E19 The locale in which the machine is running changed.
E20 You changed your PYTHONPATH or equivalent.
E21 A new library that you don't use (or didn't think you used)
has appeared in a site-packages
or similar location, and was picked
up by your code or something else your code uses.
E22 You updated your editor/IDE and now whenever you load a file it
gets changes in some subtle way that matters (e.g. line endings, blank
lines at the of files, encoding, tabs vs. spaces).
E23 The physical environment has changed in some way that affects
the machine you are running on (e.g. causing it to slow down).
E24 A file has been touched and the software determines order
of processing by last update date.
E25 The code uses a password or key that is changed, expires or
is revoked.
E26 The code requires network access and the network is unavailable, slow, or unreliable at the time the test is run.
E27 Almost any of the above (or below), but for a dependency of your code rather than your code itself, e.g. something in a data centre or library.
E28 Your PATH
(the list of locations checks for executables) has
changed, or an alias has changed so that the executable you run is
different from before. [Added 2022-02-11]
E29 A different disk or share is mounted, so that even though you
specify the same path, some file that you are using is different from
before. [Added 2022-02-11]
E30 You run the code under a different shell or changed something in a shell startup file. [Added 2022-02-17]
Many of these are illuminated by one of my favourite quote from
Beth Andres-Beck:
Mocking in unit tests makes the tests more stable because they
don’t break when your code breaks.
— @bethcodes, 2020-12-29T01:26:00Z
https://twitter.com/bethcodes/status/1343730015851069440
The Code Has, in Fact, Changed
C1 You think you didn't change the code, but actually you did.
C2 You did change the code, but only in a way that couldn't
possibly change the behaviour in the case you're testing.
C3 You didn't change the code, you fixed a bug.
C4 You didn't change the code, but someone else did.
C5 You didn't change the code, but disk corruption did.
C6 You didn't change the code, but you did update some data it uses.
C7 You pulled the code again from a source-code repository but
- someone else had pushed a change
- you checked out a different branch
- you pulled from the wrong repository.
C8 You're on the wrong branch.
C9 The system was restored from backup and you lost changes.
C10 You used a hard link to a file and didn't change the file here but
did change it in one of the other linked locations.
C11 You used symbolic links and though your symbolic link
didn't change, the code (or other file or files) it symbolically linked did.
C12 You used a diff tool to compare files, but a difference that
does matter to your code was not detected by the diff tool (e.g. line
endings or capitalization or whitespace).
C13 You are in fact running more tests than previously, or different tests
from the ones you ran previously, without realising it.
C14 You reformatted your code thinking that you were only making
changes to appearance.
C15 You ran a code formatter/beautifier/coding standard enforcement
tool that had a bug in it and changed the meaning.
C16 You believe nothing has changed because git status
tells you nothing
has changed, but you are using files that aren't tracked or are ignored.
C17 You think a file hasn't changed because of its timestamp, but the
timestamp is wrong or doesn't mean what you think it means.
C18 A hidden file changed (e.g. a dotfile).
C19 A file that doesn't match a glob pattern you use changed.
Also from Beth Andres-Beck:
If you have 100% test coverage and your tests use mocks, no you don’t.
— @bethcodes, 2020-12-29T01:51:00Z
https://twitter.com/bethcodes/status/1343736477839020032
You Aren't Running the Code You Think You Are
There is another set of problems that aren't strictly causes of code
rusting, but which help to explain a set of related situations every
developer has probably experienced, which all fall under the general
heading of you aren't running the code you think you are.
M1 The code you're running is not the the version you think it is
(e.g. you're in the wrong directory).
M2 You are running the code on a different server from the one you think
you are (e.g. you haven't realised you're ssh'd in to a different
machine or editing a file over a network).
M3 You're editing the code in one place but running it in another.
M4 You have cross-mounted a file system and it's the wrong file
system or you think you are/aren't using it when you actually
aren't/are (respectively).
M5 Something (e.g. a browser) is caching your code (or some CSS
or an image or something).
M6 The code has in fact run correctly (tests have passed)
but you're look at the wrong output (wrong directory, wrong tab,
wrong URL, wrong window, wrong machine...)
M7 Your compiled code is out-of-sync with your source code, so you're
not running what you think you are.
M8 You're running (or not running) a virtual environment when you
think you are not (or are), respectively.
M9 You're running a virtual environment and not understanding how
it's doing its magic, with the result that you're not using the libraries/code
you think you are.
M10 You use a package manager that's installed the right libraries
into a different Python (or whatever) from the one you think it has.
M11 You think you haven't changed the code/libraries/Python you're
using, but in fact you did when you updated (what you thought was)
a different virtual (or non-virtual) environment.
M12 You have a conflict between different import directories (e.g.
a local site-packages
and a system site-packages
), with different
versions of the same library, and aren't importing the one you think you are.
M13 You think the code hasn't changed because you recorded the
version number, but there was a code change that didn't cause the
version number to be changed, or the code has multiple version
numbers, or the code is reporting its version number wrongly, or the
version number actually refers to a number of slightly different
builds that are supposed to have the same behaviour, but don't.
M14 You have defined the same class or method or function or variable
more than once in a language that doesn't mind such things, and are looking
at (and possibly) editing a copy of the relevant function/callable/object
that is masked by the later definition. [Added 2022-09-14]
M15 A web server or application server has your code in memory and
changing or recompiling your code won't have any effect until you restart
that web server or application server. This is really a variation of M5,
but is subtly different because you wouldn't normally think of this as
caching. [Added 2024-03-30]
These are the ones that make you question your sanity.
TIP If what's happening can't be happening, trying introducing
a clear syntax error or debug statement or some other change you
should be able to see. Then check that it shows
up as expected when you're running your code.
Almost every time I think I'm losing my mind when coding, it's
because I'm editing and running different code
(or viewing results from different code).
Time has Moved On
T1 Your code has a (usually implicit) date/time dependence in it, e.g.
- it uses 2-digit dates
- it assumes it's running in 2022
- it assumes it's not 29th February, or 1st January, or isn't a weekend...
- it assumes something else that's not true about (computer) time (no
leap seconds, no daylight savings times, no time-zones, no half-hour-aligned
timezones...)
- it uses 2-digit dates with a pivot year and time (or some computed time
the code uses) moves past the pivot year.
T2 Time is 'bigger' in some material way that causes a problem, e.g.
- Y2K
- Unix 2038 (when the numner of seconds from 1 Jan 1970 overflows
32-bit integers)
- Number of days since the code was written needs more digits (10, 100, 1000).
T3 While the code is running, daylight savings time starts or stops,
and a measured (local) time interval goes negative.
T4 Your code uses Excel to interpret data and today's a special date
that Excel doesn't (or more likely does) recognize.
T5 The system clock is wrong (perhaps badly wrong); or the system
clock was wrong when you ran it before and is now right.
Resources Used by the Code Have Changed
R1 A resource your code uses (a database, a reference file, a page
on the internet, a web service) returns different data
from the data it always previously returned.
R2 A resource your code uses returns data in a different format
e.g. a different text encoding, different precision, different line endings
(Unix vs. PC vs. Mac), presence or absence of a byte-order marker (BOM) in UTF-8, presence of new characters in Unicode, different normalization of unicode, indented or unindented JSON/XML, different sort order etc.
R3 A resource you depend on returns “the same” data as expected but
something about the interaction is different, e.g. a different status
code or some extra data you can ignore, or some redundant data you
use has been removed.
Stochastic and Indeterminate Effects
S1 Your code uses random numbers and doesn't fix the seed.
S2 Your code uses random numbers and does fix the main seed
but not other seeds that get used (e.g. the the seed for numpy is
different from Python's main seed).
S3 A cosmic ray hits the machine and causes a bit flip.
S4 The code is running on a GPU (or even CPU) that does not,
in fact, always produce the same answer.
S5 The code is running on a parallel, distributed, or multi-threaded
system and there is inderminacy, a race condition, possible deadlock
or livelock, or any number of other things that might cause indeterminate
behaviour.
S6 Your code assumes something is deterministic or has specified
behaviour that is in fact not determinisic or specified, especially
if that result is the same most but not all of the time, e.g. tie-breaking
in sorts, order of extraction from sets or (unordered) dictionaries,
or the order in which results arrive from asynchronous calls.
S7 Your code relies on something likely but not certain, e.g. that
two randomly-generated, fairly long IDs will be different from each other.
S8 Your code uses random numbers and does fix the main seed, but
the sequence of random numbers has changed. This has happened with
NumPy, where they realised that one of the sampling functions was
drawing unnecessary samples from the PRNG. In making the sampler more
efficient, they changed the samples that were returned for the same
PRNG seed. [Contributed by Rob Moss
(@rob_models and
@rob_models@mas.to), who "had a quick
search for the relevant issue/changelog item, but it was a long time
ago (~NumPy 1.7, maybe)." He "couldn't find the original NumPy issue,
but here's a similar one: https://github.com/numpy/numpy/issues/14522".
Thanks, Rob!]
It Never Worked (or didn't work when you thought it did)
[Added 2024-07-19]
I realised there's another whole class of errors of process/errors
of interpretation that could lead us to think that code has “rusted”
despite not having been changed. These are all broadly the same as one
of the explanations offered before, but now for the original run
when you thought it worked, rather than for the current or new run,
when it fails.
N1 You thought you ran the code before, and that it worked correctly,
but you are mistaken: you didn't run it at all, or it in fact failed
but you did not notice.
N2 You did run the code before, but picked up the output from
a previous state, before you broke it, when it did work.
N3 You did run the code before, and it did produce the
wrong output then as now, but you used a defective procedure
or tool to examine the output then, and failed to realise
it was wrong/failing.
N4 You did run the code before, and it did pass, but you
passed the wrong parameters/inputs/whatever and are now passing
the correct (or different) parameters/inputs/whatever so it now
fails as it would have done then if you had done the same.