TDDA Book Online Serialization

Posted on Wed 27 May 2026 in TDDA

The cover of the book Test-Driven Data Analysis by Nicholas J. Radcliffe. It is published by Chapman and Hall, part of CRC Press, from Taylor & Francis Group, and is part of the DATA SCIENCE SERIES. The cover is black with mostly white text and a white graphic. The graphic is a 3-row by 4-column grid of squares. Each square contains a number of dots laid out on a regular 32x32 grid. The top-left square has 1024 dots (“full”) and working along each row in turn, the number of dots roughly halves each time, apparently at random (and, actually, pseudo-randomly). The last row’s boxes have six, two, two, and one dot.

As announced a few days ago, my book, Test-Driven Data Analysis, is now available for sale from all good booksellers and all sellers of good books, around the world.

The book is aimed at analysts, data scientists, engineers, researchers and anyone else interested in making analytical processes more reliable, testable, and reproducible.

My main goal in writing it has always been to encourage wider adoption of the ideas. With that in mind, I am delighted to be able to announce that the full content of the book will be available online.

Channelling my inner Charles Dickens, I am releasing one chapter per week. All the auxiliary material is already available, together with Chapter 1, at the TDDA Book's site.

A new chapter will appear each week until mid-September 2026. You can sign up to get notifications as chapters are released using this link.

If you would like a physical copy, the publisher is offering a 20% discount with code 26SMA1 (until 30th June 2026) if you order directly from its site.

The book is structured around the analytical lifecycle, common failure modes, and the remedies discussed in the book:

The main part of the diagram consists of six circles from
left to right.
The first five circles have failure mode text
under them and an error class below that.
1. CHOOSE APPROACH.
Failure: 'Fail to understand data, problem domain, or methods',
ERROR OF INTERPRETATION (error of formulation).
Ch 13.
2. DEVELOP ANALYTICAL PROCESS.
Failure: 'Mistakes during coding' and the associated
ERROR OF IMPLEMENTATION (bug).
Ch 9-12.
3. RUN ANALYTICAL PROCESS.
Failure: 'Use the software incorrectly'
ERROR OF PROCESS (operator error).
Ch 16.
4. PRODUCE ANALYTICAL RESULTS
Failure 'Mismatch between development data or assumptions
and deployment data'
ERROR OF APPLICABILITY (category error).
Ch 1-7 & 17.
5. INTERPRET ANALYTICAL RESULTS
Failure 'Misinterpret the results'
ERROR OF INTERPRETATION (communication error).
Ch 14 & 15.
6. ‘First, Do No Harm’.
ERROR OF JUDGEMENT.
Ch 17.
Arrows lead to FAILURE and SUCCESS boxes.
Remedies and book chapters sit underneath the main diagram.

The analytical lifecycle, common failure modes, and the remedies discussed in the book.

In addition to the book itself, you will find at the site links to:

Top Line: Three Machines illustrating
1. constraint discovery and data validation: an input hopper takes training
data and produces constraints, or training data + constraints to produce
data validations at the output chute.
2. Rexpy, which takes strings in its input hopper and produces
regular expressions at the output chute,
3. TDDA gentest, which takes code in the input hopper and produces a Python
reference-test script as output.
Bottom Line: 4. tdda diff which compares data in flat files and parquet
files to detect (semantic) differences.
5. tdda.serial, which is a format for describing flat-file formats and
a suite of tools for working with tdda.serial, CSVW, and Frictionless
6. tdda.referencetest, for semantic testing of complex analytical results.

Some of the principal TDDA tools and capabilities covered in the book.