TDDA: The Book, the 3.0 Library, and the PyData London 2026 Tutorial
Posted on Tue 19 May 2026 in TDDA
This blog has been quite quiet, but there is a great deal of news and it may be less quiet for a while.
The Book
Today, 19th May 2026, sees the world-wide release of Test-Driven Data Analysis, from CRC Press.
It is available from all good booksellers and all sellers of good books, and until 30th June 2026 the code 26SMA1 will give a 20% discount from the publisher's site.
The book covers:
- the TDDA methodology
- including areas not obviously amenable to software support, such as errors of interpretation, errors of applicability, errors of process, and errors of judgement
- the TDDA command-line tools for
- data validation,
- reference-test generation with Gentest (test for code in any language),
- a
difftool for on-disk data frames (as parquet files and flat files) - tools for working with the
tdda.serialformat and also with CSVW (CSV on the Web) and Frictionless.
- Reference testing with
tdda.referencetestunderunittestorpytest - Test-Driven Document Development (TDDD)
- APIs for all functionality
Resources from the book are available at book.tdda.info, including
- 22 Checklists
- All figures
- Glossary
- Data Profiles
- Data Dictionaries
- TDDD tests for the book.
Examples from the book are available from the tdda library by using
the tdda command:
tdda examples book
The whole of TDDA is really built around the encapsulation of the data-analysis cycle shown below, and the diagram shows how the book covers these ideas.
The TDDA Library, Version 3.0
Version 3.0 of the library and command-line tools is a major upgrade.
All the main features have upgrades:
-
Data validation using constraints, which can be generated from training data.
-
Inference of regular expressions from example strings.
-
Automatic generation of tests for almost any non-GUI code in any language (Gentest).
"Gentest writes tests so you don't have to."™ -
Enhanced test support for complex results in both Python's unittest and in pytest with reference testing.
New features include:
-
Support for Pandas 3.0, including all three backends (
original,numpy_nullable, andpyarrow). -
Support for Polars DataFrames in most areas of the library.
-
Comprehensive Parquet support, replacing feather format.
-
tdda diff: find and visualize differences between datasets in flat files (like CSV files) and parquet files, with control over specificity and scope. -
Flat-file metadata support: the new tdda.serial format allows the format of CSV and other flat files to be described for accurate reading across libraries. This includes inference of flat-file formats, Python code generation, helper functions for reading and writing flat files with metadata, and conversion between tdda.serial, CSVW (CSV on the Web), and Frictionless.
-
Text utilities for Unicode, including glyph counting and extended normalization forms beyond canonical composition and decomposition (NFC, NFD), and kompatibility normalization (NFKC and NFKD). Form NFTK performs further kompatibility normalization including accent stripping.
-
Man pages for all commands
-
Upgraded documentation for command line tools and the API.
PyData London TDDA Tutorial, 5th June 2026, 14:10
I'll be giving a 90-minute hands-on tutorial on TDDA on 5th June 2026 at PyData London. Do come along if you can. PyData is always great, for experts and novices and all levels of technical interest and proficiency. It would be great to see you there.
Get tickets from PyData.
And if you have something to share, prepare a 5-minute Lightning Talk. They are always a highlight of the conference.