PyData London 2026 TDDA Tutorial
Posted on Fri 19 June 2026 in TDDA, tutorial
I gave a tutorial on TDDA at PyData London 2026 earlier this month. The video is available on YouTube and below:
The slides are also available here.
Broadly, the talk included updated coverage of the traditional TDDA staples:
- Motivation for TDDA and the Python
tddalibrary and command-line tools; - Validation of data with automatically generated and hand-refined constraints
(
tdda discover,tdda verify,tdda detect); - Reference testing (semantic regression tests for analytical pipelines), motivated by the Parables of Anne and Bess. (Anne, in the parables, represents analytical virtue personified, while Bess is a data scientist who clearly hasn't taken the analytical Hippocratic oath.)
- Automatic generation of reference tests for code in any language
with
tdda gentest.
New this year were:
tdda diff: a diff tool for datasets in Parquet files and flat files (CSVs);- The
tdda serialtool and thetdda.serialmetadata format for flat files, which makes working with CSV files and other flat files safer; - Command-line utilities for flat files and Parquet files (
tdda cat,tdda head,tdda tail,tdda sample, andtdda ls); - Test-Driven Document Development (TDDD): the application of ideas from TDDA to computational documents to counter co-rusting—the phenomenon in computational notebooks whereby code and results drift away from validated correct results without obvious breaking;
- All of the functionality that was previously limited to Pandas
is now available in Polars too. Additionally,
Pandas 2 and 3 are supported with all three backends—original,
numpy_nullable, and Apachepyarrow.
I might also have mentioned my TDDA Book, available from all good publishers and all publishers of good books, and being released to read free online, a chapter a week.