PyData London 2026 TDDA Tutorial

Posted on Fri 19 June 2026 in TDDA, tutorial

I gave a tutorial on TDDA at PyData London 2026 earlier this month. The video is available on YouTube and below:

The slides are also available here.

Broadly, the talk included updated coverage of the traditional TDDA staples:

  • Motivation for TDDA and the Python tdda library and command-line tools;
  • Validation of data with automatically generated and hand-refined constraints (tdda discover, tdda verify, tdda detect);
  • Reference testing (semantic regression tests for analytical pipelines), motivated by the Parables of Anne and Bess. (Anne, in the parables, represents analytical virtue personified, while Bess is a data scientist who clearly hasn't taken the analytical Hippocratic oath.)
  • Automatic generation of reference tests for code in any language with tdda gentest.

New this year were:

  • tdda diff: a diff tool for datasets in Parquet files and flat files (CSVs);
  • The tdda serial tool and the tdda.serial metadata format for flat files, which makes working with CSV files and other flat files safer;
  • Command-line utilities for flat files and Parquet files (tdda cat, tdda head, tdda tail, tdda sample, and tdda ls);
  • Test-Driven Document Development (TDDD): the application of ideas from TDDA to computational documents to counter co-rusting—the phenomenon in computational notebooks whereby code and results drift away from validated correct results without obvious breaking;
  • All of the functionality that was previously limited to Pandas is now available in Polars too. Additionally, Pandas 2 and 3 are supported with all three backends—original, numpy_nullable, and Apache pyarrow.

I might also have mentioned my TDDA Book, available from all good publishers and all publishers of good books, and being released to read free online, a chapter a week.