Test-Driven Data Analysis

Jupyter Notebooks Considered Harmful: The Parables of Anne and Beth

Posted on Thu 14 November 2024 in TDDA • Tagged with TDDA, reproducibility, process

I have long considered writing a post about the various problems I see with computational notebooks such as Jupyter Notebooks. As part of a book I am writing on TDDA, I created four parables about good and bad development practices for analytical workflows. They were not intended to form this …

PyData London 2024 TDDA Tutorial

Posted on Sun 21 July 2024 in TDDA • Tagged with TDDA, tutorial

PyData London had its tenth conference in 2024, and it was excellent.

I gave a tutorial on TDDA, and the video is available on YouTube and below:

The slides are also available here.

Learning the Hard Way: Regression to the Mean

Posted on Thu 20 June 2024 in TDDA • Tagged with TDDA, reproducibility, errors, interpretation

I was at the tenth PyData London Conference last weekend, which was excellent, as always. One of the keynote speakers was Rebecca Bilbro who gave a rather brilliant (and cleverly titled) talk called Mistakes Were Made: Data Science 10 Years In.

The title is, of course, a reference to the …

Name Styles

Posted on Mon 04 March 2024 in TDDA • Tagged with TDDA, names

This is just a bit of fun, but I've always been interested in the different kinds of names allowed, encouraged, and used in different areas of computing and data.

A few years ago, I tweeted some well-known naming styles and a collection of lesser-known naming styles. I was playing about …

TOMLParams: TOML-based parameter files made better

Posted on Sun 16 July 2023 in TDDA • Tagged with TDDA, reproducibility

TOMLParams is a new open-source library that helps Python developers to externalize parameters in TOML files. This post will explain why storing parameters in non-code files is beneficial (including for reproducibility), why TOML was chosen, and some of the useful features of the library, which include structured sets of parameters …

Older Posts