Rexpy for Generating Regular Expressions: Postcodes

Posted on Wed 20 February 2019 in TDDA • Tagged with regular expressions, rexpy, tdda

Rexpy is a powerful tool we created that generates regular expressions from examples. It's available online at https://rexpy.herokuapp.com and forms part of our open-source TDDA library.

Miró users can use the built-in rex command.

This post illustrates using Rexpy to find regular expressions for UK postcodes.

A …

Continue reading

Tagging PyTest Tests

Posted on Tue 22 May 2018 in TDDA • Tagged with tests, tagging

A recent post described the new ability to run a subset of ReferenceTest tests from the tdda library by tagging tests or test classes with the @tag decorator. Initially, this ability was only available for unittest-based tests. From version 1.0 of the tdda library, now available, we have …

Continue reading

Detecting Bad Data and Anomalies with the TDDA Library (Part I)

Posted on Fri 04 May 2018 in TDDA • Tagged with tests, anomaly detection, bad data

The test-driven data analysis library, tdda, has two main kinds of functionality

  • support for testing complex analytical processes with unittest or pytest
  • support for verifying data against constraints, and optionally for discovering such constraints from example data.

Until now, however, the verification process has only reported which constraints failed to …

Continue reading

Saving Time Running Subsets of Tests with Tagging

Posted on Tue 01 May 2018 in TDDA • Tagged with tests, tagging

It is common, when working with tests for analytical processes, for test suites to take non-trivial amount of time to run. It is often helpful to have a convenient way to execute a subset of tests, or even a single test.

We have added a simple mechanism for allowing this …

Continue reading

Our Approach to Data Provenance

Posted on Tue 12 December 2017 in TDDA • Tagged with data lineage, data provenance, data governance, tdda, constraints, miro

NEW DATA GOVERNANCE RULES: — We need to track data provenance. — No problem! We do that already! — We do? — We do! — (thinks) Results2017_final_FINAL3-revised.xlsx

Our previous post introduced the idea of data provenance (a.k.a. data lineage), which has been discussed on a couple of podcasts recently. This is an issue that is close to our hearts at Stochastic Solutions. Here, we'll talk about how we handle this issue, both methodologically and in …

Continue reading