TOMLParams: TOML-based parameter files made better

Posted on Sun 16 July 2023 in TDDA

TOMLParams is a new open-source library that helps Python developers to externalize parameters in TOML files. This post will explain why storing parameters in non-code files is beneficial (including for reproducibility), why TOML was chosen, and some of the useful features of the library, which include structured sets of parameters using TOML tables, hierarchical inclusion with overriding, default values, parameter (key) checking, optional type checking, and features to help use across programs, including built-in support for setting parameters using environment variables.

The Benefits of Externalizing Parameters

Almost all software can do more than one thing, and has various parameters that are used to control exactly what it does. Some of these parameters are set once and never changed (typically configuration parameters, that each user or installation chooses), while others may be changed more often, perhaps from run to run. Command-line tools often accept some parameters on the command line itself, most obviously input and output files and core parameters such as search terms for search commands. On Unix and Linux systems, it's also common to use command line "switches" (also called flags or options) to refine behaviour. So for example, the Unix/Linux grep tool might be used in any of the following ways:

grep time            # find all lines including 'time' on stdin
grep time p*.txt     # ... on .txt files starting with 'p'
grep -i time p*.txt  # ... ignoring capitalization (-i switch)

All of time, p*.txt and -i are examples of command-line parameters.

Many tools also use configuration files to control the behaviour of the software. On Unix and Linux, these are typically stored in the user's home directory, often in 'dot' files, such as ~/.bashrc for the Bash shell, ~/.emacs for the Emacs editor and ~/.zshrc for the Z Shell. These files are sometimes in propriery formats, but increasly often are in "standard" formats such as .ini (MS-DOS initialization files), JSON (JavaScript Object Notation), YAML (YAML Ain't Markup Language), or TOML (Tom's Obvious Minimal Language).

When writing code, it's often tempting just to set parameters in code.

  • The quickest, dirtiest practice is just to have parameter values "hard-wired" as literals in code where they are used, possibly with the same values being repeated in different places. This is generally frowned upon for a few reasons:

    • Hard-wired parameters are hard to find, and therefore hard to update and inspect;
    • There may be repetition, violating the don't repeat yourself (DRY) principle, and leading to possible inconsistency (though see also the counterveiling rule of three (ROT) and the write everything in threes (WET) principle.
    • If you want to change parameters, you have to go to many places;
    • There may be no name associated with the parameter, making it hard to know what it means;

    So implementing an energy calculation using Einstein's formula E = mc², we would probably prefer either:

def energy(mass):
    """Return energy in joules from mass in kilograms"""
    return mass * constants.speed_of_light * constants.speed_of_light

or

def energy(m):
    """Return energy in joules from mass m in kilograms"""
    c = constants.c  # the speed of light
    return m * c ** 2

(for a physicist) to

def energy(m):
    return m * 3e8 * 3e8

or (worse!)

def energy(m):
    return m * 9e16
  • For something like the speed of light (which is after all, a universal constant) in code is probably appropriate, so we might have a file called constants.py containing something like:
# constants.py: physical constants

speed_of_light = 299_792_458   # metres per second, in vacuuo. (c. 300 000 km/s)
h_cross = 1.054_571_817e-34    # reduced Planck constant in joule seconds (= h/2π)
  • For parameters that change in different situations, or for different users, it's usually much better practice to store the parameters in a separate file, ideally in a common format for which is good library support. Some of the advantages of this include:
    • Changing parameters then does not require changing code. Code may be unavilable to the user—e.g. write locked, compiled or running on a remote server through an API. Moreover, editing code often requires more expertise and confidence than editing a parameter file.
    • The same code can be run multiple times using different parameters at the same time; this can be more challenging if the parameters are "baked into" the code.
    • It becomes easy to maintain multiple sets of parameters for different situations, allowing a user simply to choose a named set on each occasion (usually through the parameter file's name). Of course, the code itself can have different named sets of parameters, but this is less common, and usually less transparent to the user, and harder to maintain.

Why TOML?

There is nothing particularly special about TOML, but it is well suited as a format for parameter files, with a sane and pragmatic set of features, good library support in most modern programming languages and rapidly increasing adoption.

Here's a simple example of a TOML file:

# A TOML file (This is a comment)

start_date = 2024-01-01   # This will produce a datetime.date in Python
run_days = 366            # an integer (int)
tolerance = 0.0001        # a floating-point vlaue (float)
locale = "en_GB"          # a string value
title = "2024"            # also a string valuue
critical_event_time = 2024-07-31T03:22:22+03:00  # a datetime.datetime
fermi_approximate = true  # a boolean value (bool)


[logging]   # A TOML "table", which acts like a section, or group

format = ".csv"           # Yet another string
events = [                # An array (list)
    "financial",
    "telecoms",
]
title = "2024 events"     # The same key as before, but this one is
                          # in the logging table, so does not conflict
                          # with the previous title

Strengths of TOML as a parameter/configuration format

  • Ease of reading, writing and editing ("obviousness").
  • Lack of surprises: there are few if any weirdnesses in TOML (which is less true of the other main competing formats).
  • Good coverage of the main expected types: TOML supports (and differentiates between) booleans, integers, floating point values, strings, ISO8601-formatted dates and timestamps (with and without timezones, and mapping to datetime.date and datetime.datetime in Python), arrays (lists), key-value pairs (dictionaries), and hierarchical sections.
  • It supports comments (which begin with a hash mark #)
  • Support in most modern langauges, including C#, C++, Clojure, Dart, Elixir, Erlang, Go, Haskell, Java, Javascript, Lua, Objective-C, Perl, PHP, Python, Ruby, Rust, Swift, Scala
  • Flexible quoting with support for multiline strings, and no use of bare (unquoted) strings (avoiding ambiguity)
  • Well-defined semantics without becoming fussy and awkward to edit (e.g., being unfussy about trailing commas in arrays).
  • TOML is specifically designed to be a configuration file format. Quoting the front page of the website

TOML aims to be a minimal configuration file format that: is easy to read due to obvious semantics maps unambiguously to a hash table is easy to parse into data structures in a wide variety of languages"

  • In the context of Python, TOML is already being adopted widely, most notably in the increasingly ubiquitous pyproject.toml files. There are good libraries available for reading (tomli) and writing (tomli_w) TOML files, although these are not part of the standard library. (It appears that Python 3.11 does include tomllib, for reading TOML files, in the standard library.)

Weaknesses of TOML as a parameter/configuration format

We don't think TOML has any major weaknesses for this purpose, but points that might count against it for some include:

  • TOML does not have a null (NULL/nil/None/undefined) value. (TOMLParams could address this, but has no current plans to do so.)
  • Hierarchical sections ('tables') in TOML are not nested. So if you want So if you want a section/table called behaviours and subsections/subtables called personal and business, in TOML this might be represented by something like the excerpt below (possibly with a [behaviours] table with its own parameters as well).

Some people, however, really don't like TOML.

[behaviours.personal]

frequency = 3

[behaviours.business]

frequency = 3

Why not JSON, YAML, .ini...

We chose TOML because we think, overall, it is better than each of the obvious competing formats, but the purpose of this post isn't to do down those formats, which all have their established places. But we will comment briefly on the most obvious alternatives.

JSON, while good as a transfer format and very popular in web services, is tricky for humans to write correctly, even with editor support, because it requires correct nesting and refuses to accept trailing commas in lists and dictionaries. The lack of support for dates and timestamps is also a frequent source of frustration, with quoted strings typically being used instead, with all the concomitant problems of that approach.

At first glance, YAML appears more suitable as a configuration/parameter-file format, but is the opposite of obvious and often produces unexpected results (in practice). Sources of frustration include no requirement to quote strings, "magic values" like yes (which maps to true) and no which maps to false (much to the annoyance of Norwegians and anyone using the NO country code), inadvent coercison of numbers with leading zeroes to octal (in YAML 1.1), whitespace sensitivity, and issues around multi-line strings.1 2 3 4

.ini files look a lot like TOML (I would guess they a major inspiration for it) but are much simpler and less rich, have less well-defined syntax, have fewer types and don't require quoting of strings, leading to ambiguity.

What are the Key Features of TOMLParams?

Simple externalization of parameters in one or more TOML files

You write your parameters in a TOML file, pass the name of the TOML file an instance of the TOMLParams class and it reads the parameters from the file and makes them available as attributes on the object, but also makes them available using dictionary-style look-up, i.e. if you TOMLParams instance is p and you have a parameter startdate you can access it as p.startdate or p['startdate'] (and, more pertinently, also as p[k] if k is set to 'startdate').

If you use tables, you can "dot" you way throught to the parameters (p.behaviours.personal.frequency) or use repeated dictionary lookups (p['behaviours']['personal']['frequency']).

Loading, saving, default values, parameter name checking and optional parameter type checking

The parameters that exist are defined by default value with TOMLParams. You can either store you defaults in a TOML file (e.g. defaults.toml) or pass them to the TOMParams initializer as a dictionary.

If you choose a different TOML file, all the parameter values are first set to their default values, and then any parameters set in the file you specify override those. New parameters (i.e. any not listed in defaults) raise an error.

If you wish turn on type checking, the library will check that the all parameter values provided match those of the defaults, and you choose whether these cases result in a warning (which doesn't raise an exception) or an error (which does).

Hierarchical file inclusion with overriding

Perhaps the most powerful feature of TOMLParams is the ability for one parameters file to 'include' one or more others. If you use TOMLParams specifying the parameters file name as base, and the first line of base.toml is

include = 'uk'

then parameters from uk.toml will be read and processed before those from base.toml. So all parameters will first be set to default values, then anything in uk.toml will override those values, and finally any values in base.toml will override those.

The include statement can also be a list

include = ['uk', 'metric', '2023']

Inclusions are processed left to right (i.e. uk.toml is processed, before metric.toml, followed by 2023.toml), followed by the parameters in the including file itself. So in this case, if defaults are in defaults.toml and the TOML file specified is base.toml, the full sequence is

  • defaults.toml
  • uk.toml
  • metric.toml
  • 2023.toml
  • base.toml

Of course, the included files can themselves include others, but the library keeps track of this and each file is only processed once (the first time it is encountered), which prevents infinite recursion and repeated setting.

This makes it very easy to maintain different kinds and groups of parameters in different files, and to create a variation of a set of parameters simply by making the first line include whatever TOML file specifies the parameters you want to start from, and then override the specific parameter or parameters you want to be different in your new file.

Support for writing consolidated parameters as TOML after hierarchical inclusion and resolution

The library can also write the final parameters used out as a single, consolidated TOML file, which is useful when hierarchical inclusion and overriding are used, and keeps a record of the final values of all parameters. This helps with reproducibility and logging.

Support for using environment variables to select parameter set (as well as API)

You can choose how to specify the name of the parameters file to be used, and the name two which it should default. If you create a TOMLParams instance with:

params = TOMLParams(
    defaults='defaults',
    name='newparams',
    standard_params_dir='.'
)

default parameters will be read from ./defaults.toml and then ./newparams.toml will be processed, overriding default values.

If you want to specify the name of the parameters file to use on the command line, the usual pattern would be something like:

import sys
from tomlparams import TOMLParams

class Simulate:
    def __init__(params, ...)


if __name__ == '__main__:
    name = sys.argv[1] if len(sys.argv) > 1 else 'base'
    params = TOMLParams('defaults.toml', name)
    sim = Simulate(params, ...)

Sometimes, however, it's convenient to use an environment variable to set the name of the parameters file, particularly if you want to use the same parameters in multiple programs, or run from a shell script or a Makefile. You can specify an environment variable to use for this and TOMLParams will inspect that environment variable if no name is passed. If you choose SIMPARAMS for this and say

params = TOMLParams('defaults.toml', env_var='SIMPARAMS')

it will look for a name in SIMPARAMS in the environment which you can set with

SIMPARAMS="foo" python pythonfile.py

or

export SIMPARAMS="foo"
python pythonfile.py

If it's not set, it will use base.toml as the file name, or something else you choose with the base_params_stem argument to TOMLParams.

Check it out

You can install TOMLParams from PyPI in the usual way, e.g.

python -m pip -U tomlparams

The source is available from Github (under an MIT license), at github.com/smartdatafoundry/tomlparams

There's a README.md and documentation is available on ReAd the Docs at tomlparams.readthedocs.io.

You can get help within Python on the TOMLParams class with

>>> import tomlparams
>>> help(tomlparams)

After installing tomlparams, you will find you have a tomlparams command, which you can use to copy example code from the README

$ tomlparams examples
Examples copied to ./tomlparams_examples.

You can also get help from with tomlparams help:

$ tomlparams help
TOMLParams

USAGE:
    tomlparams help      — show this message
    tomlparams version   — report version number
    tomlparams examples  — copy the examples to ./tomlparams_examples
    tomlparams test      — run the tomlparams tests

Documentation: https://tomlparams.readthedocs.io/
Source code:   https://github.com/smartdatafoundry.com/tomlparams
Website:       https://tomlparams.com

Installation:

    python -m pip install -U tomlparams