Best Practices for Notebook Users

Posted on Tue 17 December 2024 in misc

In a previous post, I discussed some of the dangers of challenges, dangers and weaknesses of Jupyter Notebooks, JupyterLabs and their ilk. I used The Parables of Anne and Beth as a device to illustrate what I think of as good and bad practices for data science. A reasonable criticism of this was that it did not really offer anything to help people who might wish to continue using computational notebooks, but to work in such a way as to limit the harms identified.

Although it probably rings slightly hollow, my goal is absolutely to improve the quality of data science, data analyis, data engineering, and really all data work, and I very much see the attractions and strengths of Jupyter, despite being critical of certain dark patterns I see around their use.

Here are some suggesting best practices for notebook users that I hope might be helpful an constructive. It's true that if you adopt all of them, I might have succeeded in prising your Notebook from your hands, but if you adopt any of them, as you use notebooks, I think you will be safer and more successful. I'm very much in favour of half a loaf.

Subject to the vaguaries of the web, the checkboxes, online, should be clickable, should you find it useful as an actual checklist, and there's a PDF version available too.

Notebook Best Practices

Jupyter logo with checkmark

  • NBP1 Ensure that your Notebook runs correctly after completion
    • Develop the Notebook
    • Take a temporary copy of the Notebook
    • Clear the Notebook
    • Run and confirm the results match the temporary copy
      • Fix (if this is not the case)
    • Clear the new Notebook again
    • Commit the new Notebook to version control
    • Rerun (so the Notebook contains the results)
    • Delete the temporary copy
  • NBP2 Store the (cleared) Notebook in version control (see NBP1)
  • NBP3 Parameterize inputs and outputs at the top of the Notebook
    • e.g. Set and use variables such as INPATH and OUTPATH
    • Write the most important outputs to file (if not already done)
  • NBP4 Replace some individual cells or groups of cells with functions
    • Move them into an importable file, import and use
    • Prioritize potentially re-usable code and code requiring testing
  • NBP5 Write some tests
    • Create a reference (regression) test for the whole process
    • Create unit tests for the individual functions
  • NBP6 Consider restructuring/extracting the code as a standalone script
  • NBP7 Allow the parameters to be set from the command line
    • Alternatively, read from a configuration file (e.g. .json or .toml)
  • NBP8 Consider using safer alternatives like Marimo and Quarto

NBP Version 1.0.

A printable PDF copy is available.


Log Graphs and Grokkability

Posted on Thu 12 December 2024 in misc

In his novel Stranger in a Strange Land, Robert Heinlein1 introduced the word grok. It is used all the time in the computing sphere, but rarely, as far as I know, outside it. The definition that seems to me most closely to match its usage is:

grok (transitive verb).

  1. To understand profoundly through intuition or empathy.

grok (verb).

  1. To have or to have acquired an intuitive understanding of; to know (something) without having to think (such as knowing the number of objects in a collection without needing to count them: see subitize).

  2. To fully and completely understand something in all its details and intricacies.

  3. To get the meaning of something.

The American Heritage Dictionary of the English Language, 5th edition.2

Graphs and other visualizations are among the most powerful tools we have for identifying and elucidating patterns in data—for helping us to grok data. A good graph can be extremely dense in information and easy to understand. By the same token, a bad graph can—deliberately or inadvently—be a powerful tool for misleading and spreading confusion and misunderstanding. Data professionals who subscribe to the Hippocratic Oath, “First, do no harm”, should avoid poor graphing practices as a matter of a high priority.

Log Scales

As I was writing a chapter on graphs in a book on TDDA, I happened upon the graph below from Our World in Data (OWID).

A graph showing the incomes in the table on a y-axis with a log scale and tick lines at $100, $200, $500, $1k $2k $5k and $10k. The seven countries are displayed on the x-axis as an orange dot for the 10th percentile income and a blue dot for the 90th percentile income. There is a vertical line between these tow number for each country, annotated with ratio of the two incomes.

Graph from our world in data on Bluesky.

It uses data from the Luxembourg Income Study. and shows 14 base numbers—the post-tax incomes of the 10th percentile and 90th percentile citizens of seven countries (poorer and richer groups, respectively). The most common way of measuring income inequality uses the Gini coefficient,3 which is a single number that is powerful but rather abstract. The OWID graph is much more intuitive, focusing as it does on the ratio of incomes between a richer and poorer group, each defined by its position in the income distribution. The raw post-tax incomes and the derived multiples (ratios) are shown in this table:

The Data Shown in the OWID Graph.4

Country South Africa Brazil China Uraguy UK US Norway
Year 2017 2022 2018 2022 2021 2022 2021
90th percentile ($USD) 2,480 1,650 1,900 2,220 4,100 6,830 5,130
10th percentile ($USD) 110 195 250 395 1,080 1,170 1,670
Multiple 22⨉ 8.4⨉ 7.8⨉ 5.6⨉ 3.8⨉ 6⨉ 3.1⨉

Before discussing weaknesses with the graph, consider a few of its exemplary features. The graph minimizes chart junk5 while clearly explaining the findings with various direct annotations. Labels are close to the data, the numbers are easy to read, and different weights and colours (shades) of text are used to emphasize and de-emphasize information. The seven numbers they most wish to focus on are the income ratios for each country, which are shown clearly. The notes at the bottom both specify the source of the data and provide useful guidance about interpreting the numbers. The use of colour is effective, and the colours used appear to have been chosen to work for readers with colour blindness as well as everyone else. I think the graph draws the reader in, and is much more likely to cause someone to stop and study it than the dry table of numbers alone would be, particularly in social media (where I saw this).

Despite these many merits, I think the graph only partially succeeds as a graph. My own experience was that I did get a feel for the information (I believe) the graph was trying to communicate by reading it (which did not take too long—it is, after all, only 21 numbers), but to gain that understanding I did have to read all 21 numbers—the 7 multipliers from the labels, and the 14 incomes by looking across at the y-axis. Even when I had done this, I still did not have the same intuitive, almost visceral understanding that comes from a really effective graph. The immediate reason for this is the log scale: humans simply do not have the same effortless ability to grok information presented using log-scaled lengths as we do with proportionately scaled lengths.

Top: the previous graph redrawn in much the same manner but using a linear scale for the $y$-axis. Bottom: the ratios from the original graph plotted in a similar style, but with the poor group in each country at value 1 (1 times), and the rich group at whatever the income ratio is for that country.

My first instinct was the that log scale was unnecessary: the range of values is not too great to show comfortably on a linear scale. So I plotted the top graph of the pair above, which comfortably displays all the incomes, albeit with the qualification that differences in the incomes of the poorer groups in first four countries are harder to differentiate. (Smaller markers would help here, but I wanted to change as little as possible from the OWID graph, where the marker size is not a problem.) I contend that the redrawn graph makes it easier to compare incomes across the seven different countries visually. Unfortunately, however, it is hopeless for conveying the income ratios, which are surely the quantities OWID most wanted to communicate. In the redrawn graph, a larger separation between the rich and poor groups does not mean that the income ratio is larger, still less is the separation proportional to the ratio: the absolute difference between incomes in the US is more that twice that in South Africa, because incomes are so much higher in the US, so its multiplier of 6⨉ is represented by a much longer line than is the 22⨉ multiplier for South Africa. The log plot cleverly resolves this because addition of logs is equivalent to multiplication of the original quantities. As a result, differences on a log plot are the logs of the ratios, and do rank the ratios correctly.6

The cleverness of the presentation chosen by OWID is that it allows a single graph, with a single scale, to show both the relative incomes among countries and the ratios between incomes of the the richer and poorer groups in each country. But the cost of this approach is that not only the income scale itself, but also the multipliers are shown on log scales which—to labour the point—are hard to grok.

The bottom graph in the redrawn figure shows the multipliers themselves on a linear scale. I think this is a significantly better (more grokkable) way to visualize them. There is, however, a final subtlety. In general, ratios of positive quantities can have any positive value. But in this case, the multiplier constructed is the ratio of a larger income (the 90th percentile) and a smaller one (the 10th percentile). Manifestly, this cannot be smaller than 1—the value it would take if everyone in the country had the same post-tax income. It is for this reason that I have drawn the connecting lines on the last plot from 1⨉ to the multiplier, rather than from 0, and have chosen not to present this as a conventional bar graph. One is the effective zero for these particular multipliers.7 A country with a perfectly even income distribution would have no distance between the 10th and 90th percentiles—once more emphasizing how clever the device of using a log scale for this data is, even if, as I contend, it was ultimately a poor choice for communcation.

Another way of saying this is that OWID (I believe) was trying to show two different things, lying on naturally different scales, on a single plot. OWID found a clever technical solution that allowed them to do this, but at the cost of grokkability. As I deconstructed it, I realised I needed two plots to show the data in ways that I think are much easier to understand. You, of course, must form you own judgement.


  1. Heinlein, Robert A. (1961). Stranger in a Strange Land. G. P. Putnam’s Sons. 

  2. The American Heritage Dictionary of the English Language (2022). 5th ed. Random House Inc. 

  3. Hasell, Joe (2023). Measuring inequality: what is the Gini coefficient? In: Our World in Data. https://ourworldindata.org/what-is-the-gini-coefficient. 

  4. The data was reconstructed from the OWID graph, so there will be minor deviations from the original Luxembough Income Study data. 

  5. The Visual Display of Quantitative Information, Edward R. Tufte, Graphics Press, 1984. 

  6. since logarithms are monotonic, increasing functions. 

  7. because log 1 = 0 


Jupyter Notebooks Considered Harmful: The Parables of Anne and Beth

Posted on Thu 14 November 2024 in TDDA • Tagged with TDDA, reproducibility, process

I have long considered writing a post about the various problems I see with computational notebooks such as Jupyter Notebooks. As part of a book I am writing on TDDA, I created four parables about good and bad development practices for analytical workflows. They were not intended to form this post; but they way they turned out fits the theme quite well.

Situation Report

Anne and Beth are data scientists, working in parallel roles in different organizations. Each was previously tasked with analysing data up to the end of the second quarter of 2024. Their analyses were successful and popular. Even though there had never been any suggestion that the analysis would need to be updated, on the last Friday of October Anne and Beth each receive an urgent request to “rerun the numbers using data up to the end of Q3”. The following parables show four different ways this might play out for our two protagonists from this common initial situation.

Parable #1: The Parable of Parameterization

Beth

Beth locates the Jupyter Notebook she used to run the numbers previously and copies it with a new name ending Q3. She changes the copy to use the new data and tries to run it but discovers that the Notebook does not work if run from start to finish. (During development, Beth jumped about in the Notebook, changing steps and rerunning cells out of order as she worked until the answers looked plausible.)

Beth spends the rest of Friday trying to make the analysis work on the new data, cursing her former self and trying to remember exactly what she did previously.

At 16:30, Beth texts her partner saying she might be a little late home.


Anne

Anne begins by typing

make test

to run the tests she created based on her Q2 analysis. They pass. She then puts the data in the data subdirectory, calling it data-2024Q3, and types

make analysis-2024Q3.pdf

Her Makefile’s pattern rule matches the Q3 data to the target output and runs her parameterized script, which performs the analysis using the new data. It produces the PDF, and issues a message confirming that consistency checks on the outputs passed. After checking the document, Anne issues it.

At 09:30, Anne starts to plan the rest of her Friday.

Computational notebooks, such as Jupyter Notebooks,1 have taken the data science community by storm, to the point that it is now often assumed that analyses will be performed in a Notebook. Although I almost never use them myself, I do use a web interface (Salvador) to my own Miró that from a distance looks a version of Jupyter from a different solar system. Notebooks are excellent tools for ad hoc analysis, particularly data exploration, and offer clear benefits including the ability to embed graphical output, easy shareability, web approaches to handling wide tables, and facilitation of annotation of analysis in a spirit somewhat akin to literate programming. I do not wish to take away anyone's Notebooks, notwithstanding the title of this post. I do, however, see several key problems with the way Notebooks are used and abused. Briefly, these are:

  • Lack of Parameterization. I see Notebook users constantly copying Notebooks and editing them to work with new data, instead of writing parameterized code that handles different inputs. Anne's process uses the same program to process the Q2 and Q3 data. Beth's process uses a modified copy, which is significantly less manageable and offers more scope for error (particularly errors of process).

  • Lack of Automated testing. While it is possible to write tests for Notebooks, and some tools and guides exist e.g. Remedios 2021,2 in my experience it is rare for this to be done even by the standards of data science, where testing is less common than I would like it to be.

  • Out-of-order execution. In Notebooks, individual cells can be executed in any order, and state is maintained between execution steps. Cells may fail to work as intended (or at all) if the state has not been set up correctly before they are run. When this happens, other cells can be executed to patch up the state and then the failing cell can be run again. Not only can critical setup code end up lower down a Notebook than code that uses it, causing a problem if the Notebook is cleared and re-run: the key setup code can be altered or deleted after it has been used to set the state. This is my most fundamental reservation about Notebooks, and it not merely a theoretical concern. I have known many analysts who routinely leave Notebooks in inconsistent states that prevent them from running straight through to produce the results. Notebooks are fragile.

  • Interaction with Source Control Systems. Notebooks can be stored in source control systems like Git, but some care is needed. Again, in my experience, Notebooks tend not to under version control, with the copy-paste-edit pattern (for whole Notebooks) being more common.

In my view, Notebooks should be treated as prototypes to be converted to parameterized, tested scripts immediately after development. This will often involve converting the code in cells (or groups of cells) into parameterized functions, something else that, Notebooks seem to discourage. This is probably because cells provide a subset of the benefits of a callable function by visually grouping a block of code and allowing it to be executed in isolation. Cells do not, however, provide other key benefits of functions and classes, such as separate scopes, parameters, enhanced reusability, enhanced testability, and abstraction.

Anne's process has four key features that differ from Beth's.

  • Anne runs her code from a command line using make. If you are not familiar with the make utility, it is well worth learning about, but the critical point here is that Anne's setup allows her to use her process on new data without editing any code: in this case her make command (is intended to) get expanded to python analyse.py 2024Q3 and uses the parameter 2024Q3 both to locate the input data and to name the matching report generated.

  • Anne also benefits from tests she previously wrote, so has confidence that her code is behaving as expected on known data. This is the essence of what we mean by reference testing. While you might think that that if Anne has not changed anything since she last ran the tests, they are bound to pass (as they do in this parable), this is not necessarily the case.

  • Anne's code also includes computational checks on the outputs. Of course, such checks can be included in a Notebook just as easily as they can be in scripts. The reason they are not is entirely because I am making one analyst a paragon of good practice and the other a caricature of sloppiness.

  • Finally, unlike Beth, Anne takes the time to check her outputs before sending them on. Once again, this is because Anne cares about getting correct results, and wants to find any problems herself, not because she does not use a Notebook.

Parable #2: The Parable of Testing

Beth

Beth copies, renames and edits her previous Notebook and is pleased to see it runs without error on the Q3 data. She issues the results and plans the rest of her Friday.

The following week, Beth's inbox is flooded with people saying her results are “obviously wrong”. Beth is surprised since she merely copied the Q2 analysis, updated the input and output file paths, and ran it. She opens her old Q2 Notebook and reruns all cells. She is dismayed to see all the values and graphs in the second half of the Notebook change.

Beth has some remedial work to do.


Anne

Anne runs her tests but one fails. On examining the failing test, Anne realises that a change she made to one her helper libraries means that a transformation she had previous applied in the main analysis is now done by the library, so should be removed from her analysis code.

After making this change, the tests (which were all based on the Q2 data) pass. Anne commits the change to source control before typing

make analysis-2024Q3.pdf

to analyse the new data. After sense checking the results, Anne issues them.

In the first parable, Beth's code would not run from start to finish; this time, it runs but produces different answers from when she ran it before using the Q2 data. This could be because she had failed to clear and re-run the Notebook to generate her final Q2 results, but here I am assuming that her results changed for the same reason as Anne's: they had both updated a helper library that their code used. Whereas Anne's tests detected the fact that her previous results had changed, Beth only discovered this when other people noticed her Q3 results did not look right (though had she checked her results, she might have noticed that something looked wrong.) Anne is in a slightly better position than Beth to diagnose what went wrong, because her “correct” (previous) results are stored as part of her tests. Now that Beth has updated her Notebook, it may be harder for her to recover the old results. Even if she has access to the old and new results, Beth is probably is less good position than Anne because Anne has at least one test highlighting how the result has changed. This should allow her to make faster progress and gain confidence that her fix is correct more easily.

Parable #3: The Parable of Units

Beth

Again, Beth copies, renames and updates her previous Notebook and is happy to see it runs straight through on the Q3 data. She issues the results and looks forward to a low-stress day.

Around 16:00, Beth's phone rings and a stressed executive tells her the answers “can't be right” and need to be fixed quickly. Beth is puzzled. She opens her Q2 Notebook, re-runs it and the output is stable. That, at least, is good.

Beth now compares the Q2 and Q3 datasets and notices that the values in the PurchasePrice column are some three orders of magnitude larger in Q3 than in Q2, as if the data is in different units. She checks with her data supplier to confirm that this is the case, then sends some rebarbative emails, with the subject Garbage In, Garbage Out! Beth grumpily adds a cell to her Q3 notebook dividing the relevant column by 1,000. She then adds _fixed to the Q3 notebook's name to encourage her to copy that one next time. She wonders why everyone else is so lazy and incompetent.


Anne

As usual, Anne first runs her tests, which pass. She then runs the analysis on the Q3 data by issuing the command

make analysis-2024Q3.pdf

The code stops with an error:

Input Data Check failed for: PurchasePrice_kGBP
Max expected: GBP 10.0k
Max found: GBP 7,843.21k

(Anne's code creates a renamed copy of the column after load because she had noticed while analysing Q2, that the prices were in thousands of pounds.)

Anne checks with her data supplier, who confirms a change of units, which will continue going forward. Anne persuades her data provider to change the field name for clarity, and to reissue the data.

Anne adds different code paths based on the supplied column names and adds tests for the new case. Once they pass, and she has received the updated data, Anne commits the change, runs and checks the analysis and issues the results.

This time, Anne is saved by virtue of having added checks to the input data, which Beth clearly did not (though, again, such checks could easily be included in a Notebook). This builds directly on the ideas of other articles in this blog, whether implemented through TDDA-style constraints or more directly as explicit checks on input values in the code.

Anne (being the personification of good practice) also noticed the ambiguity in the PurchasePrice variable and created a renamed copy of it for clarity. Note, however, that her check would have worked if she had not created a renamed variable.

A third difference is that Anne has effected a systematic improvement in her data feed by getting the supplier to rename the field. This reduces the likelihood that the unit will be changed without flagging it, decreases chances of its being misinterpreted, and allows Anne to have two paths through her single script, coping with data in either format safely. By re-sourcing the updated data, Anne also confirms that the promised change has actually been made, and that the new data looks correct.

Finally, Beth now has different copies of her code and has to be careful to copy the right one next time (hence _fixed). Anne's old code only exists in the version control system, and crucially, her new code safely handles both cases.

Parable #4: The Parable of Applicability

Beth

Beth dusts off her Jupyter Notebook and, as usual, copies it with a new name ending Q3. She makes the necessary changes to use the new data but it crashes with the error:

ZeroDivisionError: division by zero

After a few hours tracing through the code, Beth eventually realises that there is no data in Q3 for a category that had been quite numerous in the Q2 data. Her Notebook indexes other categories against the missing category by calculating their ratios. On checking with her data provider, Beth confirms that the data is correct, so adds extra code to the Q3 version of the Notebook to handle the case. She also makes a mental note to try to remember to copy the Q3 notebook in future.


Anne

Anne runs her tests by typing

make test

and enjoys watching the progress of the “green line of goodness”. She then runs the analysis on the Q3 data by typing

make analysis-2024Q3.pdf

but it stops with an error message: There is no data for Reference Store 001. If this is right, you need to choose a different Reference Store. After establishing that the data is indeed correct, Anne updates the code to handle this situation, checks that the existing tests pass, adds a couple regression tests to make sure that it copes not only with the default reference store having no data, but also alternative reference stores. When all tests pass, she runs the analysis in the usual way, checks it, commits her updated code and issues the results.

As in the previous parable, the most important difference between Beth's approach and Anne's is that Beth's fix for their common problem is ad hoc and leads to a further code proliferation3 and its concomitant risks if the analysis is run again. In contrast, Anne's code becomes more general and robust as it handles the new case along with the old and she adds new extra tests (regression tests) to try to ensure that nothing breaks the handling of this case in future. The “green line of goodness” mentioned is the name some testers use for the line of dots (sometimes green) many test frameworks issue each time a test passes.


So there we have it. On a more constructive note, I have been following the progress of Quarto with interest. Quarto is a development of RMarkdown, a different style of computational document popular in the R and Reproducibile Research communities.4 To my mind it has fewer of the problems highlighted here. It also supports Python and much of the Python data stack as first-class citizens, and in fact integrates closely with Jupyter, which it uses behind the scenes for many Python-based workflows. I have been using it over the last couple of days, and though it is still distinctly rough around the edges, I think it offers a very promising way forward, with excellent output options that include PDF (via LaTeX), HTML, Word documents and many other formats. It's both an interesting alternative to Notebooks and (perhaps more realistically) a reasonable target for migrating code from Notebook prototypes. I use most of the cells to call functions imported at the start, promoting code re-use and parametization, which avoids another of the pitfalls of Notebooks (in practice) discussed above.


  1. I am using the term Jupyter Notebook to cover both what are now called Jupyter Notebooks “The Classic Notebook Interface”) and JupyterLabs (the “Next-Generation Notebook Interface”). This is both because most people I know continue to call them Jupyter Notebooks, even when using JupyterLab, and because “Notebooks” reads better in the text. I will capitalize Notebook when it refers to a computational notebook, as opposed to a paper notebook. 

  2. Remedios 2021, How to Test Jupyter Notebooks with Pytest and Nbmake, 2021-12-14. 

  3. analysis_fixed_fixed2_final.ipynb 

  4. Reproducible research is very much a sister movement to TDDA—a much larger sister movement. Its goals are similar and it is wholly congruent with all the ideas of TDDA. To the extent that there is divergence, some of it simply arises from separate evolution, and some from the fact that focus of reproducible research is more allowing other people to access your code and data, to run it themselves and verify the outputs, or to write their own analysis to verify your results even more strongly, or to use your code on their data as a different sort of validation. I sometimes call TDDA “reproducible research for solipsists”, because of its greater focus on testing, and helping to discover and eliminate problems even if no second person is involved. Another related area I have recently become aware of is Veridical data science, as developed by Bin Yu and Rececca Barter. The link is to their book of that name. 


An Adware Malware Story Featuring Safari, Notification Centre, and Box Plots

Posted on Sun 22 September 2024 in misc

This is not, primarily, an article about TDDA, but I thought it was worth publishing here anyway. Itʼs a story about a kind of adware/malware incident I had this morning—with digressions about box plots.

Disgression

I was doing some research for a book (on TDDA), looking up information on box plots, also known as box-and-whisker diagrams. When I first came across box plots, I assumed the “box” in the name was a reference to the literal “box” part of a traditional box plot. If you are not familiar with box plots, they typically look like the ones shown in Wikipedia:1:

Box plots for the Michaelson-Morely Experiment

There are variations, but typically the central line represents the median, the “box” delineates the interquartile range, the whiskers extend to either the minimum and maximum or, sometimes, other percentiles, such as 1 and 99. When the minimum and maximum are not used, outliers beyond those extents can be shown as individual points, as seen here for experiments 1 and 3.

At some point after learning about box plots, I became aware of the statistician George Box—he of “All models are wrong, but some models are useful”2 fame, and ended up believing that box plots had in fact been invented by him (and should, therefore, arguably be called “Box plots” rather than “box plots”). Whether someone misinformed me or my brain simply put 2 and 2 together to make about 15, Tufte5 (who advocates his own “reduced box plot”, in line with his principle of maximizing data ink and minimizing chart junk) states definitively that the box plot was in fact a refinement by John Tukey3 of Mary Eleanor Spearʼs “range bars”4. So I was wrong.

Back to the malware

Anyway, back to the malware. I was clicking about on image search results for searchs like box plot "George Box" and hit a site that gave one of the ubiquitous “Are you a human?” prompts that sometimes, but not always, act as a gateway to solving CAPTCHAs to train AI models. But this one didnʼt seem to work. I closed the tab and moved on, but soon after started getting highly suspicious looking pop-up notifications like these:

Malware/ad-ware notifications from ask you

These are comically obviously not legitimate warnings from anything that I would knowingly allow on a computer, which made me less alarmed that I might otherwise have been. But clearly something had happened as a result of clicking an image search result and an “I am not a robot” dialogue.

I wonʼt bore you with a blow-by-blow account of what I did, but the key points are that

  • Killing that tab did not stop the notifications.
  • Nor stopping and restarting Safari but bringing back all the old windows.
  • Nor did stopping and restarting Safari without beinging back any tabs or windows.
  • Nor did deleting todayʼs history from Safari.

So I did some web searches, almost all of the results of which advocated downloading sketchy “anti-malware” products to clean my Mac, which I was never going to do. Eventually, I came across the suggestion that it might be a site that had requested and been given permission to put notifications in Notification Centre. I think I was only half-aware that this was a possible behaviour, but it turns out that (on MacOS Ventura 13.6.9, with Safari 17.6) Safari → Settings → Websites has a Notifications section on the left that looks like this:

Malware/ad-ware notifications from ask you

I must have been aware of this at various points, because I had set the four websites at the bottom of the list to Deny, but I had not noticed the Allow websites to ask for permission to send notifications checkbox, which was enabled (but is now disabled). The top one—looks suspicious, dunnit?—was set to Allow when I went in. I have a strong suspicion that the site I tricked me into giving it permission by getting me to click something that did not appear to be asking for such permission. I suspect it hides its URL by using a newline or a lot of whitespace, which is why it does not show up in the screenshot above.

Setting that top (blank-looking) site to Deny and (as a belt-and-braces and preventative measure) unchecking the checkbox so that sites are not even allowed to ask for permission to put popups in Notification Centre had the desired effect of making popups stop. I believe this consitutes a full fix and that no data was being exfiltrated from the Mac, despite the malicious notification. I will probably also Remove at least that top site (with the Remove button in the future) but will leave it there for now in case Apple (or anyone else) can tell me how to find out what site it comes from.

I also found (but cannot now find again) an option to reset the notications from those sites. This was the extremely confusing dialogue for the site in question.

Reset notifications dialogue for

I think whatʼs going on here is that some text that the site is using to identify itself to Safari when asking for permission consists of the following text, incuding the new lines:

ask you

Confirm that you're not a robot, you need to click Allow

This makes reading the dialogue quite hard and confusing. Looking more carefully at Notification Centre, I also see this:

Ask you permission which text request

I don't quite understand whether this is an image forming a notification, or an image included in some other notification, but offset, or something else. Whatever it is, it consists of (or includes) white text saying

ask you

Confirm that you're not a
robot, you need to click Allow

with a little bit of light grey background around the letters.

I donʼt entirely understand why the site would have used barely readable white text on a light grey background like this, but I presume somehow this text was involved in getting me to click the “I am not a robot” dialogue (which I believe to be the only click I performed on the site).

Anyway, the long and the short of it is that if anyone else runs into this, my recommendations (which do not come from a security expert, so use your own judgement) are:

  1. Donʼt download a random binary from the internet to remove spyware.
  2. Try to find the Safari Preference for Notifications under Websites and see if you have a sketchy-looking entry like mine. If so, set that to Deny
  3. Probably also remove that site with the Remove buttom
  4. Consider turning off the ability for sites to request permission to put notifications in Notification Centre if this is not something you want, or that no site you care about needs.

  1. Taken from Wikipedia entry on box plots, own work by Wikipediaʼs User:Schutz (public domain). 

  2. George Box, (1919-2013): a wit, a kind man and a statistician., Obituary by Julian Champkin, 4 April 2013 https://significancemagazine.com/george-box-1919-2013-a-wit-a-kind-man-and-a-statistician-2/ 

  3. Exploratory data analysis, John Tukey, Reading/Addison-Wesley, 1977. 

  4. Charting Statistics, Mary Eleanor Spear, McGraw Hill, 1952. 

  5. The Visual Display of Quantitative Information, Edward R. Tufte, Graphics Press, 1984. 


PyData London 2024 TDDA Tutorial

Posted on Sun 21 July 2024 in TDDA • Tagged with TDDA, tutorial

PyData London had its tenth conference in 2024, and it was excellent.

I gave a tutorial on TDDA, and the video is available on YouTube and below:

The slides are also available here.