I’ve recently been contemplating a recurring pattern that I’ve observed in several teams I’ve worked on – the ‘Load-Bearing Script.’ The outline of this pattern goes like this: A team member writes a portion of a system as a shell script for a quick prototype. That shell script, initially quite simple, grows in complexity over time. Eventually, the script grows to an unmanageable level of complexity. At that point, it needs to be rewritten in a more maintainable/testable language.

In my experience, this usually manifests itself as a bash script, though any untested/untestable “script” can exhibit this pattern.


In one case, we were building a system that needed to execute in a CI builder environment. We wanted to do some basic CI/CD work, and so the script was initially a simple wrapper around git and Kubernetes commands. Eventually, much of our system’s core business logic found its way into the script (metrics collection, a basic killswitch system, retry logic, etc.). This system was particularly challenging to manage because the script wasn’t even static. Our backend used Go templates to assemble the script dynamically and send it to the CI environment. Our only testing was sanity checks that our templater produced sensible output, and limited end-to-end testing.

In another case, my company had a requirement to run certain workloads (again, CI/CD type actions) in a specific compute environment. This compute environment made it super easy to executing bash scripts, and had friction to running team-build binaries. My team did ship our own binaries to this environment, but for reasons that retroactively aren’t defensible we still allowed business logic to creep into the script portion.

In both cases, we did a (fairly risky) rewrite. In both cases, the rewrite resulted in moderate severity incidents, despite best efforts to do so safely.

Why does this happen?

I can think of a number of reasons: Shell scripts are easy to prototype with. They’re an attractive option when you require ‘just a small amount of logic’ and wish to avoid the complexities of a build system, types, or tests. Software developers enjoy the avoidance of over-engineering almost as much as the enjoy over-engineering.

Why is this bad?

The primary reason I distrust load-bearing scripts is that they make systems unstable. The instability most often comes from the inability (or difficulty in) adding sufficient test coverage. Yes, there are frameworks for bash script testing! I’ve rarely seen them effectively used. Usually, a load-bearing script comes into existence because the work it is doing is difficult to test (for example, wrapping multiple dependent CLI tools in a CI environment). The load-bearing script becomes problematic because it becomes difficult to change. The script’s complexity surpasses a point where manual testing or limited end-to-end tests can prevent issues – and so, breakages will happen.

The secondary reason load-bearing scripts are nefarious is that you will eventually have to do a rewrite. It becomes inevitable. Either you accept permanent instability or do the rewrite. The longer you delay the write, the more painful it is. There will be pushback against the rewrite: the rewritten script needs to be feature compatible with the old system; the rewritten script needs to be released safely; rewriting the script will consume valuable developer time that could be spent working on Shiny New Features. But eventually, the scales tip towards the rewrite.

Advice to myself

If your script becomes larger than what’d be appropriate to store in a single reasonably sized function, it should no longer be a script. Prefer to bail early on the shell script and eat the cost of a simple rewrite, rather than let technical debt continue to accrue.