Thoughts on Marginal Token Spend

The rise of coding agents has made it easy for a single engineer to spend thousands of dollars a day in LLM tokens. This is a new class of expense, and it will change the future cost structure of software engineering. We are between stable equilibria today in SWE: the old one, of needing humans to drive any code change, and a yet-to-be-established new one, where AI agents write most code.

Taking as a premise that AI agents will write a large fraction of code in the new equilibrium, we will need to rethink the resulting cost structure for engineering orgs. In the long term, token spend will become a large portion of enterprise OpEx, split into two classes: spend attributable to a human worker (e.g. Claude Code, internal AI tooling like call center assistants) and spend attributable to automated systems (e.g. agents which respond autonomously to customers by fielding calls/emails, agents which monitor business systems).

Automated token spend is analogous to cloud cost: at equilibrium, it “should” roughly scale linearly with product usage/revenue. Human-generated token spend seems like a new class of expense: It’s a resource that does not scale linearly with headcount and has no natural ceiling. The per-person absorption rate is whatever an engineer’s workflow can productively use, and that ceiling is itself increasing as new harnesses, workflows, and practices are developed. Human-attributable token spend is therefore the more interesting type of token usage from a decision-making perspective: it’s effectively an expensive “more productivity button”. Rationally, you should continue pushing the button up until the point where the marginal returns you receive match the marginal cost of pushing the button again.

This framing also explains why the “Tokenmaxxing” meme started. Tokenmaxxing is the recent idea that consuming more tokens makes you more “AI native” and therefore more productive. AI adoption is path dependent; engineers are hesitant to change their patterns. Leaders noticed that engineers were opting to push the “more productivity” button at an irrationally low rate. The short-term “fix” is to directly incentivize pushing the button – with the clear risk that you overshoot into people Goodharting on “push the button as much as I can” vs. “increase my productivity to the point of diminishing returns”. And, predictably, “oops, we overshot” becomes a narrative – at least for the organizations that aren’t able to keep finding additional efficient uses of AI that expand their production frontier.

A corollary: if your marginal token cost is negligible, then your usage of AI should go way, way up. Right now, Anthropic and OpenAI are clearly in the lead for coding model capabilities. They have lots of compute. The amount of compute needed to completely saturate their engineers’ token demand is likely a small fraction of what they use on the marginal research projects. If you have the hardware, and marginal-cost access to frontier models, you should probably actually be Tokenmaxxing, since it’s much harder to reach the balance point of “marginal return = marginal cost” when marginal cost is so much lower.¹

Whether you should Tokenmaxx in the short term depends on where you sit on the marginal cost curve. If your marginal token cost is high (e.g. paying Anthropic for Opus tokens at retail prices), the just-spend-more heuristic will overshoot. If it’s low, you probably should push harder.

In the long(er) term, the cost curve is influenced by how much an organization is able to absorb the ability to use more tokens. Companies will invest in getting more productive use per token, through both technical improvements (e.g. connectivity between AI tools and internal knowledge silos) and operational work (e.g. workflow redesign, changing norms of when it’s appropriate to substitute AI artifacts for human-generated ones). Token spend is more elastic than headcount, and the space is new enough that the efficiency frontier is still being discovered and reshaped.

My bet is that most efficiency gains won’t become competitive moats, because development techniques diffuse too quickly and the underlying models are available to everyone. They will, however, become a low-water mark. Companies that fall below it will be outcompeted by firms that don’t. In this way, AI adoption is a Red Queen race, requiring increasing efficiency/usage just to not become irrelevant.

None of this is stable yet. We’re currently between stable equilibria in many areas in software development. As such, it’s a particularly exciting time to be working in this space.

To some extent, frontier labs also have to consider the opportunity cost of using compute for engineering instead of research or inference, i.e. the marginal token can either go to the marginal research project, the marginal internal engineer, or the marginal external customer inference token. That opportunity cost is not zero, but it’s a different set of tradeoffs than external companies buying retail tokens. Frontier lab investment in R&D via “engineer tokens” also increases returns to marginal research compute, and “engineer tokens” are likely a quite small fraction of either research or inference compute. ↩︎