Over the holidays, I’ve been thinking about what the impacts of 2025âs progress in AI coding tools will mean for how software gets designed, built, and operated in 2026.
The primary impact of LLM tooling, so far, is that the marginal cost (both in terms of time and dollars) of producing high quality code has gone down significantly. Of course, producing code is only part of the full job of software engineering, so the bottlenecks for engineering time will shift elsewhere.
To start, what exactly are we trying to do here, as software engineers? As a vague but hopefully somewhat useful definition: building, evolving, and operating distributed software systems that provide some concrete business utility. The âbuildingâ component has noticeably become cheaper with LLMs, and âevolvingâ systems has also become easier. âOperatingâ systems, from what Iâve seen, has for now been least impacted by LLMs.
The âbusiness utilityâ goal will also change company-by-company, and engineering org-by-org. The most obvious split for this is infrastructure versus product orgs, where Iâd expect product orgs to get more of an uplift from LLM coding than infrastructure – LLMs seem to grok frontend particularly well, and there tends to be more greenfield product work than in infrastructure.
The market will expect SWEs to extract productivity gains from LLMs. The field broadly seems poised to become more mechanized, but more productive as a result. Thereâs a re-skilling and mindset shift thatâs been accelerating for a few months, but most of the effects of this have yet to be fully realized.
Here are some shifts I expect to accelerate in 2026:
Infrastructure Abstractions
Returns to good infrastructure abstractions compound faster. Can you roll out binaries fast (and roll back with similar speed)? Do you have out-of-the-box ways to quickly spin up new compute / backends for the things youâre serving?
All the core infra pieces remain important: metrics, logging, incident management, feature flags, releases, autoscaling, orchestration, workflow engines, configuration, caching, networking, etc. Companies will be well-served by making these pieces of core infra easy to use for both humans and LLMs. Infrastructure should be made as self-service as possible, with friendly CLIs or MCP-ready APIs, and with minimal infra-engineer-in-the-loop required to unblock human and AI users.
CI Infrastructure
Quality, fidelity, and speed of CI infrastructure becomes even more important as AI agents write more of the code. Perhaps we need to rethink the unit test and invest more in things like property testing and formal verification for the lower level pieces of the stack.
Humans tend not to like writing tests – theyâre not fun to write, theyâre mechanical, and they generally feel like a tax on the effort that could otherwise be spent writing flashy implementation code. LLMs have no such qualms. We have no excuse for not having near exhaustive test scenario coverage.
Human-guided Abstractions
Crisp human-guided abstractions become all the more important. LLMs, without strong guidelines, will slop-fill greedy solutions to make CI checks pass, increasing spaghettification over time. Well-informed intuition, well-developed âsystems tasteâ is still required upfront. Things like module boundaries, library interfaces, contracts between the infrastructure and product layers become an increasingly high-leverage set of levers for maintaining long-term code quality. Systems that lack these crisp boundaries will accumulate technical debt faster.
LLM-generated code is not guaranteed to be high quality. While quality has increased significantly over the past year, itâs still quite easy to drown oneself in technical debt with a few poorly constructed PRs.
Human Code Review
Human code review increasingly becomes an important bottleneck. A new âreview tasteâ needs to be developed. As much as possible, stylistic concerns should be pushed into automated lints that run pre-merge and, ideally, by the LLM agents pre-commit. Human code review should differentially focus on decisions that can not easily be codegenâd away later – things like interface change, sensitive code involving data persistence, and performance critical code still need high scrutiny. This creates a paradox for junior engineers: they need to develop âreview tasteâ earlier, but are doing less of the âwritingâ that builds that intuition.
Weâll need to ask, collectively: What things, though suboptimal, are stylistically permissible to be checked in? What things must never be checked in? What things are the new slippery slope code smells? How much of code review itself can be automated?
Project Timelines Estimates Increase in Variance
I expect variance on project estimates goes up significantly. The extent to which a task can be LLM-ified increasingly influences its wall-time cost. This adds a pressure for high-value projects to be nudged in ways that make them more LLM-amenable, but this often isnât possible. The highest-value projects that most need de-risking are often the ones least amenable to LLM assistance, because they require deep context, involve low-level systems, or have high blast radius.
Some tasks which previously would have been a long haul are now easier (e.g. code-centric migrations or inter-language/system translations) Some tasks remain relatively stable in their difficulty (e.g. networking).
Impact of AI on âBuild vs. Buyâ Decisions
Does the falling price of code influence the âbuild vs buyâ distinction for SaaS in a meaningful way? My guess is that on the margin, âyesâ, but in big ways, ânoâ. For commodity SaaS that’s mostly a thin UI over CRUD, the build-vs-buy calculus will shift toward building, at least for medium-to-large size tech companies with a competent IT arm. For infrastructure-as-a-service or compliance-as-a-service, the calculus wonât shift much because operating costs for an in-house system haven’t fallen the way development costs have.
Open Questions:
- Do we still require human review every line of code? How load-bearing is that? For what systems is a fine-tooth comb required and what is truly vibe-code-able?
- What is the best way to âAdd bits to beat slopâ for software engineers?
- What things change with 100x or 1000x faster, cheaper models?
- One lightbulb moment: It will become cheap enough to run an LLM on every emitted service log. Right now, this seems nonsensical/pointless. But one could imagine some utility there, for example in helping debug incidents. Iâve already started to see promising demos for targeted, automated LLM copilots for incident debugging.