Software Engineering in 2026

Over the holidays, I’ve been thinking about what the impacts of 2025’s progress in AI coding tools will mean for how software gets designed, built, and operated in 2026.

The primary impact of LLM tooling, so far, is that the marginal cost (both in terms of time and dollars) of producing high quality code has gone down significantly. Of course, producing code is only part of the full job of software engineering, so the bottlenecks for engineering time will shift elsewhere.

To start, what exactly are we trying to do here, as software engineers? As a vague but hopefully somewhat useful definition: building, evolving, and operating distributed software systems that provide some concrete business utility. The “building” component has noticeably become cheaper with LLMs, and “evolving” systems has also become easier. “Operating” systems, from what I’ve seen, has for now been least impacted by LLMs.

The “business utility” goal will also change company-by-company, and engineering org-by-org. The most obvious split for this is infrastructure versus product orgs, where I’d expect product orgs to get more of an uplift from LLM coding than infrastructure – LLMs seem to grok frontend particularly well, and there tends to be more greenfield product work than in infrastructure.

The market will expect SWEs to extract productivity gains from LLMs. The field broadly seems poised to become more mechanized, but more productive as a result. There’s a re-skilling and mindset shift that’s been accelerating for a few months, but most of the effects of this have yet to be fully realized.

Here are some shifts I expect to accelerate in 2026:

Infrastructure Abstractions

Returns to good infrastructure abstractions compound faster. Can you roll out binaries fast (and roll back with similar speed)? Do you have out-of-the-box ways to quickly spin up new compute / backends for the things you’re serving?

All the core infra pieces remain important: metrics, logging, incident management, feature flags, releases, autoscaling, orchestration, workflow engines, configuration, caching, networking, etc. Companies will be well-served by making these pieces of core infra easy to use for both humans and LLMs. Infrastructure should be made as self-service as possible, with friendly CLIs or MCP-ready APIs, and with minimal infra-engineer-in-the-loop required to unblock human and AI users.

CI Infrastructure

Quality, fidelity, and speed of CI infrastructure becomes even more important as AI agents write more of the code. Perhaps we need to rethink the unit test and invest more in things like property testing and formal verification for the lower level pieces of the stack.

Humans tend not to like writing tests – they’re not fun to write, they’re mechanical, and they generally feel like a tax on the effort that could otherwise be spent writing flashy implementation code. LLMs have no such qualms. We have no excuse for not having near exhaustive test scenario coverage.

Human-guided Abstractions

Crisp human-guided abstractions become all the more important. LLMs, without strong guidelines, will slop-fill greedy solutions to make CI checks pass, increasing spaghettification over time. Well-informed intuition, well-developed “systems taste” is still required upfront. Things like module boundaries, library interfaces, contracts between the infrastructure and product layers become an increasingly high-leverage set of levers for maintaining long-term code quality. Systems that lack these crisp boundaries will accumulate technical debt faster.

LLM-generated code is not guaranteed to be high quality. While quality has increased significantly over the past year, it’s still quite easy to drown oneself in technical debt with a few poorly constructed PRs.

Human Code Review

Human code review increasingly becomes an important bottleneck. A new “review taste” needs to be developed. As much as possible, stylistic concerns should be pushed into automated lints that run pre-merge and, ideally, by the LLM agents pre-commit. Human code review should differentially focus on decisions that can not easily be codegen’d away later – things like interface change, sensitive code involving data persistence, and performance critical code still need high scrutiny. This creates a paradox for junior engineers: they need to develop “review taste” earlier, but are doing less of the “writing” that builds that intuition.

We’ll need to ask, collectively: What things, though suboptimal, are stylistically permissible to be checked in? What things must never be checked in? What things are the new slippery slope code smells? How much of code review itself can be automated?

Project Timelines Estimates Increase in Variance

I expect variance on project estimates goes up significantly. The extent to which a task can be LLM-ified increasingly influences its wall-time cost. This adds a pressure for high-value projects to be nudged in ways that make them more LLM-amenable, but this often isn’t possible. The highest-value projects that most need de-risking are often the ones least amenable to LLM assistance, because they require deep context, involve low-level systems, or have high blast radius.

Some tasks which previously would have been a long haul are now easier (e.g. code-centric migrations or inter-language/system translations) Some tasks remain relatively stable in their difficulty (e.g. networking).

Impact of AI on “Build vs. Buy” Decisions

Does the falling price of code influence the “build vs buy” distinction for SaaS in a meaningful way? My guess is that on the margin, “yes”, but in big ways, “no”. For commodity SaaS that’s mostly a thin UI over CRUD, the build-vs-buy calculus will shift toward building, at least for medium-to-large size tech companies with a competent IT arm. For infrastructure-as-a-service or compliance-as-a-service, the calculus won’t shift much because operating costs for an in-house system haven’t fallen the way development costs have.

Open Questions:

Do we still require human review every line of code? How load-bearing is that? For what systems is a fine-tooth comb required and what is truly vibe-code-able?
What is the best way to “Add bits to beat slop” for software engineers?
What things change with 100x or 1000x faster, cheaper models?
- One lightbulb moment: It will become cheap enough to run an LLM on every emitted service log. Right now, this seems nonsensical/pointless. But one could imagine some utility there, for example in helping debug incidents. I’ve already started to see promising demos for targeted, automated LLM copilots for incident debugging.