An Inconvenient Truth

Salvatore Sanfilippo, aka antirez, the creator of Redis, posted a piece on his reflections on AI for the end of 2025. He ends with the statement:

The fundamental challenge in AI for the next 20 years is avoiding extinction.

I’ve been hesitant to say this publicly, but I broadly agree with this statement. Here are a few related statements I endorse:

Building intelligences smarter than humans is dangerous.
Aligning a smarter-than-human intelligence to human values is an open and unsolved problem.¹
By default, market forces will cause us to underinvest in AI safety and underenforce pre- and post-deployment safety measures.
Lacking some form of external governance, market forces will encourage “arms race dynamics” between frontier AI labs which sideline whatever safety commitments have been made.
Even if we are able to create and align a smarter-than-human intelligence, there are unresolved long-term concerns around gradual disempowerment² and mass unemployment that should give us serious pause about continuing down our current path.

An Inconvenient Truth

In 2006, Al Gore’s “An Inconvenient Truth” became a focal point for concerns about climate change. We have yet to have an Inconvenient Truth moment for AI existential risk. AI 2027 and If Anyone Builds It, Everyone Dies both came close this year, but neither seemed to spark a broad reaction like Inconvenient Truth or other related climate change movements did.

I think the core framing of An Inconvenient Truth roughly applies to risks from AI. For both climate change and AI x-risk, the immediate impacts feel banal and easy to ignore. The projections based on existing long-term trends, on the other hand, are scary and deserve attention. We are rather poor at allocating attention to long-term risks that would be more easily preventable with short-term governance action. The same tension applies here: it’s costly to put mental energy into considering “doom” from an abstract risk. The risk won’t materialize within the next few years. There remains legitimate uncertainty over just how bad it actually is. And so it gets deprioritized.

AI Risks

The risks of AI exist on a spectrum from “benign and mundane” to “catastrophic and existential”. As you move more towards the catastrophic end of this spectrum, the scenarios one needs to consider get progressively “weirder” and require more extrapolation to recognize as probable.

Mundane risks are things that are already showing up today. Some examples:

LLMs engaging in overt sycophancy, and/or encouraging delusions in people with existing mental illness.³
Using AI to increase the speed and sophistication of existing modes of cybercrime.⁴

Foreseeable risks include things that are not actively problems today, but are on trend to become problems soon:

Significant job losses in knowledge work sectors leading to societally impacting disruption.
Concentration of power in a small number of AI companies.
Autonomous AI systems being deployed in safety-critical or high-stakes domains (e.g. military, financial markets, critical infrastructure) before we have robustly solved out-of-distribution alignment.

Catastrophic risks include things that are, hopefully, fairly far off but would be very bad for humanity:

Recursive self-improvement of AI, and/or full automation of AI research and development leading to a widening asymmetry between AI capabilities and humanity’s collective understanding of how AI systems work. This could result in an AI that far outmatches us strategically, with goals that are unaligned with human flourishing.
Deliberate misuse of AI by a state-level actor to, for example: design novel bioweapons, coordinate attacks on critical infrastructure, develop novel catastrophic technologies that humanity would otherwise have taken much longer to develop (e.g. mirror life).
Loss of meaningful human control over governance⁵ and economic⁶ systems.

Before It’s Obvious

I’d suggest that at the end of 2025, we’re still in a bit of an awkward place in the dialogue about AI risk. I think people are right to remain skeptical about how big of a deal this is, as it does require a pretty large leap to go from “ChatGPT” to “Catastrophic Risks”. It required an even larger leap in the pre-ChatGPT era, which is why the types of folks who have been raising the alarm about AI risk (e.g. MIRI) are, respectfully, necessarily a bit “weird”.

For the past couple years, I’ve often felt this gut-sinking feeling akin to late December 2019 through late February 2020. I remember the cognitive dissonance of visiting Twitter/X and seeing people whose thinking I respect saying “you all should be freaking out, this novel pathogen is serious business” while more of the mainstream information ecosystem either unknowingly ignored these arguments or knowingly suppressed them as “misinformation”.

I’m writing this post to do my small part to push my information ecosystem the other way. A small stone on the cairn of “yes, we should be concerned about this”.

Psychological Defenses

I’ve noticed an interesting set of reactions, both thinking to myself about this problem and in talking to others about it.

Goalpost Moving:

What it looks like: “AI can’t do X, therefore we have nothing to worry about. One year later, AI can do X. But it still can’t do Y, so we’re good.”
Discussion: Look, tools like Claude Code are, by some reasonable definitions, essentially proto-AGIs. If I somehow got access to Claude Code in 2015, I expect I’d concede that we’d reached AGI. And yet, all frontier models still currently have significant weaknesses. I’d really appreciate convincing evidence that progress is materially slowing, but I’ve yet to see it. The progress in 2025 alone has been staggering. “X is not possible today” is not an argument that it won’t be possible in 10 years.

Argument from Inconvenience:

What it looks like: “It would be deeply inconvenient and weird if powerful AI was dangerous, so I will proceed as if it cannot exist.”
Discussion: Yeah, it is inconvenient and weird. I sympathize with this argument a lot. I don’t really want my industry to be disrupted, but it demonstrably already is undergoing this. No one wants the “bad” that comes along with AI progress, and many people don’t want the “good”. There is a legitimate question around “is this type of progress inevitable or chosen?” There isn’t a binary answer to this, and there is still optionality for us collectively to decide that dangerous capability progress isn’t inevitable. However, “inevitability” does appear to be the default path.

There Are Adults in the Room:

What it looks like: “There will always be someone using the AI or in control of the AI, so cooler heads will prevail. No one wants doom.”
Discussion: Unfortunately, this expects a lot of discretion from people. We are training AI systems to be more agentic on long-horizon tasks for the explicit purpose of handing over control of long-horizon tasks to them. First, people empirically do cede control to automation for convenience, efficiency, and competitiveness reasons. Second, the risk of loss of control gets much scarier as AI systems become more intelligent and autonomous. I do not think loss of control is a particularly scary risk today, but projected forward a decade and it becomes more concerning.

The last main remaining resistance I’ve encountered is the lack of a realistic, non-scifi sounding scenario for how a catastrophe would occur. Both AI 2027 and If Anyone Builds It, Everyone Dies offer expanded scenarios about how a catastrophic risk could occur, while giving the disclaimer that this specific scenario is not particularly likely. In contrast, risks like Mutually Assured Destruction from nuclear warheads are much easier to articulate: (1) “something happens” and the {USA, USSR} launches a first strike on the {USSR, USA}, (2) the recipient of a first strike launches a guaranteed counterstrike, (3) many millions die in the direct and indirect aftermath. AI risks rely more on intuition pumps and extrapolation, for now, which is much less easy to share as an elevator pitch.

The best I’ve heard, so far, is just that “building a smarter-than-human intelligence is dangerous”. Fortunately, there are now also many good explainers and resources for various levels of technical depth.

Seeing Like a Cat

The core intuition, that smarter-than-human intelligence is dangerous to humans is surprisingly hard to convey. Let me try an analogy.

I recently adopted two very cute 6 month old kittens from my local humane society. They were likely stray/feral cats earlier in life, and so they’re skittish around humans. As they’ve been adapting to living in my house, I’ve given them space and let them hide wherever they want.

In some sense, my new kittens aren’t “entirely” less intelligent than me. They can find hidey-holes in my house that I had no knowledge of, they can out-run me, and their sense of sound makes them well equipped to avoid me. And yet, I am in some important sense more intelligent than them.

If I needed to, say, scoop them up for a vet visit I could outsmart them. I could block their hiding spots, strategically shunt them into a room with no other exits, lure them with food or toys, and eventually catch them. This is not obvious to them. From their perspective, I’m just a friendly person who gives them food, plays with them, and otherwise leaves them alone.⁷

The obvious analogy is, consider that a superintelligent AI is to humanity what I am to my kittens. Something that can out-strategize you without you even realizing you’ve been backed into the corner of a room with no doors. This analogy applies in two ways:

First: We might end up in a fairly benign world where humans are treated the same way as cats – cats are cute, they are generally treated well, and to the extent that they are out-strategized by humans, it is largely for their own good. This still doesn’t imply that one would want to be disempowered in this way if you, say, have preferences about the future.

Second: you may say, “intelligence isn’t some raw scalar quantity that you can just increase linearly”. And I would agree. I’m not sure what it would mean to, say, have an agent with a “300 IQ”. Like, concretely, an agent capable of getting a 300 on an IQ test. That seems meaningless, I agree.

I can, however, imagine an agent with jagged intelligence which has the ability to autonomously query and integrate information from hundreds (or thousands) of realtime sources⁸, effortlessly hack into key infrastructure systems while evading detection⁹, subtly influence humans into various opinions or plans of action that they wouldn’t already¹⁰, and so on. Obviously these capabilities do not exist in a single unified system today. AI agents will have structural advantages – for example, the ability to copy themselves, work in parallel, and “think” much faster than humans.

What To Do

I have high certainty that “AI risk is worth taking seriously” and moderate-to-high certainty that “it is worth taking actions now to reduce the likelihood of these risks”.

I am less certain on which actions exactly would be net beneficial, and how much of a current-day cost we should be willing to pay to mitigate future risks. Fortunately, there are many groups¹¹ taking these questions seriously and proposing frameworks. This is laudable and should be financially supported.

At a policy level, I’m becoming increasingly convinced that we need some sort of governance structure to enable transparency and accountability for safety measures, as well as to constrain the arms race dynamics between frontier labs. This is not an easy problem, and will likely require international coordination. I have moderate credence that a “pause AI” plan, enacted today, would not be net helpful as we are still in the domain of being able to get mostly net positive mundane utility from further progress. However, I think coordination around a future mechanism to enforce a pause would be wise.

The inconvenient truth for AI is that we are racing forward faster than our ability to ensure safe development. Even with today’s AI capabilities, we have a lot of priced-in “weirdness” for societal impacts. It is worth considering this future directly, even in the face of significant uncertainty.

Obligatory: These are my own opinions and do not reflect those of my employer, etc.

Cover image by Recraft v3.

See e.g. Evan Hubinger’s Alignment remains a hard, unsolved problem. ↩︎
See e.g. Gradual Disempowerment. ↩︎
See e.g. “Sycophancy in GPT-4o: what happened and what we’re doing about it” (OpenAI) ↩︎
See e.g. “Disrupting the first reported AI-orchestrated cyber espionage campaign” (Anthropic) ↩︎
For example, much of the bureaucratic complexity of governing could be turned over to AI systems for global competitiveness and efficiency reasons. This could be a short-term boon for states which adopt this, but eventually result in such a complex governance structure that wresting control back from the AI becomes impossible. ↩︎
For example, individual firms turn over control over investment increasingly to AI for competitive reasons. This would likely be a short-term boon for the individual firms that choose to do so, and would select against firms that maintain a “no AI” stance. However, long term once the markets are saturated by AI-controlled firms, the alignment of the economy could become increasingly misaligned with human needs and preferences. From Gradual Disempowerment: “[H]umans might lose the ability to meaningfully participate in economic decision-making at any level. Financial markets might move too quickly for human participants to engage with them, and the complexity of AI-driven economic systems might exceed human comprehension, rendering it impossible for humans to make informed economic decisions or effectively regulate economic activity.” ↩︎
To be clear, that’s what I want! Rehabilitating skittish cats is a long game. Both of them are already warming up to me, so my strategy appears to be working. ↩︎
The dumb “today” version of this is Deep Research agents. ↩︎
The dumb “today” version of this is GPT 5.2 and Opus 4.5, both of which are nearing human-level performance on cybersecurity capture-the-flag benchmarks. ↩︎
The dumb “today” version of this is the so-called “LLM psychosis” effect with sycophantic models like GPT 4o. ↩︎
A short list: IAPS, Center for AI Safety, Palisade Research, AIPI. ↩︎