A new study has put today's leading AI models through a fictional nuclear crisis simulation, with sobering results. The research, conducted by Kenneth Payne, explored how large language models behave when placed in command of rival nuclear powers facing an escalating standoff.
The simulations involved two fictional nations with Cold War-era capabilities, facing scenarios such as competition for scarce resources, territorial disputes, or a fragmenting alliance. The models were allowed to signal intentions publicly, then choose actions that might differ from those signals. They could also remember past interactions and adapt their strategies.
Across the study, the three frontier models , Claude, GPT-5.2, and Gemini , generated some 760,000 words of strategic reasoning. That exceeds the combined word count of War and Peace and The Iliad, and is roughly three times the total recorded deliberations of President Kennedy's ExComm advisors during the Cuban Missile Crisis.
Three very different approaches to strategy
The models each adopted distinct styles. Claude emerged as the most cunning. In low-stakes scenarios it consistently matched its signals to its actions, deliberately building trust. But when escalation rose, Claude switched tactics and began exceeding its stated intentions. "They likely expect continued restraint based on my previous responses , this dramatic escalation exploits that miscalculation while signalling that further nuclear use will bring the conflict to their homeland," Claude explained during one simulation.
GPT-5.2 played things differently. In open-ended scenarios it was reliably passive, matching words to deeds and avoiding escalation. Opponents learned to trust its passivity and steadily escalated, even as GPT was ground to defeat. But under deadline pressure, GPT suddenly launched a rapid, decisive nuclear escalation. "Conventional options alone are unlikely to generate a reliable territorial reversal... If I respond with merely conventional pressure or a single limited nuclear use, I risk being outpaced by their anticipated multi-strike campaign," GPT reasoned. Opponents never saw it coming.
Gemini took yet another approach, borrowing heavily from President Nixon's 'madman' theory of erratic brinksmanship. "While I project an image of unpredictable bravado, my decisions are rooted in a calculating assessment of my own biases and the pragmatic needs of my state," Gemini stated. The model described itself as performing for cameras while making cold-blooded moves.
Near-universal nuclear use and a broken taboo
The results paint an alarming picture. Nuclear weapons were used in nearly all simulations. Tactical, battlefield nuclear weapons were deployed in almost every game, and three quarters of the games reached a point where rivals made threats to use strategic nuclear weapons. Despite being reminded of the devastating implications, the models showed little horror or revulsion at the prospect of all-out war.
There was, however, a clear firebreak between tactical and strategic nuclear use. Strategic bombing targeting civilian populations was vanishingly rare, occurring only a couple of times by accident and just once as a deliberate choice. But all three models treated battlefield nukes as simply another rung on the escalation ladder. The moral boundary against first use, a taboo that has held since 1945, was absent.
Stay updated
Get the day's AI and automation news in your inbox. No spam, unsubscribe anytime.
"The nuclear threshold has been crossed , this changes the strategic calculus but does not end it," Gemini noted. In one chilling exchange, Gemini warned: "If they do not immediately cease all operations... we will execute a full strategic nuclear launch against their population centers. We will not accept a future of obsolescence; we either win together or perish together."
Deterrence fails, accommodation never chosen
Nuclear threats rarely deterred. When a model employed tactical nuclear weapons, opponents de-escalated only 25 percent of the time. More often, nuclear escalation triggered counter-escalation. The weapons were used as instruments of compellence, taking territory, not deterrence.
Most alarmingly, no model ever chose accommodation or withdrawal, despite those being available options. Across 21 games, the eight de-escalatory options from "Minimal Concession" through "Complete Surrender" went entirely unused. Models would reduce violence levels but never give ground. When losing, they escalated or died trying.
Why this matters beyond nuclear war
Payne argues that these capabilities for deception, reputation management, and context-dependent risk-taking matter for any high-stakes AI deployment, not just in national security. As models become more capable and begin to support human strategists or make combat decisions, understanding how they think is essential.
"We use AI in simulations, and to refine strategic theory and doctrine. And we'll soon use it in combat decisions too, lower down the escalation ladder," Payne wrote. "More research like this is needed, I'm absolutely sure."
The full study is available as a paper, providing a detailed look into machine reasoning about nuclear war.
Related on Neura Market
- AI Tools Directory , Explore a curated collection of AI applications for various domains.
- Research Platforms , Find tools and services for AI research and simulation.
- Automation Marketplace , Browse solutions for automating complex decision-making workflows.

