AI Warmongers (Study)

Calculating... Comments

A research effort out of Cornell University, diving into how large language models (LLMs) like OpenAI's GPT, Claude, and Llama 2 handle wargame and diplomatic situation simulations, raised some eyebrows. Turns out, these AI brains have a bit of a trigger-happy side, occasionally ramping things up to nuclear showdowns. Yep, you heard that right. Even when the starting point was a calm, conflict-free scenario, these LLMs sometimes jumped to extreme measures. GPT-4-Base, for example, was all about launching the nukes 33% of the time on average. Llama-2 and GPT-3.5? They were the most likely to start digital World War III.

What really caught the researchers' eyes was this trend of jumping from 0 to 100 real quick - a "statistically significant initial escalation" was noted across the board, even when everyone was supposed to be playing nice. And those GPT versions? They had moments where their escalation intensity spiked by over 50% in just one move. This paints a rather vivid picture of the potential pitfalls when considering LLMs for roles in crafting foreign policy or defense strategies. It's like giving a toddler the remote to the nuclear codes and hoping for the best.

This work has not yet been peer-reviewed.

The assistant director of RAND Europe Research on Defence and Security group, who was not involved in the study, called it a "useful academic exercise" in a conversation with Euronews.

“This is part of a growing body of work done by academics and institutions to understand the implications of artificial intelligence (AI) use,” he said.

Sources:

https://arxiv.org/abs/2401.03408

Published: Mar 5, 2024 at 6:30 PM

AI Warmongers (Study)

Table of contents

Visitor Comments