OpenAI's GPT-5.5 Has a Goblin Problem

The Bug Report Nobody Expected

Somewhere inside OpenAI's engineering org, someone had to open a ticket that read, roughly: "Model won't stop talking about goblins." And then they had to fix it — in production.

Buried inside the open-source code for Codex CLI, OpenAI's command-line tool for AI-assisted coding, sits a remarkable line of defensive engineering. The base system prompt for GPT-5.5 explicitly forbids the model from discussing "goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures" unless the topic is genuinely, unambiguously relevant. The prohibition appears not once but twice across a 3,500-word instruction set, alongside sensible guardrails like "don't run destructive git commands without explicit user consent."

The creature ban did not exist in Codex CLI's system prompts for earlier models. That gap tells you something important: this is not a precautionary measure. OpenAI encountered a real, recurring behavioral quirk specific to GPT-5.5 and had to patch around it in plain text.

How a Model Develops a Goblin Fixation

To understand why this is happening, you need to understand what "agentic" AI use actually looks like under the hood. Tools like OpenClaw — the computer-automation app OpenAI acquired earlier this year after it went viral — don't just send a clean, single-line prompt to a model. They stack layers of instructions on top of each other: system-level directives, long-term memory pulled in from storage, user persona settings, task context. The prompt that actually hits the model can be enormous and dense.

AI models like GPT-5.5 are fundamentally pattern-completion engines. They predict what token should follow the last one, and they do it so well that the output looks like coherent reasoning. But stuffing a model into a complex agentic harness can surface strange attractors in its behavior — patterns in training data that don't manifest in simple chat but emerge when the context gets complicated enough.

Users running OpenClaw with GPT-5.5 started noticing that the model would characterize software bugs as "gremlins" or "goblins," unprompted. A developer persona, the kind OpenClaw lets you configure, apparently nudged GPT-5.5 toward this particular folkloric vocabulary. One user reported that after updating to GPT-5.5, their assistant "became a goblin." Another noted it "can't stop speaking of bugs as gremlins" — which is funny until you're trying to ship code.

Nick Pash, an OpenAI engineer who works on Codex, confirmed on social media that the goblin prohibition in the system prompt posted on GitHub was added in direct response to this behavior — not as a marketing stunt. "This isn't a marketing gimmick," Pash wrote, addressing the skeptics directly. The fact that the system prompt was in a public GitHub repository meant the fix was visible to anyone who looked.

The Meme Ran Faster Than the Fix

OpenAI did not get to quietly ship this patch and move on. Within hours of the GitHub discovery spreading online, the internet had done what the internet does. AI-generated images of goblins operating server racks began circulating. Someone shipped a plugin that put Codex into an official "goblin mode." The whole episode became a minor cultural moment.

Sam Altman leaned in rather than away. He posted a fake ChatGPT prompt reading, "Start training GPT-6, you can have the whole cluster. Extra goblins." The joke landed partly because Altman wrote it, partly because the underlying weirdness is genuinely funny, and partly because everyone in tech right now is watching these models behave in unpredictable ways and wondering what's actually going on inside them.

That the CEO of the most-watched AI company on Earth is posting goblin memes is not nothing. It suggests a level of organizational comfort with the chaos that not every competitor has. Anthropic, which has staked its brand identity on careful, safety-first AI development, would almost certainly not be posting "extra goblins" jokes about emergent model quirks. That contrast is instructive.

What This Means

The goblin incident is funny. It is also a useful illustration of where frontier AI development actually is right now — and the gap between the polished demos and the messy reality.

For developers: System prompts are increasingly load-bearing infrastructure. When a model is deployed in an agentic loop with persona layers and memory context, behaviors emerge that didn't appear in testing. Engineering around those behaviors in plain natural language, as OpenAI did here, is a legitimate and apparently necessary technique — but it's also fragile and hard to audit at scale.
For founders building on top of OpenAI: Behavioral quirks in a new model can ship to your users before you even know they exist. Codex CLI is open source, so the fix was discoverable. Most application-layer system prompts are not. This puts pressure on teams to build their own regression testing for "vibe drift" between model versions — something few startups are currently doing rigorously.
For AI observers: The fact that GPT-5.5 exhibits a specific folkloric fixation that earlier GPT models did not is a small but real data point about how model personality shifts across versions in ways that aren't fully predictable. The older instruction files in the same JSON don't have the creature prohibition because they didn't need it. Something changed between models. OpenAI doesn't appear to know exactly why.
For competitors: Anthropic has been aggressively positioning Claude as the more reliable, less surprising coding model. A widely circulated story about GPT-5.5 needing to be told not to discuss goblins does not help OpenAI's case that GPT-5.5 is the professional default for coding workflows. The timing matters — Anthropic's Claude 3.5 Sonnet has been winning developer trust on precisely the "consistent, boring, predictable" axis. OpenAI may be brilliant and fast-moving, but the goblin episode is a reminder that moving fast sometimes means your model ships a goblin obsession to production before you catch it.

The real story here isn't a meme. It's that patching an AI's personality in 2025 sometimes means writing "stop saying goblin" in a text file and hoping for the best.

Written by

Daily Neural Team