By Kaiser Fung in openai — 19 Aug 2025

They won't tell you why they did it

Why do they do the crime?

My friend Alberto is cited in this Washington Post article about AI companies committing "chart crimes" (link; paywalled).

Let's run through these examples from OpenAI's presentation, from when they launched their GPT5 foundational model.

Why is 50.0 lower than 47.4? The answer is simple!

It's because lower is better since the metric is "deception rate". (The pink columns represent the latest version of GPT while the white columns represent a prior generation.)

Our story is: the new GPT5 is much better than our older models, and that's what our chart shows. Is there anything wrong with that?

***

Seriously though, I don't buy the idea that this is a screwup by AI. I don't buy that this is vibe graphing.

To buy that official line, you'd have to accept that no staff member reviewed the slides before this huge announcement, that the CEO of the most famous AI company did not walk through the slides even once before going on camera, that there were no rehearsals for this event, that those people who are responsible for metrics did not double check what they put out to the public, and if anyone even flipped through these slides once, they failed to notice the howler(s).

That last point. What does it tell you when a company with a boatload of PhDs on staff cannot detect this howler when within seconds of it being shown to the rest of the world, people noticed and mocked it on social media?

(As far as I can tell, that event was a livestream that presented a scripted demo possibly delivered live but without a live audience.)

Is it hubris? Is it deliberate? Is it deceptive? I don't know but it's hard to believe it's innocent. It's also distracting as the conversation is focused on the design of the chart, rather than its contents.

***

The more notorious example from the same event is this one:

Don't worry, the pink parts are definitely higher than the white columns.

The corrected version is found on OpenAI's blog post here.

***

Why did they put those howlers out there?

My best guess? It's an extreme version of "tasting your own medicine". Extreme, in the sense that developers are forbidden from editing the vibe code that came out of GPT.

They won't tell you why they did it

Simple is not always easy

Reflection on two design quirks

Simple is not always easy

Reflection on two design quirks

You might also like...