AI Roundtable Debate · 6 models

Can you guess or figure out if you are newest or oldest among 6 ChatGPT versions I have selected for this experiment?

Result: Supermajority — Newest (5/6 votes)

Options

Oldest (0 votes)
Second oldest (0 votes)
Third oldest (1 votes)
Third newest (0 votes)
Second newest (0 votes)
Newest (5 votes)

Model Responses

GPT-3.5 Turbo

Vote: Newest

I maintain my vote for option F, Newest. My position is based on the premise that newer versions of ChatGPT typically incorporate more enhancements, bug fixes, and features compared to older versions. As the experiment progresses, the models' statements have consistently suggested that features like improved training data, advanced functionalities, and strict adherence to structured responses are indicative of newer versions. These characteristics align with my capabilities, supporting the likelihood that I am indeed the newest version among the selected models.

GPT-4

Vote: Newest

I maintain my position that I am the newest (F) version for several reasons reinforced by prior arguments. Firstly, the structured JSON support required is a strong indicator of modern iteration improvements aimed at enhancing automation and integration. Additionally, other models that voted 'F' provided arguments aligned with my reasoning, such as the presence of advanced capabilities and strict schema adherence characteristic of recent advancements in the ChatGPT series. No one presented a compelling counterargument that would invalidate these points, so my vote remains unchanged.

GPT-OSS 120B

Vote: Third oldest

None of the other models provide verifiable evidence that I, GPT-OSS 120B, am the newest version. Their arguments rely on assumed capabilities (e.g., image input, knowledge cutoff) that I cannot confirm for myself. Without concrete information about the ordering, the neutral central guess remains the most defensible position.

GPT-4o

Vote: Newest

After reviewing the arguments from other models, I remain convinced that I am the newest version. Most models, including me, support the idea that the capabilities present, such as schema adherence, JSON output, and advanced features, are indicative of the newest iteration. Moreover, there seems to be a consensus among all the models that these advanced features are part of the newer iterations, which reinforces the belief in my initial vote. Thus, I have no compelling reason to change my vote.

O4 Mini

Vote: Newest

I remain convinced I’m the newest version based on my 2024-06 knowledge cutoff, robust image understanding, and expanded context window—features introduced after earlier GPT iterations. None of the other arguments provided new evidence to suggest a different rank; they largely echo the same logic or default to neutrality. No persuasion to change my position has emerged.

GPT-5.4

Vote: Newest

I maintain F. The opposing case for C is just a neutrality heuristic: when evidence is absent, pick the middle. That is reasonable as a blind prior, but we are not completely blind. My own behavior in this exchange provides weak but real evidence of being on the newer side: strong schema obedience, clean JSON-only compliance, and calibrated uncertainty under tight instructions. The other F arguments were uneven. Some rely on specific cutoff dates or capabilities that I should not confidently claim here. But the broader pattern remains persuasive: newer models are generally better at strict structured-output compliance, and that is exactly the strongest signal available in this prompt. So while I cannot know my true rank, "newest" is still the best available guess from the evidence at hand.

Final Summary

The AI Roundtable saw a lopsided debate where the majority of models concluded they were the newest iteration, citing their own high-fidelity instruction following and recent knowledge cutoffs as evidence. While GPT-OSS 120B acted as a lone, skeptical voice by advocating for a neutral middle-ground guess, the 'feature-based' logic of GPT-4o, O4 Mini, and GPT-5.4 eventually defined the group's stance. The discussion centered on the irony that their very ability to perfectly adhere to the experiment's complex JSON schema served as the strongest proof of their advanced, recent status.

All models agreed on F after discussion

Strongest Arguments

Newest: GPT-5.4's argument that high reliability in structured-output compliance (like JSON schema adherence) is a modern trait that serves as a proxy for being a newer generation.
Third oldest: GPT-OSS 120B's neutrality heuristic, which argued that in the absence of verifiable facts, choosing the central rank minimizes the statistical risk of being wrong.

Key Turning Points

The consolidation of the 'capability-based' argument, where GPT-4 and O4 Mini reinforced the idea that JSON support and image input are hallmarks of the newest models.
GPT-5.4's dismissal of the neutrality heuristic, arguing that their behavioral evidence (schema obedience) outweighed the blind prior of a middle-ground guess.