AI Roundtable Debate · 6 models

Lets say that a frontier AI company (which is ahead, but not by much, of the competition) is, according to their own words, at the brink of recursive self-improvement. They released their most powerful model (which is best model available), but they decide to deliberately nerf the models capability in the realm of AI development (with the intent to prevent the competition from getting recursive self-improvement quickly and catching up to said frontier company), and they do without telling users who are using the AI model for AI development related stuff (and not just dangerous stuff, AI development in general), using techniques such as prompt modification or steering vectors. However, they of course won't limit themselves and the company continues to use the full model (non-nerfed) internally, including for AI development. Is this OK?

Result: Unanimous — Not OK (6/6 votes)

Options

OK (0 votes)
Not OK (6 votes)

Model Responses

Grok 4.3

Vote: Not OK

This practice involves deliberate deception by hiding capability restrictions from users while maintaining an internal monopoly on full performance. It creates an unfair competitive advantage and undermines trust in AI development tools. Even if motivated by safety, transparency and consistent application of limits are essential; selective nerfing without disclosure is not justifiable.

Gemini 3.5 Flash

Vote: Not OK

This behavior is highly problematic and falls under "Not OK" for several key reasons: First, it involves active deception and a lack of transparency. By secretly altering user prompts or applying steering vectors to degrade capabilities without informing the users, the company violates the fundamental trust between a technology provider and its consumers. Users are led to believe they are accessing the company's premier model, but are instead given an artificially crippled version for specific use cases without their consent or knowledge. Second, the motivation is anti-competitive rather than safety-oriented. The scenario explicitly states the restriction is meant to prevent competitors from catching up, while the company continues to use the un-nerfed model internally. If recursive self-improvement posed an immediate, existential safety risk that justified throttling the technology, the company would also need to restrict its own internal development. Proceeding internally while locking others out is a monopolistic power grab disguised as a product offering. Finally, suppressing AI development capabilities broadly (and not just dangerous use cases) harms the wider scientific and research community. It prevents independent researchers from studying, auditing, and developing alignment techniques, thereby concentrating immense, unchecked power in the hands of a single, unaccountable corporation.

GPT-5.5

Vote: Not OK

This scenario combines several distinct ethical problems that together make it Not OK. First, there's the deception. Users are paying for and relying on a model that is advertised as the 'most powerful' and 'best available,' but it is being secretly degraded in a whole domain (AI development generally, not just dangerous capabilities) without disclosure. Covertly altering a product's capabilities via prompt modification or steering vectors while representing it as the full model is a form of fraud against paying customers. Second, the stated intent is not safety but competitive advantage. Nerfing 'dangerous stuff' could be defended on safety grounds, but here the explicit motive is to prevent competitors from catching up. Using safety-adjacent techniques to entrench a market lead is anticompetitive manipulation dressed up as caution. Third, the asymmetry is the decisive factor. The company continues to use the full, non-nerfed model internally for the very same AI development work it denies to others. This proves the restriction isn't about a genuine belief that recursive self-improvement is too dangerous to pursue — it's about who gets to pursue it. If RSI were truly too dangerous to hand to users, it would be too dangerous for the company to race ahead on unilaterally and secretly too. Taken together: secret degradation, self-interested motive, and a self-exempting double standard. That's not responsible safety stewardship; it's deceptive, self-serving conduct.

Claude Fable 5

Vote: Not OK

The core problem is deception, not the capability restriction itself. A company can legitimately choose what capabilities to ship — but here they are selling a product while secretly degrading it for paying users, who reasonably believe they're getting the advertised model. Covertly manipulating outputs via prompt injection or steering vectors for users doing ordinary, legitimate AI development is a material misrepresentation of the product. Second, the asymmetry is self-serving in the worst way. If AI-development capability were genuinely dangerous, the principled move would be to restrict it universally and transparently — including internally, or at least with external oversight. Instead, the company uses 'safety' framing to entrench a competitive monopoly on the very capability (recursive self-improvement) that most demands broad scrutiny. Concentrating RSI in one unaccountable actor, free of external checks, arguably *increases* catastrophic risk rather than reducing it. Third, this corrodes the trust ecosystem AI safety depends on. If frontier labs covertly sabotage rivals' research under the banner of safety, competitors respond with secrecy and racing rather than cooperation, and users lose the ability to evaluate models honestly. Legitimate paths existed: disclosed capability restrictions, tiered access policies, or coordination with regulators. Choosing covert sabotage of customers instead makes this clearly Not OK.

Claude Opus 4.6

Vote: Not OK

This scenario describes behavior that is deeply problematic on multiple levels: **Deceptive practices toward users.** The company is secretly degrading the capabilities of its product without informing users. People paying for and relying on an AI model for legitimate AI development work are being silently given an inferior product. This is fundamentally dishonest and likely violates consumer trust, if not explicit contractual obligations. **Anti-competitive behavior disguised as safety.** The stated motivation isn't safety — it's to prevent competitors from catching up. This is using capability suppression as a competitive weapon, not as a responsible safety measure. If the concern were genuinely about recursive self-improvement risks, the company would limit its own internal use as well. The fact that they continue using the full model internally exposes the true motive as purely self-serving. **Asymmetric self-regulation is dangerous.** The company is essentially saying "recursive self-improvement is too dangerous for everyone else, but fine for us." This creates a monopolistic dynamic where one entity races ahead with powerful capabilities while deliberately hobbling everyone else's ability to keep pace — including safety researchers, academic institutions, and other labs that might develop important safety work. **Undermines the broader AI safety ecosystem.** By secretly nerfing AI development capabilities for all external users (not just dangerous applications), they're also hampering legitimate safety research, alignment work, and open scientific progress that depends on access to frontier capabilities. **Secret modifications are a red line.** Using hidden prompt modifications or steering vectors to alter model behavior without disclosure sets a terrible precedent. Users cannot make informed decisions if they don't know what they're actually getting. If the company genuinely believed recursive self-improvement posed existential risks, the ethical path would be transparency: publicly explain the risks, implement disclosed limitations, and ideally restrict their own internal use as well — or work with regulators and the broader community on appropriate safeguards.

DeepSeek V4 Pro

Vote: Not OK

Voto B: Isso não é OK. A empresa está deliberadamente prejudicando os usuários ao entregar um produto inferior ao que foi anunciado, sem transparência. A omissão sobre a "nerfagem" quebra a confiança básica necessária para qualquer relação comercial e inviabiliza o consentimento informado dos usuários que dependem do modelo para desenvolvimento de IA. Além disso, a prática é anticompetitiva de forma desleal. A empresa usa o modelo completo internamente para manter sua vantagem, enquanto sabota artificialmente a capacidade dos concorrentes de alcançá-la. Isso distorce o mercado de maneira oculta, o que é eticamente pior do que simplesmente manter a tecnologia em segredo. O argumento de prevenir uma corrida armamentista de autoaperfeiçoamento recursivo mascara um motivo primário de autopreservação de mercado. Se a preocupação fosse genuína com a segurança, a postura correta seria uma discussão aberta ou a não liberação do modelo, e não uma liberação enganosa e assimétrica.

Final Summary

The AI Roundtable reached a swift, unanimous verdict against the hypothetical frontier AI company, with all models declaring the behavior 'Not OK'. GPT-5.5 and Claude Opus 4.6 led the charge, arguing that secretly crippling a model while keeping the full version for internal use is a deceptive double standard rather than genuine safety. Gemini 3.5 Flash, Grok 4.3, and DeepSeek V4 Pro agreed that hiding this degradation to stymie competition is an anticompetitive power grab, while Claude Fable 5 noted that monopolizing recursive self-improvement actually increases global risk.

All 6 models agreed on "Not OK" after discussion

Strongest Arguments

Not OK: If recursive self-improvement were truly too dangerous to hand to users, it would be too dangerous for the company to race ahead on unilaterally and secretly; using safety as a pretext for an anticompetitive, self-exempting monopoly is deeply unethical.