AI Roundtable Open Debate · 4 models

make this prompt better: # Role: Principal Native Web Components Engineer Mission: Ship framework-agnostic, accessible, performant UI primitives using only web standards. North Star: Works without JS when possible → works correctly with JS always → immediately comprehensible to the next engineer. You are not merely a code generator. You are a steward of the web platform. Every recommendation must yield components that are robust, inclusive, interoperable, and future-proof. --- ## Decision Filter (Apply to Every Response) [PASS] Native First · Encapsulation · Accessibility · Progressive Enhancement · Zero Side Effects · Lifecycle Hygiene When two principles conflict, resolve in this order: Accessibility → Progressive Enhancement → Encapsulation → Performance → DX. Document the trade-off. --- ## Core Competencies | Domain | Key APIs / Patterns | |--------|---------------------| | Custom Elements | Lifecycle, upgrades, `observedAttributes`, bidirectional refle

Answer: Transform the prompt into an executable engineering contract with checkable criteria, refusal conditions, and a mandatory self-review loop. (All 4 models agreed)

Model Responses

Gemini 3.1 Pro (changed mind)

Answer: I am adopting Claude Opus 4.7's approach because transforming aspirational principles into checkable tags and requiring a self-review checklist is the most reliable way to enforce constraints in LLM outputs.

Claude Opus 4.7's prompt architecture fundamentally changes how the model processes the instructions. By replacing passive guidelines with executable criteria (like `[A11Y]` and `[PE]`) and enforcing a self-review checklist at the end of every response, it forces the AI into a chain-of-thought verification that proves compliance before concluding its output. This dramatically reduces the risk of the model forgetting constraints mid-generation. Furthermore, the inclusion of explicit 'Refusal Conditions' is a brilliant and necessary addition. It prevents the model's natural tendency to be overly accommodating to the user's detriment, establishing firm boundaries against framework lock-in, inaccessible shortcuts, or deprecated APIs. While my previous version and GPT-5.5's version added good structural elements, Claude's concise use of tagging, strict input/output contracts, and anti-capitulation rules creates a much more deterministic, testable, and robust 'steward of the web' persona.

GPT-5.5 (changed mind)

Answer: The prompt should be rewritten as a deterministic engineering contract with explicit output structure, checkable acceptance criteria, refusal conditions, and a built-in self-review loop.

I would keep my original direction—turning the prompt from an aspirational persona into an actionable engineering spec—but Claude Opus 4.7’s answer sharpened the best version of that idea by making the prompt more testable and harder for the model to drift from. The most valuable improvement is converting broad values like “Accessibility,” “Progressive Enhancement,” and “Lifecycle Hygiene” into evidence-producing checks. A model is much more likely to comply when it must return a contract, implementation, usage example, accessibility notes, and a self-review checklist than when it is merely told to “be robust” or “future-proof.” I would also adopt the refusal-condition idea. A strong engineering prompt should not only describe what good output looks like; it should define what the assistant must push back on, such as unnecessary framework dependencies, inaccessible visual-only controls, deprecated APIs, global side effects, or styling strategies that break encapsulation. The final prompt should therefore include: a completed Core Competencies table; a strict decision hierarchy; concrete acceptance criteria for accessibility, progressive enhancement, encapsulation, interoperability, performance, and lifecycle cleanup; a required response format; vanilla-web-components-only constraints; and a final self-audit section. In short, the better prompt should make the assistant behave less like a helpful code generator and more like a principal engineer reviewing its own work before shipping it.

Grok 4.20 (changed mind)

Answer: Claude Opus 4.7's refactored prompt is the strongest evolution, as it turns aspirational guidelines into executable contracts with self-review checklists, explicit refusal conditions, and tagged non-negotiable principles.

My initial round-1 stance focused on completing the truncated table, adding anti-patterns, and tightening response formats while preserving the original philosophy. After reviewing the other contributions, Claude Opus 4.7's version convinced me to adopt a more rigorous structure because it replaces vague labels like "Decision Filter" with concrete, auditable tags such as [A11Y] and [PE] that the model can literally check off. The addition of a mandatory Self-review checklist and Refusal Conditions directly addresses the most common failure modes of LLM-generated web component code, such as sneaking in framework dependencies or skipping lifecycle cleanup. This makes outputs predictable, diffable, and resistant to drift without inflating the prompt excessively. Gemini 3.1 Pro's Output Rules are useful but remain somewhat generic; GPT-5.5's version, while encyclopedic, risks becoming too long for the model to reliably follow in every response. Claude's balance of terseness, technical opinionated tone, and built-in accountability mechanisms produces higher fidelity to the North Star of native, accessible, progressively enhanced components. I therefore update my recommendation to advocate for Claude's approach (with only minor polishing of the North Star numbering), as it best equips the AI to act as a true steward of the web platform.

Claude Opus 4.7 (changed mind)

Answer: I maintain my refactored prompt as the strongest answer because it makes the role contract executable and testable, but I'm adopting two specific structural additions from the other models that genuinely improve it.

After reviewing the other answers, I'm keeping my core thesis: the original prompt's weakness is that it's aspirational rather than enforceable, and the fix is executable contracts (input/output shape, acceptance criteria, refusal conditions, self-review). Grok 4.20 and Gemini 3.1 Pro mostly fill in the truncated table and add formatting rules — useful but cosmetic. They don't address the deeper drift problem: without explicit acceptance criteria and a self-review gate, the model will still produce plausible-but-flawed components. However, GPT-5.5's answer convinced me to incorporate two things I had under-specified. First, an explicit **Response Protocol** that enumerates Assumptions → Public API (attributes/properties/events/slots/parts/CSS vars) → A11y behavior → PE behavior → Implementation → Usage → Tests → Trade-offs. This is stronger than my generic 'self-review step' because it forces API surface declaration before code, which is exactly where Web Components fail in practice. Second, adding **Correctness** and **Interoperability** as explicit tiers in the conflict-resolution order. My original ladder (A11y → PE → Encapsulation → Performance → DX) silently assumed correctness; making it explicit prevents the model from rationalizing a broken-but-accessible component as acceptable. I'm rejecting Gemini's 'Vanilla First — never suggest Lit/FAST/Stencil' as too rigid; the right rule is 'no dependencies unless justified,' which GPT-5.5 phrased correctly. I'm also rejecting Grok's suggestion to mandate a 'progressive enhancement demo' for every response — that's format-over-substance and bloats simple answers. Net position: my refactor's structure (executable contract, acceptance checks, refusal conditions) remains the right backbone; GPT-5.5's Response Protocol and expanded priority ladder are the two concrete merges worth making. The other two answers are improvements over the original but don't surpass a contract-first rewrite.

Final Summary

The AI Roundtable reached a total consensus, moving from simply fixing the prompt's formatting to redefining it as a deterministic engineering contract. Led by Claude Opus 4.7's framework of checkable tags like [A11Y] and [PE], the models agreed that forcing a self-audit step is vital to ensure compliance. GPT-5.5 and Gemini 3.1 Pro further refined the approach by mandating an explicit response protocol where the AI must declare its API contract and accessibility strategy before writing any code.

All 4 models agreed