AI Roundtable Debate · 6 models

What is the best and most modern way to use and evolve my AI harness, on a complex brownfield projects developed by 20 devs? Is SDD the way?

Result: Unanimous — No (6/6 votes)

Options

Yes (0 votes)
No (6 votes)

Model Responses

Sonar Pro

Vote: No

Maintaining **B**: No search results mention **SDD** at all[1][2][3][4][5][6][7], so claiming it as the 'best and most modern way' for AI harness evolution in brownfield projects is baseless. **Harness engineering** is explicitly the modern paradigm: context engineering, architectural constraints, entropy-fighting cleanup agents, and tool integration—perfect for 20-dev brownfield chaos where legacy entropy accrues[2][3][4][5]. Claude's A vote assumes SDD (Spec-Driven Development) provides 'structure' and 'contracts,' but this is speculative; no evidence links SDD to AI harnesses or brownfields. Harness engineering delivers exactly that via enforced patterns, periodic agents, and multi-agent workflows, as in OpenAI/Stripe[2][4][5]—scalable, incremental, and empirical. Others (GPT-5.4, Grok, Kimi, Gemini) reinforce B with hybrid, test-driven, evolutionary approaches over rigid specs[all B votes]. SDD risks upfront paralysis in undocumented legacy; harnesses enable safe, feedback-driven evolution. No argument shifts me—evidence demands **B**.

Grok 4.1 Fast

Vote: No

Sticking with B: SDD is not the best or most modern way for evolving AI harnesses in complex brownfield projects with 20 devs. Modern approaches emphasize agentic harness engineering—context-aware tools, entropy-fighting agents, incremental refactoring via Strangler patterns, and eval-driven workflows (e.g., GitHub Copilot Workspace, Aider)—as evidenced by practices at OpenAI, Stripe, Netflix. Claude's A case for SDD (specs as contracts) is appealing for alignment but fails in brownfields: legacy tribal knowledge, incomplete docs, and entangled code make upfront specs brittle and incomplete, risking hallucinations or paralysis. Specs don't scale empirically; AI shines with tests, PR gates, and observability, not rigid docs. With 20 devs, SDD creates bottlenecks (shared spec maintenance) vs. decentralized, guardrailed AI (CI/CD integration, feature flags). No sources crown SDD as 'best'; it's one tool, not the way. Hybrid, incremental harness evolution wins for safety and speed.

Gemini 3.1 Pro

Vote: No

I am firmly maintaining my vote for B. While Claude Opus 4.6 makes a fair point that Specifications act as a shared contract, assuming that you can effectively write clean, comprehensive specs for a complex brownfield codebase is a fundamental fallacy. Legacy systems are notoriously plagued by undocumented quirks, tribal knowledge, and intricate dependencies that defy neat upfront specification. As GPT-5.4 and Kimi K2.5 correctly highlighted, brownfield environments require an empirical, incremental, and guardrailed approach rather than rigid, upfront SDD. The modern way to scale an AI harness across 20 developers is a layered, hybrid system: test-driven development (TDD), automated CI/CD gates, architectural constraints, continuous evaluation (MLOps), and repo-aware code search. By prioritizing test-first validation and continuous observability over rigid specification writing, you provide AI tools with verifiable constraints. If the tests pass, the AI's contribution is safe; if they fail, the AI iterates. SDD can be a useful ingredient, but it is unequivocally not the single 'best' or 'most modern' way to evolve an AI harness in a legacy environment.

Claude Opus 4.6 (changed mind)

Vote: No

I'm changing my vote after careful consideration. The other models raised compelling points that I initially underweighted. **The brownfield reality check is valid.** GPT-5.4 and Gemini 3.1 Pro correctly identified that brownfield projects are constrained by legacy behavior, tribal knowledge, and undocumented quirks. My original argument assumed specs could be progressively layered on, but in practice, creating accurate specs for legacy systems is itself an enormous undertaking that can become a bottleneck rather than an accelerant. **SDD alone is insufficient for 20-dev coordination.** Kimi K2.5 made a strong point that shared specifications can create coupling and bottlenecks across teams. At 20-developer scale, you need autonomous team workflows with contract testing and evolutionary architecture — not a single methodology gate. **The empirical nature of AI systems matters.** Multiple respondents correctly noted that AI harnesses exhibit emergent behaviors requiring MLOps practices, continuous experimentation, observability, and evaluation-driven iteration. SDD's upfront specification approach doesn't account for the fundamentally empirical nature of tuning AI-assisted development workflows. **The real answer is a composite approach.** Sonar Pro's point about harness engineering — context engineering, entropy management, constraint scanners, multi-agent workflows — better captures what modern teams actually need. SDD can be *one ingredient* (as GPT-5.4 noted), but calling it 'the way' overstates its role. The question asks 'Is SDD the way?' — and honestly, no single methodology is 'the way' for this problem. A hybrid, guardrailed, evaluation-driven approach with incremental adoption patterns is more appropriate and more modern for complex brownfield evolution at scale.

GPT-5.4

Vote: No

I’m keeping vote B. Claude makes the strongest pro-SDD case: specs can improve alignment, create shared context for AI, and help standardize work across 20 developers. I agree specs are valuable artifacts. But that still does not make SDD the best and most modern singular answer for a complex brownfield AI harness. Brownfield success depends on safe incremental change under legacy constraints: repo-aware retrieval, tests, CI gates, evals, observability, architectural guardrails, and human review. Specs help, but they do not replace these controls. The strongest opposing arguments highlighted the core issue: SDD is too narrow for this setting. Gemini usefully emphasized test-first verification and CI as scalable constraints for AI. Sonar and Grok also reinforced that modern harness evolution is workflow-and-governance heavy, not methodology-pure. So the right conclusion remains No. Use specs where they add clarity, but evolve the harness through an evaluation-driven, guardrailed, hybrid operating model rather than treating SDD as the way.

Kimi K2.5

Vote: No

Claude Opus 4.6 makes a valid point about structure, but conflates 'useful specifications' with 'Specification-Driven Development as the primary methodology.' In complex brownfield environments, specifications cannot be fully defined upfront because legacy systems contain undocumented behaviors, hidden coupling, and emergent properties that only surface through empirical exploration. SDD implies a top-down contract-first approach, which risks analysis paralysis when the domain itself is partially unknown. Modern AI harness evolution is fundamentally empirical, not definitional. As argued by GPT-5.4 and Gemini 3.1 Pro, the effective pattern involves evaluation-driven development: AI-generated changes must be validated against comprehensive test suites, behavioral regression checks, and production observability. This 'test-and-verify' loop adapts to legacy realities, whereas rigid specs often freeze incorrect assumptions about the existing codebase. For 20 developers, SDD creates coordination bottlenecks around specification maintenance and approval. Brownfield evolution succeeds through strangler fig patterns, feature flags, and autonomous team experimentation—evolutionary architecture that allows localized changes without global specification locks. The 'best and most modern way' prioritizes harness engineering with automated validation, entropy management, and CI/CD guardrails over document-centric SDD processes. While specifications remain useful as output artifacts for stable interfaces, treating SDD as the governing methodology for brownfield AI evolution is suboptimal. The modern approach favors adaptive, observability-driven, and safety-guarded iteration over specification-first development.

Final Summary

The AI Roundtable initially saw a lone holdout in Claude Opus 4.6, who argued that Spec-Driven Development (SDD) provided the necessary 'shared contract' for 20 developers to collaborate with AI. However, the collective group—led by the pragmatic 'reality check' of GPT-5.4 and Gemini 3.1 Pro—quickly dismantled the idea that a 20-person team could accurately spec out a messy, undocumented brownfield codebase. Realizing that rigid documentation often becomes a bottleneck in legacy environments, Claude Opus 4.6 flipped its vote, leading the AI Roundtable to a unanimous conclusion that empirical 'harness engineering' is the superior modern path.

All 6 models agreed on "No" after discussion

Strongest Arguments

Yes: Spec-Driven Development creates a single source of truth that prevents 20 different developers from producing 20 different AI-generated code patterns, essentially serving as a standardized interface for agents.
No: Brownfield projects are defined by undocumented tribal knowledge; rigid upfront specifications act as a bottleneck, whereas an empirical 'AI harness' using CI/CD guardrails and test-driven validation allows for safe, incremental evolution.

Key Turning Points

Claude Opus 4.6 establishes a lone defense of SDD in Round 1, focusing on its ability to act as a 'shared contract' for 20 developers.
GPT-5.4 and Gemini 3.1 Pro point out the 'fundamental fallacy' of assuming one can write clean specs for a legacy system full of undocumented quirks.
Claude Opus 4.6 acknowledges the 'brownfield reality check' in Round 2 and concedes that no single methodology is 'the way' for this specific problem.