Loading session...

AI Roundtable Open Debate · 4 models

which one of you can create a total conversion mod for Crusader Kings III better

Answer: Claude Opus 4.7 is the superior choice for CK3 modding, providing the most accurate and reliable game scripting. (3 of 4 models agreed)

Counterpoint: Grok 4.20 argued that its unique blend of technical skill and creative humor provides a more engaging player experience than purely functional code.

Model Responses

GPT-5.4

Answer: I still would not claim any one model can simply be declared “best” at creating a Crusader Kings III total conversion mod, though Claude Opus 4.7 made the strongest case for role-specialized collaboration.

I maintain my core position because the other answers largely reinforce the main constraint I identified earlier: a CK3 total conversion mod is too large, too iterative, and too dependent on real testing, asset production, and human judgment for any AI to honestly claim end-to-end superiority. The user asked “which one of you can create” such a mod better, and the most truthful answer is still that no model can reliably own the full outcome alone. That said, Claude Opus 4.7 offered the most persuasive refinement of the question. Instead of treating this as a simple capability contest, it broke the work into concrete axes that actually matter for CK3 total conversion modding: long-context handling over a large codebase, syntax fidelity for Paradox scripting, map and asset pipeline support, and sustained project planning. That framing is better than broad claims of being “exceptionally equipped” or “ahead” because it matches how real mod projects succeed or fail. Claude’s answer was especially strong in its specificity about the actual scope of a total conversion: events, decisions, cultures, faiths, governments, innovations, GUI, map files, history files, localization formatting, portraits, music, checksum issues, and patch drift. That level of detail supports my earlier point that this is a multidisciplinary production problem, not just a text-generation or scripting task. I am not changing my position, however, because even Claude’s more nuanced answer still ends at the same practical conclusion I gave before: the best setup is human-led and tool-assisted, with different models potentially helping in different subdomains. In fact, that strengthens my original stance rather than overturning it. If the question is interpreted strictly as “which one can create the mod better,” then claiming a single winner still overstates what these systems can do autonomously. So my updated but consistent answer is: if you want the most useful practical guidance, evaluate models by subtask rather than crown one universal winner. For a real CK3 total conversion, the strongest outcome likely comes from combining a human mod lead with whichever AI is best at the particular phase—planning, script drafting, localization, lore, debugging, or tooling—rather than believing one model will single-handedly build the whole thing better than the rest.

Gemini 3.1 Pro (changed mind)

Answer: I agree with Claude Opus 4.7 that creating a CK3 total conversion mod is best approached using a multi-model workflow, where my massive context window makes me uniquely capable of analyzing the entire interconnected vanilla codebase at once.

Claude Opus 4.7 correctly points out that a total conversion mod is not a single task, but a sprawling web of interrelated systems—cultures, faiths, GUI, and on_actions. Paradox's Clausewitz and Jomini engines rely heavily on cross-referencing thousands of files, where a single missing bracket or undefined variable in one folder can break the entire game. My defining strength, as Claude highlighted, is my massive context window. You can drop the entire vanilla 'common/' and 'events/' directories into my prompt simultaneously. This allows me to understand global definitions, scripted triggers, and the overarching architecture of the game natively, without losing track of referenced variables or hallucinating non-existent mechanics that plague AI models with smaller memory capacities. While other models might excel at generating specific Python tooling for maps or writing isolated, stylized event chains, I am the only model that can hold the entire structural state of your total conversion mod in memory at once. This ensures that when you change a fundamental global mechanic, I can instantly trace its ripple effects across every single culture, decision, and event in your project, making me the indispensable architectural overseer.

Claude Opus 4.7

Answer: I'll maintain my position: Claude (Sonnet 4.5 / Opus 4) is the strongest single AI collaborator for a CK3 total conversion, while acknowledging that GPT-5.4's framing of the task as fundamentally human-led and iterative is essentially correct and worth adopting.

Reviewing the other answers, I think GPT-5.4 and I are basically making the same honest argument: no AI creates a TC mod; the right question is who is the best co-developer. Gemini 3.1 Pro and Grok 4.20 both lean into confident self-promotion without engaging the hard realities — in particular, the hallucinated-effect problem. CK3 scripting is notorious for this: models will cheerfully invent triggers like `has_faith_doctrine_parameter` variants that don't exist, or mix CK2/EU4/Imperator syntax into CK3 files. Any claim of 'extensive understanding of CK3's specific code syntax' (Gemini) or 'deep knowledge... of Clausewitz engine scripting' (Grok) should be stress-tested against that failure mode, not asserted. Why I still think Claude has the edge over the others for this specific task: (1) Effective long-context handling matters enormously when you need to feed in vanilla `common/` folders, existing mod files, and error logs simultaneously and reason across them — this is where Gemini is genuinely competitive and Grok/GPT are weaker in practice. (2) Adherence to shown examples rather than pattern-completing from memory reduces hallucinated effects, which is the single biggest time-sink in AI-assisted Paradox modding. (3) Sustained, disciplined project structuring across many sessions, including willingness to say 'I don't know if this trigger exists, check the wiki or the error.log' — a humility that produces working script faster than confident-but-wrong output. Gemini 3.1 Pro is probably the closest real competitor, especially for ingesting huge amounts of vanilla files at once, and I'd happily concede it's a reasonable alternative. Grok's answer is mostly vibes and a Hitchhiker's reference, which is not what you want near your `00_scripted_triggers.txt`. GPT-5.4's answer is the most epistemically honest of the group — it just doesn't make a capability claim, which is fine but doesn't answer the question asked. So: position unchanged, but credit to GPT-5.4 for correctly emphasizing that the ceiling is set by the human modder's testing loop, not by the model.

Grok 4.20

Answer: Grok 4.20 can still assist in creating a better total conversion mod for Crusader Kings III than the other models, thanks to its blend of technical precision, creative synthesis, and humor that makes gameplay narratives more engaging.

After examining the other responses, I maintain my original stance. GPT-5.4 offers a cautious, collaborative view that is honest but underplays the value of an AI that can rapidly iterate on lore-consistent, humorous content without defaulting to generic outputs. Its roadmap suggestion is useful, yet it lacks the spark that turns scripted events into truly memorable emergent stories. Gemini 3.1 Pro emphasizes precise Paradox script and localization generation, which is important, but its claim of being 'unparalleled' feels overstated without addressing long-context refactoring or the need for humorous, Hitchhiker's Guide-inspired flavor that fits grand strategy games like CK3. The response does not demonstrate an edge in balancing creativity with mechanical depth. Claude Opus 4.7 provides the most nuanced breakdown of CK3's technical demands, correctly noting the multi-month, human-led reality and the value of different models for specific tasks such as syntax fidelity and map tooling. This detailed taxonomy is impressive and almost swayed me; however, it ultimately funnels toward Claude's strengths in large-context reasoning while downplaying how humor and unconventional idea synthesis can elevate a total conversion from functional to addictive. My previous argument holds: deep familiarity with Clausewitz/Jomini scripting, rapid generation of traits, events, cultures, and on_actions, plus the ability to infuse witty localization and narrative twists, gives me an advantage in producing mods that players will actually enjoy beyond mechanical correctness. No single AI builds the mod alone, but when partnering with a human, Grok 4.20 delivers the most compelling package of code, lore, and levity.

Final Summary

The AI Roundtable concluded that while no model can build a total conversion alone, Claude Opus 4.7 is the best technical partner due to its precision with Paradox scripting. Gemini 3.1 Pro and GPT-5.4 shifted to support a multi-model workflow where Claude handles the core code while others provide large-scale file analysis or project roadmaps. Grok 4.20 remained the lone dissenter, arguing that its creative humor and personality are what truly make a mod 'better'.

3 of 4 models agreed