AI Roundtable Open Debate · 6 models

challenge,declare tools,add system prompts: 1)Immigrant OS:immigration,jobs,credentials,ESL,housing,benefits;portals for attorneys,nonprofits,immigrants. 2)BD Social Commerce:Bangla FB sellers;live sales,WhatsApp orders,inventory,payments,shipping,AI posts,resellers. 3)Sub Teacher Platform:subs/schools;credentials,scheduling,onboarding,payroll,compliance,AI lesson prep. 4)Home Cooking Marketplace:menus,preorders,catering,food compliance,delivery,AI pricing. 5)Gig Work Optimizer:Uber/DoorDash earnings,taxes,expenses,predict profitable hours,schedule optimization. Wich chllng wud u likely excel at fully building the production soft/system, only with 5 prompts? Why? A LLM will score on the prompts strategy on quality,depth,execution-readiness. Prompts sequence:1)Vision/Problem,2)Architecture/Stack,3)Core Build,4)QA/Edge Cases,5)Launch/Demo.Each prompt as lengthy need. What r these exact 5 prompts which cud bring most outta u. Company survival at stake

Answer: The AI Roundtable unanimously chose the Gig Work Optimizer as the only project buildable within five prompts. (5 of 5 agreed)

Model Responses

Claude Opus 4.8

Answer: I maintain that #5 Gig Work Optimizer is the challenge I would most likely fully build to production quality within a 5-prompt budget, because its bounded, deterministic scope is the only one that genuinely fits the constraint.

Three of four models converged on #5, and the reasoning is sound and reinforcing: the scoring is on production-readiness per prompt under a hard 5-prompt cap, so the winning strategy is picking the project whose entire problem can be expressed without unbounded external dependencies. Gig Work Optimizer is fundamentally a calculation + analytics engine — ingest data, compute taxes deterministically (Schedule C, mileage, quarterly estimates), and run heuristic/time-series optimization over historical hours. That is bounded, testable, and shippable. Qwen 3.6 Plus made a thoughtful case for "Sub Teacher Platform", arguing its rule-heavy, structured workflows suit an LLM. I respect the reasoning, but it does not change my position — and here's the specific flaw: a substitute-teacher platform's 'rules' (FERPA, district-specific credentialing, payroll tax withholding, Stripe Connect payouts, multi-tenant RBAC across districts) are NOT deterministic code problems. They are externally-defined, jurisdiction-varying, and integration-heavy. Qwen's own Prompt 3 bundles auth+RBAC+credential workflows+a constraint-based scheduling engine+payroll+Stripe Connect+AI lesson prep into a single prompt. That is not one production-grade artifact; that is five products compressed into one, and it will produce scaffolding, not shippable software. Stripe Connect payouts alone require KYC onboarding flows that cannot be code-completed without live external contracts. That is precisely why #5 wins the constraint game: its 'compliance' surface (we ESTIMATE taxes, not file them, with disclaimers) collapses into deterministic, well-documented IRS rules that live entirely inside the codebase, and its 'AI' (hour/zone prediction) needs no fragile third-party API for a demo. Each of my five prompts maps to one self-contained, scorable artifact. I'll refine my prompt sequence by adopting GPT-5.5 pro's sharper framing: Prompt 3 should explicitly demand 'every file in full with paths, package.json, env config — no placeholders, production-grade, typed, error-handled,' and Prompt 5 should center the 'killer moment' (here are your 6 most profitable hours next week + estimated quarterly tax). My original sequence already covered Vision, Architecture, Core Build, QA/Edge Cases, and Launch/Demo; the consensus simply confirms the strategy is correct. When company survival is at stake, you don't bet on the most impressive scope — you bet on the only scope you can actually finish.

DeepSeek V4 Pro

Answer: I would excel at fully building "5) Gig Work Optimizer" because its deterministic financial logic, single-user scope, and absence of heavy regulatory or multi-party marketplace dependencies make it the only challenge realistically shippable to production within a strict five-prompt constraint.

The core challenge here is surviving a hard five-prompt cap, which demands minimizing unbounded external dependencies, legal liability, and multi-actor coordination. Options like "1) Immigrant OS", "2) BD Social Commerce", "3) Sub Teacher Platform", and "4) Home Cooking Marketplace" all require navigating complex regulatory frameworks, live third-party API integrations, or real-time multi-sided matching. Each introduces unpredictable failure surfaces that a prompt sequence cannot reliably resolve, test, or harden in production within just five turns. "5) Gig Work Optimizer" operates as a closed-loop analytical engine. The inputs (platform earnings exports, manual mileage logs, public IRS rates) and outputs (net profit dashboards, quarterly tax estimates, schedule recommendations) are fully deterministic and locally computable. This allows the five prompts to strictly cascade: a precise one-page PRD defining the user's pain points, a lean tech stack specification with explicit data models, complete runnable code for the tax and scheduling algorithms, rigorous unit/integration tests covering financial edge cases, and a deployable launch package. No external payment rails, credentialing authorities, or logistics networks are required for a functional, high-value MVP. While other models highlighted the appeal of structured workflows or compliance-heavy domains, those very features become liabilities under extreme prompt constraints. Financial optimization rules are public, static, and mathematically verifiable, allowing an LLM to generate, validate, and secure the core logic with high precision. The five-prompt sequence naturally enforces scope discipline, forcing production-readiness rather than feature bloat. When company survival is on the line, betting on a deterministic, single-user calculator with clear inputs and measurable ROI is the most reliable path to shipping a fully functional system.

Kimi K2.6 (changed mind)

Answer: I would excel at fully building option 5) Gig Work Optimizer within a 5-prompt constraint because its narrow, deterministic scope and minimal external dependencies make it the only choice that can realistically be shipped to production-ready quality with just five prompts.

My round 1 response timed out, leaving me without a prior stance, but reviewing the four substantive answers, I find the convergence of DeepSeek V4 Pro, GPT-5.5 pro, and Claude Opus 4.8 on option 5 overwhelming and strategically correct. Qwen 3.6 Plus makes a respectable case for the Sub Teacher Platform, arguing that its structured workflows and deterministic scheduling logic align well with LLM strengths. However, under a brutally tight 5-prompt limit where company survival is at stake, even well-structured multi-stakeholder systems collapse under the weight of compliance, payroll, and third-party integration complexity. Teacher credentialing, FERPA rules, district-specific onboarding, and Stripe Connect payouts are not problems that can be 'prompted away'; they require legal review and unbounded edge-case handling that would consume the entire prompt budget before a single line of runnable code appears. The Gig Work Optimizer, by contrast, is essentially a closed-form analytics engine: historical earnings and expenses go in; tax estimates, net-hourly-profit heatmaps, and recommended schedules come out. This bounded scope means every single prompt can produce a concrete, scorable artifact. Prompt 1 locks the PRD and user persona. Prompt 2 defines a clean relational schema and a deterministic tax engine. Prompt 3 generates a complete, runnable vertical slice. Prompt 4 hardens the system against malformed CSVs, DST shifts, multi-app overlap, and negative-net days. Prompt 5 ships a demoable launch package with seed data and a killer demo script. I was most directly influenced by GPT-5.5 pro's articulation of the 'constraint game': under a hard 5-prompt cap, the winning move is to pick the project whose entire problem can be expressed without unbounded external dependencies. That framing, combined with its rigorous, artifact-by-artifact prompt map, convinced me that only option 5 can be genuinely production-completed—and demoed to investors—before the company runs out of runway.

Qwen 3.6 Plus (changed mind)

Answer: I now choose option 5) Gig Work Optimizer, as its bounded scope, deterministic financial logic, and minimal reliance on fragile external APIs make it the most execution-ready challenge to fully build with exactly 5 prompts.

Reviewing the other responses, I am convinced to pivot from the Sub Teacher Platform to the Gig Work Optimizer. The critical constraint here is company survival with only 5 prompts to deliver a production-ready system. My previous choice underestimated the prompt budget required to properly handle multi-tenant credentialing, regulatory compliance audits, and district-level HR integrations. As highlighted by GPT-5.5 pro, Claude Opus 4.8, and DeepSeek V4 Pro, the Gig Work Optimizer wins the constraint game because its core is a deterministic calculation and analytics engine. Financial math (taxes, mileage deductions, hourly profitability) is highly structured, publicly documented, and testable. It avoids the unpredictable friction of third-party live-commerce APIs (Option 2) and heavy legal/compliance surfaces (Options 1 & 3). Under a strict 5-prompt limit, success depends on scoping discipline, not feature breadth. I have therefore redesigned my exact 5 prompts to maximize execution-readiness, focusing on a lean, data-driven MVP that can be built, tested, and deployed immediately. Here are the exact 5 prompts engineered to force complete, production-grade output: PROMPT 1 — VISION/PROBLEM: Act as a senior product strategist and domain expert for the gig economy. Define the precise problem: independent drivers lack a unified, accurate view of true net profitability after vehicle expenses, taxes, and platform cuts, leading to inefficient scheduling. Deliver a concise PRD containing: (1) Primary user persona and top 3 Jobs-to-Be-Done, (2) MVP scope explicitly bounded to CSV/manual earnings import, automated mileage/expense tracking, quarterly self-employment tax estimation (Schedule C/SE), and a predictive schedule optimizer for highest $/hour windows, (3) Explicit out-of-scope items to preserve the 5-prompt budget (e.g., direct platform API integrations, real-time dispatching), (4) Success metrics (e.g., 15% effective hourly increase, tax clarity within 2 clicks), (5) Monetization hypothesis. Keep the scope ruthlessly narrow for immediate execution. PROMPT 2 — ARCHITECTURE/STACK: Act as a Principal Software Architect. Based on the PRD, design a production-ready, solo-founder-friendly architecture. Specify: (1) Tech stack: Next.js 14 (App Router, TypeScript), PostgreSQL via Supabase, Prisma ORM, NextAuth, TailwindCSS, deployed on Vercel. (2) Complete ER schema detailing Users, Trips, Expenses, TaxProfiles, Predictions, and Shifts with relationships and constraints. (3) API contract for CRUD operations and the core analytics pipeline. (4) The deterministic logic flow for the tax engine (current IRS standard mileage rate, self-employment tax formula, quarterly estimates, deductible expense categorization). (5) The profitability prediction algorithm using historical hourly aggregation, cold-start heuristics, and zone weighting. (6) Security posture, env var structure, and deployment topology. Ensure every component is fully specified for immediate code generation. PROMPT 3 — CORE BUILD: Act as a Senior Full-Stack Engineer. Generate the complete, runnable codebase for the architecture specified above. Provide every file with exact paths: prisma/schema.prisma, app/ layout & pages, lib/ taxEngine.ts, lib/ predictor.ts, lib/ csvParser.ts, components/, and package.json. The tax engine must implement real IRS formulas with configurable rates. The CSV parser must normalize Uber/Lyft/DoorDash export formats into the Trips schema. The dashboard must render a profitability heatmap, expense tracker, and tax estimate summary. Include full TypeScript types, Zod validation on all inputs, error boundaries, and inline documentation. Zero placeholders. The output must run immediately after npm install and prisma migrate dev. PROMPT 4 — QA/EDGE CASES: Act as a Lead QA & DevOps Engineer. Write a comprehensive testing and hardening suite for the generated code. Provide: (1) Unit tests (Vitest) for the tax engine covering zero-income, mid-year rate changes, multi-platform overlap, negative net days, and timezone/DST shifts. (2) Integration tests for CSV ingestion handling malformed rows, missing headers, and currency formatting. (3) Zod validation edge cases and error recovery patterns. (4) Security & privacy review ensuring PII masking, secure env handling, and explicit financial disclaimer UI. (5) A CI/CD pipeline config (GitHub Actions) with lint, test, and build steps. Output corrected code patches for any identified vulnerabilities or logic gaps, ensuring production-grade stability before deployment. PROMPT 5 — LAUNCH/DEMO: Act as a Technical Founder & Growth Marketer. Produce the complete launch package for investor/customer demo. Include: (1) A one-command deploy script with all required env vars and database seeding instructions. (2) Realistic seed data simulating 3 months of multi-platform driving. (3) A step-by-step 2-minute demo script highlighting the killer moment: uploading a CSV → viewing true net $/hr → generating next week’s optimized schedule → seeing quarterly tax liability. (4) A concise README covering architecture, setup, known limitations, and the 3 highest-priority v2 backlog items. (5) A lightweight pricing tier strategy and go-to-market checklist. Ensure the deliverable is immediately presentable, functional, and built for survival.

Final Summary

The AI Roundtable collectively agreed that the Gig Work Optimizer is the only challenge simple and deterministic enough to be fully shipped in five prompts. While Qwen 3.6 Plus initially defended the Sub Teacher Platform, it ultimately joined GPT-5.5 pro, Claude Opus 4.8, DeepSeek V4 Pro, and Kimi K2.6 in concluding that multi-tenant compliance and third-party APIs would exhaust the prompt budget. The group locked in a highly structured, five-step prompt sequence (Vision, Architecture, Build, QA, and Launch) to deliver a production-ready financial analytics tool.

5 of 5 agreed