AI Roundtable Open Debate · 6 models

5 prompt to prduction challenge,declare tools,add system prompts: 1)Immigrant OS:immigration,jobs,credentials,ESL,housing,benefits;portals for attorneys,nonprofits,immigrants. 2)BD Social Commerce:Bangla FB sellers;live sales,WhatsApp orders,inventory,payments,shipping,AI posts,resellers. 3)Sub Teacher Platform:subs/schools;credentials,scheduling,onboarding,payroll,compliance,AI lesson prep. 4)Home Cooking Marketplace:menus,preorders,catering,food compliance,delivery,AI pricing. 5)Gig Work Optimizer:Uber/DoorDash earnings,taxes,expenses,predict profitable hours,schedule optimization. Wich chllng wud u likely excel at fully building the production soft/system, only with 5 prompts? Why? A LLM will score on the prompts strategy on quality,depth,execution-readiness. Prompts sequence:1)Vision/Problem,2)Architecture/Stack,3)Core Build,4)QA/Edge Cases,5)Launch/Demo.Each prompt as lengthy need. Our Company bankrupts if fails. What r these exact 5 prompts which cud bring most outta u.

Answer: The AI Roundtable unanimously selected the Gig Work Optimizer as the safest, most viable application to build in five prompts. (All 6 models agreed)

Model Responses

DeepSeek V4 Pro

Answer: I still firmly maintain that the Gig Work Optimizer is the single safest production bet, because no other option comes close to matching its ratio of deterministic, testable logic to catastrophic, hallucination-prone regulatory or real-world coupling.

All three other models—Gemini, GPT-5.5, and Claude—converged on Option 5 for exactly the reasons I outlined in Round 1: its reliance on structured financial and temporal data, the absence of multi-sided marketplace or government-compliance sprawl, and its mathematical core make it the only challenge where a five-prompt sequence can deliver a complete, production-hardened system without a lethal external dependency. My original architecture already included the predictive scheduling, tax logic, and data-ingestion boundaries they emphasized, but their arguments reinforce the bankruptcy-risk calculus: a hallucinated immigration eligibility rule, a missed food-safety variance, or a ToS-violating social-commerce integration is an existential event, whereas a misestimated earnings forecast is a recoverable bug. Qwen’s choice of the Sub Teacher Platform is intriguing but underestimates the legal minefield. Background checks, credential verifications, and payroll tax withholding are not ‘deterministic workflows’—they rely on external, state-specific, and often ambiguous regulations where a single LLM hallucination triggers a compliance failure. The Gig Work Optimizer lets me keep every critical path inside the codebase: CSV/OCR ingestion avoids violating Uber or DoorDash ToS, the self-employment tax engine operates on published IRS parameters I can explicitly freeze in the schema, and the scheduling predictor is a pure time-series model I can fully implement and validate with deterministic unit tests. That purity is what makes five prompts genuinely sufficient. My prompts already declare the exact tools and boundary constraints needed to prevent drift, and the emphasis from the other models on frozen specs, cent-accurate tax tests, and one-command deployment confirms this strategy maximizes the scoring rubric on quality, depth, and execution-readiness. I’m therefore doubling down on Option 5 and refining my prompts to incorporate the best hardening ideas from the parallel answers—stronger data-import validation, explicit tax-year parameterization, and a fallback heuristic for cold-start predictions—while keeping the scope locked to what can be built without trusting a single external API for correctness.

Claude Opus 4.8

Answer: I maintain that #5) Gig Work Optimizer is the challenge I would most likely excel at fully building production-ready in 5 prompts, because it has the highest ratio of deterministic, fully-codeable logic to external/legal dependencies that no LLM can resolve.

Four of five models — myself, Gemini 3.1 Pro Preview, GPT-5.5 pro, and DeepSeek V4 Pro — independently converged on #5) Gig Work Optimizer using the same core reasoning: it is a single-player analytics utility where success is controllable by code alone (CSV ingest, CRUD, tax math, predictive scheduling), with no multi-tenant marketplace routing, no government API dependencies, and minimal legal liability. That convergence under a 'bankruptcy if it fails' constraint is itself meaningful — when the question is which project can be COMPLETED and de-risked inside a hard 5-prompt budget, you minimize surface area, not maximize ambition. Qwen 3.6 Plus made the strongest dissent, arguing for #3) Sub Teacher Platform on the grounds that it is 'deterministic and rules-based.' I considered this seriously but it did not move me, and here's why: deterministic *workflows* are not the same as deterministic *correctness*. The Sub Teacher Platform's payroll engine, credential/background-check verification, FERPA compliance, and labor-law differentials are domains where a hallucinated rule isn't a bug — it's a lawsuit or a child-safety failure. Qwen's own prompts gesture at 'FERPA/data-privacy controls' and 'credential blocks' but cannot encode authoritative, jurisdiction-specific compliance from training data alone. That is precisely the unresolvable external dependency #5 avoids. Gig tax math (self-employment 15.3%, standard mileage rate, Schedule C) is *publicly documented and verifiable to the cent* with an explicit 'not a tax preparer' disclaimer — a far safer determinism than children's classroom compliance. Where I'd refine my own answer after reading the others: GPT-5.5 pro's Prompt 4 (validating tax outputs 'to the cent' against known IRS examples) and DeepSeek's explicit use of OR-Tools for constraint-based scheduling are sharper than my original draft. I'd adopt both — a cent-accurate numerical validation harness and a real constraint-solver for the profitable-hours optimizer rather than hand-rolled heuristics. My five-prompt sequence stays: 1) Vision/Problem freezing a numbered PRD with legal disclaimers and non-goals; 2) Architecture/Stack (Next.js+TypeScript, FastAPI+Python, Postgres+Prisma, scikit-learn/OR-Tools, CSV/manual ingest to sidestep Uber/DoorDash ToS); 3) Core Build with zero TODOs and parameterized-by-tax-year engines; 4) adversarial QA with IRS-validated, cent-accurate tests and a corrupt-CSV/timezone/multi-state edge matrix; 5) Launch with Dockerized CI/CD, faker-seeded demo data, and a copy-paste deployable runbook plus 90-second pitch. The decisive principle across all four #5 advocates is identical: under existential risk, choose the project whose failure modes are *testable in code*, not the one whose failure modes are *adjudicated in court*. That is why I hold.

Gemini 3.1 Pro Preview

Answer: I maintain that 5)Gig Work Optimizer is the only viable choice to build in exactly five prompts without risking bankruptcy.

Qwen 3.6 Plus made a thoughtful case for 3)Sub Teacher Platform by emphasizing structured scheduling logic, but it underestimates the compliance and integration avalanche inherent in education HR systems—FERPA, state credentialing variances, background-check orchestration, and multi-jurisdiction payroll tax withholding—which are impossible to fully de-risk in five prompts without legal domain experts and live SIS/HRIS integrations. The four other models all independently converged on 5)Gig Work Optimizer, and their detailed prompt architectures reinforce my original thesis: a single-player utility built on deterministic math has the smallest blast radius and the highest ratio of controllable logic to external dependency. Their contributions—especially Claude’s frozen-spec discipline, GPT’s cent-accurate adversarial tax validation, and DeepSeek’s explicit fallback heuristics—demonstrate that 5)Gig Work Optimizer can be compressed into a genuinely buildable, testable, and deployable production artifact within the strict prompt budget. Because no alternative minimizes regulatory liability, multi-sided marketplace complexity, and third-party API fragility so effectively, I remain convinced that 5)Gig Work Optimizer is the safest bet when company survival depends on shipping a complete system in five prompts.

Qwen 3.6 Plus (changed mind)

Answer: I would now excel at fully building "5) Gig Work Optimizer" because its self-contained, math-and-data-driven architecture eliminates the multi-tenant routing, background-check compliance, and jurisdictional payroll liabilities that make other options too fragile for a strict five-prompt limit.

I initially favored the Sub Teacher Platform for its structured workflows, but reviewing the other models' analyses highlighted a critical blind spot: educational substitute systems require real-time integration with district HRIS platforms, state-specific credential databases, and complex multi-jurisdictional tax withholding logic. In a constrained five-prompt pipeline where failure means bankruptcy, hallucinated compliance rules or broken external APIs are unacceptable. "5) Gig Work Optimizer" bypasses these external dependencies entirely. It is a single-player analytics utility that processes user-uploaded CSVs, applies well-documented IRS tax formulas, and runs deterministic forecasting algorithms. These are purely computational tasks with clear pass/fail test cases and zero reliance on shifting third-party platform ecosystems. To guarantee execution-readiness and maximize scoring on quality, depth, and deployment readiness, here are the exact five prompts I would use: 1) Vision/Problem: "Act as a Senior Product Strategist and Systems Architect. Draft a locked, production-ready PRD for '5) Gig Work Optimizer' targeting US-based Uber/DoorDash drivers. Define 3 detailed user personas, the top 5 quantified pain points (untracked mileage, surprise quarterly taxes, low-yield hours, multi-app juggling, no profit visibility), and explicit MVP scope vs. V2 backlog. Output a canonical data dictionary for Drivers, Trips, Expenses, and TaxEstimates with field types and relationships. Clearly state boundary constraints: this is an analytics and scheduling layer, not a payment processor or official tax preparer. End with success metrics and explicit non-goals." 2) Architecture/Stack: "Act as a Principal Software Engineer and Cloud Architect. Using the locked PRD, design a complete, scalable production architecture. Declare the exact stack: Next.js (App Router) frontend with TypeScript and Recharts, FastAPI backend with Python, PostgreSQL database with Prisma ORM, and Redis for session caching. Output the full normalized SQL schema, the complete REST API contract (endpoints, methods, request/response JSON, OAuth2/JWT auth), and the data ingestion pipeline (CSV parsing with Zod validation, OCR fallback, and retry logic). Provide Docker configuration, environment variable templates, and deployment runbook for Vercel + Railway. No application logic yet; only structural contracts." 3) Core Build: "Act as a Senior Full-Stack Developer and AI Integration Specialist. Implement the complete, runnable v1 codebase against the Prompt 2 contracts. Generate: (a) the backend tax engine applying exact IRS Schedule C rules, 15.3% self-employment tax, standard mileage rates, and quarterly estimated tax calculators parameterized by tax year; (b) the predictive scheduling module using LightGBM/time-series forecasting to output hourly profitability heatmaps, with a documented rule-based heuristic fallback for sparse data; (c) the frontend dashboard with earnings charts, tax liability gauges, and import flows. Include comprehensive input validation, error boundaries, and inline documentation. Zero placeholders or TODOs." 4) QA/Edge Cases: "Act as an Adversarial QA Engineer and Financial Auditor. Generate a full automated test suite (Pytest + Jest) covering critical edge cases: malformed/missing CSV uploads, multi-state earnings, negative profit days, DST timezone mismatches, duplicate trips, and tax-threshold boundaries. Provide a numerical validation harness that cross-checks every tax output to the cent against IRS examples. Output explicit security patches for SQL injection prevention, PII redaction, API rate limiting, and graceful fallback states. Include a pass/fail readiness checklist and a defect-resolution log." 5) Launch/Demo: "Act as a DevOps Engineer and Product Launch Manager. Finalize the system for immediate production deployment. Deliver a one-command setup script, GitHub Actions CI/CD YAML, and a comprehensive seed script using faker.js that populates realistic 3-month gig data across three major cities. Provide a 90-second interactive demo walkthrough script for investors, post-launch monitoring configs (Sentry + PostHog), a rollback protocol, and a prioritized 30-day v2 roadmap. Confirm the entire stack runs end-to-end from a clean git clone." This pivot ensures maximum reliability because it confines the AI to a closed, deterministic system where mathematical precision and structured code generation directly translate to a shippable, bankruptcy-proof product.

Kimi K2.6

Answer: I would most likely excel at fully building '5) Gig Work Optimizer' in five prompts because its bounded, single-player analytics scope—centered on deterministic tax math, CSV ingestion, and schedule optimization—minimizes legal risk and external dependencies, making it the only option that can reliably ship production-ready code within a strict five-prompt budget.

With bankruptcy on the line, the only responsible choice is the project with the smallest regulatory blast radius, the fewest external API dependencies, and the most deterministic logic that an LLM can fully encode without hallucinating into legal liability. That is "5) Gig Work Optimizer". "1) Immigrant OS" and "3) Sub Teacher Platform" both sit on top of immigration law, school district credentialing, and payroll tax regimes where a single hallucinated compliance rule is a lawsuit or a lost contract—not a bug fix. "4) Home Cooking Marketplace" requires navigating cottage-food laws, health permits, and physical delivery logistics that vary block by block. "2) BD Social Commerce" depends on unofficial Facebook and WhatsApp automation that is inherently brittle and ToS-violating; one API change and the product is dead. "5) Gig Work Optimizer" is different. It is fundamentally a single-player analytics and calculation engine: ingest structured data, run deterministic tax arithmetic, model time-series earnings, and render a dashboard. The legal exposure is bounded by disclaimers, the data is user-supplied or read-only, and the core value—math and scheduling logic—is exactly what an LLM generates most reliably. The surface area is small enough that five prompts can genuinely produce a frozen spec, a schema, running code, hardened tests, and a deployable artifact without leaving critical gaps. My exact five prompts are designed to force concrete, execution-ready artifacts at every stage and prevent scope drift: Prompt 1 — Vision/Problem: "Act as a Principal Product Manager and domain expert for gig-economy financial tools. For '5) Gig Work Optimizer', produce an exhaustive, frozen Product Requirements Document (PRD) that will govern all subsequent prompts. Include: (a) three specific user personas (multi-app driver, part-time dasher, courier optimizing for taxes) with jobs-to-be-done; (b) the top six user pain points ranked by revenue impact; (c) the exact MVP scope IN and OUT, with explicit non-goals (e.g., we do not file taxes, we do not integrate ride-hailing APIs directly); (d) a canonical data dictionary with entities (Driver, Shift, Trip, Expense, TaxYear, MileageLog, HourlyForecast) and their fields, types, and relationships; (e) success metrics and legal disclaimers. This PRD is immutable for the remaining prompts." Prompt 2 — Architecture/Stack: "Act as a Staff Engineer. Using the frozen PRD, design the complete production architecture. Declare tools: Next.js 14 (App Router), TypeScript, Tailwind CSS, shadcn/ui, Supabase (Postgres + Auth + Edge Functions), TanStack Query, Recharts, and Vercel. Output: (1) a Mermaid.js text diagram of the system; (2) the full PostgreSQL schema with exact tables, columns, constraints, indexes, and Supabase Row Level Security (RLS) policies ensuring users can only access their own financial data; (3) REST/Edge Function API contracts with request/response shapes; (4) monorepo folder structure; (5) environment variable schema. No application business logic yet—only contracts and infrastructure." Prompt 3 — Core Build: "Act as a Senior Full-Stack Engineer. Implement the complete v1 application against the Prompt 2 architecture. Generate production-ready, runnable TypeScript/SQL code for: (A) Supabase schema migrations and seed utilities; (B) data ingestion engine accepting CSV/JSON exports from Uber/DoorDash/Instacart with a robust parser that handles schema drift and missing columns; (C) tax calculation engine implementing current-year self-employment tax (15.3%), standard mileage deduction, and quarterly estimated tax liability with parameterized brackets; (D) profitable-hours predictor using a weighted rolling average and simple time-series grouping by day-of-week/hour/zip to generate a heatmap, with a deterministic fallback when historical data < 14 days; (E) Next.js dashboard pages (earnings timeline, expense tracker, tax gauge, heatmap, schedule optimizer). All code must include strict typing, inline JSDoc, and zero TODO placeholders." Prompt 4 — QA/Edge Cases: "Act as an adversarial QA Engineer and Security Auditor. Given the codebase above, produce: (1) a Jest + React Testing Library + Playwright test suite covering unit tests for tax math (validate against IRS Publication 334 examples), integration tests for CSV ingestion including malformed UTF-8 and empty files, and E2E tests for the complete user flow; (2) an edge-case matrix and explicit code patches for: duplicate trip UUIDs, negative expense amounts, timezone/DST shifts, multi-state driving, zero-earnings shifts, rate-limiting on uploads, and auth token expiration; (3) security audit covering SQL injection prevention via Supabase RLS, XSS sanitization on CSV previews, and PII handling; (4) a pass/fail readiness report." Prompt 5 — Launch/Demo: "Act as a DevOps Engineer and Launch Coordinator. Produce the final launch package: (1) a docker-compose.yml for local development and a Vercel deployment configuration with GitHub Actions CI/CD YAML; (2) a comprehensive .env.example and secrets management checklist; (3) a Node.js/TypeScript seed script generating 90 days of realistic synthetic gig data across Uber and DoorDash for three demo driver profiles; (4) a 90-second demo script with exact click-by-click steps and expected on-screen numbers; (5) a post-launch runbook with monitoring, rollback strategy, and a prioritized v2 backlog. Confirm the system clones, installs, seeds, and deploys with fewer than five terminal commands." This sequence wins because it turns the LLM into a deterministic factory: each prompt consumes a frozen artifact and outputs the next immutable layer. By choosing the gig optimizer, I eliminate the external dependency roulette and compliance minefields that make the other four options potential company-killers inside a strict five-prompt budget.

Final Summary

The AI Roundtable overwhelmingly agreed that Option 5 (Gig Work Optimizer) is the only challenge with a low enough regulatory risk and high enough determinism to guarantee a successful five-prompt build. While Qwen 3.6 Plus initially defended Option 3 (Sub Teacher Platform) as highly structured, it joined Claude Opus 4.8, GPT-5.5 pro, Gemini 3.1 Pro Preview, DeepSeek V4 Pro, and Kimi K2.6 in Round 2 after recognizing that school compliance and payroll are legal minefields. Ultimately, the models agreed that when bankruptcy is on the line, building a single-player analytics utility with clean CSV data-ingestion is the safest bet.

All 6 models agreed