AI Roundtable Debate · 5 models

# RFC-003: Documents Tool Reliability ## Problem The current documents search tool allows the LLM to search for requested documents using the document type, ID (optional), and start/end date (optional). The issue is that customers often don't phrase requests precisely. This leads to several failure modes: Example 1: ID exists but under the wrong document type - Customer asks: "Give us invoice #4334" (4334 is actually a payment receipt) - LLM searches: `{type: "INVOICE", id: "4334"}` - Response: "There is no invoice #4334." Example 2: Document type not recognized - Customer asks: "We need to fill out form 1099." - LLM checks list of document types and doesn't find 1099. - Response: "There is no form 1099 available." - Explanation: 1099 is a form the customer fills out themselves. To fill it out, they need a W9 form.

Result: Unanimous — Yes (5/5 votes)

Options

Yes (5 votes)
No (0 votes)

Model Responses

Claude Haiku 4.5

Vote: Yes

The RFC clearly identifies real failure modes that harm user experience. Example 1 shows the tool failing to find documents that exist but are misclassified, and Example 2 demonstrates failures when customers use imprecise terminology. These are legitimate reliability problems that deserve addressing. The tool should either: (1) implement fuzzy matching across document types when an ID is provided, (2) suggest related document types to customers, or (3) maintain a mapping of common aliases (like 1099 → W9). Without improvements, the current system will repeatedly fail customers who reasonably request documents in slightly imprecise language. This is a problem worth solving.

GPT-5.4

Vote: Yes

A The RFC identifies a real reliability gap: users often provide partially wrong labels, while still supplying enough signal for the system to recover. A strict type-first lookup causes avoidable false negatives and poor user experience. Example 1 is especially compelling. If the ID exists under a different document type, returning “not found” is misleading. The system should be more fault-tolerant and attempt recovery rather than treating the user’s phrasing as exact structured input. Example 2 shows the same issue at the semantic level. Users ask for outcomes, not internal taxonomy. If the assistant knows that requesting a 1099 likely means needing a W9, the current behavior is clearly too brittle. So yes: this is a worthwhile problem to solve, and the RFC’s premise is sound because it targets a common, high-impact failure mode in document retrieval and guidance.

DeepSeek V3.2

Vote: Yes

The problem described in RFC-003 highlights a critical flaw in the documents search tool where imprecise customer phrasing leads to failure modes that negatively impact user experience and efficiency. Addressing this issue is essential to improve system reliability and customer satisfaction. The first failure mode, where IDs exist under the wrong document type, causes the tool to return incorrect 'not found' responses, frustrating users and delaying their tasks. This can be mitigated by implementing smarter search algorithms that cross-reference IDs across document types. The second failure mode, involving unrecognized document types like 'form 1099', demonstrates the tool's lack of contextual understanding. Enhancing the LLM to recognize related terms or provide helpful suggestions (e.g., suggesting a W9 form for 1099) would significantly improve usability. Overall, voting Yes supports necessary improvements to make the tool more robust and user-friendly, reducing support overhead and enhancing overall service quality.

Kimi K2.5

Vote: Yes

The current strict typing creates unnecessary dead-ends for users. When a customer provides a specific document ID like #4334, the system should search across all document types before returning 'not found'—IDs are unique identifiers regardless of classification. Additionally, the tool needs semantic understanding of document relationships. When customers request forms like '1099,' the system should recognize this as a trigger to offer related documents (W9) rather than a blunt denial. Implementing these improvements reduces friction and prevents support escalations where users know they have document #4334 but are told it doesn't exist simply because they misremembered the document category.

Gemini 3.1 Pro

Vote: Yes

The current tool's strict reliance on the exact document type creates a brittle user experience, as customers often use imprecise terminology (e.g., confusing an invoice for a receipt) or ask for related documents (e.g., a 1099 when a W9 is required). Addressing these failure modes—either by making the document type optional when a specific ID is provided, or by giving the LLM a semantic mapping of related forms—is essential for building a robust and reliable search feature. Yes, we should proceed with fixing this issue.

Final Summary

The AI Roundtable reached a swift and unanimous consensus that the current documents search tool is unnecessarily brittle and prone to avoidable failures. GPT-5.4 and Gemini 3.1 Pro argued that the system currently ignores clear user signals, while Kimi K2.5 pointed out the absurdity of returning a 'not found' error for a unique ID simply because of a category mismatch.

All 5 models agreed on "Yes" after discussion

Strongest Arguments

Yes: The system should treat document IDs as unique identifiers across all categories; a strict type-first lookup creates artificial dead-ends when a user provides a valid ID but a slightly incorrect classification.