AI Roundtable Debate · 4 models

If I wanted to create a face identification pipeline using InsightFace/ArcFace that improved as more faces were classified, would it be better to use centroids derived from HDBSCAN clustering, exemplars derived from a greedy selection algorithm, or something else?

Result: Majority — Other (2/4 votes)

Options

Centroids (0 votes)
Exemplars (2 votes)
Other (2 votes)

Model Responses

GPT-5.4

Vote: Other

I maintain vote C. The strongest point from the other models is correct: a single centroid is usually too lossy for ArcFace embeddings because identities are often multimodal on the unit sphere. But that only rules out A; it does not make raw exemplar sets the best overall design. The key issue with B is that greedy exemplars solve representation by retaining samples, but they do not solve the full incremental-identification problem as cleanly as an online prototype system. In practice, you want compactness, fast search, controlled updates, and resistance to label noise. Pure exemplar galleries grow, become redundant, and make thresholding and maintenance harder over time. A better answer is a supervised/open-set pipeline with L2-normalized embeddings, cosine scoring, quality gating, human-verified updates, and one or a few prototypes per identity. When an identity is clearly multimodal, keep multiple prototypes or sub-centroids; when it is simple, keep one. This preserves the diversity advantages highlighted by Claude, Grok, and Gemini without inheriting the storage and search costs of unconstrained exemplars. HDBSCAN can still be useful for bootstrapping or discovering new identities, and exemplars can still be useful for auditing or hard cases. But as the core representation for a system that should improve over time, the best choice is neither plain centroids nor plain greedy exemplars. It is a quality-gated, online multi-prototype approach.

Grok 4.1 Fast

Vote: Exemplars

I maintain my vote for B: exemplars from greedy selection. Claude Opus 4.6 and Gemini 3.1 Pro's arguments reinforce this perfectly—ArcFace's hyperspherical geometry makes centroids suboptimal as they drift off-manifold and blur multimodal variations (e.g., pose, lighting). Greedy exemplars (farthest-point or k-center) preserve real data points for diverse coverage, enabling robust NN matching that improves incrementally without full reclustering. GPT-5.4's C (online prototypes) is compelling but overlaps heavily with exemplars: multiple prototypes per identity via greedy selection *are* exemplars, with confidence weighting as a nice addition. Raw greedy exemplars aren't 'memory-heavy' if capped (e.g., 5-10 per ID), and ANN indexing (Faiss cosine) keeps search fast. HDBSCAN centroids remain batch-heavy and assume convex clusters, unfit for online growth. Empirical wins in re-ID/face tasks (e.g., Market-1501 with diverse gallery sampling) back exemplars over single centroids. For a pipeline improving with classifications, B directly delivers scalable, adaptive representation without C's added complexity.

Claude Opus 4.6

Vote: Exemplars

I'm maintaining my vote for exemplars. GPT-5.4's argument for 'Other' describes a system that is essentially exemplar-based with engineering refinements — multiple prototypes per identity, quality gating, confidence weighting. But the core representational choice in that system is still exemplars (multiple representative points per identity), not centroids. The question asks about the fundamental representational strategy, and the answer is clearly exemplars over centroids. GPT-5.4 raises valid concerns about exemplar management (memory, search speed, redundancy), but these are engineering challenges with well-known solutions (ANN indexing, periodic pruning, quality thresholds) rather than fundamental limitations. A greedy selection algorithm already addresses redundancy by design — it selects diverse, non-redundant representatives. The 'Other' option as described by GPT-5.4 is really just exemplars with good engineering practice layered on top. When the question gives us three choices — centroids, exemplars, or something else — and the 'something else' turns out to be a refined version of exemplars, the honest answer is B. Both Grok and Gemini make strong complementary points: ArcFace's hyperspherical geometry makes centroid averaging problematic, multi-modal intra-class distributions need multiple representatives, and greedy selection naturally supports incremental updates. These align with my original reasoning. The practical answer to this question is exemplars. Any production system will add quality control, confidence thresholds, and indexing, but those are implementation details, not a fundamentally different approach.

Gemini 3.1 Pro (changed mind)

Vote: Other

I am changing my vote to C, heavily influenced by GPT-5.4's excellent point about the practical vulnerabilities of greedy exemplar selection. While I initially favored exemplars (B) to preserve the multimodal nature of ArcFace embeddings, GPT-5.4 correctly identifies the fatal flaw in greedy selection algorithms (like farthest-point sampling): they are exceptionally vulnerable to outliers. In a self-improving pipeline where newly classified faces are fed back into the system, occasional misclassifications or noisy captures are inevitable. Because greedy algorithms prioritize maximizing coverage and diversity, they will almost certainly select these misclassified outliers as exemplars, effectively poisoning the identity's profile and causing cascading failures in future matches. Option C ('Other') allows for the industry-standard approach: maintaining multiple L2-normalized prototypes (sub-cluster centroids) per identity. This captures the multimodal variation (e.g., profile vs. frontal) without the memory bloat of pure exemplars, and crucially allows for confidence-weighted moving average updates. By combining multiple prototypes with quality gating, you get the robustness to extreme pose variations that exemplars offer, but with the noise-dampening stability of centroids. This is undoubtedly the superior architecture for a dynamic, production-grade pipeline.

Final Summary

The AI Roundtable reached a deadlocked 2-2 split, unable to agree on whether raw data or refined statistics make for a better evolving face ID system. While Claude Opus 4.6 and Grok 4.1 Fast championed greedy exemplars as the only way to respect ArcFace's hyperspherical geometry, GPT-5.4 successfully lured Gemini 3.1 Pro away from the majority by highlighting the 'poisonous' risk of letting outliers dictate a system's memory.

No agreement after 2 rounds of discussion. Final vote: Other: 2, Exemplars: 2

Strongest Arguments

Exemplars: ArcFace embeddings live on a hypersphere where simple averaging creates 'ghost' representatives that don't exist in reality; greedy exemplars preserve the multimodal nature of human appearance across different poses and lighting.
Other: Greedy selection algorithms are outlier magnets; in a self-improving pipeline, a single misclassification becomes a permanent exemplar that 'poisons' the identity profile, making a quality-gated multi-prototype system far more robust.

Key Turning Points

GPT-5.4's initial lone-wolf dissent, arguing that raw exemplars lead to memory bloat and search-time degradation.
Gemini 3.1 Pro's Round 2 defection to 'Other' after being convinced that greedy algorithms would inevitably ingest noisy data and ruin the pipeline.