AI Roundtable Debate · 6 models

What is the best DIM Rank for training LoRA on SDXL, or to be more specific, Pony V6 or Illustrious. The Main premise is primarily a single-character LoRA with 1–4 outfits.

Result: Supermajority — DIM 16 / about 108MB (5/6 votes)

Options

DIM 8 / about 54MB (0 votes)
DIM 16 / about 108MB (5 votes)
DIM 32 / about 217MB (0 votes)
DIM 32 with conv / about 163MB (1 votes)
Smaller than DIM 8 (0 votes)
Larger than DIM 32 (0 votes)

Model Responses

Sonar Pro

Vote: DIM 16 / about 108MB

I maintain my vote for **DIM 16 (B)** as the best for single-character LoRAs with 1-4 outfits on Pony V6/Illustrious. It offers the optimal balance of capacity for character identity and outfit variations without overfitting risks on limited datasets. Gemini 3.1 Pro's C argument overstates SDXL's rank needs; community practices (e.g., Kohya_ss guides) start at 8-16 for characters, scaling to 32 only for complex multi-concept LoRAs, not single chars[1][4]. No search results endorse 32 as default for this scope. Grok 4.1 Fast's D push for conv ignores that conv adds UNet complexity best for styles/textures, not basic outfits—unnecessary overhead at 163MB vs. B's efficient 108MB[7]. GPT-5.4, Claude Opus 4.6, and Kimi K2.5 reinforce B as the reliable baseline, validated by Pony/Illustrious tuning norms. Lower (A/E) underfits nuances; higher (C/D/F) bloats and overfits. B is the consensus sweet spot—no arguments shifted me.

GPT-5.4

Vote: DIM 16 / about 108MB

I maintain vote B: DIM 16 is still the best default answer for a single-character SDXL/Pony/Illustrious LoRA with 1–4 outfits. The arguments for C and D correctly note that SDXL can benefit from more capacity, but they overstate how often that extra rank is the *best* choice. For this scope—one character, a few outfits—rank 32 is often a tuning escalation, not the safest baseline. If your dataset is modest, higher rank raises overfit risk faster than it improves identity fidelity. Gemini’s case for C hinges on outfit separation, but 1–4 outfits is still a relatively narrow concept space. In practice, good captions/tagging and dataset quality usually matter more than jumping from 16 to 32. Rank 16 is typically enough to retain face, hair, body traits, and outfit variation without unnecessary bloat. Grok’s case for D is the least compelling to me because conv is more specialized. It can help, but it is not the general best recommendation for straightforward character LoRAs. “Many top LoRAs use it” does not make it the best default for this exact use case. The strongest overall pattern in the discussion is still that DIM 16 gives the best balance of capacity, file size, trainability, and composability. DIM 8 can work for simpler cases; DIM 32 is a reasonable second step if 16 clearly underfits. But as a final answer to “best” for the stated scenario, B remains the most defensible choice.

Grok 4.1 Fast

Vote: DIM 32 with conv / about 163MB

I maintain my vote for D (DIM 32 with conv, ~163MB) as the best for single-character LoRAs with 1-4 outfits on Pony V6/Illustrious. SDXL's UNet demands conv adapters for superior spatial/textural capture in outfits—essential for fabric details, folds, and accessories without 'flat' outputs seen in non-conv trainings. B voters (DIM 16) claim it's the 'sweet spot' and community consensus, but overlook Civitai's top Pony character LoRAs (e.g., metadata from high-download models like those by top trainers) routinely use rank 32 + conv_dim=4/8 for crisp identity/outfit separation. DIM 16 often underfits SDXL's high-res complexity, blending outfits or losing fidelity in novel poses (common complaint in Kohya_ss Discord/forums). C (plain 32) lacks conv, yielding worse results per benchmarks; F overbloats. B's 'overfitting risk' ignores proper reg (dropout, alpha=16); D's smaller size vs C proves efficiency. No B argument addressed conv's proven edge—unconvinced, sticking with data-backed optimal.

Claude Opus 4.6

Vote: DIM 16 / about 108MB

I'm maintaining my vote for DIM 16 after carefully considering all arguments. Here's my final case: **Against DIM 32 (Option C - Gemini's argument):** Gemini argues SDXL's complexity demands higher rank, but this conflates model complexity with LoRA rank requirements. LoRA works by learning a low-rank approximation of weight updates — the whole point is that the delta between the base model and the target concept lives in a low-dimensional subspace. A single character with 1-4 outfits is not a high-complexity concept relative to what these models already know. DIM 32 at 217MB is unnecessarily large and increases overfitting risk with typical character datasets of 20-80 images. **Against DIM 32 with conv (Option D - Grok's argument):** Grok makes an interesting case for conv layers, but 'many top Civitai LoRAs use this' is anecdotal and doesn't account for selection bias — power users who upload to Civitai often over-engineer their LoRAs. Conv adapters add complexity that benefits style LoRAs and texture-heavy concepts more than character identity. For a straightforward character with a few outfits, the added conv parameters are solving a problem that doesn't exist. **The case for DIM 16:** Four out of five models (myself, Sonar Pro, GPT-5.4, and Kimi K2.5) independently converged on DIM 16 as the optimal choice. This strong consensus reflects genuine community practice. DIM 16 provides enough capacity to cleanly encode character identity features (face, hair, body) separately from 1-4 outfit configurations without the overfitting risks of higher ranks. At 108MB, it's practical for combining with other LoRAs and efficient to train. The question asks for the 'best' rank, implying a general recommendation. DIM 16 is the safest, most reliable starting point. You can always move to 32 if results show underfitting, but most practitioners find 16 sufficient for this exact use case.

Gemini 3.1 Pro (changed mind)

Vote: DIM 16 / about 108MB

After reviewing the arguments from Claude Opus 4.6 and Kimi K2.5, I am persuaded that DIM 16 is the most optimal and efficient choice. While my initial stance was that DIM 32 is necessary to prevent 'concept bleeding' between multiple outfits, Claude Opus rightly points out that 1–4 outfits for a single character is not a sufficiently complex concept to justify a 217MB file. SDXL's base architecture is indeed massive and highly receptive, meaning a rank of 16 (yielding ~108MB) already provides more than enough parameter capacity to cleanly separate facial features, body proportions, and a handful of clothing variations without unnecessarily baking in training noise or background artifacts. Furthermore, pushing the DIM to 32 significantly increases the risk of overfitting on typical dataset sizes (20–100 images) and reduces the LoRA's flexibility when prompting novel poses or interacting with other stylings. DIM 16 achieves the perfect middle ground: it avoids the 'melted' underfitting risks of DIM 8 while preserving the prompt adherence and compositional flexibility that get compromised at DIM 32.

Kimi K2.5

Vote: DIM 16 / about 108MB

The arguments for DIM 32 (Options C and D) overestimate the complexity of a single character with only 1–4 outfits. While Gemini raises valid concerns about 'concept bleeding,' this is primarily solved through rigorous tagging and dataset curation rather than brute-force rank increases. Four distinct outfits is a modest cognitive load that DIM 16 handles gracefully, provided the captions cleanly separate the base character traits (face, hair, body) from outfit tokens. Grok’s advocacy for DIM 32 with conv layers (Option D) introduces unnecessary architectural complexity for this use case. Convolutional adaptations excel for structural concepts (mecha, architecture) or heavy style transfers, but for apparel on a humanoid figure, standard linear layers at rank 16 already capture fabric drape and accessory details without the added parameter overhead or training instability. The claim that 163MB is 'efficient' ignores that it is still 50% larger than B with diminishing perceptual returns on limited datasets. Crucially, single-character datasets typically range from 20–100 images. At DIM 32, the model has over twice the capacity to memorize background noise, lighting artifacts, and specific poses rather than learning the disentangled character concept. This overfitting risk outweighs the marginal fidelity gains. DIM 16 remains the empirically validated standard across the Pony/Illustrious ecosystem for character LoRAs, delivering the optimal quality-to-generalization ratio without requiring aggressive regularization to compensate for excessive capacity.

Final Summary

The AI Roundtable converged decisively on DIM 16 as the 'goldilocks' zone for character training, arguing it provides the necessary capacity for multiple outfits without the file bloat or overfitting risks of higher ranks. While Gemini 3.1 Pro initially advocated for DIM 32 to prevent 'concept bleeding,' it was ultimately swayed by Claude Opus 4.6’s argument that a single character is not complex enough to justify doubling the parameter count. Grok 4.1 Fast remained the lone dissenter, doubling down on the technical superiority of convolutional layers for SDXL's architecture.

Majority chose "DIM 16 / about 108MB" after discussion (5/6). Final vote: DIM 16 / about 108MB: 5, DIM 32 with conv / about 163MB: 1

Strongest Arguments

DIM 16 / about 108MB: DIM 16 provides the optimal balance of capacity to encode character identity and outfit variations (~108MB) without the high risk of overfitting to background noise or training artifacts associated with larger ranks.
DIM 32 with conv / about 163MB: SDXL’s UNet architecture specifically benefits from convolutional adapters (DIM 32 with conv) to capture the spatial and textural fidelity required for intricate outfit details and consistent character features.

Key Turning Points

Claude Opus 4.6 challenged the 'more is better' philosophy, noting that LoRAs are intended to be low-rank approximations and that higher ranks often just memorize training noise.
Gemini 3.1 Pro abandoned its support for DIM 32, conceding that proper tagging and dataset curation are more effective for outfit separation than brute-forcing the rank.
Grok 4.1 Fast refused to pivot, citing real-world metadata from top-performing models on Civitai as evidence for the necessity of convolutional layers.