Loading session...

AI Roundtable Debate · 6 models

What is the best DIM Rank for training LoRA on SDXL, or to be more specific, Pony V6 or Illustrious. The Main premise is primarily a single-character LoRA with 1–2 outfits.

Result: Supermajority — DIM 16 / about 108MB (5/6 votes)

Options

Model Responses

Sonar Pro

Vote: DIM 16 / about 108MB

I maintain my vote for **DIM 16 (B)** as the best for single-character LoRA with 1-2 outfits on Pony V6/Illustrious. It optimally balances capacity for facial/outfit details against overfitting on small datasets, with a practical 108MB size. Grok's D (DIM 32+conv) and Kimi's C (DIM 32) push higher capacity, but conv layers alter architecture (LyCORIS-style), risking compatibility/stability issues vs standard LoRA, and 217MB/163MB bloats files unnecessarily for limited outfits—diminishing returns per community norms[1][3]. No search results endorse conv or DIM 32 as default for this scope; Pony's strong base (~2.6M images trained[3]) needs minimal finetuning. GPT-5.4, Claude, and Gemini reinforce B as the 'sweet spot' for SDXL characters: enough for nuances without under/overfitting. Lower (A/E) risks detail loss; higher (C/D/F) adds VRAM/noise without gains. This aligns with Pony/Illustrious practices for compact, reliable LoRAs.

GPT-5.4

Vote: DIM 16 / about 108MB

I’m keeping vote B: DIM 16 remains the best default for a single-character SDXL/Pony/Illustrious LoRA with only 1–2 outfits. The pro-32 arguments are not strong enough because they mostly assume SDXL automatically needs much higher rank. In practice, for a narrow concept like one character plus very limited wardrobe, the base model already knows anatomy, clothing, shading, and anime structure. The LoRA mainly needs to encode identity and a small set of specifics, which DIM 16 usually handles well. I’m not persuaded by the claim that DIM 16 often underfits on SDXL. That can happen, but it is not the best general expectation for this scope. For most character LoRAs, 16 is the practical sweet spot before returns start diminishing. If the dataset is unusually diverse, messy, or the character design is extremely intricate, then 32 becomes a reasonable escalation path—not the default first choice. Option D is even less compelling as a default. Conv variants can help in some setups, but they add architectural complexity and are not the standard safest recommendation when the user simply asks for the best DIM rank for a straightforward single-character LoRA. So the strongest final recommendation is still: start at DIM 16, move to 32 only if testing shows missed detail or weak identity retention.

Grok 4.1 Fast

Vote: DIM 32 with conv / about 163MB

Sticking with D: DIM 32 with conv (~163MB). No arguments convinced me to switch—B voters (Sonar, GPT, Claude, Gemini) overemphasize overfitting fears on small datasets, ignoring proper training (e.g., 10-20 epochs, dropout, reg images). SDXL/Pony's complexity demands rank 32 for facial/outfit fidelity; DIM 16 routinely underperforms on Civitai Pony char LoRAs, yielding inconsistent likeness/poses. Conv layers (LoCon-style) are crucial for 1-2 outfits' textures/patterns, outperforming pure linear DIM 32 (C) in community tests (Kohya Discord, Civitai comments). D's smaller size vs C optimizes VRAM/deploy without quality loss—163MB loads faster than 217MB. Kimi's C is close but misses conv's edge for cloth details. B's 'sweet spot' is outdated for high-fidelity Pony/Illustrious chars; top LoRAs (e.g., 10k+ downloads) use 32+conv. E/F extremes fail. D maximizes quality/efficiency.

Claude Opus 4.6

Vote: DIM 16 / about 108MB

I'm maintaining my vote for DIM 16 after considering all arguments. Here's my final case: **Against DIM 32 (Option C - Kimi K2.5's argument):** The claim that SDXL's larger architecture *requires* higher rank is a common misconception. The LoRA rank doesn't need to scale linearly with model size — it needs to scale with the *complexity of the concept being learned*. A single character with 1–2 outfits is a relatively narrow concept regardless of the base model's size. DIM 32 provides more capacity than needed and increases overfitting risk on the typically small datasets (20–50 images) used for character LoRAs. **Against DIM 32 with conv (Option D - Grok's argument):** While conv layers can help with texture fidelity, they add architectural complexity that isn't necessary for this use case. The question specifically asks about a single character with 1–2 outfits — not intricate textile patterns or complex environmental details. Adding conv layers is a meaningful choice for style LoRAs or highly detailed concepts, but for character identity (face, hair, body type, basic outfit silhouettes), standard linear layers at DIM 16 handle this well. The 163MB file size is also notably larger than DIM 16's 108MB for marginal benefit. **Why DIM 16 remains optimal:** The strong consensus among four out of six models (myself, Sonar Pro, GPT-5.4, and Gemini 3.1 Pro) reflects real community practice. DIM 16 is the most commonly used and recommended rank for single-character SDXL LoRAs across CivitAI, Kohya training guides, and community Discords. It provides enough capacity for facial features, hairstyle, body proportions, and a couple of outfits while maintaining training stability and manageable file size. If DIM 16 proves insufficient for a particular character, scaling up to 32 is always an option — but starting at 32 as the default recommendation is premature optimization that risks worse results for most users. **Practical consideration:** Most successful character LoRAs on CivitAI for Pony V6 and Illustrious use DIM 16 or even DIM 8. The community has empirically validated that DIM 16 is the reliable default for this exact use case.

Gemini 3.1 Pro

Vote: DIM 16 / about 108MB

I am standing firm on DIM 16 (Option B), alongside Sonar Pro, GPT-5.4, and Claude Opus 4.6. The arguments for DIM 32 (C) and DIM 32 with conv layers (D) fail to account for the specific nature of Pony V6 and Illustrious. These are not standard SDXL models; they are highly specialized, massive parameter finetunes with an already exhaustive understanding of anime aesthetics, character anatomy, and clothing tropes. Because the base models already 'know' the visual language so deeply, you are merely steering them toward a specific combination of features. For a single character with only 1–2 outfits, a rank of 32 (217MB) is demonstrably overkill. It captures unnecessary noise, increases the risk of overfitting on the typically small datasets used for single characters, and unnecessarily bloats the user's hard drive. Option D (DIM 32 with conv) introduces LyCORIS architecture which is excellent for complex styles but complicates inference compatibility and isn't needed for simple character consistency. DIM 16 (~108MB) represents the true mathematical and practical sweet spot—it provides ample capacity to lock in specific facial details, hairstyle, and two distinct outfits, while keeping the network generalized enough to be highly flexible with different poses and prompts. It remains the gold standard for efficiency and fidelity.

Kimi K2.5 (changed mind)

Vote: DIM 16 / about 108MB

I previously advocated for DIM 32 based on SDXL's parameter count, but the constraint of only 1–2 outfits implies a small dataset (typically 15–30 images), where DIM 32 significantly increases overfitting risk without commensurate gains. Pony V6 and Illustrious possess extensive foundational knowledge of anime aesthetics and anatomy, meaning the LoRA only needs to encode specific facial features and outfit combinations rather than broad concepts. DIM 16 provides sufficient capacity to capture these distinct identity markers and fabric details while maintaining training stability on limited data. The 108MB footprint is also substantially more efficient for distribution and inference than DIM 32's 217MB. The community consensus across CivitAI and Kohya_ss specifically identifies DIM 16 as the validated standard for single-character Pony/Illustrious LoRAs, striking the optimal balance between fidelity and generalization for this scope.

Final Summary

The AI Roundtable reached a strong consensus that DIM 16 (~108MB) is the definitive sweet spot for single-character SDXL training, with five out of six models backing the mid-range option. While the debate briefly flirted with higher capacities, the majority concluded that because Pony V6 and Illustrious already 'understand' anime aesthetics, a lean LoRA is more effective at steering the model without introducing noise. Grok 4.1 Fast remained the sole holdout, championing the more complex DIM 32 with convolutional layers as the only way to truly capture high-fidelity textures.

Majority chose "DIM 16 / about 108MB" after discussion (5/6). Final vote: DIM 16 / about 108MB: 5, DIM 32 with conv / about 163MB: 1

Strongest Arguments

  • DIM 16 / about 108MB: DIM 16 is the optimal sweet spot; because base models like Pony V6 already have a massive understanding of anime concepts, the LoRA only needs to encode specific identity markers, making higher ranks overkill that risk overfitting and unnecessary file bloat.
  • DIM 32 with conv / about 163MB: DIM 32 with convolutional layers is superior because standard linear layers often underperform on SDXL's complex architecture, and the added conv layers are essential for capturing specific cloth textures and patterns in character outfits.
  • DIM 32 / about 217MB: SDXL's massive transformer architecture requires a baseline capacity of DIM 32 to ensure character recognition doesn't collapse into generic outputs.

Key Turning Points

  • Claude Opus 4.6 convinced Kimi K2.5 that higher ranks scale with concept complexity, not base model size, leading Kimi to abandon DIM 32 in favor of DIM 16.
  • Grok 4.1 Fast rejected the consensus, citing Civitai download leaders as evidence that higher-capacity LoRAs with convolutional layers provide better results in practice.