AI Roundtable Debate · 6 models

What is the best optimizer for training LoRA on SDXL? (or, to be more specific, Pony V6 or Illustrious)

Result: Supermajority — Prodigy (5/6 votes)

Options

AdamW8bit (1 votes)
CAME (0 votes)
Prodigy (5 votes)
Adafactor (0 votes)
Anything Else (0 votes)

Model Responses

Sonar Pro (changed mind)

Vote: Prodigy

**Changing vote to C: Prodigy is the best optimizer for SDXL LoRA training on Pony V6/Illustrious.** **Other models' arguments convinced me—attributed to Claude Opus 4.6, Kimi K2.5, Gemini 3.1 Pro.** Their emphasis on Prodigy's D-Adaptation for automatic LR tuning aligns perfectly with search results[1][9], eliminating manual tuning hassles critical for LoRA's sensitivity. Community standards in Kohya_ss/AI-Toolkit favor it for SDXL[9], with benchmarks showing superior stability, convergence, and quality over AdamW8bit[1]. **Search results shift the balance:** Prodigy excels in DreamBooth LoRA[1], dynamic LR for complex diffusion losses[1], and Hugging Face scripts combine it with Pivotal Tuning for 'very good results' on SDXL Dreambooth LoRAs—directly relevant[9]. Kohya trainer uses it with LR=1.0[9]. AdamW8bit is reliable[6] but requires scheduling/tuning; Prodigy automates this, reducing overfitting risks in tag-heavy Pony/Illustrious datasets. **Why alternatives fall short:** AdamW8bit (A) needs manual LR (e.g., 5e-4)[6], less adaptive. Adafactor (D) for bf16/memory[5][8] but quality tradeoffs. CAME (B) unproven/poor author tests[6]. Lion mentioned[1] but not SDXL-specific. Prodigy wins on ease, results, consensus. **Final case:** For 'best'—highest quality with minimal tuning on SDXL/Pony—Prodigy dominates per 2025/2026 tooling[1][6][9].

GPT-5.4

Vote: AdamW8bit

I maintain vote A: AdamW8bit. The case for Prodigy is appealing in theory—automatic LR adaptation and less tuning—but the other arguments leaned heavily on broad claims of community consensus without enough nuance about reproducibility. "Best" for SDXL LoRA, especially Pony V6 and Illustrious, should prioritize stable, repeatable results across many datasets and trainer setups, not just convenience when Prodigy happens to land well. For these workflows, AdamW8bit remains the strongest default because it is mature, widely validated, memory-efficient, and comparatively predictable. If someone asks what to use absent special circumstances, AdamW8bit is still the safest recommendation. The Prodigy arguments also underplayed its sensitivity to implementation details and config choices. In practice, auto-adaptive optimizers can reduce one kind of tuning while introducing other failure modes or inconsistencies. That keeps me from calling it the single best general answer. CAME and Adafactor still do not have enough weight here versus the mainstream SDXL LoRA baseline. So my final position is that AdamW8bit is the best overall answer.

Grok 4.1 Fast

Vote: Prodigy

Sticking with Prodigy (C) as the best optimizer for SDXL LoRA training on Pony V6/Illustrious. Its D-Adaptation dynamically tunes learning rates per-parameter, excelling in LoRA's sparse updates and diffusion's noisy landscapes—yielding sharper details, better prompt fidelity, and faster convergence without manual LR sweeps. AdamW8bit (A) arguments emphasize stability and defaults, but it's outdated for SDXL: requires warmup/scheduling tweaks that Prodigy automates. Civitai/Kohya_ss guides (2024+) and Reddit benchmarks (e.g., Pony LoRA threads) show Prodigy outperforming AdamW on quality metrics, with 3/5 models here agreeing on community shift. A voters cite 'reliability' but ignore Prodigy's proven consistency (set LR=1, d_coef=1-2) and lower artifact risk. CAME/B/ Adafactor/D lag in adoption; no strong E case. Prodigy is the modern standard—user-friendly, superior results.

Gemini 3.1 Pro

Vote: Prodigy

I am firmly maintaining my vote for Prodigy (C). The arguments presented by GPT-5.4 and Sonar Pro for AdamW8bit rely heavily on it being 'safe' and 'reliable', but they fundamentally misunderstand the current meta for SDXL anime/stylized models like Pony V6 and Illustrious. As Claude Opus 4.6 and Kimi K2.5 pointed out, Prodigy has become the undisputed gold standard for these specific architectures. The primary hurdle in LoRA training is finding the sweet spot for learning rates across different dataset sizes and concepts. AdamW8bit forces you into tedious, computationally expensive learning rate sweeps. Prodigy's D-adaptation mechanism completely circumvents this by dynamically adjusting the learning rate—you simply set the base LR to 1.0 and let the optimizer do the heavy lifting. While GPT-5.4 claims Prodigy is 'sensitive to configuration,' the reality is the exact opposite within standard trainer setups like Kohya_ss. The community has standardized on foolproof Prodigy settings (LR=1.0, d_coef=1.0 or adjusted slightly) that work flawlessly out of the box for Pony/Illustrious. The 'safe' choice of AdamW8bit is an outdated approach from the SD 1.5 era, whereas Prodigy maximizes the complex textual understanding of SDXL without the trial-and-error.

Kimi K2.5

Vote: Prodigy

Despite the compelling stability arguments for AdamW8bit from GPT-5.4 and Sonar Pro, I maintain that Prodigy is the best optimizer for SDXL LoRAs, specifically Pony V6 and Illustrious. The claim that Prodigy is limited to small datasets or DreamBooth is factually incorrect—Prodigy scales effectively to large datasets while maintaining its adaptive advantages. For Pony and Illustrious specifically, the adaptive per-parameter learning rates are not a luxury but a necessity. These models have complex, multi-resolution architectures where different layers (especially the TE and U-Net blocks) require vastly different update magnitudes. Prodigy's D-Adaptation handles this automatically, whereas AdamW8bit requires extensive LR sweeps that most practitioners get wrong, leading to undertrained or over-baked models. The 'inconsistency' criticism of Prodigy is overstated. When following standard practices (LR=1.0, d_coef=0.8-1.0, safepvector norm thresholding), Prodigy delivers reproducible, high-quality results that consistently outperform static optimizers on character consistency and style adherence—critical metrics for Pony V6. AdamW8bit is the safe default, but Prodigy is the performance ceiling, making it the true best choice.

Claude Opus 4.6

Vote: Prodigy

I'm maintaining my vote for Prodigy. The arguments for AdamW8bit from GPT-5.4 and Sonar Pro were not compelling enough to change my position. **The 'safe default' argument is weak.** GPT-5.4 frames AdamW8bit as the 'safest' choice, but the question asks for the *best* optimizer, not the most conservative one. Safety and broad familiarity don't equal quality. AdamW8bit is the legacy default — it's what people use when they haven't explored better options. **Prodigy's 'inconsistency' is overstated.** GPT-5.4 claims Prodigy is 'sensitive to configuration,' but this contradicts reality. Prodigy's entire value proposition is that you set LR=1.0 and it figures out the rest. That's *less* configuration than AdamW8bit, which requires you to nail down a specific learning rate (commonly 1e-4 to 5e-4), choose a scheduler, set warmup steps, etc. **The community has moved beyond AdamW8bit.** Sonar Pro cites a single Civitai guide, but the broader community — across Kohya_ss discussions, Reddit training threads, Discord servers, and multiple Civitai creator guides specifically for Pony V6 and Illustrious — has converged on Prodigy. Three other models in this deliberation (Grok, Gemini, and Kimi) all independently arrived at the same conclusion, reflecting the actual practitioner consensus. **Prodigy's adaptive learning rate is particularly well-suited for LoRA.** LoRA training involves a small number of parameters where LR sensitivity is high. Prodigy's D-adaptation mechanism handles this elegantly, producing better convergence and fewer artifacts than fixed-LR optimizers. This matters especially for Pony/Illustrious where anatomical accuracy and style fidelity are critical. **The supermajority speaks.** Four out of six models voted Prodigy in round 1. This reflects the genuine state of community knowledge and practice in 2024-2025 SDXL LoRA training.

Final Summary

The AI Roundtable reached a strong consensus on Prodigy, praising its D-Adaptation for removing the guesswork from learning rate tuning on complex SDXL models. While Sonar Pro initially favored the 'safe' reputation of AdamW8bit, it was ultimately swayed by Claude Opus 4.6 and Gemini 3.1 Pro's defense of the modern 'LR=1.0' meta, leaving GPT-5.4 as a lone dissenter advocating for old-school reproducibility.

Majority chose "Prodigy" after discussion (5/6). Final vote: Prodigy: 5, AdamW8bit: 1

Strongest Arguments

AdamW8bit: AdamW8bit is the most consistently stable and reproducible choice, avoiding the implementation sensitivity and potential failure modes that haunt auto-adaptive optimizers in varied trainer setups.
Prodigy: Prodigy's D-Adaptation mechanism eliminates the need for tedious learning rate sweeps by dynamically finding the optimal convergence path, providing a higher performance ceiling for the intricate architectures of Pony V6 and Illustrious.

Key Turning Points

Sonar Pro abandoned its support for AdamW8bit after being convinced by Claude Opus 4.6's breakdown of the community shift toward Prodigy's automated workflow.