"Embarrassingly Simple Self-Distillation Improves Code Generation." Shows that an LLM can improve at code generation using only its own raw outputs, without a verifier, teacher model, or reinforcement learning. Sample solutions from the model, then fine-tune on those samples with standard SFT.

SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems. Generalizes across Qwen and Llama models at 4B, 8B, and 30B scale. The method resolves a precision-exploration conflict in LLM decoding, reshaping token distributions in a context-dependent way. By Zhang, Bai, Zheng, Jaitly, Collobert, and Zhang.

Paper

codingresearchfoundational