Minitron
paperModel compression technique combining depth pruning, width pruning, and distillation. Compressed Llama 3.1 8B to 4B and Mistral NeMo 12B to 8B with 1.2-2.7x speedup and minimal quality loss. Later evolved into MiniPuzzle (used in Nemotron-H 47B). Accepted at ICLR 2025.
Paper
arXiv: 2408.11796
Venue: ICLR 2025