Optimizer based on matrix orthogonalization. Research demonstrating scalability for large language model training.

Outputs 2

Muon is Scalable for LLM Training

paper

arXiv: 2502.16982

Muon Library

library

GitHub Repository

trainingoptimization