Optimizer based on matrix orthogonalization. Research demonstrating scalability for large language model training.

Outputs 2

Muon is Scalable for LLM Training

paper
Citations 1

Muon Library

library
trainingoptimization