MoE scaling laws paper exploring efficient mixture-of-experts language models.

Paper

arXiv: 2507.17702

moescaling