MoE scaling laws paper exploring efficient mixture-of-experts language models.

Paper

moescaling