Ant Group's general-purpose (non-thinking) LLM family, introduced with "Every FLOP Counts" — a technical report on training massive MoE models without premium H100/A100 clusters. Includes Ling-mini (16B dense) for edge and low-latency use cases.

Outputs 2

Every FLOP Counts: Scaling a 300B MoE LING LLM

paper

Technical details on training massive MoE models without premium clusters. Core philosophy behind Ant's efficient scaling approach.

arXiv: 2503.05139

Ling-mini / Ling-Lite

model

Smaller Ling variants for edge and low-latency use cases.

Architecture DENSE
Parameters 16B
moeopen-weightefficiency

Related