Nature Language Model (NatureLM)
modelSequence-based science foundation model from Microsoft Research AI for Science that unifies small molecules, materials, proteins, DNA, and RNA for text-driven scientific discovery. Available in 1B, 8B, and 46.7B (8×7B MoE) sizes.
Trained on hundreds of billions of curated tokens from biology, chemistry, and materials science. Enables cross-domain integration tasks combining knowledge across modalities. Top performance on many scientific tasks, matching specialist models, with applications in drug discovery, protein design, material engineering, and RNA design. By Xia, Jin, Xie, and 75+ co-authors at Microsoft Research AI for Science.
Model Details
Architecture MOE
Parameters 46.7B
Active params 13B
Variants
| Name | Parameters | Notes |
|---|---|---|
| NatureLM 1B | 1B | — |
| NatureLM 8B | 8B | — |
| NatureLM 8x7B | 46.7B | Mixture of Experts |
Paper
arXiv: 2502.07527