Ouro
modelNamed after the recursive Ouroboros, Ouro is an open-source family of Looped Language Models (LoopLM) that embed reasoning during pretraining rather than as a post-training stage. Dense decoder-only Transformers (RoPE, SwiGLU, sandwich norm) with recurrent parameter sharing across 4 loop steps and an entropy-regularized objective for learned per-token depth allocation. Released at 1.4B (24 layers) and 2.6B (48 layers), both pretrained on 7.7T tokens. Thinking variants are released alongside base weights.
Delivers 2–3× parameter efficiency over standard Transformers of equivalent quality: Ouro-1.4B matches 4B baselines (GSM8K 78.92 vs 72.86 for Qwen3-4B) and Ouro-2.6B outperforms 8B baselines on math (MATH500 90.85 vs 62.30 for Qwen3-8B), with 1.4B/2.6B matching 12B standard LMs across several benchmarks. Ablations attribute the gains to superior knowledge manipulation rather than increased capacity, and show LoopLM reasoning traces align with final outputs more tightly than explicit chain-of-thought.
Completes a trio of 2025 looped-Transformer work alongside Google's "Reasoning with Latent Thoughts" (theoretical, ICLR 2025) and ByteDance Seed's Parallel Loop Transformer (inference-efficiency, Oct 2025). Collaboration between ByteDance Seed and researchers at UC Santa Cruz, Princeton, Mila / U. Montreal (incl. Yoshua Bengio), Peking University, CMU, and UPenn. CC-BY-SA 4.0.
Model Details
Variants
| Name | Parameters | Notes |
|---|---|---|
| Ouro-1.4B | 1.4B | 24 layers, 4 loops |
| Ouro-2.6B | 2.6B | 48 layers, 4 loops |
| Ouro-1.4B-Thinking | 1.4B | — |
| Ouro-2.6B-Thinking | 2.6B | — |