One RL to See Them All: Visual Triple Unified RL

Introduces the V-Triune system and Orsta models (7B/32B) that unify visual reasoning and perception tasks via reinforcement learning. Up to +14.1 improvement on MEGA-Bench Core.

Paper (arXiv)GitHub

Paper

arXiv HTML

visionreasoningtrainingresearch