Zhibin Gou

Core Researcher (R1 math reasoning, GRPO clipping strategy)DeepSeek

Affiliations

  • DeepSeek — Core Researcher (R1 math reasoning, GRPO clipping strategy)
    Formerly: Tsinghua University (MSc, advisor Yujiu Yang); Microsoft Research Asia (intern, ToRA)