RL for subgoal decomposition in formal mathematical reasoning. Includes the DeepSeek-ProverBench evaluation suite.

Model Details

Architecture MOE
Parameters 671B

Paper

reasoningopen-weight

Related