Introduced Reinforcement Learning from Community Feedback (RLCF) for aligning AI with scientific reasoning, accompanied by a dataset of 700,000 scientific preference signals.

Outputs 2

Scientific Judge

paper

Introduced Reinforcement Learning from Community Feedback (RLCF) for aligning AI with scientific reasoning.

Scientific Judge Dataset

dataset

Dataset of 700,000 scientific preference signals for alignment research.

trainingreasoningresearch