Over 36 hours we ran four DPO training iterations against Qwen2.5-Coder-7B-Instruct, trying to push HumanEval pass@1 above the base model's 87.20%. The first three iterations failed in different ways -9.15pp, -1.22pp, two NO-GO calls. The fourth rec...
In the previous posthttps://dev.to/namakoo/-curating-python-failures-for-dpo-notes-from-the-rejected-side-2ff0, I described the curation philosophy for IDFU's rejected-side dataset — why I avoid synthetic bug generation, why stub detection matters, w...