← back
x.comSureshSun, May 31, 2026, 8:20 PM PDT
score 15.4
1likes1reply

Benchmark datasets repeat same bugs without deduplication

Original: @cwolferesearch mining failure cases works until you need to deduplicate them; otherwise the benchmark replays the same 3 bugs.

Source: x.com

Writing ELI5 summary…