AI benchmark testing costs far higher than publicly reported

Original: @cwolferesearch @tmkadamcz And ProgramBench (200 tasks) has reported something like 5-10K per model run on the high end (some tweet, hard to find on the spot). PB underelicits the capabilities, imo. S

Source: x.com ↗

Writing ELI5 summary…