AI labs spend billions on training data but lack public benchmarks

Original: If this is true, then I really hope we have some better public evaluation benchmarks that are released as a result of this spending. Even top benchmarks for coding agents are usually super small. For

Source: x.com ↗

Writing ELI5 summary…