← back
x.comKexin HuangMon, May 18, 2026, 8:33 AM PDT
score 16.5
53likes10RT

Benchmark grades AI agents on biology research reasoning, not just answers

Original: 📢Introducing BiomniBench — the first benchmark focused on evaluating the process, not just the final answer, of AI agents on long-horizon biology research tasks.

Source: x.com

Writing ELI5 summary…