Benchmark grades AI agents on biology research reasoning, not just answers

Original: 📢Introducing BiomniBench — the first benchmark focused on evaluating the process, not just the final answer, of AI agents on long-horizon biology research tasks.

Source: x.com ↗

Writing ELI5 summary…