New benchmark tests AI agents on long tasks of 12-72 hours

Original: Introducing EdgeBench, a benchmark designed to study how agents learn from environments over at least 12~72-hour runs. We find that performance follows a log-sigmoid function of environment interactio

Source: x.com ↗

Writing ELI5 summary…