← back
x.comDeyao ZhuThu, Jul 2, 2026, 8:19 AM PDT
score 17.2
216likes61RT15reply

New benchmark tests AI agents on long tasks of 12-72 hours

Original: Introducing EdgeBench, a benchmark designed to study how agents learn from environments over at least 12~72-hour runs. We find that performance follows a log-sigmoid function of environment interactio

Source: x.com

Writing ELI5 summary…