x.comDeyao ZhuThu, Jul 2, 2026, 8:19 AM PDT
score 17.2
216likes61RT15reply
New benchmark tests AI agents on long tasks of 12-72 hours
Original: Introducing EdgeBench, a benchmark designed to study how agents learn from environments over at least 12~72-hour runs. We find that performance follows a log-sigmoid function of environment interactio
Source: x.com ↗
Writing ELI5 summary…