AI benchmarks need regular updates to stay meaningful

Original: Evaluations should not be static. We need to evolve evaluation sets / benchmarks over time so that they remain relevant and unsaturated.

Writing ELI5 summary…