← back
arXivAntonio Franca, Alexander TongSat, Jun 6, 2026, 7:35 PM PDT
score 15.9

Popular metric for evaluating text generation models is fundamentally broken

Original: Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics

Source: arxiv.org

Writing ELI5 summary…