arXivAntonio Franca, Alexander TongSat, Jun 6, 2026, 7:35 PM PDT
score 15.9
Popular metric for evaluating text generation models is fundamentally broken
Original: Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics
Source: arxiv.org ↗
Writing ELI5 summary…