arXivChao Wen, Jacqueline Staub, Adish SinglaTue, Jun 2, 2026, 6:25 AM PDT
score 17.1
Benchmark reveals AI struggles with visual geometry coding tasks
Original: TurtleAI: Benchmarking Multimodal Models for Visual Programming in Turtle Graphics
Source: arxiv.org ↗
Writing ELI5 summary…