x.comOmar KhattabFri, May 22, 2026, 9:03 AM PDT
score 15.7
433likes44RT7reply
Training AI to juggle multiple goals instead of one
Original: RL has almost always meant trying to maximize a scalar reward.
Source: x.com ↗
Writing ELI5 summary…
Original: RL has almost always meant trying to maximize a scalar reward.
Source: x.com ↗
Writing ELI5 summary…