← back
x.comOmar KhattabFri, May 22, 2026, 9:03 AM PDT
score 15.7
433likes44RT7reply

Training AI to juggle multiple goals instead of one

Original: RL has almost always meant trying to maximize a scalar reward.

Source: x.com

Writing ELI5 summary…