Training AI to juggle multiple goals instead of one

Original: RL has almost always meant trying to maximize a scalar reward.

Writing ELI5 summary…