Reinforcement learning agents model the world as ds with rewarded transitions

In reinforcement learning, as opposed to supervised learning, agents are tasked with coming up with a strategy, a policy for acting in their world. Often, this is done by augmenting finite-state dynamical systems with a notion of reward. This is done by attaching a reward to each transition, and describing a successful policy one that maximizes the received reward in the long run.