adversarial training: Minimally tweak evaluator inputs to cause the largest deviation in output from the ground-truth. If it's an outcome-directed classifier, a picture with sad people might be tweaked in a way imperceptible to the human eye so that it's classified as containing happy people. Obviously, use it as an impactful training example with containing the original output. Extremely relevant note: there's a notion of provable adversarial robustness, with guarantess of robustness through adversarial attacks.
[[contrastive-dreaming]]: Radically tweak evaluator inputs to cause a "perfect" evaluation. If it's an outcome-directed regressor, you'd tweak a neutral picture to elicit the concept of "happy people", reaching unrealistic DeepDream-like territory. Use that as a training example, coupled with a low-humanness outcome, as you can clearly tell that it's not what you want. While adversarial examples are crafted to maximize the loss, contrastive dreams are crafted to minimize it.