Hypothesis Subspace

What are concrete unilateral training signals?

  • adversarial training: Minimally tweak evaluator inputs to cause the largest deviation in output from the ground-truth. If it's an outcome-directed classifier, a picture with sad people might be tweaked in a way imperceptible to the human eye so that it's classified as containing happy people. Obviously, use it as an impactful training example with containing the original output. Extremely relevant note: there's a notion of provable adversarial robustness, with guarantess of robustness through adversarial attacks.
  • [[contrastive-dreaming]]: Radically tweak evaluator inputs to cause a "perfect" evaluation. If it's an outcome-directed regressor, you'd tweak a neutral picture to elicit the concept of "happy people", reaching unrealistic DeepDream-like territory. Use that as a training example, coupled with a low-humanness outcome, as you can clearly tell that it's not what you want. While adversarial examples are crafted to maximize the loss, contrastive dreams are crafted to minimize it.
What are concrete unilateral training signals?