Oversight Leagues
Similar to how [[ideological-inference-engines]] are a merger of [[deontic-arrays]] and [[memetic-colonies]], oversight leagues are a merger of ideas from [[contrastive-dreaming]] and [[parametric-ecologies]], an attempt to combine the strength of the previous frames. Oversight leagues rely on placing the agent-in-training and an overseer evaluator in a GAN-like co-evolving feedback loop (a training regime expressible in [[parametric-ecologies]]), while keeping the evaluator one step ahead using robustness techniqies (of which [[contrastive-dreaming]] is an example). Additionally, notions from league training are brought in to complete the frame. In general, an oversight league consists of:
- unilateral training signals: Ways of improving the performance of one single component of the agent-evaluator duo without making use of the other. For instance, [[contrastive-dreaming]] or vanilla adversarial training. An evaluator might be trained to systematically predict its own blindspots.
- bilateral training signals: Ways of improving the performance of a component of the agent-evaluator duo using the other component. An evaluator might be trained to spot the agent's shortcomings, while an agent might be trained to pick up the evaluator's blindspots. Bilateral training signals further split into:
- present exploitation: Training a component against the latest version of the opposite component.
- past exploitation: Training a component against all the past versions of the opposite component, or a tricky mixture thereof. This is meant to avoid a component forgetting to account for earlier exploitation strategies (based on league exploiters in AlphaStar).
- future exploitation: Train a component against future versions of the opposite component by anticipating short-term counterplay. Consists in running optimization across an unrolled version of a (boxed!) version of the opponent which gets a few more optimization steps (based on unrolled GANs).
The core idea behind oversight leagues is that the evaluator is helped to better understand its systematic blindspot in relation to the agent. The agent is helped in a similar way, but unilateral training signals can be used to keep the overseer one step ahead.