Hypothesis Subspace

Deontic Arrays

What if we engineered discreteness into the objective function, while preserving the appeal of end-to-end differentiability? A finite set of discrete structures (e.g. attractors, repellers, dipoles, etc.) could be used to exert force on a model and influence its dynamics, a bit like DeepMind actively shaping plasma inside a fusion reactor using a set of magnets. In the case of deontic arrays, individual structures could be human principles, while the target shape would correspond to a region of state space deemed safe. The discreteness of deontology (i.e. finite sets of moral laws) lends itself nicely to various generalization schemes, such as cross-validation followed by targeted red teaming. Deontic arrays could also populate a host of different state spaces (e.g. latent space during inference, model space during training, optimizer space during meta-learning, etc.).

[[how-do-deontic-arrays-relate-to-concrete-challenges-in-alignment]]
[[how-could-deontic-arrays-help-avoid-hfdt-takeover]]

Deontic Arrays