Hypothesis Subspace

Representational Alignment

As a step in aligning AGI to human intent, we might want to first align the latent representations of ML models with the latent representations of humans. This would mean bypassing human language and behavior entirely, and incentivizing the ML model to make use of representations which can accurately be translated to and from neural activity. This should generally ensure that human representations and human representations alone are employed by the ML model in its internal thought process. Finally, as a slider of capability, we might gradually shift towards relaxing the translatability constrain by applying it to model shards only, rather than holistically. This is analogous to a team of developers working on a codebase, where each one is only responsible for a chunk of it. The codebase is still represented in human brains, but no single brain represents it.

[[how-does-representational-alignment-relate-to-concrete-challenges-in-alignment]]