Hypothesis Subspace

What if training humans is not feasible?

Perhaps the internal representations used by the ML model are so alien that trying to teach humans this foreign language would fail spectacularly. The main source of replies to this family of counterarguments might come from the fact that the ML model itself is incentivized to use its representational resources efficiently, striking a balance between "speaker economy" and "message clarity". We have reasons to believe the ML model's internal language and human languages might exhibit convergent evolution due to the perks of resource efficiency.

Still, despite the ML model coming up with its own internal "mathematical" instead of "written multiple-paragraph", for various parts of its conceptual framework, it might still be too advanced. Imagine having to learn Chinese in one-week tops (as an attempt to artificially constrain capabilities and at the same time boost the difficulty of a human-made problem into superhuman territory). Alternatively, imagine teaching non-human primates to write coherent English by pressing on a keyboard. Unlikely to happen to a reliable extent.

This feels like the opposite side of the [[what-if-the-bottleneck-layer-is-too-constraining]] struggle. It might be interesting to add additional constraints on the internal representation inspired by [[representational-alignment]] to help bring the two modes of thought closer, in order to avoid this clash in sophistication. This might refer to both making the model's representations easier to perceive (the realm of [[bridger-languages]]), but also closer to the human thing (the realm of [[representational-alignment]]).

Another approach would be to split the task of acquiring the ML model's language into subtask through a chain of bridger languages. The whole chain would be learned by the ML model, where each element of the chain would be constrained to be within a threshold of divergence (e.g. KL divergence) with its neighboring languages. Additionally, the one end of the chain would have to be a human language. Following this, people would gradually learn to bridge the representational gap. Potentially, new generations might gradually move across the gap, each internalizing a more alien version of the language, which would come to feel natural.

What if training humans is not feasible?