Pretrained models are universal computation engines

If you take a pretrained language model and fine-tune 0.1% of its parameters on a different task with a different modality (e.g. vision), then the model fares extremely well. This might be explained by either a shared structure of natural signals or a shared set of computational primitives useful in any situation. In this view, pretrained models on neural activity might help derive extremely powerful primitives.