Training a model on data about the world (e.g. Wikipedia, books, articles, papers, etc.) makes it difficult to box, as it's likely to exploit loopholes in the world (e.g. from physics to sociology) based on its evidence. Given this, we could build a one-way model-to-human channel by tasking the model with learning a physics which reliably causes lifelike structures to emerge from noise, before us gleaning insights from the resulting structures in a Voyager-pretending-Earth-is-alien sort of way. Life could be operationalized as entropy-fighting across space (e.g. forming unlikely chunks of matter) and time (e.g. changing in unlikely ways from moment to moment), while the physics to be learned could be modeled by a transformer mapping particle-tokens from one timestep to the next. Local-only interactions could help fight the quadratic attention costs, while particle-tokens could have slots for velocity, momentum, and chemical properties, depending on the targeted level of abstraction.
At the moment, I think this is not a particularly promising frame, for two main reasons. First, it might be very difficult to try to understand the emerging aliens beyond mathematical formalism which might as well be framed as a narrow AI task. Second, existing ML models seem to exhibit enough alienness for people to find it insightful to probe them for hours, intentionally going for more alienness might make it no more practical than an Oujia board.