translation is pervasive
Every act of communication is a miracle of translation.
A couple years ago, I took this introductory course on linguistics. It covered popular models ranging from morphology all the way up to pragmatics, with brief stops at phonetics, syntax, and semantics. At some point, our prof mentioned an obscure theory which went as follows.
Each individual has their own unique language which they’re using when speaking or writing. Fortunately, those languages are similar enough to one another so that we can understand each other. In this view, the reason you’re able to understand this text is not primarily because it’s written in English and you know English, but rather because there’s a huge overlap between me-ese and you-ese. Sure, fluency in languages as we know them correlates strongly with pairwise overlaps of such individual languages, meaning that I both don’t understand Japanese and I also wouldn’t be able to communicate directly with a Japanese-only speaker.
But this model of individual languages becomes better delineated when setting a conventional language in place and conducting thought experiments around it. Let’s assume two speakers, Alice and Bob, who by all means are fluent in English (e.g. they have fancy C2 certificates from Cambridge). If Alice is a lawyer, she’ll probably be highly proficient in what’s often referred to as legalese, this idiosyncratic style of speaking assumed by people dealing with legal matters. Searching for instances of legalese led me to: “In witness whereof, the parties hereunto have set their hands to these presents as a deed on the day month and year hereinbefore mentioned.”
Like sure, it’s English, but it’s idiosyncratic enough as to require one to first become proficient in this specific dialect of sorts. Let’s try a similar one: “A general complex matrix is positive definite if and only if its Hermitian part has all positive eigenvalues.” Again, it seems to be written English, in that an NLP model trained to classify the language would rather choose English over German, but that doesn’t say much. You still need deliberate practice in order to accustom yourself to this otherwise impenetrable way of speaking. In the limit, you might consider each individual as having a language of their own which differs ever so slightly from the one of their peers (e.g. neighbors, friends, colleagues), and which is radically different from the one of a remote person.
What’s more, if you and I have slightly different languages, then any communication between us is an act of translation. In our case, it’s quite covert, because our languages are so close that I’m not putting much effort in phrasing things in a particular way so that you’d find it more natural. Similarly, you’re not putting much effort into deciphering my language because it’s not that cryptic for you. If you had a radically different background, you might find it a bit more inaccessible.
This leads us to the case in which the inter-language distance is large between author and reader, between speaker and listener. In those cases, translation become more overt. Some dedicate their careers to science communication, trying to translate advanced notions in a language that is easy to understand for the layman. In my circles, there’s a lot of hype around being able to distill technical concepts while intimately preserving their essence, in contrast to the popularizers who allegedly focus more on the entertainment value.
There’s an extreme case of translation when considering AI. In the field of explainable AI, researchers essentially work on devising tools which can translate high-dimensional representations employed by ML models into human-readable terms. For instance, you might want to understand the role of a neuron not in terms of the hundreds of weights which link it to other neurons, but in terms of the features it generally responds to. In my thesis, I’m working on a way of distilling legible knowledge graphs (e.g. “apple” IS_A “fruit”) directly from clouds of 784-dimensional embeddings which arise in an ML model. It’s like trying to translate arbitrary human knowledge in hamster terms.
As another example, you might see personalized education as the task of translating concepts in the unique language of the student, while accounting for their idiosyncrasies and quirks. If they’re fluent in employing the notion of energy, you might frame the notion of training an ML model as a system which strives for a low energy state, the headspace of LeCun it seems.
Related to this pedagogical example, I just finished reading The Diamond Age by Neal Stephenson, which depicts a magic book called the Primer. Its sole purpose is to translate useful concepts in the user’s language. The Primer itself is not magic, but rather a really advanced piece of engineering. However, it just happens to depict notions in a magical way. For instance, two thirds of the book in, Nell recapitulates the history of computing as Princess Nell follows a journey across castles:
So it went, as Princess Nell proceeded from castle to castle, inadvertently finding herself at the helm of a full-fledged rebellion against King Coyote. Each castle depended on some kind of a programmable system that was a little more complicated than the previous one. After the Castle of the Water-gates, she came to a castle with a magnificent organ, powered by air pressure and controlled by a bewildering grid of push-rods, which could play music stored on a roll of paper tape with holes punched through it. A mysterious dark knight had programmed the organ to play a sad, depressing tune, plunging the place into a profound depression so that no one worked or even got out of bed. With some playing around, Princess Nell established that the behavior of the organ could be simulated by an extremely sophisticated arrangement of water-gates, which meant, in turn, that it could just as well be reduced to an unfathomably long and complicated Turing machine program.When she had the organ working properly and the residents cheered up, she moved on to a castle that functioned according to rules written in a great book, in a peculiar language. Some pages of the book had been ripped out by the mysterious dark knight, and Princess Nell had to reconstruct them, learning the language, which was extremely pithy and made heavy use of parentheses. Along the way, she proved what was a foregone conclusion, namely, that the system for processing this language was essentially a more complex version of the mechanical organ, hence a Turing machine in essence.Next was a castle divided into many small rooms, with a system for passing messages between rooms through a pneumatic tube. In each room was a group of people who responded to the messages by following certain rules laid out in books, which usually entailed sending more messages to other rooms. After familiarizing herself with some of these rule-books and establishing that the castle was another Turing machine, Princess Nell fixed a problem in the message-delivery system that had been created by the vexatious dark knight, collected another ducal coronet, and moved on to castle number six.This place was entirely different. It was much bigger. It was much richer. And unlike all of the other castles in the domain of King Coyote, it worked. As she approached the castle, she learned to keep her horse to the edge of the road, for messengers were constantly blowing past her at a full gallop in both directions.It was a vast open marketplace with thousands of stalls, filled with carts and runners carrying product in all directions. But no vegetables, fish, spices, or fodder were to be seen here; all the product was information written down in books. The books were trundled from place to place on wheelbarrows and carried here and there on great long seedy-looking conveyor belts made of hemp and burlap. Book-carriers bumped into each other, compared notes as to what they were carrying and where they were going, and swapped books for other books. Stacks of books were sold in great, raucous auctions—and paid for not with gold but with other books. Around the edges of the market were stalls where books were exchanged for gold, and beyond that, a few alleys where gold could be exchanged for food.
The way I see it, Nell essentially learns about punched card programming, low-level machine code, the von Neumann architecture, and the Internet while weaving herself into the narative fabric, losing herself in the story. A story which translates specific bits of knowledge in a language which is accessible to a child, which adopts constructivism as a first-class citizen by having adventures build on previous adventures. Quite a feat of engineering, indeed.
If we abstract away from the specific instances of translation practiced in day-to-day life, explainable AI, and personalized education, we can invoke a new common property. There’s always this trade-off between adequacy and fluency, originally recognized in machine translation. Adequacy refers to the goal of conveying information in a way which preserves meaning as good as possible. Fluency refers to the goal of conveying information in a way which is natural and familiar for the person at the end of the line.
We could argue that distillers pride themselves with prioritizing adequacy, while popularizers focus primarily on fluency. In explainable AI, adequacy has been instantiated as functional-groundedness, the property of interpretability tools to accurately capture the model’s knowledge. In contrast, fluency has been instantiated as human-groundedness, the property of generating legible and cognitively ergonomic explanations above all. In education, adequacy would mean trying to bluntly convey precise definitions with no regard to the user’s language, while fluency would mean going full on with magical journeys without conveying concepts deemed important. In all those instances, the two objectives are competing. However, we can do more than just move across the Pareto frontier defined by the competing incentives. We can push it further out with the appropriate technology, just like Hackworth in The Diamond Age.