dixit kernel functions
For me as a student of AI, the concept of network feels much closer to the concept of brain than to the concept of Internet. However, for someone working on cloud infrastructure, the reverse is more likely to be the case. Similarly, for me as someone with a software development background, the concept of architecture feels much closer to the concept of computer than to the concept of building. However, for an architect or interior designer, the reverse is again more likely to be true. A good model of my conceptual framework (e.g. a set of semantic embeddings for each concept) is by necessity different from someone else’s.
This is something you find way more often in cognitive psychology compared to machine learning. In machine learning, a model is often frozen after training resulting in an immutable blob of parameters. The representations it internalized during training are often fixed across a range of industrial applications – all instances of the model “conceive” of things in the same exact way. In contrast, cognitive models aim to simulate the thought process and actions of one unique individual, based on their own experience, interests, and knowledge. I often find myself wanting the best of both worlds in my work: a model of the mind which is as tailored to one’s thoughts as a cognitive model, yet as informationally rich as a machine learning model.
To make away with the vague handwaving of “internalized representations”, simply consider machine learning models which embed words into a semantic space. The similarity valuations I opened this article with can be tackled by a model, too. In fact, I investigated the comparisons using the classic Word2Vec corpus of embeddings, and it appears to place the concept of network closer to Internet than to brain. Additionally, it seems to find architecture to be closer to building than to computer. Both of these valuations diverge from how I happen to see things, which makes it questionable whether off-the-shelf machine learning models can be directly repurposed as cognitive models.
Cloud of Word2Vec embeddings (source)
There’s a barely challenged assumption at play here which circulates in many machine learning circles. The representations internalized by huge models (e.g. GPT-3), having been exposed to huge datasets, are deemed to better fit the way things actually are in the world compared to a single individual’s worldview. There’s a singular bank of representations encapsulated in a single model because the world is deemed singular. I only have an introductory philosophy course and some popular readings under my belt, but it seems to me that the epistemic philosophies underlying the two communities are vastly different, ranging from a hardcore objectivity of absolute truth over to a softer enactivism based on embodied cognition. I’d say the middle ground is taken by ML researchers working on robotics, and cognitive psychologists talking of perception as being the process of recovering physical properties of the environment. The rationalist forums might be a good place to find more debate on this.
The philosophy portrayed above is also reflected in the formalism of choice underlying many ML representations – embeddings in a latent space. Words, for instance, are said to be embedded in, or projected on, a lower-dimensional space. The reduction in dimensionality is often equated with getting rid of happenstance noise and narrowing in on the absolute essence. The words are taken as input and fixed in place for good. How can this way of representing concepts be rendered more fluid? One way is to have multiple conceptual frameworks projected side by side in the same space, such as is the case with “aligned embeddings” in Conceptnet Numberbatch. Another way is to have separate latent spaces which can be aligned and mapped to one another, like in this machine translation project.
Aligning conceptual frameworks (source)
Yet another way is to talk not of concepts being located at a distance from each other, but of a kernel function which, given two concepts, outputs how similar they are to one another. Relevant here is the fact that multiple kernel functions can take in the same words, and output different similarities. They enable a neat way of talking about both absolute symbols and multiple ways of relating them at the same time.
Relying on abstractions alone is demanding, so let’s move here to an analogy to the board game DiXit. If you know the basic rules of DiXit, feel free to skip to the next paragraph. The game unfolds as follows. During a given round, each of the 3-12 players has 5 cards with abstract illustrations. When it’s your turn, you have to (1) choose one of your cards and, (2) openly state a theme related to that card. Then, each player submits one of their covered cards to a temporary deck of N cards (together with your original choice). After all cards are shuffled, uncovered and lined up on the table, each player votes which card they think was the original, the one of the round master. A few rules determines how the guesses translate into points, but in brief: players are encouraged to submit cards which appear to be good candidates for being the originals (they’re close to the theme), while the round master is encouraged not to make the connection too obvious and also not too vague (only having part of the others guess correctly).
Sample of DiXit cards (source)
Besides enjoying yourself, playing DiXit is all about running adversarial attacks on kernel functions. When it’s not your round, on one hand, you have to submit cards which appear to be close to the theme for the other players. You have to exploit associations which you know the others are likely to have. For instance, an illustration of a lit candle might be close to the concept of miracle for someone well-versed in Jewish culture. When it’s your round, on the other hand, you have to strike a balance. If all others find your card close to the theme, you don’t get points because it’s too obvious. If nobody gets it, you also lose points because the connection is too weak. You need to sample some options and pick a card-theme combo which achieves the largest possible spread of perceived similarity across the other players.
You’ll also notice DiXit is about similarities between stated themes and illustrations on cards – between texts and images. The task of connecting texts and images has long been tricky in machine learning, but in the past years a host of (fixed) multi-modal models like OpenAI CLIP aim to do exactly that, learn to tell how close texts are to images.
I mentioned earlier that I want a model of the mind which is both tailored to one’s thought process and informationally rich. The conceptarium aims to strike a bit of a balance here, by storing your thoughts and estimating how much you’ve been thinking about them. However, the semantic embeddings on which the conceptarium relies in order to understand the meaning of your language and imagery still suffers from the same issues of immutability. This can be tackled by fine-tuning a model on your own thoughts, although I have yet to find data created by users which connects texts with images in large numbers. Maybe through the captions of their figures as researchers, the titles of their artworks as illustrators, or the description of their sketches as architects. For the time being, though, a fixed machine learning model is better than no model at all.