of graphs and spaces
Most knowledge management tools today allow users to connect documents by means of links, folders, and tags. Links relate two documents, folders bring together whole nested and mutually exclusive sets of documents, and tags are folder-like but allow overlaps. Let’s call this the graph family. A minority of tools, in contrast, mostly make away with those representations and rely almost exclusively on semantic embeddings extracted from the documents themselves which are then relatively easy to compare. Let’s call this the spatial family. Note, however, that if a tool simply automates document tagging, I’m still considering it graph-like for the purpose of this article – I want to mostly focus on the nature of representations, what they capture and how they’re manipulated, rather than how they’re obtained. Also, note that those two families by no means capture the whole space of representing ways things relate to each other. Technicalities aside, I think both the graph and space families have their pluses and minuses, hinting at the value of combining their strengths into a richer unified representation. I want to use this article to (1) list those pros and cons, (2) cover a bit of prior art, and (3) present a somewhat novel approach to merging the graph and space representations. “Somewhat” novel because it’s entirely a combination of established paradigms, but hey, creativity is remixing.
- pros and cons
Let’s begin with the graph family. First, note that both linear note-taking and folder-based approaches qualify for membership, because trees and linked lists are also graphs, just less interesting ones. Now, an important advantage of links, folders, and tags as means of connecting documents is that they’re all reliable and predictable. If you put a few docs in a folder or tag them, you know what to expect later. Opening the folder will show you what you put in there before and only that, and same with tags. Following a link will get you to the same doc it got you to a month ago, and you know precisely which one. It’s easy to understand and interact with this representation because those discrete relations are articulated explicitly in human-readable form, so there’s not much uncertainty.
However, the very discrete symbols which guarantee the robustness of the graph family make it falter in the face of fuzziness and ambiguity. Connections which are not recognized explicitly are essentially non-existent. Tags and folders related in scope prompt awkward decisions on the part of the user, not knowing which would fit better and being forced to collapse their intuitions into black or white. It’s often tricky to understand how documents not present in the graph might relate to its nodes – I can see what notes are related to this note of mine, but pure graph approaches make it difficult to surface notes related to an external piece of content, a conversation I’m having, or someone else’s notes, failing to deliver on the promise of extended memory due to finicky recall. It’s clunky and not robust against the noise of unstructured data, even if you can occasionally PageRank your way to promising nodes.
If we now switch to the spatial family of representations, we find the opposite pattern. They can handle fuzzy variations alright, because things related to each other in obvious ways naturally go together due to similar embeddings. Every document is connected to every other one in some unique mostly-non-human-readable way. However, they’re difficult to understand and control. You never really know what you’re gonna get when you look something new up. That’s sometimes a plus, sometimes a minus, depending on context. There’s no reliable way of pointing to specific documents besides almost repeating the actual document, which defeats the purpose. Same for grouping documents in predictable ways. Connections recognized by the user are almost never truly taken into account by the encoder model, because it’s almost always a static pretrained blob of weights and biases.
In short, the spatial family excels at ambiguity and uncertainty but largely fails at reliability and customization, while the graph family does the exact opposite. Ideally, we’d want the best of both worlds: a controllable and robust representation of knowledge which can also reliably handle new documents and commonsense relations. Being able to predictably group docs while still being able to automate estimates of relatedness among them, being able to recognize and state new connections while also having those applied at scale on large bodies of external items.
How can we achieve that? Let’s have a look at some prior art.
- just fine-tune the encoder
It’s possible in theory to simply incorporate the explicit relations stated by the user (links, folders, tags) into the encoder model by means of training it a bit more on the user’s data starting from a pretrained checkpoint. The model is incentivized to slightly reshape its semantic space so that things seen as connected by the user move closer to each other, while preserving most (but inevitably not all) of its prior knowledge. This might work in an ideal world of plentiful compute, but in practice you’d need some serious resources for fine-tuning the model. Either that or you’re patient enough to wait a few hours for each new model update… which is several orders of magnitude slower than seamless interaction, and also pretty bad for the planet. Current state-of-the-art ML models struggle to learn continuously as they’re being used in an online setting. Also, you still lack the perfect reliability of discrete symbols, even if you try your best to approximate them in a purely spatial representation.
- graph neural networks
To oversimplify, GNNs work by taking in a graph where each node has an embedding, and propagating information around the network via edges. It’s as if you’d look at a document and tell it that it is the average of the five other documents it
spends most time with links to. As information from neighbors gets injected in a node’s embedding, its own embedding starts to adjust due to this “social pressure” of its local subgraph. After waves of updates propagating across a graph of documents, their embeddings will be enriched with information in a way directed by users through explicit connections. Document embeddings become more contextualized and unique to the knowledge base. It’s as if physical forces of attraction specified by the user result in documents being moved around the high-dimensional semantic space away from their initial locations. This video on GNNs in Obsidian might help.
While this approach manages to bring together documents in an approximation of discrete symbols without the burden of expensive training, it still misses a few features on our checklist. First, the background model of how things relate to each other is still unchanged, documents simply play around it through a set of new embeddings, trying to implement user-specified relations in an otherwise static and uncooperative environment. This is immediately apparent when probing the space with an external document (e.g. piece of content, conversation, someone else’s notes, etc.), as the relations between those are unchanged. It’s as if GNNs here manage to restructure the existing knowledge graph in semantic space, aligning relations to the user’s intent, but struggle to apply the same adjustments to external items, as if almost not sharing the same space with them anymore.
What if we could change the very structure of semantic space like in encoder fine-tuning, but computationally cheaper and more predictable than actual training? That’s where the novel approach comes in.
- semantic navigation system
For the next few paragraphs, try thinking of distance in terms of travel time, rather than actual distance. A supermarket might be 10 minutes away from you by foot. The supermarket might be 5 minutes away from a bank, and so on. Now, how do municipal authorities change how far things are from each other in terms of travel time without actually moving buildings around? It’s actually pretty easy – they implement a public transport network. Buses, subways, and trams all change the time it takes for you to get to different locations around the city. They achieve this by implementing a network of stations which are linked together by predefined routes, though new bus lines can be created and removed relatively quickly if necessary. Crucially, the public transport network doesn’t need to cover all available space – you don’t need to live in a bus station in order to be able to reach another one, you simply walk to the appropriate one and get on a bus to get there.
To recap the obvious, public transport networks manage to adjust distances between buildings not by moving them around, but by creating and removing routes near them. What’s more, the transport network seamlessly integrates with the space it inhabits, its very purpose being merely to adjust its topology for citizens, tourism aside. What’s more, this environment can be grasped both by a person thinking about how to best get somewhere, but also by a path finding algorithm planning an optimal route. Now, what if we changed physical space to semantic space and the transport network to a knowledge graph?
That would probably give us a knowledge representation which allows on-the-fly and predictable adjustments in terms of relations of relatedness. It would be able to account for fuzziness by traversing some segments “by foot” across bare semantic space, while traversing others through network connections “at higher speed.” External documents would be localized in there as per usual, but could actually make use of the same network put in place by the user to “reach” various other docs. Tags and folders translate to routes among all set members. A whole suite of path finding algorithms, both informed and uninformed, could be put forward to help with high-dimensional semantic similarity just as they tackle measuring travel time to the food market. It’s a sort of few-shot learning approach to encoder models, adjusting the structure of their latent spaces using a handful of explicit symbols, with no weight updates.
I’ll keep the last paragraph or two for crazier speculation on what mechanics this knowledge representation could support which the previous ones couldn’t. What if you wanted to see how the concept of lexiscore might be related to the concept of imitative amplification? You’d see a route going something like:
- Start by heading to this note of yours related to the lexiscore.
- Follow a link to this note about thoughtware. The link states: “the lexiscore is an instance of thoughtware.”
- Move to another note about thoughtware.
- Follow the “thoughtware and AI safety share similar instrumental objectives” link to this note about AI safety.
- Your destination is on your right. Just kidding, this isn’t human-readable.
What else besides adjusting the results of semantic search and being able to see such traceroutes between two arbitrary documents using your knowledge base as the transport network? Well, it might be possible to combine multiple knowledge bases this way, blending conceptual maps of different people, adding up all their connections (or the connections on which you trust them). Let’s assume I got my hands on Chris Olah’s knowledge base, who works on transparency tools in ML. If I tried planning the same route as before I might see this “faster” one, a bit like opting for both bus and subway being considered on Google Maps:
- Go to this note related to the lexiscore.
- Follow a link to this note of yours about thoughtware. The link states: “the lexiscore is an instance of thoughtware.”
- Move to this note of Chris Olah’s about microscope AI.
- Follow Chris Olah’s “microscope AI is an approach to AI safety” link to this note of his about AI safety.
- Your destination is on your left… No? Okay.
Optimize then for the number of changes in total, or for the number of inter-network hopping so as not to involve a dozen people in the train(!) of thought. Play with the speeds of moving around “by foot” and “across network” and “across someone else’s network,” and so on…
Similarly, I could simply select someone else’s model of how things relate to each other as a means of navigating and searching around my knowledge base. “Ah, so those are the items they might have thought of as related in my place.” Remember when I argued that the effectiveness of non-linear note-taking is due to the user being able to remix their past thought patterns? Swapping the “transport networks” for someone else’s would allow you to plug-and-play other conceptual maps on top of your documents, by simply including their routes during path finding.
Edit: A related thread from the undergrounds of the Napkin Slack led to the following idea. What if you treated travel times computed between docs as pure distances again before generating embeddings which then match those? While this might preserve the computational benefits of embeddings (i.e. no pathfinding required), it’s again tricky to place new docs in the new space, just like in the case of GNNs. However, depending on the algorithm used, the transform could be saved and simply applied on the new docs.
Make no mistake, this cluster of ideas is really vague and fluid for me at the moment, so take it with a grain of salt before it cools down into a more coherent view.
You arrived at your destination.