I’ll continue slipping miscellaneous updates at the top of new articles for the time being. First, I published a new project exploring how reinforcement learning and language models could help build versatile tutoring systems. Second, I tweaked my homepage a bit and added a link to a blogroll OPML file which contains all RSS feeds I follow (and pipe through the lexiscore labeler weekly). If you’re into obscure blogs on tools for thought (which you are as far as I can tell), you might be particularly interested in the following: esoteric takes on AI, building a second subconscious, and funky futurist commentary. Let me know if I missed any cool blogs, I’m adding a few every week.
A few months ago, I watched a recording of a Q&A session held by Conor White-Sullivan, the guy behind Roam Research. Throughout that session, Conor brings up interesting ideas about how collective knowledge could self-organize and provide a robust and adaptive epistemic layer for society. He highlights shortcomings of how we currently navigate truth online in the context of a pressing global health crisis, and hints at a bunch of possible mechanics to address those issues, many of which are inspired by the success of collaborative software development.
One idea I found particularly interesting was that of liquid epistemics. I can’t recall whether that was the exact phrasing used, but the concept referred to the following. In liquid democracy, you have the option of delegating your vote on certain subjects to other people – people you trust to be wiser on the topic. For instance, I have no idea how Internet regulation and policy-making works, so I might choose to delegate my vote on related matters to someone I trust to know their stuff on the subject, perhaps some ambitious advocate from Mozilla or EFF. Similarly, in liquid epistemics, you might have the means necessary to strategically patch your model of the world with the conceptual maps of other people – people you trust to know their stuff on the topic. You might have the ability to instantaneously gauge your expert’s sentiment regarding the truthfulness of a statement, the feasibility of a project, or the potential of an idea. You’d be able to seamlessly tap into their specialized expertise. For instance, I might be able to casually consult with an entire expert panel on online privacy, radically “educating” my guesses without needing a PhD on all topics I find relevant.
Why not just keep reading their content to internalize that knowledge yourself, their blog posts or books or peer-reviewed publications? Because life is too short to gain expert-level knowledge yourself in all the disciplines relevant to being a functional human being, there’s simply too much to know – the world is just too complex. Why not, gasp, actually ask them to send you some advice? Mainly because you’ll only receive an answer a couple weeks later, if you receive one, because of how busy those people tend to be.
It would be nice to simply be able to select a statement online, and a couple seconds later get an estimate of truthfulness. Better still if the impromptu report comes with a bit of relevant expert material. Perhaps expand the widget into a chatbox where the expert simulation can provide more insight? Perhaps it’s not an expert simulation, but just your plain old virtual assistant which you configured to use the same epistemic layer? Those last couple bits are a bit far off, but what I want to spend the rest of this article talking about is a step-by-step guide through which the initial selection-to-truthfulness mechanic could be implemented with technology available today.
To get there, let me first attempt to generalize the problem. You have a set of statements living in a space of topics. For some topics, you might have chosen to rely on specific knowledge bases fed with content from people and organizations you trust. For most topics, however, you opted for the default option, perhaps Wikipedia. Some topics have almost no epistemic coverage whatsoever. In this context, given a new statement, you want to get a fuzzy truth value, a continuous representation of truthfulness which goes from 0 to 1, from falsehood to truth. That’s all there is to it at a high level.
How can this be implemented today? First, you’d need to specify who you trust on what topics. You might do this by adding entries in a dictionary structure. The keys are the topics expressed in natural language (e.g. “Internet regulation, policy-making, and online privacy”), while each value is a non-empty set of links to external sources you trust (e.g. an expert’s website or knowledge base, a peer-reviewed journal, a shared microverse of knowledge, etc.). Those delegation decisions have to be done in advance, although suggestions could pop up during use.
After you specified this mapping between subjects on one hand and trusted sources on the other, you can proceed to verify a new statement. Let’s say you highlighted a sentence on this page, and you’d like to get a truthfulness estimate. A system could have a look at your statement, and begin by matching it to the most relevant topics listed as keys in the previous dictionary, say by means of semantic similarity. There might be one match, multiple ones, or none at all. We just figured out the trusted content sources relevant to your statement in an automatic fashion – checkpoint reached.
Following this, we’d filter all the content from the selected sources down to the documents which are particularly related to your statement. We might have, say 100 paragraphs from trusted sources which are relevant for our current purposes. What next? Natural language inference (NLI). We have to figure out whether our trusted paragraphs support the selected statement. There are at least two ways I can think of in which this can be done. First, pipe each trusted paragraph with the statement being investigated through a pretrained NLI model. Those take in two short texts, and do their best to determine whether one entails or contradicts the other. The second approach consists in putting together a prompt composed of a trusted paragraph and the statement, plus the string “When seem in relation to the statement which follows it, the first paragraph can be said to…” Then ask GPT-3 to figure out whether it’s “support it” or “contradict it” that’s more likely, by constraining the available tokens.
We now have numerical estimates for whether the trusted paragraphs support the selected statement – we reached another important checkpoint. What now? We can use a voting scheme to aggregate what each trusted paragraph “feels like” with respect to the statement. Maybe we simply average their results, or see how large the majority is. Different weights for trust on various sources might come into play, too. Anyway, those are details. The important thing is that we finally got our juicy truthfulness estimate, informed directly by trusted sources. What’s more, we know exactly which trusted paragraphs had a major say in the decision, so we can just reference them for explainability. There would also be a disclaimer there reminding you to always directly ask the expert for the best results, but it would probably always be ignored because who has time for that?
It sounds way easier than it is to implement. In practice, you might need to decontextualize the statement. Also, NLI misses relations of entailment with multiple premises – you might need a pair of trusted paragraphs working together in there. Weigh trusted sources by similarity of the statement to the topic keys? Summarize the trusted paragraphs before NLI? Weigh research papers by replicability by, for instance, detecting preregistration? How exactly should the voting be implemented? There are probably at least 10 design choices on that alone. But, at least, we have a really rough sketch of a sketch of how we might implement this liquid epistemic layer, and help people better make sense of the world.
That was the crux of this article. What follows are some more loose ideas. Microverses of knowledge as shared past and future search results from one’s conceptarium could in principle act as high-signal content sources. After I’ll share one of mine centered around using AI to extend human thinking, one might simply grab that and partially patch their model of the world with part of mine if they trust me on the topic. In a sense, they can put together a conceptual atlas, if you will – a set of conceptual maps obtained from different people by means of trust. The connections they see could be taken along to assist in navigating the various regions of semantic space at high speed.
The whole purpose of liquid epistemics is to be able to deploy your own trusted sources to help you process the world. However, on a related note, it might be interesting to somehow tap into the content consumed by the author you’re reading from, as you’re reading it. What if you could select a sentence of theirs and essentially run “git blame” on it? You’d be able to pinpoint the precise articles and books and talks which helped them reach that specific view or idea. It might work by finding the content consumed by the author just before their saved an idea in their conceptarium which is related to the current statement (see my last full paragraph here), with all the countless technical challenges and design choices involved. For instance, running this “git blame” on the previous sentence where I mentioned “git blame” might surface Conor’s talk. Imagine moving beyond explicit references, everything implicitly having its sources, waiting for you to poke at them if you feel like it.
Perhaps experts could monetize their knowledge work by minting a microverse they share as an NFT? I have no idea how web3 works, so I’d be eager to patch my world model with some more accurate maps of that. Yet again, there’s just so much to learn. Exciting, overwhelming, humbling – all at the same time.