thinking in public
A couple people suggested I looked into social features for the conceptarium, like being able to share (part of) your thoughts online. The reasoning behind this was two-fold. First, it would help prospective users get a sense of what the piece of thoughtware can do, whether it’s useful for them, by using someone else’s instance. Second, social features might help grow the userbase through some growth hacking building on network effects. While the URL of a conceptarium is essentially a raw gateway into one’s knowledge, I was initially reluctant to embrace the “learning in public” attitude here and haven’t yet published my own instance. However, glimpses of powerful new ways in which knowledge sharing can enable new practices, together with a promising new technical solution to the problem of determining who can access what thoughts, made me reconsider my decision. In the rest of this article, I’ll briefly paint a dystopian picture of what could go wrong with thinking in public, then brighten it up with some utopian visions, before finally describing an specific way of designing our way into the brighter of the two futures.
First, let’s get the dark side out of the way. Besides the obvious issue of accidentally publishing sensitive information online, there are much subtler pitfalls to avoid. For instance, just as social media platforms and advertising companies constantly refine models of online users (e.g. their background, interests, views, peers…), an online model of one’s personal knowledge opens up the possibility of adversarial attacks driven by a broad range of goals which are potentially misaligned with the user’s, including but not limited to commercial ones. As a relatively low-hanging fruit, consider the memetic injection and excision patterns briefly hinted at here. While such ways of weaponizing language models and knowledge bases are a long way off, NATO’s related position on cognitive warfare prompts research in defense against related attack vectors, an obvious countermeasure being strong privacy on a societal level.
Everything made sense now. Far too much sense. Loraine had no scan file – but they’d broken into mine. […] They must have run my Copy through a few hundred virtual scenarios, and selected the one most likely to [succeed in obtaining a ransom]. A few hundred resurrections, a few hundred different delusions of extortion, a few hundred deaths. I didn’t care – the notion was far too bizarre, far too alien to move me – which was probably why there hadn’t been a very different ransom demand: ‘We have your copy…’ And the fake Loraine – not even a Copy of the real woman, but a construct based entirely on my knowledge of her, my memories, my mental images – what empathy, what loyalty, what love did I owe her?
Using a similar thoughtware stack, it appears possible to also attempt to simulate users in a conversational medium based on the thoughts they saved. While projects like dual specifically address beneficial use cases of this approach, many thoughtware primitives might be used maliciously. It would be irresponsible on my side to conveniently ignore those shortcomings, so here we are addressing them explicitly. In this case, user models based on PKMs which are capable of convincing natural conversation might plausibly be used for identity theft. Given incidents like the iconic bank heist enabled by an employee’s deepfaked voice, it’s not unreasonable to believe that information on personal views, interests, and mannerisms would enable even more advanced acts.
This echoes the concept of “deepfake ransomware” which I’ve had the dubious opportunity to coin a few years back, before it got picked up by orgs like MalwareBytes here. The idea was that public images from social media and beyond could be scraped Clearview-style as a basis for incriminatory deepfake videos.
In more oppressive regimes, willingly sharing accurate user models which capture personal beliefs would be a highly suboptimal choice, to put it mildly. Just like honeypots are used in cybersecurity as stealthy devices configured to capture suspicious network activity for post-mortem diagnostics, semantic honeypots might be “forbidden” regions of semantic space where new thoughts trigger disturbing repercussions for the person who authored them. Alternatively, cognitive analytics similar to the ones explored in the ideoscope, like memetic drift and variability, if applied on a societal level, could inform policies which would make Newspeak appear cute in comparison. The reason I think even Orwell would get goosebumps here is that all this is feasible from a technical perspective. What concerns me most is that if I as a curious 21-years-old with too much time on my hands can think up those possibilities, what fail-safes are there in place to prevent much more resourceful entities from giving them a shot?
As a final low-tech entry on the list of shortcomings of thinking largely in public, there’s the simple fact that the mere awareness of peers having access to those ideas influences the author in unique ways. As we’re social creatures and wired on some level to care about the opinion of others, sharing features can be seen as slightly stifling innovation due to the inherent social pressure at play. Such undercurrents of conformity to widespread dogma and accepted paradigms might especially prevent academics from pursuing wildcard trains of thought. Similar to how sending out the Golden Records aboard Voyager I and II into the universe likely meant more to us as the authoring civilization than it will ever mean to alien ones, so might publishing have underestimated influences on the author.
Having completed a brief tour of potential failure modes of thinking in public and thoughtware im general, let’s move to a more cheerful part exploring the brighter part. First and foremost, sharing knowledge online means that other knowledge workers can learn from it and then build on top of it, leading to a frictionless knowledge exchange. Intellectual effort get recycled through mental models permeating communities with high connectivity, preventing redundant work being done over and over again in separate academic silos. Ideals of what the internet should be like are living on and driving technologists to course-correct.
Besides the immediate advantage of being able to tap into an expert’s knowledge, it becomes possible for the first time to derive a granular phylogeny of their ideas, a genealogical tree of how their thoughts evolved over time around various topics, how their views shifted and what triggered those shifts. What richer autobiography than one’s conceptarium, with atomic thoughts timestamped and placed in semantic space for meaningful navigation? You rarely find information on the context of how an insight took shape, which is unfortunate because such information might help derive new insights and find fragile assumptions at the root of various paradigms. A notification could trickle down the social paths a meme used in case of a reported fallacy or new compelling evidence against it, informing all along the trails of the mild vulnerability in their belief system, a bit like dependabot pings you in case you’re using outdated dependencies in your GitHub projects.
Couple a public conceptarium with information about the knowledge worker’s informational diet, say through a sequence of meal preps derived using the lexiscore, and you immediately get associations between ideas and bibliographic resources. “Let’s see, what were other people reading about when they thought of ideas related to this one?” Additionally, data on one’s informational diet would refine the tree of ideas described above even further, making it possible to pinpoint influences of various thinkers. If those include interactions with others who also share important parts of their knowledge, then phylogenies from different users could get linked together, depicting the evolution of ideas across people, the forks and branches in a version control system for thought. The dream of many sociologists, historians, and possibly IP lawyers.
Edit: A short exchange with Ton led to this other approach to content recommendation based on published conceptaria which doesn’t require bibiliographic references or tracked content consumption. If a content item has a high lexiscore both for the user and a set of other users (computed by the original one via their public conceptaria), then it becomes possible to determine whether the piece of content at hand appears interesting for a wide range of people with different backgrounds (determined, for instance, via a measure of overlap between their thoughts in semantic space), making it extra relevant.
Besides getting precise records of how ideas evolve over time, public conceptaria would make it easier to find peers you’re sharing units of mutual interests with, as per Illich’s phrasing. “Let’s see, whose thoughts in the infosphere can answer my pressing question? Huh, promising results, I’ll use those as stepping stones and reach out to their authors for a more in-depth discussion…” Besides this way of finding people with goals at least partially aligned with yours, it might also be easier to instantly surface ideological overlaps between to knowledge workers in a collaboration, nurturing mutual understanding. Conversely, you might precisely want to move against the comfort zone of echo chambers and look for people and subjects with different views on certain topics. In this case, a credograph pattern based on access to both conceptaria could provide aid in the mediation if arguments become unproductive.
If content creators somehow manage to correlate readers with their conceptaria, they might be able to inform their content creation process in such a way as to be more surprising and thought-provoking, through analytics tools similar to the ideoscope and the lexiscore. The same society-level aggregate analytics used destructively in the dystopian vignettes above could be used to nurture a blossoming culture of powerful ideas, close to the vision of ideoponics and the nutritional guidelines for food for thought mentioned in the lexiscore write-up.
As a final plus in the list of pros, the same user emulations which can lead to identity theft can enable asynchronous collaboration between knowledge workers. PhD students would consult their supervisors at any point in time through their PKM-informed simulation, while experts could scale and even monetize their expertise in a far-reaching market of drop-in simulation aids. This could at least to as a cheap preliminary solution, similar to how your speech first gets recognized by your voice assistant on your client device before sending it over to the best resource available in the cloud if the easy option doesn’t cut it. Besides direct use of the simulation, services might pop up which offer collaborative AIs trained to effectively help users in their work (e.g. better understand concepts, generate new ideas, etc.) based on hundreds of years of conversational back-and-forth with the PKB-based simulation, rather than the real user. Such a virtual sandbox for training collaborative virtual assistants is a bit similar to how OpenAI trained a robot hand to solve a Rubik’s cube in a virtual game engine for efficiency before applying it in the real world. This would be the opposite use case to the adversarial attacks depicted in Egan’s short story above.
Whether the pros outmatch the cons is non-trivial. Regardless of where you stand, ways to make the positive future more likely would be welcome. The main feature which makes the dystopian failure modes feasible is the accuracy of the user model, how closely and comprehensively it capture’s the user’s knowledge and belief system. Private spaces for ideas might counter the Big Brother affordances, while also reducing peer pressure and conformity in thought. Similarly, simulations in identity theft would less compelling and narrow, while adversarial attacks be harder to put together, due to the lack of information about what they’re adversarial to. Fortunately, the existence of private spaces for ideas doesn’t detract much from the positive use cases. Narrow simulations of experts would still be useful, while idea phylogeny might only be tracked for a subset of ideas, likely in domains which would benefit most from the analysis. Large swaths of knowledge would still be available to build on, while autobiographical dumps could still be a possibility, as per one’s will about digital heirs.
The difficulty of implementing advanced privacy features in thoughtware is the friction it requires in its vanilla formulation. Tagging each idea with respect to whether it should be public or private seems like nothing for a single note, but adds up quickly in the long run. Things get even trickier when talking about different audiences you want to share things with, requiring more advanced book-keeping. Besides predefined keywords being filtered, you could also just try to pass new thoughts through a zero-shot classification pipeline to choose among “sensitive” and “not sensitive,” maybe by simply testing whether the note entails the sentence “This text contains sensitive information” using a pretrained NLI model. However, that’s quite computationally expensive and might be difficult to improve in terms of granularity with different more specific audiences.
A better option I’m suggesting is the following. The user could whitelist or blacklist thoughts for different parties using custom semantic regions. Essentially, if I want to give a group of people access to my thoughts about human-machine collaboration, I could whitelist a custom region of semantic space for them, based on a tiny set of handpicked thoughts. “Okay, so I want to create a custom gateway into my knowledge for those people. I want to give them access to thoughts which are similar in meaning to this note and this diagram. Huh, that seems to make for about 382 entries, looks good.” No constant tagging required and minimal one-off book-keeping, staying true to conceptarium design principles. I could simply drop a couple text and images in an “hci-402951” folder which would instantly define a semantic region which requires the same passcode as the folder name. Or maybe some config file mentioning the couple of examples to follow, not sure. The tailored gateway resembles thoughts prompted by the memory navigator from about a year ago, specifically the idea of “the mind’s API.” Imagine having an RSS feed to a collaborator’s whitelisted thoughts, and filtering that on your side using the lexiscore – frictionless scenius?
Automatically determining what access privileges are required for a certain thought is a problem of classification, often multi-class classification. As such, different technical approaches to the problem can be systematically evaluated with respect to accuracy, precision, recall, etc. Sensitivity and specificity can be balanced based on how costly type 1 errors are with respect to type 2 ones (e.g. ending up under predictive policing in an oppressive regime would be quite a costly mistake). Couple this performance with a measure of friction for the user, how much effort they have to put in for defining their preferences, and you’ve got a pretty clear objective for a research program. I think semantic regions defined through a handful of exemplars might strike a very promising balance between performance and ease of use, and I’m excited to look into a related implementation as an important update for the conceptarium, before finally publishing my own whitelisted semantic regions online. A static snapshot which loses value with age wouldn’t really reflect my intended usage patterns for the conceptarium – it’s a living system after all and ways of building privacy around that should be investigated. As usual now, the screencast of the thought process behind this article is available below.