🌿  back

lexiscore (stable release, online demo)

A web app which takes in content items (e.g. articles, papers, essays, books), and estimates how interesting they are for you by reconstructing them with a GPT-3-like model. Items which are trivial to predict are too boring, while items which are impossible to predict are too challenging.

As a first effort in tackling the theme of information overload in content consumption, I’ve been working on the lexiscore: a nutritional label for food for thought designed to help you narrow in on resources which personally bring you the most value. The open source companion software can automatically label raw text originating from RSS feeds, bookmarked pages, PDFs, EPUBs, and more. In the scope of this project, I’m considering valuable resources to be those from which you learn a lot, those which are packed with ideas you find surprising. Even if this framing is way closer to educational than entertaining, it’s been argued that in the proper context and at a proper level, learning can also be inherently enjoyable.

Before moving on to the inner workings of the lexiscore, I’d like to briefly go through a list of mainstream ways of finding valuable content online. This tour of prior art is particularly relevant because many design choices I made for the lexiscore are directly informed by shortcomings of those more traditional approaches.

We often follow individual content creators and subscribe to their channel or newsletter or feed. By subscribing to their stream of content, be it original content or “just” curated from other sources, we partly outsource the task of finding valuable resources to them. They might be experts, thought leaders, public figures or what not, and we entrust them with bringing value in our life through their proven ability to sift through content, increasing the signal-to-noise ratio in our informational diet. This slightly resembles liquid democracy, in that we intentionally choose to rely on someone else’s decisions based on trust. The pitfall of the pure subscription approach to content filtering, however, is that the people we subscribe to are also only human. They might be experts in their respective fields, but they still have trouble coping with information overload themselves. What’s more, resources which happen to be valuable to them might not always be valuable to you – some articles or posts or episodes might be full of stuff you already know, or stuff which is way above your level of comprehension. Subscription is just a cheap heuristic for overall increasing the quality of the content we consume.

If we shift from the voice of the individual to the voice of the masses, we get to speak of content which is popular, trending, and hot. Pieces of content which have received many likes, claps, upvotes, hearts, comments, shares, and other social signals. As you increase the voter pool, you’ll get heuristics which are increasingly misaligned with your own definition of valuable content as they asymptotically approach mainstream in the limit. Conversely, as you decrease the voter pool, you might get more relatable suggestions, but the signal will be increasingly sparse and jagged due to the low sample size. Besides those potential pitfalls, it’s also relevant to point out that extensive reliance on social heuristics alone might lead to groupthink, as it’s difficult to come up with novel ideas if you’re exposed to the same content as everyone else. Additionally, if one was to ponder how nutritional guidelines for content consumption might sound like for the public at large, they for sure wouldn’t advocate for reduced memetic variability across society, but would encourage individuals to independently stray off course once in a while, avoiding reaching a local optimum on a societal level.

If the previous two heuristics for finding valuable content were based on outsourcing choice to other users, this and the following one are approaches of outsourcing it to artificial systems. Content-based recommendation systems work by suggesting you content which is itself similar to content you’ve already consumed in the past. If you read a lot about AI safety, you’d get more content about AI safety this way. This also translates to “related articles” at the bottom of a blog post as a special case of content history being reduced to size one – only the current piece of content. The shortcoming of this approach is naturally that it’s easy to get stuck in echo chambers and filter bubbles due to those systems offering precisely what you want now, as opposed to what you might want in the future. In my view, Ken Liu seems to argue below that this type of recommendation system reduces memetic drift, slowing down the user’s velocity across the space of possible selves, and reinforcing immediate short-term relevance through high exploitation and low exploration.

Centillion is an algorithm that's gotten out of hand. It just gives you more of what it thinks you want. And we – people like me – think that's the root of the problem. Centillion has put us in little bubbles, where all we see and hear are echoes of ourselves, and we become ever more stuck in our existing beliefs and exaggerated in our inclinations. We stop asking questions and accept Tilly's judgement on everything.

If the previous type of systems focused on offering content similar to what you’ve personally consumed, this one shows you content which people similar to you have consumed. For instance, if you’re into the science of sleep, you might also get content about fitness or diet. Even if it’s not specifically about sleep, people in the past who’ve been interested in sleep have also turned out to be interested in fitness or diet, and so might you, too. This approach is close to the popularity metrics above in that it relies on information about what others want in understanding what you want, and so brings in the same danger of the hivemind, though on a more granular level.

This one is in many ways the odd one out, in that you know precisely what you’re interested in – it’s not really open-ended. The shortcoming is that if you roughly know what you’re looking for in the first place, it’s unlikely that you’ll be surprised and experience serendipity, which is an important part of what makes content valuable in this setting. You often don’t know what you’d enjoy learning about, which is why virtually all the other approaches attempt to bring in content signalled by others, be it other people or machines.

Just to be clear, most online content platforms in use today employ multiple of the above, often all of them, in trying their best to find valuable content. Unfortunately, their definition of valuable often differs from the user’s – it’s usually more about engagement than learning. This goal misalignment problem, coupled with the multitude of compounded shortcomings faced by most platforms today, led me to think about an alternate way of finding valuable content. To be sure, this iteration is nothing more than a modest preliminary attempt at tackling information overload from a different angle, and has a lot of refinement due. Still, I think this initial exploration is useful in challenging the traditional ways of content filtering.

The lexiscore piggybacks on the pervasive metaphor of ideas and content as food for thought. We’re digesting ideas, we see poor quality content as junk food for the mind, and we try to tweak our informational diet. In this context, a nutritional label seemed like a fitting way to describe the value of a content item. If you live in Europe, you’ll find it obvious that the lexiscore name and label are inspired by the Nutri-Score, a widespread actual food label which apparently originates from France, but which I’ve also seen in Romania and the Netherlands. Its simplicity and ease of interpretation have been ported over as an important first principle to the lexiscore. The next few paragraphs expand on the rest of this “wishlist” of features.

The actual food label which served as inspiration (Source)

Just as the Nutri-Score can be applied to virtually any edible product, from water to cake, so should the lexiscore be able to be applied to a wide range of content. It should continue to make sense for a full-length textbook as it does for a short blog post (length-invariant). It should work for fiction just as well as on non-fiction (veracity-invariant), because ideas found in novels can be just as useful, if only as visionary stepping stones to real-world impact. Heck, half of my projects have been inspired by technology I read about in science fiction, and I firmly believe that limiting ourselves to a rationalist conception of knowledge is harmful. When a robot has to find its way out of a maze, it often has to temporarily get further away from the exit in order to then make it there, avoiding the tyranny of the objective, pun not intended.

An animal caught in a trap would gnaw off its own leg to escape, what will you do?

What’s more, just as the Nutri-Score describes the nutritional value of a certain food product, rather than of a brand, so should the lexiscore be able to describe the value of an individual piece of content, regardless of the aggregate past value of its author or curator. This would enable consumers to go beyond the rudimentary heuristic of subscription and increase the signal-to-noise ratio in the process (granularity).

Additionally, just like the Nutri-Score highlights a wide variety of foods which might be good for you, so should the lexiscore be open-ended. There’s a contrast here which actually seems to break the metaphor. In formal education, you usually choose a topic and constantly learn about it in a pretty directed way (i.e. a mix of search and subscription). The textbooks and papers you read through the years are losing their nutritional value, because you learn more and more about the topic and they get repetitive. It’s as if the goal of formal education itself is to make it so that a predefined set of resources lose their nutritional value by learning. Though when translating this behavior to food, it suddenly feels extremely counterintuitive. You don’t plan your diet with the sole purpose of making it so that a target food product stops being useful for you because you’ve got the nutrients it has to offer. “Oh yeah! I made it so that almond milk doesn’t bring me much anymore! My body is really well-nurtured now!” We don’t really do that, we just look for food which is nutritious for us at any given moment, in an open-ended way, so why not do the same for learning?

The previous point serves as a good segue into this last one, which I’ve reserved for the end of the wishlist because it differs from food labels in a clear way. The lexiscore of a certain piece of content should be a function of the consumer’s knowledge at a given point in time. An introductory textbook in cellular biology might be useful in your first year of your degree, but it likely decreases in nutritional value as you progress towards expertise. It gets less surprising because you develop more accurate mental models of the subject. But it goes both ways. If you incur ideological debt by learning that deep learning is all there is to AI, content on other AI paradigms might become more surprising for you.

As an intermezzo summary, this is the wishlist of features I had in mind when designing the lexiscore:

Let’s now dive into the inner workings of the lexiscore. How does it turn a raw text into a clean nutritional label? Just as the Nutri-Score follows a certain algorithm based on the ratios of macronutrients, among others, so does the lexiscore rely on a tiny set of characteristics of the piece of content which it extracts as features. There are only two characteristics used here, the “skill” and “challenge,” which have been inspired by Mihaly Csikszentmihalyi’s theory of flow.

The Nutri-Score algorithm (Source)

In the Hungarian psychologist’s model, the state of flow is a pleasant feeling of complete focus, often experienced when the skill level of the individual is matched by how challenging the task at hand is. If their skill is much higher than the challenge, then the task is likely to be boring, as it’s too simple. On the other hand, if the challenge is disproportionately high, the task is likely to be frustrating. It’s been posited that the sweet spot in the middle, the flow channel, is conducive of flow.

Based on this theory, we further refine our definition of valuable content here as content which is conducive of flow. Being too challenging means being too unpredictable – you lack the fluency in thinking about the topic, your mental models have low predictive power. Being too boring means being too predictable – your knowledge is way beyond this piece of content and you have no trouble understanding it. In between them there’s this sweet spot of content which matches your skill level, and this is what we’re after.

Visualizing the flow channel (Source)

We estimate how skilled you are in thinking about a certain topic in the following way. Given a piece of content, we iterate through its paragraphs. For each paragraph, we determine the minimum semantic distance to a past thought of yours stored in the conceptarium. Finally, we take the average of those distances. Concretely, a high skill value, given your conceptarium as a proxy for your knowledge and the article at hand, can be thought of as being fluent in thinking about ideas related to the target content. I’m personally more skilled in thinking about human-machine interaction than in thinking about cellular biology, simply because I’ve accumulated more knowledge in this region of semantic space.

The way we estimate how challenging a piece of text is might very well be the most intriguing technical detail of this project, or at least that’s how I see it. We first take surprisal to mean a contrast between reality and your mental models of the world. We then approximate your mental models by feeding your knowledge to an autoregressive language model like GPT-3. This way, we essentially have a model of your mental models – we’re using a far relative of GPT-3 as a cognitive model, instead of as a pure language model, a bit like dual. Following those two premises, it can now be argued that the objective difficulty faced by the cognitive model in reconstructing a piece of text (i.e. perplexity) is a decent estimate of the subjectively experienced surprisal. If your cognitive model has no knowledge of foundational principles of cellular biology, it will have a hard time predicting the flow of a related argument properly, just as you would. Everything will be utterly unpredictable and therefore too surprising.

If you’re a big fan of the theory of flow, you might notice something weird here. The challenge level above takes into account the unique knowledge of the individual, while in the original framing the challenge doesn’t differ from one person to another. That’s true, there’s a slight discrepancy here. The reason for it lies in the fact that the original pretrained language models (e.g. raw GPT-3) are biased to finding popular online content easier to predict, simply because they’ve been exposed to more of it. To account for this skew which lead to subpar results qualitatively, I opted for this personalized version, which describes the challenge posed by different content items to an individual, rather than to the internet. Actually, I specifically went for the reduction in perplexity given the user’s knowledge to further avoid those inherent biases of the training data, canceling it out.

Wireframing the labeling software

Now that we’ve seen the two raw ingredients which go into the lexiscore, and we’ve also noted a technicality in how one of them is computed, we’re ready to make the final step to the clean single-letter labels. Essentially, given the content items projected on the skill-challenge graph, the items which are somewhere across the flow channel diagonal get the highest lexiscore, while items which are either way above or way below it get the lowest ones. Concretely, the skill-challenge coordinates get converted to polar coordinates, and some basic if-then-else conditions get applied based on the angle value. The exact values hard-coded here have been loosely informed by the distribution of a sample of a couple hundred online articles.

The beauty of the lexiscore algorithm is that it’s fully automated and can scale your implicit content valuation heuristics through its models. After adding a set of content sources as batched jobs, the labeler software goes ahead and evaluates written content before you actually read it. Preliminary results with minimal performance optimization indicate a roughly 20x increase in speed compared to the average (human) reading speed. The most expensive step from a computational point of view is measuring the challenge of a content item, but I can see quite a few low-hanging fruit optimizations for improving it, mostly having to do with using a GPU and batching a lot of paragraphs being reconstructed at the same time, potentially moving the needle a couple orders of magnitude. Truth is, I don’t feel the need for that at the moment, because I can simply let the labeler run overnight once a week, as part of my informational meal prep routine. It’s pretty hands-off.

Main UI of the labeling software

Results up until now are mostly qualitative. For instance, when I used Benjamin Wittorf’s collection of RSS feeds for testing, he happened to have my own RSS feed in there from my blog. Unsurprisingly, the articles I wrote myself were consistently evaluated as having poor nutritional value for me. Think about it, if I wrote them, then it means that I’ve been thinking a lot about related ideas, and it’s likely that ideas expressed in my conceptarium help explain and predict the content of my written articles quite thoroughly. Of course I’m not learning new things from my own writing, because it’s a reflection of what I know. At the other end of the spectrum, there were articles on politics and international relations which Ben happened to be interested in. Because they were both pretty far semantically from my thoughts and my mental models have also had trouble bringing in predictive power, they received a low lexiscore not because of high predicted boredom, but because of high predicted frustration – not much would make sense due to the lack of conceptual scaffolds.

Across the flow channel, however, there seemed to be the sweet spot. Far towards the top right, there seemed to be content items which where related to my interests, but which appeared to bring in surprising new perspectives and framings which I wasn’t familiar with. At the other end of the spectrum, towards the bottom left, there were more entry-level and introductory resources on stuff I wasn’t familiar with, but which seemed pretty accessible.

Distribution of a couple hundred articles across the skill-challenge space. If a content item gets placed at at -0.2 on the y-axis, it roughly means that my notes have helped make the associated content item 20% easier to predict.

As intended, content items originating from the same stream of content have been spread out across the skill-challenge plane. For instance, some articles from Ness Labs about spaced repetition and AI productivity tools were predicted to be quite boring for me, while some other articles about decision-making fared better. Naturally, some juicy articles will still occasionally get a subpar lexiscore, while subpar articles will occasionally get an overindulgent score. It’s not about building the perfect content filter, it’s more about increasing the signal-to-noise ratio ever so slightly through a new approach.

Going forward, a promising direction for improvement would be to adapt the skill-challenge-to-lexiscore transform based on user preferences. This would step away from hard-coded values informed by a handful of qualitative assessments, to a more robust and adaptive system. A wide range of traditional ML algorithms could help with this, from K-NN to Bayesian learning, or from GMMs to logistic regression. This means, however, that the user would somehow have to provide their own estimate for the lexiscore of a content item, offering feedback to the labeling system.

Another discussion point is the significance of the radius in the polar coordinates. While the angle component seems to indicate how good of a match a content item is, with boredom increasing clockwise and frustration increasing counterclockwise, the radius itself has been neglected. From the qualitative results, however, it seems that moving along the radius can be a knob for configuring the amount of exploration in information foraging. If you’re consuming highly nutritious food for thought, but in a broad range of fields, that’s strong exploration. If you only focus on the top-right part of the graph, finding new takes in your field, that’s a lot of exploitation. Setting target cognitive analytics through the ideoscope might be accompanied by concrete strategies of focusing on certain radius values.

Exported meal prep document which aggregates the most nutritious content items. The fact that it’s roughly a bunch of concatenated HTML makes it easy to deploy a whole range of browser extensions to tweak it at will, including the common “reader view” seen above, “print to PDF” exports, etc.

Finally, just as the Nutri-Score isn’t the only relevant metric when considering food (i.e. you’ve also got the Eco-Score and the NOVA Score, among others), so can’t the lexiscore provide an absolutely complete picture of content quality. Ideally, you’d combine the lexiscore with other heuristics, such as a bit of popularity and subscription. Additionally, the lexiscore isn’t really a great fit for pleasure reading, just as the Nutri-Score doesn’t really help with finding particularly tasty food. It can’t determine how strong a connection you’ll develop with the protagonist of a space opera, or how suspenseful you’ll find the cliff hangers. It’s mainly useful for finding informative resources.

All in all, the lexiscore is a tiny but determined step in improving the way we relate to content, content consumption, and content curation. Try to imagine what state of understanding could be reached by an individual with hundreds of labeling workers at their disposal which are constantly on the lookout for nutritious content? Just as not having to worry about cultivating our food freed us up to do more impactful things, having goal-aligned content filters on autopilot might yield both educational and time resources.

Finally, as a peek into even more ambitious approaches to tackling information overload, consider this. Even with the minor technical improvements of the lexiscore (e.g. adaptability, improved performance), it still frames information as residing in immutable packets of knowledge, be it books, blog posts, articles, or what not. The next step in my work will likely be designing an artificial sense organ which can stretch our perception into the Konishi polis digital realm and help us narrow in on specific salient insights. Not only should you be able to find nutritious content online, but you should in the future be able to find precisely the ideas which personally bring you the most value, given your current knowledge as a reference. If you believe in thoughtware being a viable path to radically optimized learning, consider supporting the development of open source tools like the lexiscore and the upcoming novaceptors through the button on the right. If not, I’d love to hear your take through the left one.