🌿  back

21.42 YRS

agency harvesters

Algorithms have been blamed for many of the perils of centralized social media platforms. Ah, weapons of math destruction, that’s such a good pun. Personally, I had a hard time wrapping my head around how exactly those algorithms might impair cognitive development, sow stress and loneliness, or deteriorate the user’s self-image. How can a program which “simply” puts together a bundle of suggested posts lead to those outcomes? I could buy into the idea of increasing polarization being encouraged by echo chambers constructed by recommender systems, but how exactly do we get from there to the full eight-section-long ledger of harms? I couldn’t really reason through it.

Enter Stuart Russell, distinguished professor who co-authored the go-to textbook on AI, which also happens to be the single most used textbook in the whole of computer science. While on a podcast with Lex, Russell described the following. The recommender systems underlying most social media platforms are pressured to yield high click-through-rates (CTR) on ads, as that’s mostly where the money comes from in the end. Initially, a decent strategy to be followed by the algorithms in question is to simply tailor the selection of ads shown to the user, in an attempt to connect them with relevant products and services. This might have also been what the engineers had in mind as an outcome at the beginning, and it seems pretty commonsense and almost harmless.

However, Russell thinks that’s not the whole story. He argues that besides tailoring the ads to the user in an attempt to improve their match, recommender systems might have also stumbled on the practice of tailoring the user to the ads while blindly optimizing for the same metric. Let’s call this the agency harvesting hypothesis, and let it sink in for a while. Instead of only innocently picking ads the user might be interested in, the user becomes part of the optimization loop in a disturbing act of specification gaming, wireheading spree, Goodhart extravaganza, or whatever it is you want to call it. Regardless of the term, it’s pretty damn scary.

It doesn’t help that (1) Russell’s hypothesis is technically feasible for the most part, given a couple assumptions which we’ll address later in the article. The pattern of specifying a well-intended objective, only to have an AI mess up big time while maximizing it is extremely familiar to AI safety people, perhaps the single most influential idea in the field. In thought experiments, you get the same paperclip maximizers turning the universe into paperclip factories, well-being maximizers resorting to electrically stimulating reward centers in the human brain, or self-driving cars getting you to your destination in record time while having 5-GTA-San-Andreas-stars worth of police helicopters on your tail. In practice, you get RL agents exploiting bugs in the physics engine to move around freely and quickly, pneumonia classifiers determining where X-rays were made in order to improve its guess by having better priors, plus a few dozen other well-documented examples out of what’s probably more in the order of thousands of unnoticed clever hacks unfortunate heuristics found by various systems.

In most of those cases, it’s entirely intuitive what tricks the AI is employing in order to further its objective after-the-fact, but it would require crazy amounts of imagination and red-teaming efforts in order to anticipate those failure modes in advance. This means that if the agency harvesting hypothesis is true in the previous context of social media, it’s very likely that the engineers involved genuinely had no idea that this outcome might come about. There was virtually no oversight or regulation in place in order to attribute great responsibility with the great power that comes with wielding this amount of data. Many would argue that there still isn’t enough oversight or regulation in place to address this. My perception of the consensus on fail-safe mechanisms in the AI safety community is that there’s a lot of space for improvement.

It also doesn’t help that (2) Russell’s hypothesis has a great deal of explanatory power when it comes to the nasty outcomes observed in prolonged social media use. If I were a teen girl, one reliable way of getting me to buy more makeup and related products might be to instill in me a belief that my body isn’t enough, sowing the seeds of body dysmorphia. I couldn’t have made sense of this harm based on the previous model of echo chambers bad alone, because initially I wouldn’t have been part of the “teen girl with deteriorated self-esteem” bubble in the first place. Similarly, many other harms fit this framing. However, there’s a great deal of post-hoc reasoning here, not the entire ledger can be covered with this model, and there are quite a few assumptions involved. Let’s move on to those assumptions.

First, in order for a recommendation system to resort to investing actions and resources into tampering with the users themselves, the system specifically requires a long-term CTR objective, rather than a short-term one. If the sole focus of a social media platform is to get me to click through stuff today, then it makes no sense to sacrifice short-term engagement opportunities (e.g. order tasty food now) in favor of more long-term plans (e.g. buy beauty products in a month). Agency harvesting wouldn’t be helpful to the system’s objective at all, hence it’s unlikely to pop up. However, if the system’s goal would be to maximize future reward with a broader time horizon (e.g. a week, a month, a year), tampering with the user becomes increasingly valuable, as the AI gains a meaningful new degree of freedom in its user-ad matching life goal. Not making use of that would be irrational behavior as far as the system is concerned, so given the huge optimization pressure applied to the recommender system (e.g. computational, intellectual, financial resources), my guess is that long-term CTR objectives would inevitably bring about the agency harvesting failure mode. There’s a bitter irony here – those platforms would need to themselves delay the instant gratification of short-term user engagement, and instead be determined to pursue more long-term goals if it’s total CTR they’re after.

Second, there’s the assumption of a recommendation system being powerful enough in order to pull this off, to turn engagement optima across behavioral space into attractors. By powerful, I mean the system would need to be able to implement complex policies (i.e. what to show the user now in order to maximize expected reward), based on a vast space of specific actions (i.e. can choose to put forth a specific cluster of posts in the user’s feed, or generate it), and have a rich user model (i.e. demographics, interests, past interactions, social circle, etc.). If the recommendation system simply implements basic content-based suggestions (e.g. “You’re into AI safety, here’s more AI safety.”), it’s impossible for it to implement agency harvesting. I’d think of Pocket as one of the more innocent platforms in this regard, as I’ve seen them work with an “embedding database” start-up, which would be the perfect stack for implementing a simple content-based system. That said, if the recommender system is a full-blown RL agent which can orchestrate important parts of user experience, then agency harvesting is more likely. It might be the case that no such system exists as of now, though it’s impossible to glance into the actual system architectures behind today’s platforms due to business interest secrecy. I’d guess the financial incentives are too massive so as insiders not to have thought of RL approaches. Plus, I’m using the heuristic of “If I managed to build on this idea myself, then the probability of smarter, better funded, more ambitious people making use of it approaches 1.” That said, conventional marketing campaigns might be more than enough to persuade users, without the need for creepy automation.

There’s a final note here which wouldn’t be enough to fill up an entire new article, so I’m appending it here. I think “the extractive attention economy” is a very unfortunate misnomer. It’s not attention that’s at stake here, it’s an amalgam of agency, identity, and self, for lack of a better word. Attention is “merely” a channel which can be used for reaching into and tampering with this intricate self-image which makes up a human being. It’s a means, not an end in itself. It’s like calling the energy industry “the turbine economy” – the latter is just a tool for harvesting the former, it’s not where the ultimate value lies. Attention is a pathway which enables this osmosis of selves, which can be forced into alignment with specific end goals, an echo of the main theme of the article on societies of representations I scribbled two weeks ago when talking about transformers. I’d much rather go with something like “the agency harvesting economy” to describe this phenomenon, but only once Russell’s hypothesis is tested against some more reliable empirical evidence, rather than just my armchair philosophy above. Though I’m not done with ranting about social media by a long shot.