ingenuity of emotions
As a somewhat analytical person (my SO would sneer at the qualifier), I’ve had my own fair share of challenges in truly coming to terms with the value of soft skills. This article describes an analogy which has helped me recognize the astounding ingenuity built into our more primal physiology, a bridge which has helped me relate to the murky nature of subcortical space. This analogy, together with a handful of others which might make it into future articles, lead me to conceive of relying on feelings as an utmost rational thing to do in complex situations. If you’re in the same region of worldview space as I was, I hope you’ll experience the same ideological currents flowing through. If not, you’re still in for a short article riddled with the usual cross-references between AI and psychology.
One end of the bridge starts with reinforcement learning, one of the three major approaches to AI today, side-by-side with supervised and unsupervised learning. While the last two approaches are often concerned with learning a mapping between two domains (e.g. speech-to-text, image-to-label, customer-to-segment), reinforcement learning is rather concerned with training agents to act intelligently in various environments so as to obtain human-defined rewards. The term itself is borrowed from Skinner and friends who were arguing half a century ago that mice and pigeons could be conditioned to learn arbitrarily complex behaviors (e.g. guide ballistic missiles) given the appropriate incentives (i.e. tasty treats). This historical reference actually goes to the heart of the bridge we set out to build, but we’re getting ahead of ourselves.
The point here is that reinforcement learning is seen by many AI researchers as particularly tricky and challenging. Imagine you’re learning chess and your only feedback on the moves you make comes dozens of moves later as a binary win / loss, or maybe as some loose hints related to captured pieces here and there. The feedback signal is way sparser than in the other two paradigms, particularly compared to the neat supervised learning, where you’d instantly get comprehensive feedback on any output. Plenty of bright minds from industry and academia all around the world are trying to come up with effective approaches to reinforcement learning.
Besides it being a major scientific and engineering challenge on which many intellectual energies have been spent over the years, RL is also seen as one of the most dangerous approaches from an AI safety point of view. Letting a powerful misaligned RL agent loose into the world might wreck havoc on society in a way difficult to imagine for supervised or unsupervised artifacts. Now classic thought experiments range from agents repurposing all matter on Earth towards their goal of producing paperclips, or agents hedonistically stimulating pleasure centers of the brain en masse towards their goal of improving humanity’s well-being. Or the perverted way of reducing the number of viral infections in humans by reducing the number of humans. There’s the now classic comparison between how we see house cats and how such a superintelligent agent might see us, in case the kinetics of its intelligence explosion leading to us being irreversibly outsmarted. All compelling reasons why we should invest more in AI safety.
To review, solving RL is a challenging intellectual task, and solving RL right appears key to us not getting outsmarted by an intelligence misaligned with our goals.
Now, let’s dive into one specific influential algorithm used in RL, to get a better sense of the concrete inner workings of such an agent. The algorithm I want to describe here is called value iteration. Its core idea is that during the training phase, the agent is free to move around the state space (e.g. get into different Chess board configurations, move its robot arms in certain positions) in a pretty exploratory way. Whenever the agent gets one of those sparse rewards (e.g. captures a pawn, touches the object it has to grasp), the reward trickles back into nearby states. “If this board configuration lead me to get a reward soon after, then it has some value of its own.” After the reward signal propagates across the state space during training, the agent possesses a map of which states are most promising. Following this, trying to constantly move to the neighboring state with the highest value in a greedy way might get you quite some reward in the end – a pretty intelligent behavior being learned. Of course, this hugely oversimplified description fails to mention a whole range of technical challenges, but the essence of the algorithm is that the reward signal trickles across nearby states as estimated “values” before the agent comes in and simply takes the seemingly most valuable paths in the short-term.
We’ve come a long way on the first side of the conceptual bridge we’re building. Let’s now approach our construction project from the other side, so that we can finally join the two perspectives into a unifying mental model.
We’ll get started on the other side with classical conditioning. An influential theoretical model which explains why Pavlov’s dogs start salivating at the sound of the bell argues that the individual learns an association between the tasty treats and the bell. The sound of the bell appears to possess predictive power in determining when the treats will arrive, allowing the dogs to anticipate the disruption of homeostasis and, in turn, prepare for it on a physiological level. Similarly, the very scent of coffee lowers your blood pressure, and the very bottle of coke lowers your blood sugar, both as attempts to offset massive disruptions to homeostasis. Unfortunately, this leads to challenging cravings for people suffering from eating disorders and substance use, their bodies seeing intake as the key to re-establishing balance following the learned anticipatory response.
Besides those intricate learning mechanisms capitalizing on the predictive power of stimuli, encoding rich models of the world without conscious effort, another remarkable thing seems to happen. Even if you actively undermine the learned predictive association in a clinical setting, say through exposure with response prevention (i.e. forcing the binge eater to go through the moves preceding their eating spree, while making sure the eating spree does not happen), something else lingers around. It’s not only the predictive relation being learned, but the very valence of the response seems to trickle back into the perceived valence of the stimuli. The arachnophobic still perceives spiders as nasty little animals, even if they don’t believe they’ll get bitten and poisoned if they sit with them in the same room (predictive association learned by oral communication, for instance). The alcoholic finds alcohol more appealing than a control beverage, even if they stop experiencing the anticipatory cravings informed by a learned association.
This is called hedonic shift, and separate clinical interventions have been devised to counter the residual slippery slopes. For instance, the arachnophobic is asked to interact with the spider while listening to their favorite music and eating their favorite snacks, in trying to exercise a contravalent influence on their perception, restoring balance and steering away from future disruptive failure modes (e.g. continuing to avoid spiders, providing a fertile breeding ground for new dysfunctional beliefs). The artificial nudge helps them confront inaccurate beliefs with first-hand evidence, pushing their behavior out of the attractive local optimum with a little forced exploration. Note that while our understanding of those mechanisms has been largely informed by practical applications in psychopathology specifically, the same mechanisms are usually useful in maintaining homeostasis, avoiding existential threats, and survival in general.
The two sides of the bridge are now ready to be connected. In both the case of the RL agent and the human individual, there’s a common pattern of reward signals propagating backwards in time to previous states. The RL agent might perceive a board configuration as “valuable” due to its proximity to reward, while the human individual might perceive the scent of coffee as “pleasant” due to its proximity to reward. The RL agent might lack the incentive to find a better sequence of Chess moves due to the short-term risks of losing out via exploration, while a human individual might naturally lack the incentive to face their spider fear due to the predicted risk of serious injury. We force RL agents into exploration through algorithms like epsilon-greedy or softmax policies, which occasionally throw in suboptimal actions just for the sake of exploration, while we force phobics into exploration through occasional exposure therapy. Once the agent’s state value landscape is learned, decisions can be made instantaneously, by simply reaching for the immediately valuable states, while people are able to rely on their intuitions and gut-feelings for quick decisions based on preferences for positive valence. This last point might very well be the evolutionary rationale for the mechanism: being able to make effective decisions in complex environments while keeping it metabolically cheap. This is roughly the somatic marker hypothesis, introduced by Antonio Damasio in his landmark Descartes’ Error. The book is a long-form argument against Cartesian dualism, framing feelings as rich signals to be leveraged by the embodied and bounded-rational beings that we are. AI might be so too if you ask James Lovelock in the context of his Novacene book.
Isn’t it fascinating how eerily similar the value iteration algorithm is to the hedonic shift seen in classical conditioning? I find it quite poetic to think that top-notch academics, in trying to build intelligent artifacts which act rationally, have stumbled upon a mechanism which lies at the core of our primal physiology – an age-old piece of subcortical circuitry inherited from way up the genealogical tree. Your feelings appear to provide “state-of-the-art” guidance in navigating the world effectively, as an effect of being developed incrementally over millions of years. Being in touch with them might very well be an intellectual feat, if that’s what you’re into. When describing the strengths and weakness of the two instances, it almost seems a matter of word preference in different schools of thought (e.g. value versus valence, agent versus individual). What’s more, our experience with psychopathologies in clinical settings could inform ways around failure modes in RL, and maybe also the other way around. Maybe addressing Descartes’ alleged error and properly engaging with emotions might even help us avoid some of the grim catastrophic threats posed by AI.