🌿  back

21.46 YRS

structure and function

Excited to share that Youssef and I published velma, a new project investigating a few technical approaches to inferring one’s beliefs from the writing they authored, complete with write-up, demo, diagrams, code, etc. Also, I slightly updated my projects page (see themes section) and bio to reflect my latest focus.


The brain is to neural activity what an ML model is to latent activations. The brain itself and the ML model are relatively static, inert, difficult to meaningfully change in a short time. In contrast, neural activity and latent activations are much more fleeting, transient, easy to influence from one moment to the next. If we were to use terms typical of neuroscience, we might describe the brain and ML model as comprising the structural layer, while neural activity and latent activations would make up the functional layer.

The structural layer greatly influences the state of the functional layer over time, its dynamics. Your brain is probably wired to instantly trigger a vaguely repulsive state on sight of snakes and spiders. Side note, talking of causality gets us into tricky ethical territory, so let’s instead choose the framing of one’s brain heavily predisposing their thought pattern to unfold in a certain way – you still have moral obligations for behaving yourself in the face of primordial impulses. Going back to structure-function interactions, the influence is mutual. Sure, your brain’s wiring might influence its real-time dynamics (e.g. reactions, thought patterns, etc.), but what you’re currently thinking about now also changes the brain’s structure. Think Hebb’s rule and the monumental feat of learning. When thinking about function-to-structure influence in terms of dynamics, a sleigh leaving a trail as it passes through snow and priming future rides to follow suit might be a useful image.

Over on the artificial side, an ML model’s paramaters (i.e. its “wiring”) heavily determine how processing on particular inputs unfolds. The weights are then updated based on the fleeting latent activations, as deemed fit by the optimizer. In reality, this simple forward/backward dichotomy isn’t mirrored well in the brain. There appear to be entire cascades of learning mechanisms operating at different time scales, making structural changes which are more fleeting than others and used in transient computations. The brain can vary the density of receptors in a synapse, the number of synapses between two neurons, etc. In a sense, this means that structure-function is more of a spectrum than a dichotomy, but the simplified model still seems handy as a starter. Hyperparameter tuning and meta-learning paradigms in ML could be considered tiny steps in going beyond the dichotomy, though.

To help drill down the analogy, consider neuroimaging paradigms. Functional tools like EEG/ECoG/MEG, fMRI, PET, etc. are typically used to record what happens in the brain in the present moment. EEG/ECoG/MEG try to pick up awkward electric/electromagnetic echoes of neural activity, fMRI tries to infer where stuff is happening based on where oxygen is being supplied (i.e. “This is primary visual cortex requesting backups, over. Copy that, support is on the way.”), and PET injects a (harmless) radioactive beacon and tracks it through the brain in real time. In contrast, structural tools like MRI or CT are typically used to obtain a static image of the tangible, physical parts of the brain which won’t go anywhere for the time being. This involves the cartography of gray matter and white matter, the overall shape of the brain, etc.

Over on the artificial side, functional neuroimaging techniques can be informally likened to post-hoc transparency tools which generate local explanations (where global explanations become the object of psychology). How does an ML model go about processing this particular input? How does processing unfold as information propagates through deeper layers? Where exactly is a certain computation performed? With AI, structural analogues aren’t that juicy as with the brain, because we usually have a perfect account of what is connected to what, how many neurons and layers, etc. Training regimes which also search for architectures might be an exception, and plotting the computational graph could be analogous to conducting an MRI scan. However, you can always investigate the structural connectivity realized by the model’s weights and uncover whole artificial circuits, an approach favored by Olah &co. at Anthropic.

Erratum: Someone who actually knows their stuff around neuroscience mentioned that there’s a subtle distinction between the structural neuroimaging techniques aiming to recover connectivity and the ones trying “merely” to map out regions of the brain containing gray or white matter. “Just by looking at MRI images it’s usually not possible to infer which areas are connected to which (although with stronger magnets we are getting closer https://www.nature.com/articles/d41586-018-07182-7),” argues Jan.

Backtracking to structure-function interactions, we could try to sketch a formalism of it. Try imagining a structural space, where each location is linked to a certain way of wiring the brain. Now, imagine a functional space, where each location is linked to a mental state, and there are as many dimensions as there are neurons in the brain. The functional space is then populated by a field which drives the flow from one thought to the next, a bit like the weather forecast showing a map of winds as currents through the atmosphere. Here, we’re talking thought patterns through functional space instead. Now, let a particular location in structural space be equivalent to a particular field across functional space. Move through structural space, and you redefine movement through functional space. Rewire your brain, and you change the way you react in the moment.

Over on the artificial side, structural space can be identified with model space (e.g. the space of all ways of parameterizing GPT-3). Similarly, the functional space can be identified with latent space, the set of all possible latent activations which could manifest themselves inside a model at inference. Fine-tune your model, and you change the way it responds to inputs at inference.

I wanted to touch on a couple other things, but this article is already getting pretty dense. Expect to see at some point (1) a napkin-sketch explanation of a few psychological phenomena based on the formalism above, and (2) a few possible HCI and BCI paradigms based on dynamics (functional level) and dynamics of dynamics (structural level) as atoms. Those future articles will probably be off-spring of this one and: expecting unexpected ideas, breaking frames, dynamical systems online, and dixit kernel functions.