Instrumentality

And so I take a step back.

Reducing moral uncertainty is instrumental to doing the right thing. After all, it is the process of getting to know what that thing is, and so making it infinitely more likely to be done. The alternative? Going all in on the few pockets of the moral realism roulette whose names we’ve picked up along our civilizational childhood. I wouldn’t take this bet.

And so I take a step back: how could one reduce uncertainty about an immaterial object such as morality? That’s the focus of the first volume of Elements of Computational Philosophy. In brief, the book pursues an operationalization of metaphysical truth-seeking which provably boils down to a particular computable function. Unfortunately, this line of work is nowhere near the point of providing a complete and reflexively robust operationalization of truth. Exciting progress has been made on one-tenth of the puzzle, yes, but there is so much to be done.

And so I take a step back: how could one make progress on such an operationalization? A while back, I saw four approaches.

First, I could continue with independent research, coordinating with external collaborators in favorable contexts. While the academic freedom would be second to none, this approach would lack the organizational capacity required for genuinely tackling such questions.

Second, I could work my way through organizations focused on related challenges, with technical alignment research being especially close in ambitions. While the capacity would be second to none, this approach would involve defaulting to particular counterproductive frames, implying limited effective academic freedom outside senior researchers.

Third, I could start a non-profit, again. While this would seemingly restore the autonomy of independent research, capacity would still hinge on a legible compatibility with particular counterproductive frames when it comes to fundraising.

Fourth, I could start a for-profit. The key appeal here is that the output responsible for producing legible value need not be exactly the same as the output responsible for advancing this line of work, as in the case of non-profits. There is some degree of freedom to pursue in-house research which is not directly productizable. However, it does bring its own challenges related to simultaneously aligning product to both market needs and this other goal of baking morality into autonomous systems. That said, even if navigating these challenges in the highly volatile market that is AI is non-trivial, the potential upside makes the approach seem promising on-net.

And so I take a step back: how could one make work on quantifying specific properties of AI systems commercially sustainable? There seem to be multiple such properties whose attested measurement and management are of near-future interest. This brings us to a theory of change for a commercial project focused on offerings related to tracking and sculpting properties of deployed models.

First, there is a shared focus on reliable evaluation infrastructure. If evaluations are abstracted away as maps between model parameters, activations, gradients, or input-output samples, and numerical values of interest, then different evaluations start having a lot in common. You want measurements with test-retest reliability, guarantees on bounds, efficiency and scalability, results attestation, etc. You want reliable measurements of AI systems, in general. An automated proof of the moral defensibility of a model could end up having a lot in common, conceptually, with one about its adversarial robustness.

Second, there is a shared focus on distribution of measurement instruments. This involves everything from precedents of coordinating with institutions and AI labs, all the way to having artifacts readily available across the cloud hyperscalers. In other words, meaningful operationalizations need to be made available if they are to end up shaping the developmental psychology of AI systems. Imagine authorities having the capacity for widespread attestation of moral defensibility in friendshored models.

Third, the consumption of legibly useful measurements and auxiliary infrastructure by AI labs, third-party integrators, institutions, etc. could help fund research into these more charged, loaded operationalizations. That said, the tractability and exact shape of the product and market are beyond the scope of this note.

Fourth, even if this direction turns out to not be what ought to be done, there are a number of side effects which might still make it worthwhile. For instance, the management of dual-use capabilities might help mitigate misuse risks. In other words, there appears to be a reasonable amount of defensibility to this course of action even when deferring prioritization on what needs to be done to global priorities research.

There are many other branches left behind in this backward chaining. For instance, effective communication about this line of work might also be high-leverage, as might more actively engaging with proponents of other frames. Alternatively, one might argue that another for-profit angle aligned with reducing moral uncertainty could be automated reasoning more broadly, and so on. Though branches need to be pruned at some point, for better or worse.