What are useful axes to describe abstraction inductors along?
In general abstraction inductors can be used to, well, induce particular abstractions into an ML model's conceptual framework. Those "tweaks" to this ontology which naturally emerges during training can be be split along various axes or spectra.
- narrow to wide scope of desired modification: If one wants to add, remove, or edit a single concept in the ML model's ontology, then it can be described as having a relatively narrow scope. In contrast, if one wants to disrupt the interrelations between many concepts, the intervention can be described as having a relatively wide scope.
- load of modification target: If one wants to modify a concept which is already loaded with connotation and nuance (e.g. "human values"), the modification target is relatively heavy. If one wants to tweak a novel tabula rasa concept, the target is relatively light.
Using those axes, it becomes easier to describe a given intervention. For instance, consider narrow edits to a light target (e.g. a new token) towards the conjunction of nice-to-have concepts (e.g. happiness, justice, etc.), before using it to specify an objective function.