It seems sensible that nudging knowledge base expansion towards "relevant" regions (i.e. on the topic of verification target) would help alleviate a combinatorial explosion in knowledge base size. However, what's the upper bound of how much expansion heuristics can help?
Text is cheap to store, and actions might be too, so the knowledge base itself might be relatively large without major technical issues (e.g. billions of items). Semantic search using bi-encoders scales pretty nicely assuming cached embeddings, due to efficient approximate neighbor algorithms. The main bottleneck appear to be the pair-wise ops between the relevant knowledge base items and the verification target. Additionally, each expansion step is relatively costly (e.g. requires LLMs).
Expansion heuristics might help guide the investment of compute during expansion steps towards relevant regions, but their use in the entailment might be limited. On a related note, knowledge base expansion appears quite parallelizable if focusing on different patches at the same time or different self-play debate runs.