How you apply PRF matters more than where it comes from: across 13 low-resource retrieval tasks, the feedback model drives retrieval gains more than feedback source; LLM-generated feedback is the cheapest high-return option unless a high-quality first-stage retriever supplies strong candidate documents, in which case corpus-derived feedback outperforms.
Pseudo-relevance feedback (PRF) methods built on large language models (LLMs) can be organized along two key design dimensions: the feedback source, which is where the feedback text is derived from and the feedback model, which is how the given feedback text is used to refine the query representation. However, the independent role that each dimension plays is unclear, as both are often entangled in empirical evaluations. In this paper, we address this gap by systematically studying how the choice of feedback source and feedback model impact PRF effectiveness through controlled experimentation. Across 13 low-resource BEIR tasks with five LLM PRF methods, our results show: (1) the choice of feedback model can play a critical role in PRF effectiveness; (2) feedback derived solely from LLM-generated text provides the most cost-effective solution; and (3) feedback derived from the corpus is most beneficial when utilizing candidate documents from a strong first-stage retriever. Together, our findings provide a better understanding of which elements in the PRF design space are most important.
Summary
Main Finding
When using LLM-based pseudo-relevance feedback (PRF), the choice of feedback model (how feedback is applied) critically affects retrieval effectiveness, and feedback source matters differently depending on context: LLM-generated feedback is the most cost-effective overall, while corpus-derived feedback helps most when candidate documents come from a strong first-stage retriever.
Key Points
- PRF design decomposes into two independent dimensions:
- Feedback source: where the feedback text comes from (e.g., LLM-generated text vs. text drawn from the corpus).
- Feedback model: how that feedback text is used to refine the query representation.
- Prior work often conflates these two dimensions; this study isolates them through controlled experiments.
- Across 13 low-resource BEIR tasks and five LLM PRF methods:
- Feedback model choice can have a larger impact on retrieval quality than feedback source.
- Purely LLM-generated feedback yields the best cost-effectiveness (good performance for lower cost).
- Corpus-derived feedback becomes most useful only when the retrieval pipeline already supplies strong candidate documents from a high-quality first-stage retriever.
- The results clarify which elements of the PRF design space are most important to prioritize in practice.
Data & Methods
- Tasks: 13 low-resource retrieval tasks from the BEIR benchmark suite.
- Methods: Evaluation of five LLM-based PRF methods, systematically varying:
- Feedback source (LLM-generated text vs. corpus-derived text).
- Feedback model (the mechanism that incorporates feedback into query refinement).
- Experimental design: Controlled experiments that disentangle the independent effects of source and model, and that examine performance under differing strengths of the first-stage retriever.
- Metrics and costs: Effectiveness measured by standard retrieval metrics (as typical in BEIR studies); cost-effectiveness assessed by considering the tradeoff between LLM invocation cost and retrieval gains.
Implications for AI Economics
- Cost allocation: Organizations should consider LLM-generated feedback as a high-return, lower-cost PRF option for low-resource retrieval tasks, which can reduce expenses tied to corpus annotation or expensive retrieval pipelines.
- Investment priorities: Greater ROI may come from investing in better feedback models (how to use feedback) than solely collecting richer feedback sources. Improving the feedback-model component can yield larger performance gains.
- System design trade-offs: If investing in a strong first-stage retriever is feasible, augmenting it with corpus-derived feedback can further improve outcomes; otherwise, LLM-generated feedback is the more economical default.
- Adoption strategy: Firms and platforms deploying retrieval-augmented systems should evaluate the marginal benefit per dollar of stronger retrievers versus more sophisticated feedback-models or LLM calls when designing retrieval stacks.
- Policy and accessibility: Cost-effective LLM-generated PRF lowers the barrier to building competitive retrieval systems in low-resource domains, which can democratize access to advanced search tools across smaller organizations and research groups.
Assessment
Claims (10)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| PRF design decomposes into two independent dimensions: feedback source (where feedback text comes from) and feedback model (how that feedback is used to refine the query). Other | positive | high | PRF design components (feedback source vs. feedback model) |
0.12
|
| Prior work often conflates feedback source and feedback model; this study isolates them through controlled experiments. Other | negative | medium | Degree to which prior studies separate PRF design dimensions (methodological assessment) |
0.07
|
| Feedback model choice can have a larger impact on retrieval quality than feedback source. Output Quality | positive | medium | Retrieval effectiveness (standard BEIR retrieval metrics) |
n=13
0.07
|
| Purely LLM-generated feedback yields the best cost-effectiveness overall (best performance per unit LLM invocation cost) for low-resource retrieval tasks. Organizational Efficiency | positive | medium | Cost-effectiveness (retrieval gains per LLM invocation cost) |
n=13
0.07
|
| Corpus-derived feedback becomes most useful only when the retrieval pipeline already supplies strong candidate documents from a high-quality first-stage retriever. Output Quality | mixed | medium | Retrieval effectiveness conditional on first-stage retriever quality |
n=13
0.07
|
| Across 13 low-resource BEIR tasks and five LLM PRF methods, the choice of feedback model (how feedback is applied) critically affects retrieval effectiveness. Output Quality | positive | medium | Retrieval effectiveness (standard BEIR metrics) |
n=13
0.07
|
| The study's results clarify which elements of the PRF design space are most important to prioritize in practice (i.e., prioritize feedback-model improvements over source collection in many low-resource settings). Output Quality | positive | medium | Relative impact on retrieval performance and cost-effectiveness |
n=13
0.07
|
| Organizations should consider LLM-generated feedback as a high-return, lower-cost PRF option for low-resource retrieval tasks to reduce expenses tied to corpus annotation or expensive retrieval pipelines. Organizational Efficiency | positive | low | Economic metric: return (retrieval gains) per dollar spent on LLM invocations or corpus annotation |
n=13
0.04
|
| Greater ROI may come from investing in better feedback models (how to use feedback) than solely collecting richer feedback sources. Organizational Efficiency | positive | medium | Return on investment (performance improvement per resource invested in model vs. source) |
n=13
0.07
|
| If investing in a strong first-stage retriever is feasible, augmenting it with corpus-derived feedback can further improve outcomes; otherwise, LLM-generated feedback is the more economical default. Output Quality | mixed | medium | Retrieval effectiveness and cost-effectiveness conditional on first-stage retriever strength |
n=13
0.07
|