A probe-then-plan LLM system deployed at JD.com lifts relevant recall and conversion, producing measurable increases in gross merchandise value; lightweight retrieval probes let the planner ground decisions in live inventory without incurring prohibitive latency.
Modern e-commerce search is evolving to resolve complex user intents. While Large Language Models (LLMs) offer strong reasoning, existing LLM-based paradigms face a fundamental blindness-latency dilemma: query rewriting is agnostic to retrieval capabilities and real-time inventory, yielding invalid plans; conversely, deep search agents rely on iterative tool calls and reflection, incurring seconds of latency incompatible with industrial sub-second budgets. To resolve this conflict, we propose Environment-Aware Search Planning (EASP), reformulating search planning as a dynamic reasoning process grounded in environmental reality. EASP introduces a Probe-then-Plan mechanism: a lightweight Retrieval Probe exposes the retrieval snapshot, enabling the Planner to diagnose execution gaps and generate grounded search plans. The methodology comprises three stages: (1) Offline Data Synthesis: A Teacher Agent synthesizes diverse, execution-validated plans by diagnosing the probed environment. (2) Planner Training and Alignment: The Planner is initialized via Supervised Fine-Tuning (SFT) to internalize diagnostic capabilities, then aligned with business outcomes (conversion rate) via Reinforcement Learning (RL). (3) Adaptive Online Serving: A complexity-aware routing mechanism selectively activates planning for complex queries, ensuring optimal resource allocation. Extensive offline evaluations and online A/B testing on JD.com demonstrate that EASP significantly improves relevant recall and achieves substantial lifts in UCVR and GMV. EASP has been successfully deployed in JD.com's AI-Search system.
Summary
Main Finding
Environment-Aware Search Planning (EASP) resolves the blindness-latency dilemma in LLM-based e-commerce search by grounding planning in a real-time retrieval snapshot. Using a lightweight "Probe-then-Plan" workflow, EASP produces execution-valid search plans with sub-second serving latency. Offline evaluations and online A/B testing at JD.com show substantial increases in relevant recall and meaningful lifts in user conversion (UCVR) and gross merchandise value (GMV). EASP has been deployed in JD.com's production AI-Search system.
Key Points
- Problem: Existing LLM paradigms face a blindness-latency tradeoff
- Query rewriting is blind to retrieval capabilities and live inventory, producing plans that cannot be executed.
- Deep search agents that call retrieval tools iteratively add seconds of latency—unacceptable for industrial sub-second SLAs.
- Solution: Environment-Aware Search Planning (EASP)
- Reformulates search planning as dynamic reasoning grounded in an explicitly probed environment.
- Probe-then-Plan mechanism:
- Retrieval Probe: a lightweight call that returns a retrieval snapshot (what the current retrieval stack and inventory would yield).
- Planner: consumes that snapshot, diagnoses execution gaps, and outputs grounded, executable search plans.
- System lifecycle (three stages):
- Offline Data Synthesis: A Teacher Agent diagnoses probed environments and synthesizes diverse, execution-validated plans to build training data.
- Planner Training & Alignment: Planner is first trained by supervised fine-tuning (SFT) to learn diagnostic planning, then aligned to business objectives (conversion) via reinforcement learning.
- Adaptive Online Serving: Complexity-aware routing selectively invokes the Planner only for complex queries to balance effectiveness and latency/cost.
- Production impact: Improves relevant recall and drives higher conversion and GMV in A/B tests; successfully integrated into JD.com’s AI-Search pipeline.
Data & Methods
- Inputs and signals:
- Lightweight retrieval probe snapshots capturing what the retrieval system would return given the live index and inventory constraints.
- Query logs and contextual signals used to detect complexity and routing decisions.
- Offline data synthesis:
- A Teacher Agent inspects probe snapshots and generates multiple candidate, execution-validated search plans. These plans reflect real retrieval constraints to avoid infeasible instructions.
- Model training:
- Supervised Fine-Tuning (SFT) on teacher-generated, execution-validated plans to teach diagnostic reasoning and grounded plan generation.
- Reinforcement Learning (RL) to align planner outputs with business metrics (explicitly using conversion/UCVR as the objective signal).
- Serving architecture:
- Probe is implemented as a lightweight retrieval call to minimize added latency.
- Complexity-aware routing classifier decides which queries warrant full planning; lower-complexity queries use cheaper heuristics to preserve sub-second response times.
- Evaluation:
- Offline metrics: relevant recall and execution validity of generated plans.
- Online metrics: UCVR (user conversion rate) and GMV measured via A/B testing on the JD.com platform.
- Deployment constraints:
- Must respect stringent latency budgets; selective invocation and a minimal probe footprint enable production viability.
Implications for AI Economics
- Platform-level revenue and efficiency:
- Environment-aware planning improves matching quality, increasing conversion and GMV—directly affecting platform revenue and marketplace liquidity.
- Reducing invalid or infeasible plans decreases wasted impressions and improves the return on ranking/retrieval computation.
- Cost–benefit and resource allocation:
- Complexity-aware routing demonstrates that selective use of expensive AI planning can yield high marginal returns while keeping operational costs and latency low.
- The Probe-then-Plan paradigm monetizes a small upfront retrieval cost that prevents larger downstream inefficiencies.
- Incentives and market dynamics:
- Better execution-valid recommendations change seller exposure and demand patterns; this can shift competitive dynamics and advertising auction equilibria.
- Aligning planners with conversion metrics via RL optimizes for platform objectives but raises questions about externalities (e.g., promotion bias toward higher-margin items).
- Generalizability and policy considerations:
- The EASP approach is applicable to other marketplaces and real-time decision systems that require grounding plans in live environments (inventory, bids, constraints).
- Economic evaluation should include second-order effects (seller strategy, fairness, search neutrality) and measure consumer surplus, not just platform GMV.
- Research takeaway:
- Incorporating environment probes into reasoning pipelines is a high-leverage design pattern: it reduces model blind spots at low latency cost and enables principled alignment of model behavior with economic objectives.
Assessment
Claims (9)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Environment-Aware Search Planning (EASP) resolves the blindness-latency dilemma in LLM-based e-commerce search by grounding planning in the real retrieval environment via a Probe-then-Plan mechanism. Other | positive | medium | ability to produce environment-grounded search plans that address execution gaps (qualitative/behavioral outcome) |
0.36
|
| The Probe-then-Plan mechanism uses a lightweight Retrieval Probe to expose the retrieval snapshot, enabling the Planner to diagnose execution gaps and generate grounded search plans. Other | null_result | high | retrieval snapshot exposure and Planner diagnostic output (implementation/functional outcome) |
0.6
|
| EASP's Offline Data Synthesis stage: a Teacher Agent synthesizes diverse, execution-validated plans by diagnosing the probed environment. Other | null_result | high | synthesized execution-validated search plans (data generation outcome) |
0.6
|
| The Planner is trained via Supervised Fine-Tuning (SFT) to internalize diagnostic capabilities and then aligned with business outcomes (conversion rate) via Reinforcement Learning (RL). Other | null_result | high | Planner diagnostic behavior and policy alignment with conversion rate (model training outcome) |
0.6
|
| A complexity-aware routing mechanism selectively activates planning for complex queries, ensuring optimal resource allocation during online serving. Other | null_result | medium | selective activation of planning (system routing/resource allocation outcome) |
0.36
|
| Extensive offline evaluations and online A/B testing on JD.com show that EASP significantly improves relevant recall. Output Quality | positive | medium | relevant recall (retrieval effectiveness metric) |
0.36
|
| Online A/B testing on JD.com demonstrates that EASP achieves substantial lifts in UCVR (user conversion rate) and GMV (gross merchandise volume). Firm Revenue | positive | medium | UCVR (user conversion rate) and GMV (gross merchandise volume) |
0.36
|
| EASP has been successfully deployed in JD.com's AI-Search system. Adoption Rate | positive | medium | deployment status in production (operational outcome) |
0.36
|
| EASP offers a practical tradeoff between reasoning quality and latency by avoiding iterative LLM tool-calls at inference time while still producing grounded plans. Task Completion Time | positive | medium | inference latency vs. reasoning/plan validity tradeoff (system performance outcome) |
0.36
|