A system called Echo recycles users' corrections into training data and lifts code-completion acceptance rates from 25.7% to 35.7% in production, showing that continuous learning from real interaction logs can materially improve AI performance. The finding suggests a scalable path to break static performance ceilings where users routinely refine agent outputs.

Echo: Learning from Experience Data via User-Driven Refinement

Hande Dong, Xiaoyun Liang, Jiarui Yu, Jiayi Lin, Changqing Ai, Feng Liu, Wenjun Zhang, Rongbi Wei, Chaofan Zhu, Linjie Che, Feng Wu, Xin Shen, Dexu Kong, Xiaotian Wang, Qiuyuan Chen, Bingxu An, Yueting Lei, Qiang Lin · May 21, 2026

arxiv quasi_experimental medium evidence 7/10 relevance Source PDF

Echo converts users' refinement sequences from agent interactions into training signals, and in production raises code-completion acceptance rates from 25.7% to 35.7%.

Static "human data" faces inherent limitations: it is expensive to scale and bounded by the knowledge of its creators. Continuous learning from "experience data" - interactions between agents and their environments - promises to transcend these barriers. Today, the widespread deployment of AI agents grants us low-cost access to massive streams of such real-world experience. However, raw interaction logs are inherently noisy, filled with trial-and-error and low information density, rendering them inefficient for direct model training. We introduce Echo, a generalized framework designed to operationalize the transition from raw experience to learnable knowledge, effectively "echoing" environmental feedback back into the training loop for model optimization. In today's agent ecosystem, user refinement serves as a primary source of such feedback: driven by responsibility for the outcome, users rigorously transform flawed agent proposals into verified solutions. These user-driven refinement sequences inherently distill agents' crude attempts into high-quality training signals. Echo systematically harvests these signals to continuously align the agent with real-world needs. Large-scale validation in a production code completion environment confirms that Echo effectively harnesses this pipeline, breaking the static performance ceiling by increasing the acceptance rate from 25.7% to 35.7%.

Summary

Main Finding

Echo is a practical, environment-agnostic framework that converts noisy agent interaction logs into high-fidelity training targets by harvesting user-driven refinements. Implemented in a production code-completion service at Tencent, Echo increased online acceptance rate from 25.7% to 35.7% (absolute +10 pp), demonstrating that mining users’ corrective edits yields dense, scalable supervision that breaks the static-data performance ceiling.

Key Points

Conceptual shift: move from static, expensive human datasets to continuous "experience data" (agent–environment interactions), with user-driven refinement as the primary mechanism that embeds missing, task-specific knowledge into final artifacts.
Three pillars:
- Experience acquisition: capture sequential state → agent proposal → user edits from deployed services.
- Knowledge extraction: identify the final committed state (C_N) after user edits as the verified ground truth for the original context (C0).
- Model optimization: train the agent to directly produce C_N from C0, minimizing dist(Agentθ(C0), C_N).
Domain-agnostic: although demonstrated in code completion, the C0→C1→C_N pattern applies to many agent workflows where accountable stakeholders finalize outcomes.
Pipeline highlights (code completion):
- Continuous request / lifecycle tracking using static anchors (prefix/suffix) to define the completion gap and monitor when the gap is filled or broken.
- Gap-based extraction: when anchors break, extract the content between them as the commit (C_N).
- Intent-aligned truncation: a Teacher instruction-following LLM trims/segments C_N to the atomic unit corresponding to the original trigger (no change of logic, only scoping and readability).
- LLM-based verifiers and syntactic/quality filters (boundary checks, PII redaction, syntactic correctness, perplexity filters) to denoise and avoid poisoning.
- Data-proportioning and lifecycle monitoring to maintain useful length and difficulty distributions.
Advantages versus alternatives:
- Provides dense, instructive supervision (the actual corrected artifact) versus sparse scalar rewards used in RL.
- Scales with deployed user base (low marginal cost per new interaction) and demonstrated continuous, non-saturating gains with more data.
Empirical outcomes: industrial-scale deployment with a significant online metric uplift, generalization to external users (not overfitting to specific editing patterns), and observed scaling benefits.

Data & Methods

Data source: large-scale interaction logs from a production code auto-completion environment (Tencent Cloud / CodeBuddy). Logs contain continuous prefix/suffix updates, agent proposals, user accepts/edits and final commits.
Extraction method:
- Define static anchors at request time (P1, S1). Track document evolution and cursor position.
- When anchors are broken or the user leaves the editing region, extract the content that filled the gap as the commit C_N.
Signal processing:
- Use a Teacher LLM to truncate/segment raw C_N into the atomic completion unit that matches the original trigger (preserve logic; remove extraneous future context).
- Apply LLM-based verifier to enforce boundary rules (silence vs. fill), remove PII, check syntax and variable consistency, and filter by perplexity/quality.
- Adjust sample proportions by length and void/non-void ratios to produce balanced training data.
Optimization:
- Supervised objective: minimize divergence between the agent’s output for context C0 and the verified commit C_N sampled from Echo data: L_echo(θ) = E_{(C0,C_N)∼D_echo}[dist(Agentθ(C0), C_N)].
- Training pipeline integrates the distilled, denoised dataset into continual model updates (online/offline cycles).
Evaluation:
- Online metric: acceptance rate of code completions increased from 25.7% to 35.7%.
- Additional analyses showed generalization beyond the original user cohort and monotonic improvement with more Echo data (no immediate saturation reported).

Implications for AI Economics

Data-as-a-product economics:
- Lowers marginal cost of high-quality supervision by monetizing corrections already produced in normal workflows; reduces reliance on costly curated annotation campaigns.
- Platforms with large active user bases gain sustained, proprietary training pipelines—creating stronger data-driven moats and competitive advantages.
Incentives and labor effects:
- Shifts value from external annotators toward end users (developers, professionals) whose edits implicitly generate training data. This raises questions on whether and how to compensate users for their contribution/value capture.
- Potentially reduces demand for traditional annotation labor but increases the value of platforms that enable high-quality interaction capture and verification tooling.
Productivity and pricing:
- Faster model improvement from Echo-style signals can accelerate feature rollout and efficiency gains (e.g., higher completion acceptance → fewer keystrokes → higher developer productivity). That can alter pricing power for platform providers and raise consumer surplus for users.
Externalities, governance and risks:
- Privacy/compliance: mining user edits (especially code) raises IP and PII exposure risks; rigorous redaction, consent, and contractual clarity are needed.
- Feedback loops and overfitting risk: models trained on user-refined outputs could amplify common user patterns or proprietary coding styles; platforms must guard against reinforcement of suboptimal practices.
- Market concentration: platforms that can deploy Echo at scale may consolidate advantage, increasing concentration in developer tooling and downstream markets.
- Regulatory and ethical issues: need for transparency about data reuse, opt-out mechanisms, and possible revenue-sharing for substantive user contributions.
Strategic implications:
- Echo suggests a durable path to continual model improvement without always relying on larger pretraining datasets—favoring strategies that embed model learning into product usage.
- Firms should invest in instrumentation, lifecycle tracking, de-identification, and verifier tooling to unlock this low-cost supervision while managing legal/ethical constraints.

If you want, I can (a) extract a concise one-page slide-ready bullet list, (b) draft suggested economic metrics to measure Echo’s value for a platform (e.g., marginal unit value of a corrected edit, user rent capture), or (c) outline governance/compensation designs for user-sourced training signals.

Assessment

Paper Typequasi_experimental Evidence Strengthmedium — The result comes from a large-scale production environment (high external validity) and shows a sizable, concrete gain (acceptance rate +10 percentage points). However, the excerpt does not report randomization, explicit controls for temporal or usage confounders, or robustness checks, so causal attribution to Echo is plausible but not firmly established. Methods Rigormedium — The approach—systematically harvesting user refinement sequences as training signal—is conceptually sound and validated at scale, suggesting solid engineering and empirical work. Missing from the excerpt are details about experimental design (randomized A/B testing), train/test splits, ablations, statistical significance, and handling of selection biases, which limits assessment of methodological rigor. SampleInteraction logs and user refinement sequences from a production code-completion environment (agents' proposals and users' verified refinements), used to produce training data and measure model acceptance rates; exact sample size, time span, user composition, and platform specifics are not provided in the excerpt. Themesproductivity human_ai_collab IdentificationEvaluated via large-scale production deployment comparing acceptance rates before and after (or across cohorts) when Echo's harvested refinement signals were used for continuous training; no randomized controlled trial or clear counterfactual / pre-registration is reported in the excerpt, so causal inference relies on deployment-level performance changes rather than randomized assignment. GeneralizabilityResults are from a code-completion product and may not generalize to other domains (e.g., dialog, vision, robotics)., Dependent on environments where users actively refine and verify agent outputs—requires engaged, skilled users., Platform- and user-population-specific behavior (developer workflows, tooling) may limit transfer to other firms or user bases., Unclear how well the method scales where refinement is rare or lower-quality (low information density)., Without randomized assignment, improvements may reflect concurrent system or user-behavior changes rather than the Echo pipeline itself.

Claims (7)

Claim	Direction	Confidence	Outcome	Details
Static 'human data' is expensive to scale and bounded by the knowledge of its creators. Other	negative	high	scalability and knowledge coverage of human-generated training data	0.08
Continuous learning from 'experience data' (interactions between agents and their environments) promises to transcend the scalability and knowledge limitations of static human data. Other	positive	high	ability to overcome limitations of static human data	0.08
Widespread deployment of AI agents provides low-cost access to massive streams of real-world experience data. Other	positive	high	availability and cost of experience data from deployed agents	0.24
Raw interaction logs are inherently noisy, contain trial-and-error and low information density, and are inefficient for direct model training. Other	negative	high	information density and training-efficiency of raw interaction logs	0.24
Echo is a generalized framework that operationalizes the transition from raw experience to learnable knowledge by echoing environmental feedback into the training loop for model optimization. Other	positive	high	process of converting raw experience data into training signals	0.08
User-driven refinement sequences distill agents' flawed proposals into high-quality training signals. Other	positive	high	quality of training signals produced via user refinement	0.48
Large-scale validation in a production code completion environment shows Echo increased the acceptance rate from 25.7% to 35.7%. Output Quality	positive	high	acceptance rate of code completions	increase from 25.7% to 35.7% 0.48