Evidence (13827 claims)
Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 195 | 97 | 889 | 1979 |
| Governance & Regulation | 815 | 391 | 188 | 121 | 1539 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 624 | 233 | 123 | 96 | 1084 |
| Research Productivity | 410 | 121 | 56 | 331 | 929 |
| Output Quality | 466 | 177 | 59 | 47 | 749 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 166 | 122 | 24 | 495 |
| Task Allocation | 206 | 64 | 70 | 31 | 376 |
| Skill Acquisition | 165 | 57 | 60 | 17 | 299 |
| Innovation Output | 201 | 27 | 41 | 18 | 288 |
| Employment Level | 105 | 51 | 107 | 13 | 278 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 149 | 46 | 26 | 3 | 224 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 61 | 20 | 12 | 182 |
| Error Rate | 69 | 91 | 10 | 2 | 172 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 92 | 19 | 13 | 19 | 145 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Skill Obsolescence | 5 | 45 | 6 | 1 | 57 |
| Creative Output | 31 | 16 | 7 | 2 | 57 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Agentic Technical Debt and Stochastic Tax are related but distinct: debt can amplify the tax.
Theoretical relationship asserted in the structural model; the note states debt can amplify the recurring Stochastic Tax and provides model expressions and discussion (and illustrative simulation) to substantiate the relationship.
Combining both levers yields a 502% improvement on single-cell RNA denoising over the initial baseline.
Reported experimental result in the paper comparing SIA to the initial baseline on the single-cell RNA denoising task (denoising metric unspecified in abstract).
Combining both levers yields a 91.9% runtime reduction on GPU kernels over the initial baseline.
Reported experimental result in the paper comparing SIA to the initial baseline on the low-level GPU kernel optimisation task (runtime measured).
Combining both levers yields a 56.6% gain on LawBench (Chinese legal charge classification) over the initial baseline.
Reported experimental result in the paper comparing SIA to the initial baseline on the LawBench task.
Combining both levers (harness updates and weight updates) outperforms scaffold iteration alone on all three benchmarks.
Empirical comparison reported in the paper: experiments across the three domains comparing SIA (combined harness+weight updates) against scaffold-iteration-only baseline.
These results show that per-query configuration of the full retrieval pipeline is a practical alternative to static workload-level tuning.
Authors' conclusion drawn from the reported empirical results on MuSiQue, BrowseComp-Plus, and FinanceBench demonstrating BRANE's performance advantages.
BRANE outperforms LLM-routing, rule-based, and fine-tuned Qwen3-4B baselines.
Empirical comparisons against specified baselines (LLM-routing, rule-based approaches, and fine-tuned Qwen3-4B) across the reported benchmark sets. The text does not provide numeric performance metrics or sample sizes in the excerpt.
BRANE matches the best fixed configuration's accuracy at up to 89% lower cost.
Empirical result reported by the authors based on experiments on the named benchmarks (MuSiQue, BrowseComp-Plus, FinanceBench). The provided text states the magnitude ('up to 89% lower cost') but does not give sample sizes or confidence intervals.
Across MuSiQue, BrowseComp-Plus, and FinanceBench, BRANE consistently pushes the cost-quality Pareto frontier.
Empirical evaluation reported on three benchmark suites: MuSiQue, BrowseComp-Plus, and FinanceBench. The claim is based on experimental comparisons across these datasets; the paper does not state numeric sample sizes in the provided text.
There exists a data supply chain that runs from individual translators through language service providers (LSPs) and platforms to model developers.
Mapping and descriptive analysis of industry supply chains and intermediary roles provided in the paper; conceptual and empirical examples of flows of translation data from translators to model developers. No numerical sample reported.
Article 30-4 of the Japanese Copyright Act legitimates a mode of use the paper terms 'appropriation without consumption'—i.e., mining works for statistical features rather than reading or experiencing them.
Textual/legal analysis of Article 30-4 of the Japanese Copyright Act and its interpretation; comparative legal reading presented in the paper. No numerical sample reported.
The development of statistical machine translation (SMT), neural machine translation (NMT), the Transformer architecture, and multilingual large language models (LLMs) cannot be disentangled from the accumulation of translation data (TM/parallel corpora).
Historical and technical literature review linking MT/NLP methodological advances to the availability and use of parallel corpora and TM; comparative analysis of model development histories described in the paper. No numerical sample reported.
Translation memories (TM) and parallel corpora preserve a one-to-one correspondence between source and target text and therefore constitute extraordinarily valuable supervised training data for machine translation.
Conceptual argument and literature review of machine translation practice (discussion of TM/parallel corpora as supervised training data); examples and descriptive evidence from MT research and industry practice presented in the paper. No numerical sample reported.
To balance promotion of innovation with preservation of human creativity, it is essential to revise existing laws and introduce novel approaches such as defining a specific intellectual property right for AI-generated works or designating ownership among associated human agents.
Normative recommendation derived from the paper's comparative legal analysis and discussion of enforcement challenges (no empirical sample size).
Artificial intelligence systems are capable of autonomously generating artistic, literary, musical works, and even inventions without direct human intervention.
Stated as part of the paper's premise and supported by the paper's literature/theoretical review of advances in AI creative and inventive capabilities (no empirical sample size reported).
EmoDistill learns skills from offline agent-to-agent interactions, avoiding costly online negotiation during training.
Methodological claim that training is performed offline using recorded agent-to-agent interaction data rather than online interactions; described as part of framework benefits.
Transfer studies demonstrate generalization across domains, unseen counterparties, and trained-vs-trained tournaments.
Reported transfer experiments in which EmoDistill-trained policies were evaluated on different negotiation domains, with unseen counterparties, and in tournaments between trained agents; results reportedly show generalization. (Exact metrics and sample sizes not provided in the excerpt.)
Ablations show that emotion conditioning is essential.
Ablation experiments reported in the paper removing or altering emotion conditioning, which reportedly degrade performance relative to the full EmoDistill model. (No numeric results provided in the excerpt.)
Across four emotion-sensitive, high-stakes negotiation domains, SLM policies trained under the EmoDistill framework achieve the highest utility, outperforming vanilla SLM/LLM baselines and IQL-only emotion selection.
Empirical evaluation across four negotiation domains comparing EmoDistill-trained SLM policies to vanilla SLM/LLM baselines and an ablated IQL-only emotion selector. (Paper reports comparative utility results, but exact sample sizes and numeric effect sizes are not provided in the excerpt.)
EmoDistill decomposes emotional strategy into emotion selection and emotion expression: an Implicit Q-Learning (IQL) selector learns which emotion to express, while a Low-Rank Adaptation (LoRA)-based policy learns how to express it through Supervised Fine-Tuning (SFT) and Judge Policy Optimization (JPO).
Description of model architecture and training approach: IQL used as selector; LoRA-based policy trained with SFT and JPO for expression. (Design/implementation claim from methods section.)
We introduce EmoDistill, an offline framework for distilling emotional negotiation skills into language model agents.
Methodological contribution described in the paper: design and presentation of the EmoDistill framework (decomposition, training pipeline). This is a description of a proposed method rather than an empirical result.
The present paper states the primitive contract, the toll identity, the within-boundary no-arbitrage result, and the budget guarantee that the later empirical, mechanism-design, and dynamic-underwriting companion papers depend on.
Paper's stated scope and organization asserting that these formal primitives and theorems are provided as foundations for follow-on empirical and companion studies.
(iv) A conservative runtime gating theorem translates high-probability toll envelopes into an executed-action budget guarantee.
Mathematical theorem in the paper proving that given high-probability bounds (toll envelopes), one can derive a guarantee on executed-action budget consumption (runtime gating).
(iii) An irreversible-authority premium is characterized and splits into a strictly positive action-level component plus an if-and-only-if characterization of the set-level robust capital increase.
Formal decomposition/theorem in the paper proving existence of the irreversible-authority premium, showing the action-level component is strictly positive, and providing an iff condition for set-level robust capital increases.
(ii, corollary) Gaming-resistance of the system is tied to the design of the underwriting boundary (i.e., a corollary linking gaming-resistance to boundary design).
Corollary derived from the no-splitting theorem that links strategic gaming-resistance properties to specific features of the underwriting boundary.
(ii) A no-splitting property holds within an underwriting boundary that telescopes path-decomposed actions into a boundary potential.
Formal theorem in the paper proving a no-splitting property and showing how path-decomposed action contributions aggregate (telescoping) into a boundary potential.
(i) There exists a well-defined counterfactual toll under a chosen safe-default mapping and continuation policy.
Theoretical derivation / formal proof presented in the paper establishing existence of the toll under specified mappings and policies.
The framework treats per-action insurance as the primary unit of analysis and replaces post-hoc annual liability cover with a pre-action transaction layer.
Conceptual and design claim supported by the paper's theoretical argumentation and proposed contract primitives; no empirical validation reported.
We propose a foundational runtime actuarial layer for autonomous AI agents in which every side-effect-bearing action carries a time-consistent, counterfactual risk toll computed against a contractually fixed safe default, inside an explicit underwriting boundary.
Theoretical proposal and formal description of an actuarial framework presented in the paper (architectural/axiomatic exposition). No empirical sample or experiment reported.
Hybrid Fusion significantly accelerated the recovery of smaller Slow AI teams (+6.9% at N=4).
Reported intervention result: Hybrid Fusion produced a +6.9% acceleration in recovery for smaller Slow AI teams, reported at N=4.
Integrating these isolated veridical signals via Hybrid Fusion successfully rescued the Fast AI team (+7.6% at N=8).
Reported intervention result: application of Hybrid Fusion integration produced a +7.6% improvement in Fast AI team performance, reported at N=8.
The Riemannian Oracle adapted to task states by heavily restricting temporal windows (< 0.8s) to intercept fast reflexive compliance and widening windows (> 1.2s) to capture delayed cognitive conflict.
Reported algorithmic behavior of the 2D Adaptive Riemannian Oracle in response to measured spatial covariance: window sizes described as <0.8s for fast states and >1.2s for slow states.
In the Slow AI condition, behavioural teams (N=8) eventually recovered to 100.0%.
Reported team performance metric for behavioural teams in Slow AI condition with N=8; team performance reported to reach 100.0%.
Policy makers and education/training organizations should comprehensively consider AI and EPU to cope with market uncertainty and ensure the stability and sustainability of China’s ETM.
Policy recommendation derived from the paper's empirical findings on causality, quantile dependence, and asymmetric risk spillovers (argumentative/conclusion statement rather than a direct empirical result).
There is an interaction between AI and EPU: EPU promotes AI during periods of economic stability.
Cross-quantilogram analysis indicating quantile-specific causality/interactions, with EPU predicting AI in stable-period quantiles (method reported; sample size not stated).
There is an interaction between AI and EPU: AI promotes EPU in bullish markets.
Cross-quantilogram analysis showing quantile-dependent interaction (method reported; sample size not stated); specific result described for bullish-market quantiles.
The cross-quantilogram indicates quantile dependence among AI, EPU and ETM: the positive predictive effect of AI on ETM is mainly concentrated in bullish markets.
Cross-quantilogram analysis (quantile cross-dependence test) applied to AI and ETM time-series in China (method reported; sample size not stated).
The nonparametric quantile causality test shows a unidirectional causal relationship from AI to China’s education and training market (ETM).
Nonparametric quantile causality test applied to time-series data on AI and ETM in China (method reported; sample size not stated).
The nonparametric quantile causality test shows a unidirectional causal relationship from AI to EPU.
Nonparametric quantile causality test applied to time-series data on AI and Economic Policy Uncertainty (EPU) in China (method reported; sample size not stated in the provided text).
The proposed policy framework contributes to establishing a foundation for Vietnam to proactively embrace the Agent Economy safely and effectively.
Claim in abstract about the intended contribution/impact of the proposed framework; no empirical evaluation or measured outcomes presented.
The Agent Economy promises substantial gains in productivity and innovation.
Asserted in paper abstract as an anticipated outcome; no empirical measurement, sample size, or quantified effect provided.
We hope JobBench shifts the community's target labour-market effect from replacement to enhancement: building agents that do what humans actually want delegated, not only what is most economically valuable.
Authors' stated aim/goal for the benchmark (normative/aspirational statement in the paper).
Each task is packaged as a workspace of heterogeneous reference files, requiring the agent to reason through the cluttered information streams of real professional work.
Design description of task packaging in JobBench (benchmark construction/methodological detail).
We introduce JobBench, which evaluates AI agents on the workflows that experts identify as high-priority for delegation, empowering humans based on their needs instead of replacing them with GDP value.
Description of a new benchmark (JobBench) presented by the authors; methodological design claim about target tasks and intent (expert-identified workflows prioritized for delegation).
This study proposes a Workforce Resilience Governance Framework (WRGF) that includes task-level exposure assessment, human augmentation design, reskilling, redeployment, transparent communication, psychological safety, workforce impact accountability, and policy alignment.
Conceptual framework proposed by the authors in the paper (design/proposal; no empirical test described in the excerpt).
The paper concludes with policy recommendations for accelerating human-centred AI integration in public-sector HRM.
Stated conclusion and policy recommendations section in the paper; recommendations derived from empirical findings.
Access to modern digital tools positively moderates AI uptake.
Reported moderation/interaction effects in regression/path analysis indicating that access to modern digital tools is associated with higher AI adoption/uptake; exact effect size not specified in summary.
Holding a managerial position is the strongest predictor of active AI adoption (OR = 1.609).
Reported odds ratio from the binary logistic regression for role/position predictor (managerial status) predicting active AI adoption; OR = 1.609.
Internal HR factors exert a stronger influence on perceived HR effectiveness (β = 0.463) than external factors (β = 0.227).
Reported standardized (?) path/regression coefficients from OLS/path analysis linking internal and external HR quality indices to perceived HR effectiveness; coefficients β = 0.463 and β = 0.227 respectively.
Future evaluations should use artifact-level denominators, reproducible parsing rules, correction taxonomies, and independent coding of governance events.
Authors' recommendations based on methodological lessons from this structured self-observed implementation case study and observed parsing/governance challenges.