Evidence (13827 claims)
Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 195 | 97 | 889 | 1979 |
| Governance & Regulation | 815 | 391 | 188 | 121 | 1539 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 624 | 233 | 123 | 96 | 1084 |
| Research Productivity | 410 | 121 | 56 | 331 | 929 |
| Output Quality | 466 | 177 | 59 | 47 | 749 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 166 | 122 | 24 | 495 |
| Task Allocation | 206 | 64 | 70 | 31 | 376 |
| Skill Acquisition | 165 | 57 | 60 | 17 | 299 |
| Innovation Output | 201 | 27 | 41 | 18 | 288 |
| Employment Level | 105 | 51 | 107 | 13 | 278 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 149 | 46 | 26 | 3 | 224 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 61 | 20 | 12 | 182 |
| Error Rate | 69 | 91 | 10 | 2 | 172 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 92 | 19 | 13 | 19 | 145 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Skill Obsolescence | 5 | 45 | 6 | 1 | 57 |
| Creative Output | 31 | 16 | 7 | 2 | 57 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
AI adoption is accelerating.
Analysis of public institutional data on AI adoption using growth indicators (relative growth, CAGR, growth multipliers) within a conceptual-empirical quantitative diagnostic design (no causal econometric model).
The paper recommends staged, governance-aware implementation for responsible AI adoption in SMEs.
Policy and practice recommendation from the reviewer's synthesis and conclusions section.
This review extends the resource-based view to AI-enabled capabilities in SMEs.
Conceptual/theoretical contribution described in the paper based on synthesis of literature and interpretation of AI as a firm capability in SMEs.
AI enhances operational efficiency primarily in recruitment and performance analytics.
Synthesis across the 21 included studies in the review identifying recurring application domains (recruitment, performance analytics) and reported efficiency benefits.
Artificial intelligence (AI) is transforming human resource management (HRM) by automating tasks and enabling data-driven decisions.
Statement synthesized from the systematic literature review (PRISMA-based) of global studies on AI applications in HRM included in the paper; no single empirical estimate reported.
AI excels at structured, retrieval-grounded, and tool-mediated tasks.
Paper's synthesized conclusion from cross-stage analysis; appears to be based on qualitative benchmarking and review rather than a specific randomized trial in the excerpt.
Long-horizon agents can execute experiments, draft manuscripts, and simulate critique with minimal human input.
Qualitative claim based on the paper's end-to-end analysis of AI across the research lifecycle (review of developments through April 2026); no specific trials or sample sizes reported in the excerpt.
Fully automated systems can now generate research papers for as little as $15.
Statement in paper's introduction asserting observed market/practice examples and cost estimates; no specific empirical sample or experiment reported in the excerpt.
The results position value realization as the most informative predictive signal in the dataset and provide an interpretable basis for enterprise-level screening and managerial reflection rather than causal inference.
Interpretation based on model importance diagnostics (ai_iot_advantage_share as key predictor) and explicit statement in the paper emphasizing predictive/interpretive use over causal claims.
Nonlinear diagnostics indicate a threshold-like transition in predicted success around the mid-range of advantage attribution and a saturation pattern at higher values.
Partial dependence plots (PDP) and individual conditional expectation (ICE) analyses reported in the paper showing nonlinear relationships between ai_iot_advantage_share and predicted success.
Across model families, ai_iot_advantage_share emerges as the most stable predictor of reported AI/IoT success.
Feature-importance analyses (permutation importance) and cross-model comparison reported in the paper.
Random Forest achieves the strongest out-of-sample predictive performance and reduces absolute errors relative to Elastic Net for most test observations.
Empirical model comparison using out-of-sample evaluation on the survey dataset (n = 1250); error reduction and relative performance reported in results.
The paper compares a regularized linear baseline (Elastic Net) with nonlinear approaches (Decision Tree and Random Forest) under a consistent out-of-sample evaluation framework.
Methods section: model families listed and out-of-sample evaluation protocol described.
The study uses enterprise survey data from Slovakia and the Czech Republic (n = 1250).
Data description provided in the paper indicating source countries and total sample size.
This study develops and evaluates a firm-level predictive framework for the reported AI/IoT success rate, measured on a bounded 0–100 scale.
Methodological description in the paper: development of a predictive framework and definition of the dependent variable (reported AI/IoT success on 0–100).
Shifting the community's default mindset from optimizing models per task to sampling models from learned weight distributions will accelerate toward an era in which AI systems routinely improve or create other AI systems.
Normative/prognostic statement by the authors outlining the paper's intended impact and vision; not supported by empirical data in the abstract.
Adapter-scale and conditional generation are advancing rapidly.
Authors' assessment of the current research trajectory (statement in abstract); implies multiple recent papers showing progress at adapter and conditional scales but no specific quantification in the abstract.
The authors organize existing methods into a five-stage pipeline and survey applications where weight-space generative approaches are already practical.
Descriptive claim about the content and organization of this position paper (methodology and survey); evidence is the paper itself.
High-performing models occupy low-dimensional, highly structured regions of weight space shaped by symmetry, flatness, modularity, and shared subspaces.
Authors' theoretical/empirical contention synthesizing observations from recent work; presented as an explanatory claim in the paper's abstract rather than a specific experimental result.
Recent advances demonstrate that neural weights can be synthesized on demand, often matching fine-tuning performance while reducing adaptation cost by orders of magnitude.
Claim refers to recent empirical work in the literature showing weight-synthesis methods; no specific papers, sample sizes, or quantified studies are cited in the abstract.
Model checkpoints should be treated as a first-class data modality, and generative modeling in weight space should be standardized as a core machine learning primitive.
Normative argument made by the authors in the position paper (proposal/recommendation); not supported by an empirical study in the abstract.
Neural network checkpoints have quietly become a large-scale data resource: millions of trained weight vectors now exist, each encoding task-, domain-, and architecture-specific knowledge.
Statement in the paper's abstract describing the current state of checkpoints; references to public model zoos and industry practice are implied but not enumerated in the abstract.
Casting customer trajectory prediction as a maximum entropy RL problem balances reward maximization with stochasticity to better reflect customers with bounded rationality.
Methodological proposal and conceptual argument in the paper, supported by empirical comparisons that demonstrate more behaviorally realistic trajectories; direct empirical validation referenced but details not included in excerpt.
Reinforcement learning (maximum entropy RL) generated trajectories align more closely with customer behaviour than Travelling Salesman Problem (TSP) and Probabilistic Nearest Neighbours (PNN) heuristics.
Comparison of RL-generated trajectories to TSP and PNN using real-world trajectory data from a convenience store; alignment metrics reported in the paper (specific metrics and sample size not provided in the excerpt).
Deployment of GrowthGR delivered a non-trivial 0.3% gain in overall search GMV.
Reported result from the same production deployment / online A/B testing on Taobao (overall search GMV improvement claimed); no sample size or experimental details provided in the excerpt.
We successfully deployed GrowthGR on Taobao's production platform, achieving a substantial 5.3% lift in new item GMV.
Reported result from a production deployment / online A/B testing on Taobao (deployment and observed lift claimed in paper); no sample size or experimental details provided in the excerpt.
The Multi-Value-Aware Generative Retrieval (MultiGR) module, built on a semantic-ID-based generative retrieval architecture, leverages structured samples with search cascade signals and adopts a Multi-Value-Aware Policy Optimization (MoPO) training paradigm to align with multi-stage online values while explicitly balancing short-term transactional value and long-term growth potential estimated by ItemLTV.
Methodological description in the paper (design of MultiGR and MoPO); no empirical results cited in this sentence.
The Item Long-term Transaction Value Prediction (ItemLTV) module employs counterfactual inference to quantify the long-term value increment attributable to a single user interaction.
Methodological description in the paper (design of ItemLTV module); no experimental quantification provided in excerpt.
We propose a Multi-Value-Aware retrieval framework (GrowthGR) tailored for e-commerce search, designed to better align with the cascaded online values across different stages of the search system while balancing immediate conversion and long-term item growth.
Methodological contribution described in the paper (system/algorithm proposal); no empirical evaluation details in this sentence.
Root-cause-diagnosis accuracy rises from 75% to 100% when agents have causal grounding (Causely) in the active-fault scenario.
Reported diagnostic accuracy rates from benchmark experiments comparing runs without Causely (75%) and with Causely (100%) on the active-fault scenario.
Causal grounding lowers direct API cost per run by 57%.
Reported percent reduction in direct API cost per run from benchmark experiments with vs. without Causely.
Causely compresses the investigation footprint by 4.8× (in the active-fault scenario).
Reported multiplicative reduction in investigation footprint from benchmark experiments comparing runs with vs. without Causely.
On the active-fault scenario, causal grounding reduces mean tool-call count by 78%.
Benchmark experiment results comparing tool-call counts with vs. without Causely on the active-fault scenario (quantitative percentage reported in paper).
On the active-fault scenario, causal grounding reduces mean token consumption by 60%.
Benchmark experiment results comparing token consumption with vs. without Causely on the active-fault scenario (quantitative percentage reported in paper).
On the active-fault scenario, causal grounding reduces mean time-to-diagnosis by 63%.
Benchmark experiment results comparing agent runs with vs. without Causely on the active-fault scenario (quantitative percentage reported in paper).
Causely transforms raw telemetry into a live, queryable model providing the semantic and causal foundation AI agents require to diagnose, evaluate impact, and act safely in production.
System design / implementation claim in the paper (supported by downstream benchmark evaluation described elsewhere in the paper).
Causely is a causal intelligence layer that maintains a structured representation of environment topology, attribute dependencies, and causal relationships anchored to an ontological representation of the managed environment.
System design / implementation claim in the paper describing the Causely architecture.
Major deployed generative AI advertising systems preserve a visible boundary between commercial content and AI-generated responses.
Descriptive claim based on review of major deployed systems and design patterns; stated by authors as observed industry practice (no specific sample size or experiments reported in text).
Managerially, firms should pair GenAI access with short AIC micro-training and simple standard operating procedures (SOPs) to capture value consistently and avoid uneven adoption outcomes.
Authors' managerial recommendation drawn from experimental findings that AIC predicts gains and that scaffolding reduces variance; recommendation is an interpretation/synthesis rather than a directly tested organizational field intervention.
A scaffolding intervention (conceptual maps) reduced outcome variance, indicating that standardized workflows can mitigate inequality in AI-mediated performance.
Experimental inclusion of a scaffolding intervention (conceptual maps) and reported reduction in variance of outcomes among participants receiving scaffolding in conjunction with GenAI access.
Improvements were not predicted by GPA or prior knowledge, but were predicted by AI Interaction Competence (AIC) — the ability to elicit, filter, and verify model outputs.
Regression/subgroup analyses reported in the experiment linking improvements in task performance to measured predictors (GPA, prior knowledge, AIC); authors report null association for GPA/prior knowledge and positive association for AIC.
On average, GenAI access significantly increased task performance.
Reported randomized controlled experiment comparing task performance between LLM-assisted group and traditional-resources group; authors state the average increase was statistically significant.
Macro-level policy shocks activate market discipline in emerging market debt markets (illuminated by the observed penalty on AI washing firms).
Interpretive conclusion based on the observed post-FYP increases in debt financing costs for AI washing firms and associated analyses (inference from empirical results; not a direct test reported in the abstract).
Supply chain concentration and bank proximity attenuate the debt-cost penalty for AI washing firms.
Heterogeneity/interaction analyses indicating smaller post-shock financing-cost increases for AI washing firms with concentrated supply chains and closer bank relationships (moderator evidence; no sample sizes in abstract).
External validation shows this decoupling reflects strategic deception (AI washing) — evidenced by subsidy extraction and future regulatory violations — rather than benign ambition, supporting its validity as an AI washing proxy.
External validation analyses linking the residual decoupling to observed subsidy extraction and to higher incidence of future regulatory violations (validation methods described; sample size not provided in abstract).
The policy architecture required to escape the trap (targeting trust, sequencing, and team-level adoption) is characterised.
Model-derived policy prescriptions identifying interventions (trust-building, sequencing, team-level targeting) necessary to shift equilibria toward genuine adoption; theoretical argumentation. No empirical trial or sample.
Conditions are derived under which sustained but imperfect adoption pressure is welfare-improving.
Analytical derivation within the model framework characterising parameter regions where persistent imperfect adoption increases welfare (model-defined welfare metric). Theoretical analysis; no empirical sample.
A cost ratchet dynamic implies that failed adoption attempts permanently lower barriers even when embedding fails.
Model component introducing a cost-ratcheting mechanism; analytical/simulation results showing permanent barrier reductions following failed attempts. Theoretical model; no empirical sample.
Genflow establishes a robust framework for scalable, enterprise-grade generative systems.
Concluding claim in paper based on the proposed architecture and reported yield improvement; no broader deployment studies, scalability benchmarks, or enterprise trials detailed in this statement.
By transitioning to a multi-stage, self-correcting pipeline, Genflow improved the yield of brand-compliant video generations from 42% to 89%.
Reported empirical result comparing yield of brand-compliant video generations before and after applying Genflow; no sample size, dataset description, statistical significance, or experimental protocol provided in the text.