Evidence (13661 claims)
Adoption
8339 claims
Productivity
7479 claims
Governance
6715 claims
Human-AI Collaboration
6267 claims
Org Design
4098 claims
Innovation
3987 claims
Labor Markets
3488 claims
Skills & Training
2888 claims
Inequality
2016 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 740 | 192 | 95 | 871 | 1945 |
| Governance & Regulation | 796 | 388 | 185 | 119 | 1512 |
| Organizational Efficiency | 765 | 186 | 123 | 82 | 1166 |
| Technology Adoption Rate | 610 | 227 | 121 | 95 | 1061 |
| Research Productivity | 409 | 121 | 56 | 331 | 928 |
| Output Quality | 464 | 174 | 58 | 47 | 743 |
| Decision Quality | 318 | 173 | 75 | 42 | 615 |
| Firm Productivity | 432 | 55 | 88 | 20 | 601 |
| AI Safety & Ethics | 214 | 273 | 65 | 33 | 589 |
| Market Structure | 175 | 165 | 120 | 24 | 489 |
| Task Allocation | 206 | 64 | 70 | 31 | 376 |
| Skill Acquisition | 161 | 57 | 57 | 16 | 291 |
| Innovation Output | 201 | 27 | 41 | 18 | 288 |
| Fiscal & Macroeconomic | 130 | 69 | 43 | 26 | 275 |
| Employment Level | 104 | 50 | 105 | 13 | 274 |
| Consumer Welfare | 116 | 62 | 42 | 11 | 231 |
| Firm Revenue | 149 | 45 | 26 | 3 | 223 |
| Inequality Measures | 43 | 120 | 49 | 6 | 218 |
| Task Completion Time | 164 | 29 | 8 | 12 | 214 |
| Worker Satisfaction | 89 | 60 | 20 | 12 | 181 |
| Error Rate | 69 | 89 | 9 | 2 | 169 |
| Regulatory Compliance | 74 | 67 | 14 | 4 | 159 |
| Training Effectiveness | 91 | 19 | 13 | 19 | 144 |
| Wages & Compensation | 77 | 33 | 25 | 6 | 141 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Automation Exposure | 49 | 50 | 22 | 12 | 136 |
| Developer Productivity | 91 | 17 | 14 | 5 | 128 |
| Job Displacement | 12 | 80 | 19 | 1 | 112 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Creative Output | 31 | 16 | 7 | 2 | 57 |
| Skill Obsolescence | 5 | 43 | 6 | 1 | 55 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
The Process Performance Index (PI) is positively associated with abnormal earnings.
Empirical regression results using the panel dataset (3,515 firms; 20,076 firm-year observations) reporting a positive association between PI and abnormal earnings.
A Process Performance Index (PI) is constructed to measure AI-enabled operational capability across resource allocation efficiency, coordination effectiveness, and production performance dimensions.
Authors describe construction of PI using multi-dimensional indicators and the AHP–EWM weighting plus FCE aggregation procedure.
This study proposes a data-driven evaluation framework that integrates the Feltham–Ohlson enterprise value assessment with a multi-level performance evaluation framework (hybrid AHP–EWM weighting and Fuzzy Comprehensive Evaluation aggregation) to quantify the impact of AI on industrial process performance and enterprise value creation.
Methodological description in the paper: authors describe integrating Feltham–Ohlson valuation with AHP–EWM weighting and FCE aggregation to form a unified evaluation framework.
New tendencies in managerial AI research and practice include explainable AI, human–AI collaboration, knowledge management, enterprise analytics, and algorithmic management.
Descriptive finding from the paper's literature synthesis (topics emphasized in the review); no quantitative prevalence or counts provided in the abstract.
Machine Learning and Deep Learning enhance employee productivity, business intelligence, process mining, and data-driven decision-making by enabling prediction, perception, and adaptive learning solutions.
Claim synthesized in the review from multiple studies identified via PRISMA screening; abstract does not list the number or identity of underlying empirical studies.
AI-based technologies can greatly enhance managerial efficiency by automating repetitive activities, improving resource allocation, enabling intelligent scheduling, and supporting predictive modelling and strategic planning.
Summary conclusion from the paper's literature review (PRISMA methodology referenced); no quantitative meta-analytic effect sizes provided in abstract.
Machine Learning, Artificial Intelligence, and Deep Learning are tools that can optimize managerial decisions, enable intelligent automation, streamline workflows, and improve organizational performance.
Synthesis claim from the paper's PRISMA-based literature review (no numeric sample size reported in the abstract).
The paper ends with strategic suggestions to foster inclusive growth and orchestrate disruption, contributing evidence-based insights to the future of work in Africa.
Description of the paper's conclusions/recommendations drawn from its systematic review; represents the paper's stated contribution rather than an empirical claim about external data.
The technologies are capable of raising productivity.
Synthesis from the paper's systematic review indicating productivity gains associated with AI/automation in the literature; no quantified meta‑analytic estimate provided in the summary.
Future research should explore hybrid frameworks that combine LLM reasoning with quantitative optimization for cost-sensitive environments.
Recommendation in conclusion based on observed results (LLMs perform reasonably but lag optimized methods and transaction costs matter).
A transaction cost analysis revealed that low-turnover LLM strategies retain their competitiveness post-costs, surpassing cap-weighted benchmarks.
Post-transaction-cost analysis reported in results: LLM strategies with low turnover remained competitive after applying transaction cost assumptions and exceeded performance of cap-weighted benchmark.
LLM-generated portfolios outperformed naive diversification (Sharpe ratio up to 0.741).
Backtest results comparing LLM-generated portfolios against naive diversification; reported Sharpe ratio value (up to 0.741) for LLM strategies.
Findings underscore the importance of robust evaluation frameworks for deploying VLMs in visually rich and safety-critical environments.
Synthesis/recommendation based on experimental results showing that visual inputs (images and colors) can influence VLM decisions and that mitigation effectiveness varies by model.
Policy frameworks, reskilling initiatives, and institutional adaptations are required to ensure inclusive technological progress.
Prescriptive conclusion presented in abstract based on the review and synthesis; no empirical validation or sample sizes provided in abstract.
AI simultaneously generates demand for higher-order problem solving, emotional intelligence, and human-AI collaboration skills.
Explicit finding reported in abstract from the review of interdisciplinary literature; no quantified effect sizes or sample sizes provided in abstract.
The majority of AI’s effect on potential GDP in the period under review was due to increased labor productivity and the optimization of existing processes.
Attribution/decomposition within the scenario analysis of aggregated industry data indicating productivity and process-optimization channels as principal contributors.
Artificial intelligence has become a significant factor in the growth of Russia’s potential GDP.
Findings reported from the scenario analysis and aggregated industry data reviewed in the paper and syntheses of Russian analytical sources.
AI implementation during 2023–2025 was accompanied by a positive contribution to Russia’s potential GDP.
Analysis of aggregated industry data and a scenario approach using Russian-language sources (Ministry of Digital Development, HSE, Digital Economy ANO, analytical reviews).
For memory workloads requiring stable facts and stateful computation, architecture matters more than retrieval scale or model strength alone.
Conclusion drawn by the authors based on comparative experimental results reported in the paper (xmemory vs retrieval/model-strength baselines); excerpt provides aggregate benchmark comparisons but not full experimental details.
On the application-level task, xmemory reaches 95.2% accuracy, outperforming specialised memory systems, code-generated Markdown harnesses, and customer-facing frontier-model application harnesses.
Empirical evaluation on an application-level task reported in the paper showing 95.2% accuracy for xmemory and claiming it outperforms several classes of alternative systems; excerpt lacks details on the task, dataset size, or baseline numeric results.
On the end-to-end memory benchmark, xmemory reaches 97.10% F1, compared with 80.16%-87.24% across the third-party baselines.
Empirical evaluation on the paper's end-to-end memory benchmark reporting F1 scores for xmemory and a range for third-party baselines; the excerpt does not provide dataset size or statistical significance details.
On the structured extraction benchmark (judge-in-the-loop configuration) the system reaches 90.42% object-level accuracy and 62.67% output accuracy, above all tested frontier structured-output baselines.
Empirical evaluation on the paper's structured extraction benchmark in the judge-in-the-loop configuration; the excerpt reports the numeric accuracies and states they exceed tested frontier structured-output baselines. The excerpt does not specify dataset size or number of runs.
This iterative, schema-aware write-path design shifts interpretation from the read path to the write path: reads become constrained queries over verified records rather than repeated inference over retrieved prose.
Conceptual claim about how the proposed architecture affects system behavior; supported by the architectural description in the paper rather than explicit quantitative evidence in the excerpt.
We present an iterative, schema-aware write path that decomposes memory ingestion into object detection, field detection, and field-value extraction, with validation gates, local retries, and stateful prompt control.
Description of the proposed method/architecture in the paper (methodological contribution); no numeric evaluation attached to the description in the excerpt.
Reliable external AI memory must be schema-grounded (schemas define what must be remembered, what may be ignored, and which values must never be inferred).
Normative assertion supported by the paper's proposed design and subsequent experimental results (the paper introduces a schema-grounded approach and evaluates it against benchmarks), though the excerpt does not give full methodological details or sample sizes for this claim alone.
To manage AI legibility, creators perform four recurring forms of invisible authenticity labor: epistemic verification, linguistic naturalization, narrative restructuring, and performative embodiment.
Authors identify and name four recurrent practices from coding and analysis of 16 in-depth interviews with creators on Xiaohongshu and Douyin describing specific downstream repair and performance work.
Creators engage in 'AI passing': strategic efforts to conceal and humanize AI-assisted drafts so that outputs plausibly appear human-authored.
Concept introduced based on analysis of 16 in-depth interviews with creators on Xiaohongshu and Douyin describing tactics to hide AI involvement and present content as human-authored.
Latency relaxation expands feasible geography for placing inference workloads.
Result reported from the paper's modeling and stylized simulation (energy-latency frontier analysis showing marginal cost/carbon benefits from relaxing latency budgets).
The paper provides a transparent stylized simulation over representative global compute regions to show how heterogeneous latency tolerance separates workloads into local, regional, and energy-oriented execution layers.
Empirical/methodological evidence from a stylized simulation described in the paper; uses representative global compute regions and latency-tolerance heterogeneity to categorize workloads.
AI inference is becoming a persistent and geographically distributed source of electricity demand.
Statement/assertion in the paper's introduction framing the motivation; no empirical sample or experiment reported in the provided text.
Effective governance requires coordinated action across technical, organizational, and regulatory domains (e.g., system-level audits, vendor guidelines, continuous monitoring, documentation across dependency chains) to establish meaningful accountability in distributed development environments.
Policy and technical recommendations derived from literature review, regulatory analysis, and the paper's conceptual findings (recommendation, not empirically validated).
Claw-Eval-Live suggests that workflow-agent evaluation should be grounded twice, in fresh external demand and in verifiable agent action.
Conclusion/recommendation drawn from the benchmark design and experimental findings; conceptual claim advocating evaluation grounded in external demand signals and verifiable actions.
The release contains 105 tasks spanning controlled business services and local workspace repair, and evaluates 13 frontier models under a shared public pass rule.
Benchmark release statistics reported in the paper: explicit counts of tasks and evaluated models (105 tasks; 13 models).
For grading, Claw-Eval-Live records execution traces, audit logs, service state, and post-run workspace artifacts, using deterministic checks when evidence is sufficient and structured LLM judging only for semantic dimensions.
Grading methodology described in the paper: instrumentation and hybrid deterministic/LLM-judging approach documented by authors (procedural description).
Each release is constructed from public workflow-demand signals, with ClawHub Top-500 skills used in the current release, and materialized as controlled tasks with fixed fixtures, services, workspaces, and graders.
Description of release construction in the methods: uses public workflow-demand data and ClawHub Top-500 skills; tasks are materialized with controlled fixtures and graders (procedural detail from the paper).
We introduce Claw-Eval-Live, a live benchmark for workflow agents that separates a refreshable signal layer, updated across releases from public workflow-demand signals, from a reproducible, time-stamped release snapshot.
Methodological contribution described in the paper; design and architecture of the benchmark are presented by the authors (design description, no external sample needed).
LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces.
Framing/background statement in the paper describing expected capabilities of workflow agents; no empirical sample size reported for this expectation.
Substantive technological competencies play an important role in shaping network resilience and complement structure-based perspectives in understanding innovation networks.
Synthesis of empirical findings from composite metric identification and disruption simulations on the 282,778-patent-derived networks showing capability-based removals have stronger impacts than structure-only removals.
A composite technological capability metric can be constructed (from textual and network information) to identify core innovators beyond simple topological measures.
Construction and application of a composite metric combining text-derived technological value and network features on 282,778 patents; used to identify core innovators.
Latent Dirichlet Allocation (LDA) on the patent texts delineates fine-grained technological domains within the Chinese AI patent corpus.
Text-mining method applied to a corpus of 282,778 Chinese AI patents using LDA to extract topic/domains.
This study develops a multidimensional, knowledge-driven evaluation framework that integrates text mining with complex network analysis to identify core innovators.
Methodological description: framework built using Latent Dirichlet Allocation (LDA) on 282,778 Chinese AI patents, construction of a composite technological capability metric, and simulation of targeted disruptions across collaboration and knowledge networks.
Managing evolutionary dynamics in software is as urgent as AGI alignment for safeguarding society’s co-evolution with its machines.
Author's concluding normative claim in the abstract; argument based on scenario analysis rather than comparative empirical evidence.
Governance should shift focus from aligning goals to steering evolution; the paper proposes four guidance instruments: replication-rate thresholds (modeled on epidemiological R0), a public vulnerability registry for self-modifying code, tiered digital biosafety levels, and adaptive regulatory sandboxes.
Normative policy recommendation spelled out in the abstract; based on the paper's scenario analysis and argumentation rather than empirical validation.
Cloud platforms, open-source software supply chains, and crypto-economic incentives provide, at electronic speed, the three preconditions of evolution: replication, variation, and differential fitness.
Conceptual/mechanistic claim supported by theoretical argumentation and scenario-building in the paper (no empirical test or sample reported).
The proposed framework balances AI-driven productivity with the epistemic sovereignty necessary to manage increasingly opaque software ecosystems.
Normative/architectural claim about the proposed framework; presented conceptually in the paper without reported empirical testing in the excerpt.
To preserve long-term resilience, engineering leaders must move beyond prompt-based development to implement rigorous human-in-the-loop pedagogical standards.
Prescriptive recommendation based on the paper's conceptual analysis; no randomized trials or empirical validation of this intervention reported in the excerpt.
The findings offer practical insights for construction firms to enhance innovation performance through effective AI integration and help engineers better leverage AI tools in design and project management workflows.
Authors' stated practical implications based on their empirical findings (survey results linking AI capability, decision-making quality, and innovation performance).
Algorithmic transparency positively moderates the relationship between AI capability and decision-making quality.
Moderation analysis reported on questionnaire data (Credamo, time-lagged) with n=435; authors state a positive moderating effect of algorithmic transparency.
Decision-making quality mediates the relationship between AI capability and innovation performance.
Mediation analysis reported on the same survey dataset (time-lagged Credamo survey) with n=435 using established measurement scales; stated in results.
AI capability is positively associated with innovation performance.
Authors report statistical analysis of questionnaire data collected via the Credamo platform (time-lagged design) using established scales; sample size n=435; result stated in findings.