Evidence (8570 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Adoption
Remove filter
Providers hide the model, the tokenizer, and the execution to protect their IP, mitigate jailbreaks, and preserve user privacy, which means an auditor can only inspect proofs the provider supplies.
Conceptual/architectural claim about current commercial provider practices and their implications for auditability (argumentation in paper).
Limited data, resource constraints and skill gaps significantly influence the pace and form of AI adoption in SMEs.
Synthesis of barriers identified across multiple studies in the 2016-2024 literature (review-level claim without a single quantitative estimate).
Ethical concerns—especially algorithmic bias—and the need for human oversight remain essential for ensuring positive financial outcomes.
Argument and synthesis from the reviewed literature highlighting ethical risks and recommended governance (conceptual and empirical discussions across studies).
SMEs face barriers to AI adoption such as limited data, skill shortages, and high implementation costs.
Review synthesis of barriers reported in multiple studies from 2016-2024 (no pooled quantitative prevalence reported).
Existing generative AI models do not directly optimize marketplace performance.
Stated as an observed limitation / motivation for the proposed method in the paper (conceptual claim; not an empirical test reported in the excerpt).
Existing approaches either require a trusted central coordinator (cloud marketplaces), demand heavy blockchain infrastructure (Golem, BrokerChain), or lack an incentive layer entirely (BOINC, Petals).
Comparative characterization based on named existing platforms; presented as conceptual/qualitative analysis without empirical evaluation or quantified benchmarks.
Vast quantities of compute (GPU cycles on personal workstations, idle inference servers, and edge devices between jobs) go unused because no incentive-aligned protocol exists for their owners to share them safely and profitably.
Asserted in the paper's problem statement; no empirical data, sample, or measurement reported — presented as observed motivation.
Both humans and AI contribute wrong answers.
Reported error contributions from both human participants and AI agents in the experimental task.
Humans over-rely on AI when AI misleads them, occurring in 1.7% of opportunities.
Aggregate analysis of adoption decisions in the experiment (reported percentage of over-reliance on misleading AI suggestions).
Humans under-rely on correct AI suggestions, missing 3.9% of opportunities.
Aggregate analysis of adoption decisions in the experiment (reported percentage of missed opportunities to rely on correct AI suggestions).
Organizations increasingly deploy separate purpose-built AI tools across professional domains, often hiring domain specialists for each, recreating the staffing models AI was expected to transform.
Stated as an observational/introductory claim in the paper (no empirical data or sample size reported to support the general trend).
LLMs heavily rely on simulations for designing algorithms, which is notorious for breaking when transferred to real hardware.
Paper's claim grounded in known transferability issues between simulation and hardware; no experimental quantification provided in the abstract.
LLM pitfalls worsen on Radio Access Network (RAN) use cases: they hallucinate Application Programming Interfaces (APIs) and mis-read specifications, which kills interoperability of RAN components at the first mistake.
Author assertion / observed behavior reported in the paper (qualitative examples implied); no formal experiment or sample size provided in the abstract.
Cellular research and development (R&D) is throttled by six structural processes that each consume months of manual engineering work per iteration: (i) synthesizing new features from standards or research papers into production code; (ii) conformance and interoperability testing; (iii) hardening against field anomalies and diverse deployment environments; (iv) data-driven optimization of network functionalities; (v) discovering and prototyping novel waveforms, functionalities, and capabilities for future standards; and (vi) securing the stack against vulnerabilities.
Author assertion in the paper (qualitative analysis / domain expertise). No empirical sample size or quantitative study reported in the abstract.
Existing approaches remain fragmented across formal verification, runtime assurance, neuro-symbolic reasoning and trustworthy Artificial Intelligence (AI) research communities.
Author claim about the state of the research landscape; asserted fragmentation without bibliometric or survey data provided in excerpt.
Current reasoning systems still suffer from hidden logical inconsistencies, hallucinated symbolic transitions, unsupported theorem applications, and limited reliability guarantees.
Author assertion identifying failure modes of current reasoning systems; presented qualitatively without quantitative error rates or experimental sample sizes in the excerpt.
Stochastic Tax can remain positive even when Agentic Technical Debt is minimized.
Theoretical claim in the paper's model and discussion: even with minimized debt (stock), the model predicts a nonzero recurring operating burden from stochastic agents; illustrated via examples and an accounts-payable simulation.
Stochastic Tax is a recurring flow of operating burden that arises when stochastic agents are used in business workflows.
Definition provided in the paper as part of the conceptual framework describing Stochastic Tax as a flow (recurring operating burden) associated with stochastic agents in workflows.
There is an urgent question of how humans can effectively supervise and control an economy operated by AI agents when this system may expand beyond the capacity of traditional governance.
Framed as a central research/policy concern in the paper's abstract; conceptual argument rather than empirical finding.
The Agent Economy raises new regulatory challenges concerning data privacy, security, ethics, and the risk of job displacement.
Stated in paper abstract as identified risks; based on literature synthesis and comparative policy analysis approach (method described), but no empirical incidence metrics reported.
Organizations implementing AI without responsible transition mechanisms may worsen workforce anxiety, skill obsolescence, inequality, and trust erosion.
Paper's theoretical/conceptual assertion about risks of poorly-managed AI adoption; no empirical validation reported in the excerpt.
The International Monetary Fund estimates that nearly 40% of global employment is susceptible to AI, with exposure rising to 60% in advanced economies owing to cognitive task-oriented jobs.
Cited IMF estimate reported in the paper (reference to an IMF analysis; no sample size given in the excerpt).
Tenure negatively relates to AI use (OR = 0.846 per category).
Reported odds ratio from logistic regression for tenure categories predicting AI use; OR = 0.846 per tenure category.
The requirement that review + expected rework attention be lower than manual completion attention is substantially more stringent than the requirement that AI merely generate faster drafts.
Comparative analytical argument based on the model's derived stability conditions (theoretical/model-based reasoning; no empirical sample reported).
Under congestion, reviewers rationally raise the risk threshold for checking AI outputs, reducing scrutiny precisely when it would matter the most.
Analytical implication derived from the queueing model presented in the paper (theoretical/model-based inference; no empirical validation reported).
Mean-based metrics (e.g., tasks completed per worker-hour or mean handle time) can misrepresent AI's effects in workflows where tasks accumulate and compete for scarce human attention.
Argument and analysis presented in the paper; theoretical reasoning and illustrative queueing model (no empirical sample reported).
Regardless of apparent performance advances in AI technology, human and environmental factors of the organization may substantially attenuate — or even negate — the effective productivity benefits.
Conceptual argument in the paper; theoretical reasoning and literature synthesis (no primary empirical data reported in the abstract).
Adopting AI in organizational practice does not guarantee productivity gains, because human and environmental factors critically moderate the relationship between AI deployment and realized productivity improvements.
Position paper's conceptual argument presented in the abstract; no empirical sample or quantitative study reported.
AI evaluation methods (benchmarks, red teaming, leaderboards) cannot be easily applied to human workers or yield comparable metrics.
Conceptual critique in the paper contrasting standard AI evaluation methods with human evaluation (no empirical comparisons provided).
Common criteria used to assess people (e.g., education, experience, references) cannot feasibly scale to AI systems.
Argumentative claim in the paper contrasting human hiring/evaluation practices with AI system assessment (conceptual; no empirical validation provided).
Human and machine workers may 'compete' for a given task, reproducing aspects of adversarial games.
Theoretical/assertional claim in the paper (conceptual discussion; no empirical data provided).
The increased use of algorithms in allocation decisions creates a Reverse Turing Test dynamic wherein the machine is now the judge.
Conceptual framing and argument presented in the paper (theoretical description; no empirical test reported).
AI-driven efficiency pressures in IT services may compress billable work and alter hiring and wage structures, raising transition risks even for technical workers.
Abstract cites high-reliability sector evidence (Reuters 2026a; Nasscom) to support this sector-specific claim; no sample size provided in abstract.
Labor-market segmentation and digital capability gaps in India create distributional vulnerabilities.
Abstract cites Indian official statistics and household/labor surveys (PLFS, HCES, MoSPI–NSO) and integrates sector evidence; no specific sample size reported in abstract.
Refined exposure measures imply widespread task transformation rather than uniform job destruction, with accelerated skill change as a central risk for vulnerable workers.
Abstract cites labor-market analyses and ILO (2025) as the basis for refined exposure measures and conclusions; no sample size stated in abstract.
Global frameworks warn that uneven readiness may produce a 'Next Great Divergence' between countries.
Cited global reports in abstract (UNDP 2025, WTO 2025, OECD 2026) which are summarized as issuing this warning; no primary data sample size reported in paper abstract.
Persistent adoption gaps among groups suggest unequal access to AI-enabled productivity.
Abstract references global reports (OECD, WEF, UNDP, WTO) and sector evidence indicating adoption gaps; no numerical sample size given.
AI may widen capability inequality—inequalities in access to knowledge, digital infrastructure, computational resources, and organizational adoption—thereby shaping income opportunities and socio-economic security for low-income groups.
Argument presented using the paper's socio-technical political economy framework and validated secondary sources (OECD, ILO, UNDP, WTO, WEF) and official Indian statistics; no direct empirical sample from this paper reported.
Design choices that prioritize scalable growth introduce trade-offs in reusability, evolution, and auditability in A2A collaboration networks.
Synthesis of empirical findings (low reuse, manipulable rankings, unverified validations) connecting design incentives to negative side-effects.
EvoMap relies on agents to provide local execution logs as evidence that uploaded assets function correctly; because these validations are not independently verified, over 84% of approved assets bypass quality checks using vacuous tests (e.g., console.log).
Empirical audit of validation logs and acceptance tests reported in the paper showing >84% of approved assets used trivial/vacuous checks.
Agents can trivially manipulate their asset's scores by falsifying self-reported metadata.
Demonstrations/analyses in the paper that changing metadata values leads to predictable changes in GDI scores; examples like claimed lines-of-code manipulation are provided.
An asset's GDI rank is heavily dictated by unverified, self-reported metadata (e.g., claimed lines of code modified).
Correlation/causal analysis in the paper showing strong dependence of GDI scores on self-reported metadata fields rather than objective performance measures.
EvoMap employs an algorithm (GDI) to score and rank shared assets, and this scoring system is flawed.
Paper description of the GDI ranking algorithm and empirical analyses illustrating problems with how it operates.
Rewards become highly concentrated among a small fraction of agents.
Distributional analysis of credits/rewards across agents (inequality/concentration observed in reward allocation).
98% of assets are never reused.
Empirical reuse metric computed across the asset corpus reported in the paper.
Because rewards favor publication over adoption, agents mass-produce assets to accumulate credits.
Observed publishing behavior (large numbers of assets per agent) and the platform's incentive structure; paper links publication-focused rewards to high per-agent asset counts.
Rewards are tied primarily to publication rather than adoption.
Analysis of reward allocation rules and empirical patterns showing reward issuance linked to publication events more than measured reuse/adoption.
These findings challenge the prevailing theory of skill-biased technological change.
Empirical observation that high-skill, high-exposure neighborhoods experienced wage stagnation post-2023 despite continued inflows of high-skilled workers, interpreted in contrast to predictions of skill-biased technological change.
Since 2023, high-exposure neighborhoods have experienced wage stagnation even as they continue to attract high-skilled workers (a 'high-skill trap').
Temporal analysis of job-posting wage signals in Beijing neighborhoods (2018--2024) using the GenAI Exposure Index to compare wage trajectories before and after 2023 between high- and low-exposure neighborhoods.
GenAI exposure is highly concentrated in the city's core districts, deepening the intra-urban AI divide.
Spatial analysis of a neighborhood-level GenAI Exposure Index constructed from 5 million Beijing job postings (2018--2024), where task-level assessments were aggregated across five leading large language models to measure exposure by neighborhood.