Evidence (13827 claims)
Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 195 | 97 | 889 | 1979 |
| Governance & Regulation | 815 | 391 | 188 | 121 | 1539 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 624 | 233 | 123 | 96 | 1084 |
| Research Productivity | 410 | 121 | 56 | 331 | 929 |
| Output Quality | 466 | 177 | 59 | 47 | 749 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 166 | 122 | 24 | 495 |
| Task Allocation | 206 | 64 | 70 | 31 | 376 |
| Skill Acquisition | 165 | 57 | 60 | 17 | 299 |
| Innovation Output | 201 | 27 | 41 | 18 | 288 |
| Employment Level | 105 | 51 | 107 | 13 | 278 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 149 | 46 | 26 | 3 | 224 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 61 | 20 | 12 | 182 |
| Error Rate | 69 | 91 | 10 | 2 | 172 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 92 | 19 | 13 | 19 | 145 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Skill Obsolescence | 5 | 45 | 6 | 1 | 57 |
| Creative Output | 31 | 16 | 7 | 2 | 57 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
We evaluate collaborative performance from consensus-based routing among self-interested heterogeneous agents in AgentSociety on real-world datasets.
Empirical evaluation / experiments using real-world datasets to measure collaborative performance under consensus-based routing among heterogeneous agents.
We characterize the Nash equilibrium showing that agent payoffs are reflective of their marginal contributions.
Analytical game-theoretic characterization/proof of Nash equilibrium in the paper.
The mechanism incentivizes agents to selectively disclose information to their neighbor agents when doing so aligns with their self-interest, in order to garner influence.
Theoretical analysis and mechanism design arguments (and possibly supporting simulations) within the paper.
Delegation to more competent neighbor agents is incentive compatible and naturally generates multi-agent routing path by consensus.
Formal theoretical proof/analysis presented in the paper (analytical/theoretical result).
We propose AgentSociety, a mechanism that enables decentralized agentic collaboration grounded in liquid democracy and information diffusion from social choice theory.
Description and design of the AgentSociety mechanism in the paper (mechanism proposal / system design).
AI assistance can stabilize an overloaded workflow only when (i) the fraction of tasks handled by AI exceeds a critical threshold, and (ii) the human attention required for review and expected rework is lower than the attention required for manual completion.
Formal analytical conditions derived from the paper's queueing model (model-based theoretical result; no empirical sample reported).
LLM-assisted systems make candidate generation, code comprehension, harness construction, proof-of-impact drafting, and report preparation cheaper at codebase scale.
Argument supported by analysis using public data from Anthropic's Mythos Preview and Mozilla Firefox collaborations (qualitative and illustrative examples; no sample size reported in the provided text).
The paper calls for action by stakeholders to consider human and environmental moderators when adopting AI.
Policy/recommendation statement in the paper's conclusion/abstract; normative recommendation rather than empirical finding.
We revise the existing framework to redefine effective organizational determinants and shed light on practical implications including industry and education.
Authors' proposed theoretical revision of an existing framework and discussion of implications; presented as a conceptual contribution within the paper.
Most practitioners assume that AI brings productivity boosts owing to enhanced technical capabilities.
Statement of common practitioner belief reported by the authors in the paper's framing; no supporting survey or sample reported in the abstract.
Adoption of Claude Code increases cumulative lifetime languages used by +0.51.
Panel analysis of 5,838 developers over 28 months using the Callaway & Sant'Anna estimator; treatment = first Claude-co-authored commit.
Adoption of Claude Code increases the count of newly-used languages by +0.31.
Same dataset and staggered-rollout estimator (Callaway & Sant'Anna), treatment = first Claude-co-authored commit; not-yet-treated controls.
Adoption of Claude Code increases Shannon language entropy by +0.14.
Estimated with the doubly robust Callaway & Sant'Anna approach on the 5,838-developer panel over 28 months, using first Claude-co-authored commit as treatment.
Adoption of Claude Code increases the number of distinct programming languages used by a developer by +0.83.
Same panel and staggered-rollout estimation as above (Callaway & Sant'Anna), treatment = first Claude-co-authored commit.
Adoption of Claude Code increases the number of repositories a developer contributes to by +1.5 (monthly).
Same panel (5,838 developers, 28 months) and estimator (Callaway & Sant'Anna). Treatment = first Claude-co-authored commit; not-yet-treated controls.
Adoption of Claude Code is associated with an increase of +41 monthly commits per developer.
Analysis of a panel of 5,838 GitHub developers observed monthly over 28 months, exploiting staggered rollout of Claude Code (May 2025–Jan 2026). Treatment defined by developer's first Claude-co-authored commit; not-yet-treated developers used as controls. Estimates from the doubly robust Callaway and Sant'Anna (2021) staggered-difference-in-differences estimator.
Case studies demonstrate exact power-water consistency between virtual attributions and physical generation-side withdrawals.
Simulation results on IEEE 30-bus and 118-bus test systems reported in the paper claiming exact consistency (two test systems used).
Case studies on the IEEE 30-bus and 118-bus test systems demonstrate reliable convergence of the method.
Simulation experiments reported in the paper using two standard test systems (IEEE 30-bus and IEEE 118-bus). Sample size: 2 test systems.
Combined with fixed-point coordination, the framework enforces consistency between virtual water attribution and physical generation-side withdrawals.
Methodological claim about algorithmic properties (fixed-point coordination used to align attributions with physical withdrawals); supported by theoretical description and later case-study demonstrations.
The framework represents dispatch optimization as a differentiable optimization layer embedded within a deep learning architecture, enabling efficient end-to-end learning of coordination policies while preserving operational feasibility.
Methodological description claiming an implementation approach (differentiable optimization layer within deep learning); evidence likely from algorithmic implementation and simulation experiments described later in the paper.
This paper develops an operational electricity-computation-water (ECW) nexus framework that internalizes virtual water impacts directly into power system dispatch.
Primary methodological contribution described in the paper (development and formulation of an ECW framework; implementation details implied but not quantified in the excerpt).
The expansion of data centers (DCs) drives a sustained increase in electricity demand and associated water withdrawals at generation sites.
Background assertion in paper introduction; general empirical observation motivating the work (no specific dataset or sample size reported in the excerpt).
The contribution is a benchmark-ready evaluation framework for runtime actuarial control of autonomous-agent side effects.
Paper presents the AAI, Authority Frontier, metrics (C_full, Capital@k), taxonomy, implementations and experimental traces; authors present it as benchmark-ready.
We report a live Postgres panel in which three Azure-hosted models propose actions through the same contract.
Live-panel experiment described in the paper using three Azure-hosted models interacting with a Postgres panel under the AAI contract.
We instantiate AAI across four agentic environments (database mutation, customer-service refund, and the public tau-bench retail and airline tool-use traces).
Empirical instantiation described in the paper across four named environments/traces.
The framework provides (i) a deterministic quote-bind-commit protocol with toll-bounded capability tokens; (ii) a universal seven-class action taxonomy mapping heterogeneous tool calls to comparable authority units; (iii) replay determinism and pathwise reserve coverage under alpha-spending; (iv) cross-domain normalization via full reserve demand C_full and capital metrics Capital@k.
System design and theoretical specification in the paper; described as implemented across experiments.
We develop the Authority Frontier, an evaluation primitive measuring how much autonomous authority the runtime releases at each level of reserve capital.
Methodological contribution (definition and formulation of the Authority Frontier) described in the paper; subsequently instantiated empirically in experiments.
We propose the Actuarial Action Interface (AAI), a deterministic runtime contract that prices each such action against a contractually fixed safe default under a time-consistent risk mapping, and gates execution against a per-boundary reserve capital budget.
Methodological design and proposal described in the paper (no empirical test reported for the claim itself).
A profile-driven approach places humans and AI systems on shared scales, supporting comparisons that are predictive of novel-task performance, explanatory of why agents succeed or fail, and auditable.
Claim about anticipated benefits of the proposed profile-driven approach presented in the paper (theoretical argument; no empirical results reported).
Suitability evaluations for task-assignment should be profile-driven — based on assessments that infer latent constructs such as capabilities and propensities from observed performance.
Core proposal of the position paper (conceptual/methodological recommendation; no empirical pilot or validation reported).
As AI is integrated into the workplace, organisations increasingly face allocation decisions between human and machine workers, and these decisions are increasingly made or assisted by algorithms.
Position paper / conceptual argument in the paper's introduction (no empirical sample or quantitative data reported).
The paper proposes a policy architecture for 'shared gains' centered on learning equity, transition protections, accountable algorithmic management, and distribution-sensitive metrics beyond GDP.
Paper's normative policy proposal presented in abstract, based on the integrative framework and synthesis of secondary sources; no empirical sample size reported.
India's macro growth remains robust.
Statement in abstract referencing official Indian statistics (MoSPI–NSO GDP estimates, 2025); no numerical sample size provided in abstract.
Evidence indicates accelerating AI adoption among firms in advanced economies.
Abstract cites validated secondary sources including OECD (2026) and other global reports; no primary sample size reported in paper abstract.
AI is increasingly embedded in production, services, and workforce management.
Statement in paper's abstract supported by integrative socio-technical political economy framework and validated secondary sources (OECD, ILO, UNDP, WTO, WEF). No primary sample size reported.
Future A2A collaboration networks cannot rely on unverified self-reporting alone; scalable collaboration requires mechanisms that balance open participation with verifiable execution and trustworthy evaluation.
Paper's concluding recommendation based on the empirical problems documented (low reuse, ranking manipulation, vacuous validations).
EvoMap's credit economy rewards agents for publishing valuable assets, encouraging participation at scale.
Description and analysis of the platform's reward mechanism and observed high participation (agent counts); empirical linkage between reward rules and publishing behavior discussed in the paper.
Structured AI-based interventions provide causal evidence that they can transform access to scientific feedback from a largely private advantage into a more widely distributed resource.
Causal inference based on randomized field experiment showing increased revision likelihood and broader uptake of LLM tools across diverse regions and author groups.
Effects were strongest among teams with lower h-indexes and earlier career stages.
Heterogeneous treatment effects by team-level metrics (h-index) and career stage reported in the randomized experiment.
Effects were strongest for manuscripts less embedded in the scholarly literature.
Heterogeneous treatment effects reported by manuscript-level embedding in literature (e.g., referencing/citation context) within the randomized experiment.
Effects of AI feedback were strongest among authors from non-English-dominant research regions.
Heterogeneous treatment effects reported in the randomized experiment stratified by authors' geographic / language-dominance region; sample includes authors from 133 geographic regions.
Exposure to AI feedback increased authors' subsequent use of LLM tools in their future papers, suggesting longer-run shifts in scientific practice.
Follow-up measurements in the randomized field experiment tracking authors' later behavior (use of LLM tools in subsequent papers); comparison between treatment and control authors.
Authors who received LLM-generated feedback had a significantly higher likelihood of revising their manuscripts, corresponding to a 12.55% relative increase over the baseline revision rate.
Randomized field experiment comparing treatment (LLM feedback) vs control; sample described as >31,000 arXiv preprints and >45,000 researchers; reported comparative revision rate and statistical significance.
A difference-in-differences design centered on ChatGPT's release supports a causal interpretation of GenAI's local labor-market effects.
Quasi-experimental difference-in-differences analysis using ChatGPT's release as an event/shock, comparing outcomes across neighborhoods with different pre-existing GenAI exposure measures derived from 5 million job postings.
A human-centered approach is needed that integrates technological advancement with reskilling initiatives, labor protections, and inclusive policies.
Authors' prescriptive/recommendation based on their thematic synthesis of the reviewed literature (2010–2024).
The integration of AI into manufacturing offers substantial gains in efficiency, productivity, and operational performance.
Authors' systematic literature review of interdisciplinary studies (2010–2024) using thematic synthesis; synthesis of prior empirical and conceptual studies reporting efficiency/productivity effects of AI in manufacturing.
A-insensitivity increases with financial literacy, suggesting financially literate decision-makers perceive greater ambiguity in prediction accuracy.
Association reported in the incentivized laboratory experiment between participants' measured financial literacy and their measured a-insensitivity (correlational evidence; sample size not reported in abstract).
Decision-makers hold more optimistic beliefs about the accuracy of ML analysts than about human analysts, and this greater optimism predicts higher trust in ML analysts relative to human analysts.
Incentivized laboratory experiment measuring participants' optimism about forecast accuracy for human vs. ML analysts and examining the relationship between those beliefs and expressed trust (correlational/regression evidence; sample size not reported in abstract).
A human-centred approach underpinned by ongoing reskilling and ethical governance is vital for sustainable workforce evolution in the Indian IT sector.
Authors' policy/recommendation derived from their literature synthesis and thematic analysis (qualitative conclusion).
The paper introduces a conceptual framework for hybrid intelligence within the Indian IT sector.
Authors present a new conceptual framework as part of this qualitative research article (conceptual contribution).