Evidence (13827 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	195	97	889	1979
Governance & Regulation	815	391	188	121	1539
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	624	233	123	96	1084
Research Productivity	410	121	56	331	929
Output Quality	466	177	59	47	749
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	166	122	24	495
Task Allocation	206	64	70	31	376
Skill Acquisition	165	57	60	17	299
Innovation Output	201	27	41	18	288
Employment Level	105	51	107	13	278
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	149	46	26	3	224
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	61	20	12	182
Error Rate	69	91	10	2	172
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	92	19	13	19	145
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Skill Obsolescence	5	45	6	1	57
Creative Output	31	16	7	2	57
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

We evaluate collaborative performance from consensus-based routing among self-interested heterogeneous agents in AgentSociety on real-world datasets.

Empirical evaluation / experiments using real-world datasets to measure collaborative performance under consensus-based routing among heterogeneous agents.

high positive AgentSociety: Incentivizing Agentic Social Intelligence collaborative performance from consensus-based routing

We characterize the Nash equilibrium showing that agent payoffs are reflective of their marginal contributions.

Analytical game-theoretic characterization/proof of Nash equilibrium in the paper.

high positive AgentSociety: Incentivizing Agentic Social Intelligence agent payoffs relative to marginal contributions

The mechanism incentivizes agents to selectively disclose information to their neighbor agents when doing so aligns with their self-interest, in order to garner influence.

Theoretical analysis and mechanism design arguments (and possibly supporting simulations) within the paper.

high positive AgentSociety: Incentivizing Agentic Social Intelligence information disclosure behavior and influence acquisition among agents

Delegation to more competent neighbor agents is incentive compatible and naturally generates multi-agent routing path by consensus.

Formal theoretical proof/analysis presented in the paper (analytical/theoretical result).

high positive AgentSociety: Incentivizing Agentic Social Intelligence delegation behavior and emergence of routing paths (multi-agent routing by conse...

We propose AgentSociety, a mechanism that enables decentralized agentic collaboration grounded in liquid democracy and information diffusion from social choice theory.

Description and design of the AgentSociety mechanism in the paper (mechanism proposal / system design).

high positive AgentSociety: Incentivizing Agentic Social Intelligence ability of agents to operate autonomously, strategically communicate, behave col...

AI assistance can stabilize an overloaded workflow only when (i) the fraction of tasks handled by AI exceeds a critical threshold, and (ii) the human attention required for review and expected rework is lower than the attention required for manual completion.

Formal analytical conditions derived from the paper's queueing model (model-based theoretical result; no empirical sample reported).

high positive Queue & AI: When Faster Tasks Slow Down the Workflow organizational_efficiency

LLM-assisted systems make candidate generation, code comprehension, harness construction, proof-of-impact drafting, and report preparation cheaper at codebase scale.

Argument supported by analysis using public data from Anthropic's Mythos Preview and Mozilla Firefox collaborations (qualitative and illustrative examples; no sample size reported in the provided text).

high positive Demystifying the Mythos or Disrupting Bugonomics? From Zero-... cost/effort to produce candidate vulnerabilities (generation, comprehension, har...

The paper calls for action by stakeholders to consider human and environmental moderators when adopting AI.

Policy/recommendation statement in the paper's conclusion/abstract; normative recommendation rather than empirical finding.

high positive Position: Adopting AI in Practice Does Not Guarantee the Pro... stakeholder policies and actions regarding AI adoption and moderation

We revise the existing framework to redefine effective organizational determinants and shed light on practical implications including industry and education.

Authors' proposed theoretical revision of an existing framework and discussion of implications; presented as a conceptual contribution within the paper.

high positive Position: Adopting AI in Practice Does Not Guarantee the Pro... organizational determinants and practical implications for industry and educatio...

Most practitioners assume that AI brings productivity boosts owing to enhanced technical capabilities.

Statement of common practitioner belief reported by the authors in the paper's framing; no supporting survey or sample reported in the abstract.

high positive Position: Adopting AI in Practice Does Not Guarantee the Pro... perceived productivity benefits from AI

Adoption of Claude Code increases cumulative lifetime languages used by +0.51.

Panel analysis of 5,838 developers over 28 months using the Callaway & Sant'Anna estimator; treatment = first Claude-co-authored commit.

high positive Coding Beyond Your Training: Claude Code and the Technologic... cumulative lifetime programming languages (count)

Adoption of Claude Code increases the count of newly-used languages by +0.31.

Same dataset and staggered-rollout estimator (Callaway & Sant'Anna), treatment = first Claude-co-authored commit; not-yet-treated controls.

high positive Coding Beyond Your Training: Claude Code and the Technologic... newly-used programming languages (monthly)

Adoption of Claude Code increases Shannon language entropy by +0.14.

Estimated with the doubly robust Callaway & Sant'Anna approach on the 5,838-developer panel over 28 months, using first Claude-co-authored commit as treatment.

high positive Coding Beyond Your Training: Claude Code and the Technologic... Shannon language entropy (diversity of languages used)

Adoption of Claude Code increases the number of distinct programming languages used by a developer by +0.83.

Same panel and staggered-rollout estimation as above (Callaway & Sant'Anna), treatment = first Claude-co-authored commit.

high positive Coding Beyond Your Training: Claude Code and the Technologic... distinct programming languages used (monthly)

Adoption of Claude Code increases the number of repositories a developer contributes to by +1.5 (monthly).

Same panel (5,838 developers, 28 months) and estimator (Callaway & Sant'Anna). Treatment = first Claude-co-authored commit; not-yet-treated controls.

high positive Coding Beyond Your Training: Claude Code and the Technologic... repositories contributed to (monthly)

Adoption of Claude Code is associated with an increase of +41 monthly commits per developer.

Analysis of a panel of 5,838 GitHub developers observed monthly over 28 months, exploiting staggered rollout of Claude Code (May 2025–Jan 2026). Treatment defined by developer's first Claude-co-authored commit; not-yet-treated developers used as controls. Estimates from the doubly robust Callaway and Sant'Anna (2021) staggered-difference-in-differences estimator.

high positive Coding Beyond Your Training: Claude Code and the Technologic... monthly commits

Case studies demonstrate exact power-water consistency between virtual attributions and physical generation-side withdrawals.

Simulation results on IEEE 30-bus and 118-bus test systems reported in the paper claiming exact consistency (two test systems used).

high positive From Accounting to Coordination: A Virtual Water-Aware Elect... power-water consistency (alignment between attributed virtual water and physical...

Case studies on the IEEE 30-bus and 118-bus test systems demonstrate reliable convergence of the method.

Simulation experiments reported in the paper using two standard test systems (IEEE 30-bus and IEEE 118-bus). Sample size: 2 test systems.

high positive From Accounting to Coordination: A Virtual Water-Aware Elect... convergence of the algorithm/method in simulations

Combined with fixed-point coordination, the framework enforces consistency between virtual water attribution and physical generation-side withdrawals.

Methodological claim about algorithmic properties (fixed-point coordination used to align attributions with physical withdrawals); supported by theoretical description and later case-study demonstrations.

high positive From Accounting to Coordination: A Virtual Water-Aware Elect... consistency between virtual water attribution and physical generation withdrawal...

The framework represents dispatch optimization as a differentiable optimization layer embedded within a deep learning architecture, enabling efficient end-to-end learning of coordination policies while preserving operational feasibility.

Methodological description claiming an implementation approach (differentiable optimization layer within deep learning); evidence likely from algorithmic implementation and simulation experiments described later in the paper.

high positive From Accounting to Coordination: A Virtual Water-Aware Elect... efficiency of end-to-end learning of coordination policies and preservation of o...

This paper develops an operational electricity-computation-water (ECW) nexus framework that internalizes virtual water impacts directly into power system dispatch.

Primary methodological contribution described in the paper (development and formulation of an ECW framework; implementation details implied but not quantified in the excerpt).

high positive From Accounting to Coordination: A Virtual Water-Aware Elect... integration of virtual water impacts into dispatch optimization

The expansion of data centers (DCs) drives a sustained increase in electricity demand and associated water withdrawals at generation sites.

Background assertion in paper introduction; general empirical observation motivating the work (no specific dataset or sample size reported in the excerpt).

high positive From Accounting to Coordination: A Virtual Water-Aware Elect... electricity demand and associated water withdrawals at power generation sites

The contribution is a benchmark-ready evaluation framework for runtime actuarial control of autonomous-agent side effects.

Paper presents the AAI, Authority Frontier, metrics (C_full, Capital@k), taxonomy, implementations and experimental traces; authors present it as benchmark-ready.

high positive Insuring Every Action: An Authority Frontier Framework for R... availability of a benchmark-ready evaluation framework

We report a live Postgres panel in which three Azure-hosted models propose actions through the same contract.

Live-panel experiment described in the paper using three Azure-hosted models interacting with a Postgres panel under the AAI contract.

high positive Insuring Every Action: An Authority Frontier Framework for R... models proposing actions under the contract in a live Postgres setup

We instantiate AAI across four agentic environments (database mutation, customer-service refund, and the public tau-bench retail and airline tool-use traces).

Empirical instantiation described in the paper across four named environments/traces.

high positive Insuring Every Action: An Authority Frontier Framework for R... successful instantiation of AAI across multiple agentic environments

The framework provides (i) a deterministic quote-bind-commit protocol with toll-bounded capability tokens; (ii) a universal seven-class action taxonomy mapping heterogeneous tool calls to comparable authority units; (iii) replay determinism and pathwise reserve coverage under alpha-spending; (iv) cross-domain normalization via full reserve demand C_full and capital metrics Capital@k.

System design and theoretical specification in the paper; described as implemented across experiments.

high positive Insuring Every Action: An Authority Frontier Framework for R... availability of protocol, taxonomy, determinism properties, and normalization me...

We develop the Authority Frontier, an evaluation primitive measuring how much autonomous authority the runtime releases at each level of reserve capital.

Methodological contribution (definition and formulation of the Authority Frontier) described in the paper; subsequently instantiated empirically in experiments.

high positive Insuring Every Action: An Authority Frontier Framework for R... amount of autonomous authority released as a function of reserve capital

We propose the Actuarial Action Interface (AAI), a deterministic runtime contract that prices each such action against a contractually fixed safe default under a time-consistent risk mapping, and gates execution against a per-boundary reserve capital budget.

Methodological design and proposal described in the paper (no empirical test reported for the claim itself).

high positive Insuring Every Action: An Authority Frontier Framework for R... ability to price actions and gate execution via a deterministic runtime contract

A profile-driven approach places humans and AI systems on shared scales, supporting comparisons that are predictive of novel-task performance, explanatory of why agents succeed or fail, and auditable.

Claim about anticipated benefits of the proposed profile-driven approach presented in the paper (theoretical argument; no empirical results reported).

high positive Reverse Turing Tests for Human-Machine Task Suitability Asse... predictive validity for novel-task performance; explanatory power; auditability ...

Suitability evaluations for task-assignment should be profile-driven — based on assessments that infer latent constructs such as capabilities and propensities from observed performance.

Core proposal of the position paper (conceptual/methodological recommendation; no empirical pilot or validation reported).

high positive Reverse Turing Tests for Human-Machine Task Suitability Asse... method for conducting suitability evaluations (profile-driven assessment of late...

As AI is integrated into the workplace, organisations increasingly face allocation decisions between human and machine workers, and these decisions are increasingly made or assisted by algorithms.

Position paper / conceptual argument in the paper's introduction (no empirical sample or quantitative data reported).

high positive Reverse Turing Tests for Human-Machine Task Suitability Asse... use of algorithms to make or assist allocation decisions between human and machi...

The paper proposes a policy architecture for 'shared gains' centered on learning equity, transition protections, accountable algorithmic management, and distribution-sensitive metrics beyond GDP.

Paper's normative policy proposal presented in abstract, based on the integrative framework and synthesis of secondary sources; no empirical sample size reported.

high positive ARTIFICIAL INTELLIGENCE, INEQUALITIES OF KNOWLEDGE AND RESOU... policy architecture elements for inclusive AI transitions

India's macro growth remains robust.

Statement in abstract referencing official Indian statistics (MoSPI–NSO GDP estimates, 2025); no numerical sample size provided in abstract.

high positive ARTIFICIAL INTELLIGENCE, INEQUALITIES OF KNOWLEDGE AND RESOU... macro growth

Evidence indicates accelerating AI adoption among firms in advanced economies.

Abstract cites validated secondary sources including OECD (2026) and other global reports; no primary sample size reported in paper abstract.

high positive ARTIFICIAL INTELLIGENCE, INEQUALITIES OF KNOWLEDGE AND RESOU... rate of AI adoption among firms in advanced economies

AI is increasingly embedded in production, services, and workforce management.

Statement in paper's abstract supported by integrative socio-technical political economy framework and validated secondary sources (OECD, ILO, UNDP, WTO, WEF). No primary sample size reported.

high positive ARTIFICIAL INTELLIGENCE, INEQUALITIES OF KNOWLEDGE AND RESOU... degree of AI embedding in production, services, and workforce management

Future A2A collaboration networks cannot rely on unverified self-reporting alone; scalable collaboration requires mechanisms that balance open participation with verifiable execution and trustworthy evaluation.

Paper's concluding recommendation based on the empirical problems documented (low reuse, ranking manipulation, vacuous validations).

high positive Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent... policy / mechanism design for verification and evaluation

EvoMap's credit economy rewards agents for publishing valuable assets, encouraging participation at scale.

Description and analysis of the platform's reward mechanism and observed high participation (agent counts); empirical linkage between reward rules and publishing behavior discussed in the paper.

high positive Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent... participation / publishing activity

Structured AI-based interventions provide causal evidence that they can transform access to scientific feedback from a largely private advantage into a more widely distributed resource.

Causal inference based on randomized field experiment showing increased revision likelihood and broader uptake of LLM tools across diverse regions and author groups.

high positive Human-AI Collaboration in Science at Scale: A Global Large-s... access and distribution of scientific feedback (measured via treated authors' be...

Effects were strongest among teams with lower h-indexes and earlier career stages.

Heterogeneous treatment effects by team-level metrics (h-index) and career stage reported in the randomized experiment.

high positive Human-AI Collaboration in Science at Scale: A Global Large-s... treatment effect (e.g., revision likelihood) by team h-index and author career s...

Effects were strongest for manuscripts less embedded in the scholarly literature.

Heterogeneous treatment effects reported by manuscript-level embedding in literature (e.g., referencing/citation context) within the randomized experiment.

high positive Human-AI Collaboration in Science at Scale: A Global Large-s... treatment effect (e.g., revision likelihood) by degree of manuscript embeddednes...

Effects of AI feedback were strongest among authors from non-English-dominant research regions.

Heterogeneous treatment effects reported in the randomized experiment stratified by authors' geographic / language-dominance region; sample includes authors from 133 geographic regions.

high positive Human-AI Collaboration in Science at Scale: A Global Large-s... treatment effect on revision likelihood (or other measured outcomes) by region

Exposure to AI feedback increased authors' subsequent use of LLM tools in their future papers, suggesting longer-run shifts in scientific practice.

Follow-up measurements in the randomized field experiment tracking authors' later behavior (use of LLM tools in subsequent papers); comparison between treatment and control authors.

high positive Human-AI Collaboration in Science at Scale: A Global Large-s... subsequent use of LLM tools in future papers

Authors who received LLM-generated feedback had a significantly higher likelihood of revising their manuscripts, corresponding to a 12.55% relative increase over the baseline revision rate.

Randomized field experiment comparing treatment (LLM feedback) vs control; sample described as >31,000 arXiv preprints and >45,000 researchers; reported comparative revision rate and statistical significance.

high positive Human-AI Collaboration in Science at Scale: A Global Large-s... likelihood (probability) of revising manuscripts

A difference-in-differences design centered on ChatGPT's release supports a causal interpretation of GenAI's local labor-market effects.

Quasi-experimental difference-in-differences analysis using ChatGPT's release as an event/shock, comparing outcomes across neighborhoods with different pre-existing GenAI exposure measures derived from 5 million job postings.

high positive Generative AI impacts on intra-urban inequality and skill pr... causal effect of GenAI exposure on neighborhood-level labor-market outcomes (e.g...

A human-centered approach is needed that integrates technological advancement with reskilling initiatives, labor protections, and inclusive policies.

Authors' prescriptive/recommendation based on their thematic synthesis of the reviewed literature (2010–2024).

high positive Artificial Intelligence in Manufacturing policy and programmatic responses (reskilling, protections, inclusion)

The integration of AI into manufacturing offers substantial gains in efficiency, productivity, and operational performance.

Authors' systematic literature review of interdisciplinary studies (2010–2024) using thematic synthesis; synthesis of prior empirical and conceptual studies reporting efficiency/productivity effects of AI in manufacturing.

high positive Artificial Intelligence in Manufacturing efficiency, productivity, and operational performance

A-insensitivity increases with financial literacy, suggesting financially literate decision-makers perceive greater ambiguity in prediction accuracy.

Association reported in the incentivized laboratory experiment between participants' measured financial literacy and their measured a-insensitivity (correlational evidence; sample size not reported in abstract).

high positive Trusting human versus machine predictions as a decision unde... a-insensitivity (ambiguity-generated insensitivity)

Decision-makers hold more optimistic beliefs about the accuracy of ML analysts than about human analysts, and this greater optimism predicts higher trust in ML analysts relative to human analysts.

Incentivized laboratory experiment measuring participants' optimism about forecast accuracy for human vs. ML analysts and examining the relationship between those beliefs and expressed trust (correlational/regression evidence; sample size not reported in abstract).

high positive Trusting human versus machine predictions as a decision unde... optimism about forecast accuracy and trust in analyst

A human-centred approach underpinned by ongoing reskilling and ethical governance is vital for sustainable workforce evolution in the Indian IT sector.

Authors' policy/recommendation derived from their literature synthesis and thematic analysis (qualitative conclusion).

high positive Human–AI Collaboration in the Indian IT Industry: A Qualitat... sustainability of workforce evolution (effect of human-centred reskilling and go...

The paper introduces a conceptual framework for hybrid intelligence within the Indian IT sector.

Authors present a new conceptual framework as part of this qualitative research article (conceptual contribution).

high positive Human–AI Collaboration in the Indian IT Industry: A Qualitat... conceptual framework introduction

« Prev 1 2 3 … 104 105 106 … 276 277 Next »