Evidence (3062 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	373	105	59	439	984
Governance & Regulation	366	172	115	55	718
Research Productivity	237	95	34	294	664
Organizational Efficiency	364	82	62	34	545
Technology Adoption Rate	293	118	66	30	511
Firm Productivity	274	33	68	10	390
AI Safety & Ethics	117	178	44	24	365
Output Quality	231	61	23	25	340
Market Structure	107	123	85	14	334
Decision Quality	158	68	33	17	279
Fiscal & Macroeconomic	75	52	32	21	187
Employment Level	70	32	74	8	186
Skill Acquisition	88	31	38	9	166
Firm Revenue	96	34	22	—	152
Innovation Output	105	12	21	11	150
Consumer Welfare	68	29	35	7	139
Regulatory Compliance	52	61	13	3	129
Inequality Measures	24	68	31	4	127
Task Allocation	71	10	29	6	116
Worker Satisfaction	46	38	12	9	105
Error Rate	42	47	6	—	95
Training Effectiveness	55	12	11	16	94
Task Completion Time	76	5	4	2	87
Wages & Compensation	46	13	19	5	83
Team Performance	44	9	15	7	76
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	18	16	9	5	48
Job Displacement	5	29	12	—	46
Social Protection	19	8	6	1	34
Developer Productivity	27	2	3	1	33
Worker Turnover	10	12	—	3	25
Creative Output	15	5	3	1	24
Skill Obsolescence	3	18	2	—	23
Labor Share of Income	8	4	9	—	21

Human Ai Collab Remove filter

Weak or inconsistent explanations increase regulatory and medico-legal risk; standardized, validated XAI can lower compliance costs and liability exposure.

Logical inference connecting explanation reliability to regulatory scrutiny and liability concerns, presented as an implication in the review (no direct empirical legal analysis provided).

medium negative Explainable Artificial Intelligence (XAI) for EEG Analysis: ... regulatory/compliance and legal risk

Preprocessing pipelines (filtering, artifact removal such as ICA, re-referencing, segmentation) materially affect XAI outputs.

Review cites multiple studies and methodological notes showing explanation maps vary with preprocessing choices; effect reported qualitatively across papers.

medium negative Explainable Artificial Intelligence (XAI) for EEG Analysis: ... sensitivity of explanation outputs to preprocessing steps

There is a scarcity of human/clinical validation studies testing whether explanations improve clinician decision-making or align with clinical reasoning.

Observation from literature survey: few reviewed works include clinician studies or longitudinal/clinical impact evaluations.

medium negative Explainable Artificial Intelligence (XAI) for EEG Analysis: ... presence/absence of human/clinical validation

Identified methodological limitations include sensitivity of explanations to hyperparameters and preprocessing choices, inconsistent explanations across similar inputs, and poor correlation with known neurophysiology.

Synthesis of reported failure modes and limitations from multiple EEG-XAI studies reviewed in the paper.

medium negative Explainable Artificial Intelligence (XAI) for EEG Analysis: ... stability/consistency of explanations and alignment with neurophysiological know...

Most studies focus on qualitative visualizations (e.g., heatmaps) rather than quantitative, reproducible metrics for explanation quality; few evaluate neuroscientific validity or clinical usefulness, and robustness to noise and preprocessing is often untested.

Review-level assessment of evaluation practices across papers, noting prevalence of visual inspection and scarcity of standardized quantitative metrics or clinical validation.

medium negative Explainable Artificial Intelligence (XAI) for EEG Analysis: ... evaluation rigor: qualitative vs quantitative; assessment of robustness and clin...

Current explainability methods for EEG frequently lack robustness, consistency, and alignment with neuroscientific knowledge, limiting their trustworthiness and practical utility.

Aggregate observations from reviewed EEG-XAI studies noting inconsistent attributions, sensitivity to analysis choices, and few studies that validate explanations against neuroscientific markers or clinical endpoints.

medium negative Explainable Artificial Intelligence (XAI) for EEG Analysis: ... robustness/consistency/neuroscientific validity of explanations (trustworthiness...

Optional LLM access without training was associated with shorter written answers compared with no LLM access.

Measured answer length in the randomized trial (n = 164); comparison between untrained optional-access arm and no-access arm showed shorter answers in the untrained-access group.

medium negative Training for Technology: Adoption and Productive Use of Gene... Answer length (measured length of exam answers)

AI adoption can reinforce winner‑take‑most market dynamics and increase market concentration due to data‑ and AI‑driven advantages.

Theoretical arguments and industry analyses on platform markets and data economies; empirical market‑structure studies and descriptive evidence cited in the review; the claim is derived from synthesis rather than a single causal identification design.

medium negative The Impact of AI Machine Learning on Human Labor in the Work... market concentration measures and firm market shares (competition outcomes)

Impacts of AI on labor are uneven globally: developing regions face larger risks due to digital infrastructure gaps, limited reskilling capacity, and weaker social protections.

Cross‑country comparative analyses, policy and industry reports highlighting infrastructure and institutional differences, and sectoral case studies; review notes geographic bias toward advanced economies in the empirical literature, making some cross‑region inference provisional.

medium negative The Impact of AI Machine Learning on Human Labor in the Work... vulnerability to job displacement, capacity for reskilling, and distributional i...

There is widespread displacement of routine and lower‑skilled tasks associated with AI and automation.

Task‑based analyses decomposing occupations into automatable vs augmentable tasks, econometric studies correlating measures of automation/AI exposure with declines in employment and/or hours in routine occupations, and industry reports documenting automation of routine tasks; evidence is largely from cross‑country and country‑specific empirical work summarized in the review.

medium negative The Impact of AI Machine Learning on Human Labor in the Work... employment levels and task content in routine and lower‑skilled occupations

Strict oversight requirements for GLAI could raise fixed compliance costs (audit, certification, human-in-the-loop processes), benefiting incumbent firms and potentially reducing competition and barriers to entry.

Regulatory economics argument drawing on compliance-cost logic and market structure effects; no empirical entry-cost analysis or case studies.

medium negative (for competition), positive (for incumbents) Why Avoid Generative Legal AI Systems? Hallucination, Overre... barriers to entry and market competition metrics in legal-AI markets

Perception of increased legal risk and regulatory uncertainty may slow adoption of GLAI and redirect investment toward safer subfields (verification tools, retrieval-augmented systems, formal-reasoning hybrids).

Economic reasoning and market-design argumentation based on risk/uncertainty dynamics; no econometric or survey data presented.

medium negative (for generative adoption), positive (for verification subfields) Why Avoid Generative Legal AI Systems? Hallucination, Overre... adoption rates of GLAI and relative investment flows across AI subfields

Divergent regulatory regimes (e.g., strict EU rules vs. looser regimes elsewhere) may produce regulatory arbitrage, influencing where GLAI companies locate, invest, and trade internationally.

Cross-jurisdictional regulatory analysis and economic inference about firm behavior under differential regulation; no firm-level relocation data provided.

medium negative (for regulatory harmonization), neutral for firms (strategic outcome) Why Avoid Generative Legal AI Systems? Hallucination, Overre... firm location/investment decisions and cross-border trade in legal-AI services

Interpretive, ad-hoc human-centered evaluation practices (e.g., “vibe checks”, team sense-making) are rational adaptations to LLM behavior rather than merely sloppy or inferior methodological choices.

Authors' interpretive argument based on interview evidence where practitioners explained why such practices persist and how they serve sense-making for unpredictable model behavior.

medium neutral Results-Actionability Gap: Understanding How Practitioners E... characterization of interpretive evaluation practices (rational adaptation vs. m...

The possibility of strategic argument construction (gaming) motivates governance needs: standards for provenance, certification, and liability rules.

Policy recommendation based on anticipated incentive problems; no empirical governance evaluations.

medium neutral Argumentative Human-AI Decision-Making: Toward AI Agents Tha... existence and effectiveness of governance mechanisms (standards, certification, ...

AI changes the nature of capital (digital/algorithmic assets) and complicates productivity accounting; researchers should decompose firm-level productivity gains into AI technology, complementary organizational capital, and human capital effects.

Theoretical proposal grounded in productivity accounting literature and conceptual discussion; no single decomposition empirical result presented.

medium neutral Modern Management in the Age of Artificial Intelligence: Str... components of multifactor productivity attributable to AI assets versus organiza...

Conventional productivity statistics and standard evaluation methods may undercount benefits from conversational initiation assistance; new survey and administrative measures might be needed.

Policy and measurement recommendation based on the conceptual model; no empirical measurement validation provided.

medium neutral A Model of Action Initiation Barrier Reduction through AI Co... coverage of productivity statistics regarding initiation effects (hypothesized m...

Policy and governance issues become salient: liability, IP, security, and certification of AI-generated code require new standards for provenance, testing, and accountability.

Argument based on practitioner-raised concerns about security, IP, and provenance in the Netlight study; authors recommend policy attention; no legal/regulatory analysis or empirical policy evaluation provided.

medium neutral Rethinking How IT Professionals Build IT Products with Artif... need for regulatory standards and governance mechanisms for AI-assisted developm...

Many AI-assisted decision systems operate in competitive settings (e.g., admission or hiring) where only a fraction of candidates can succeed.

Authors' characterization of real-world contexts motivating the study (literature-based/contextual claim within the paper).

medium null result Actionable Recourse in Competitive Environments: A Dynamic G... prevalence of competitive selection constraints (fraction of candidates selected...

The authors assess system performance on JobSearch-XS across retrieval tasks.

Paper states that system performance is assessed on JobSearch-XS across retrieval tasks. The excerpt does not provide the tasks, metrics, sample sizes, or numerical results.

medium null result JobMatchAI An Intelligent Job Matching Platform Using Knowle... retrieval performance on JobSearch-XS tasks (metrics unspecified in excerpt)

Output quality saturates at approximately seven governed memories per entity.

Empirical analysis reported in the controlled experiments showing output quality vs. number of governed memories per entity, with saturation near seven memories.

medium null result Governed Memory: A Production Architecture for Multi-Agent W... output quality as a function of number of governed memories per entity (saturati...

A Sankey diagram of thematic evolution shows lexical convergence over time and indicates that a small set of authors has disproportionate influence in structuring the discourse.

Thematic evolution analysis visualized with a Sankey diagram; author influence inferred from performance trends (citations/publication counts) in the bibliometric data.

medium null result Generative AI and the algorithmic workplace: a bibliometric ... lexical convergence across themes and concentration of author influence (disprop...

Distributed agency (Problem C) complicates classical principal–agent models; economists should develop models that capture multiple, overlapping agents and ambiguous attribution of outcomes.

Conceptual implication for economic modeling derived from the paper’s diagnosis of distributed agency; recommendation for formal modeling and simulations but none provided.

medium null result Examining ethical challenges in human–robot interaction usin... adequacy of classical principal–agent models to represent distributed agency (th...

ToM alignment matters less (i.e., misalignment has smaller effect) in settings with explicit coordination protocols, strong signaling, or standardized conventions.

Analyses and experiments described in the paper showing smaller performance differences between matched and mismatched ToM orders when explicit conventions or reliable signals are available; reported as part of robustness/conditional analyses.

medium null result Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... difference in coordination performance between matched and mismatched ToM orders...

Manipulating costs and benefits of observation versus action in experiments can probe the switching behavior driven by System M.

Proposed experimental manipulation; no empirical data presented.

medium null result Why AI systems don't learn and what to do about it: Lessons ... switching thresholds; allocation of observation vs action; resultant task perfor...

Ablation studies disabling System M or decoupling Systems A and B will help test whether meta-control provides empirical benefits.

Suggested experimental design (ablation study) in the methods section; no results provided.

medium null result Why AI systems don't learn and what to do about it: Lessons ... performance difference with/without M; switching/adaptation behavior

The study is the first empirical investigation of human–AI assistance in a live CTF setting with a direct comparison to autonomous AI agents on the same fresh challenges.

Authors' positioning of their work as novel; methodology involved a live onsite CTF, instrumentation of human–AI interactions (41 participants), and direct benchmarking of four autonomous agents on the same fresh challenge set.

medium null result Understanding Human-AI Collaboration in Cybersecurity Compet... novelty claim (existence of prior comparable live CTF human–AI empirical studies...

This is the first study to compare human–human and human–AI collaboration outcomes for temporary virtual tasks from employees’ perspective in an applied service-industry context.

Author-stated novelty claim in the paper (based on study design: online experiment with retail employees examining temporary, virtual teamwork).

medium null result Adoption of AI partners in temporary tasks: exploring the ef... comparative collaboration outcomes (human–human vs human–AI) in temporary virtua...

Measuring AI's contribution to productivity and coordination effects will be challenging; new metrics (e.g., coordination time per task, error/rework rates attributable to communication lapses) are required.

Conceptual argument and recommended measurement agenda in the paper; no empirical testing of proposed metrics provided.

medium null result AI as a universal collaboration layer: Eliminating language ... feasibility and precision of proposed coordination/productivity metrics

AI adoption is not associated with significant changes in operating costs.

Analysis of operating costs in firm financials showing no significant post-adoption change for adopters relative to nonadopters.

medium null result AI and Productivity: The Role of Innovation operating costs

The innovation effects of AI adoption are not concentrated among larger firms, financially unconstrained firms, or high-tech firms.

Heterogeneity tests across firm size, financial constraint status, and industry technology intensity showing no concentration of effects in these groups (as reported in the paper).

medium null result AI and Productivity: The Role of Innovation distribution of treatment effects across firm-size, financial-constraint, and in...

We did not observe significant differences between using Gemini (free or paid) and not using Gemini in terms of secure software development.

Statistical comparison of code-security outcomes across the three experimental groups (no AI, free Gemini, paid Gemini) in the n = 159 participant sample; the paper reports no statistically significant group differences.

medium null result The Impact of AI-Assisted Development on Software Security: ... secure software development / code security (e.g., detected vulnerabilities or s...

Workers prefer systems that are straightforward, tolerant, and practical.

Survey responses from workers collected in the study on the representative sample of tasks (171) and possibly summarized/scaled via LMs.

medium null result Are We Automating the Joy Out of Work? Designing AI to Augme... traits workers indicate preferring in AI systems (straightforwardness, tolerance...

Developers report emphasizing politeness, strictness, and imagination in system design.

Survey responses from developers collected as part of the study on the representative sample of tasks (171) and possibly summarized/scaled via LMs.

medium null result Are We Automating the Joy Out of Work? Designing AI to Augme... traits developers report prioritizing when designing AI systems (politeness, str...

Prior work has mapped which workplace tasks are exposed to AI, but less is known about whether workers perceive these tasks as meaningful or as busywork.

Statement referencing prior literature (background motivation) in the paper; no new data provided for this claim within the excerpt.

medium null result Are We Automating the Joy Out of Work? Designing AI to Augme... extent of existing research coverage on AI exposure vs. worker perceptions of me...

SWE-Skills-Bench is the first requirement-driven benchmark that isolates the marginal utility of agent skills in real-world software engineering (SWE).

Authors present a new benchmark designed to evaluate marginal utility of skills; benchmark pairs skills with repositories and requirement documents and is described as requirement-driven and focused on isolating marginal utility.

medium null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... existence/novelty of a requirement-driven benchmark for evaluating marginal util...

Collaborative ability is distinct from individual problem-solving ability.

Model-based estimates from the Bayesian IRT framework that separately parameterize collaborative ability and individual problem-solving ability, with results indicating they are separable constructs (analysis on n = 667 benchmark data).

medium null result Quantifying and Optimizing Human-AI Synergy: Evidence-Based ... separability/distinctness of model parameters for collaborative ability versus i...

LLMs can be understood as condensates of human symbolic behavior—compressed, generative representations that render patterns of collective discourse computationally accessible.

Theoretical framing and conceptual argument provided by the authors; presented as an interpretive model rather than an empirically tested assertion in the excerpt.

medium null result The Third Ambition: Artificial Intelligence and the Science ... conceptual characterization of LLMs (as condensed representations of collective ...

Previous studies have identified language barriers as impediments to labor market engagement but empirical information assessing both policy reductions and the relative efficacy of professional, AI-assisted, and hybrid translation methods is scarce.

Paper's literature review claim that existing literature documents language barriers but lacks comparative empirical evaluations of policy reductions and multiple translation models; asserted as motivation for current study.

medium null result Translation Models Empowering Immigrant Workforce Integratio... state of literature (presence/absence of comparative empirical evidence)

Translation verified against existing performance implementations achieves throughput parity with MJX (1.04x) for HalfCheetah JAX.

Benchmarking HalfCheetah implemented in the translated backend versus MJX, reporting a 1.04x throughput ratio (approximate parity).

medium null result Automatic Generation of High-Performance RL Environments throughput parity (ratio) vs MJX

Logistics efficiency does not mediate (fails to fulfill) the anticipated role in transmitting AI's effects to supply chain stability.

Mechanism/mediation tests in the DML analysis on the 45 Chinese listed SEs (2012–2023) indicate no significant mediation via logistics efficiency.

medium null result Can Artificial Intelligence Enhance the Stability of Supply ... logistics efficiency as a mediator of AI's effect on supply chain stability

Personal experience with an AI 'boss' did not affect workers' attitudes on using AI in public decision making.

Same randomized design (N > 1,500) with attitudinal measures collected across a three-wave panel; comparison between AI-assigned and human-assigned participants showed no measurable effect on attitudes about AI in public decision making.

medium null result The Politics of Using AI in Policy Implementation: Evidence ... attitudes toward using AI in public decision making

The study presents an advanced systematic ranking of I4.0 adoption barriers in the Thai automotive industry.

Paper outputs a ranked list of barriers produced by the integrated Fuzzy BWM-PROMETHEE II-DEMATEL framework; full ranked list and quantitative ranks not included in the supplied summary.

medium null result Evaluating Critical Barriers to Industry 4.0 Adoption in the... systematic ranking/prioritization of I4.0 adoption barriers

This study developed a unified framework that integrates technology acceptance and trust-based perspectives.

Conceptual/methodological claim in the paper: authors report constructing an integrated framework based on literature and their empirical testing.

medium null result Role of artificial intelligence on consumer buying behavior:... theoretical integration (no direct empirical outcome variable; framework develop...

In the sentiment-analysis task, those individual differences do not produce human–AI complementarity: the joint performance of humans and AI did not exceed that of either alone.

Empirical finding reported from the preregistered sentiment-analysis experiment showing no complementarity effect (joint human-AI performance ≤ best individual performance). (Statistical tests and sample size not included in the excerpt.)

medium null result Who Needs What Explanation? How User Traits Affect Explanati... human–AI joint performance compared to human-alone and AI-alone performance (e.g...

We conducted a systematic review and meta-analysis of the literature on AI/HR analytics and organizational decision making, using 85 publications and grounding the work in theories of algorithm-automated decision-making (AST) and matching/hybrid models (STS).

Paper's methods: systematic review and meta-analysis; sample = 85 publications; theoretical framing explicitly stated as AST and STS.

medium null result ALGORITHMIC DETERMINISM VERSUS HUMAN AGENCY: A SYSTEMATIC RE... scope/coverage of literature (number of publications reviewed); theoretical fram...

Macroeconomic fiscal moderation remains empirically unvalidated.

Synthesis conclusion from the review noting an absence of empirical evidence that Agentic AI produces macroeconomic fiscal moderation; i.e., no validated studies showing broad fiscal relief effects were identified in the reviewed literature.

medium null result Agentic AI for Ageing Healthcare Systems in Advanced Economi... macro-fiscal outcomes (e.g., national fiscal pressure, public expenditure modera...

No significant differences emerged in job titles and industry suggested by GPT-5 across genders.

Empirical finding from analysis of GPT-5 outputs comparing suggested job titles and industries for the 24 profiles; exact statistical tests not specified in the summary.

medium null result Gender Bias in Generative AI-assisted Recruitment Processes suggested job titles and industry assignments by GPT-5 across male and female pr...

Self-generated (model-authored) Skills provide no average benefit.

Comparison of three evaluation conditions (no Skills, curated Skills, self-authored Skills) across SkillsBench. Averaged pass-rate deltas show that model-authored Skills do not increase average pass rate relative to baseline; analysis used 7,308 trajectories over 86 tasks and 7 agent–model configurations.

medium null result SkillsBench: Benchmarking How Well Agent Skills Work Across ... task pass rate (average delta for self-authored Skills vs. baseline)

AI did not significantly moderate the relationship between workplace stress and job performance.

Moderation test in PLS-SEM (SmartPLS 4.0) on N = 350; reported non-significant AI × Stress → Performance moderator (paper reports no significant moderating effect).

medium null result AI-driven stress management and performance optimization: A ... job performance

« Prev 1 2 3 … 37 38 39 … 61 62 Next »