Evidence (3062 claims)
Adoption
5227 claims
Productivity
4503 claims
Governance
4100 claims
Human-AI Collaboration
3062 claims
Labor Markets
2480 claims
Innovation
2320 claims
Org Design
2305 claims
Skills & Training
1920 claims
Inequality
1311 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 373 | 105 | 59 | 439 | 984 |
| Governance & Regulation | 366 | 172 | 115 | 55 | 718 |
| Research Productivity | 237 | 95 | 34 | 294 | 664 |
| Organizational Efficiency | 364 | 82 | 62 | 34 | 545 |
| Technology Adoption Rate | 293 | 118 | 66 | 30 | 511 |
| Firm Productivity | 274 | 33 | 68 | 10 | 390 |
| AI Safety & Ethics | 117 | 178 | 44 | 24 | 365 |
| Output Quality | 231 | 61 | 23 | 25 | 340 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 158 | 68 | 33 | 17 | 279 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 88 | 31 | 38 | 9 | 166 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 105 | 12 | 21 | 11 | 150 |
| Consumer Welfare | 68 | 29 | 35 | 7 | 139 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 71 | 10 | 29 | 6 | 116 |
| Worker Satisfaction | 46 | 38 | 12 | 9 | 105 |
| Error Rate | 42 | 47 | 6 | — | 95 |
| Training Effectiveness | 55 | 12 | 11 | 16 | 94 |
| Task Completion Time | 76 | 5 | 4 | 2 | 87 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 16 | 9 | 5 | 48 |
| Job Displacement | 5 | 29 | 12 | — | 46 |
| Social Protection | 19 | 8 | 6 | 1 | 34 |
| Developer Productivity | 27 | 2 | 3 | 1 | 33 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Skill Obsolescence | 3 | 18 | 2 | — | 23 |
| Labor Share of Income | 8 | 4 | 9 | — | 21 |
Human Ai Collab
Remove filter
Weak or inconsistent explanations increase regulatory and medico-legal risk; standardized, validated XAI can lower compliance costs and liability exposure.
Logical inference connecting explanation reliability to regulatory scrutiny and liability concerns, presented as an implication in the review (no direct empirical legal analysis provided).
Preprocessing pipelines (filtering, artifact removal such as ICA, re-referencing, segmentation) materially affect XAI outputs.
Review cites multiple studies and methodological notes showing explanation maps vary with preprocessing choices; effect reported qualitatively across papers.
There is a scarcity of human/clinical validation studies testing whether explanations improve clinician decision-making or align with clinical reasoning.
Observation from literature survey: few reviewed works include clinician studies or longitudinal/clinical impact evaluations.
Identified methodological limitations include sensitivity of explanations to hyperparameters and preprocessing choices, inconsistent explanations across similar inputs, and poor correlation with known neurophysiology.
Synthesis of reported failure modes and limitations from multiple EEG-XAI studies reviewed in the paper.
Most studies focus on qualitative visualizations (e.g., heatmaps) rather than quantitative, reproducible metrics for explanation quality; few evaluate neuroscientific validity or clinical usefulness, and robustness to noise and preprocessing is often untested.
Review-level assessment of evaluation practices across papers, noting prevalence of visual inspection and scarcity of standardized quantitative metrics or clinical validation.
Current explainability methods for EEG frequently lack robustness, consistency, and alignment with neuroscientific knowledge, limiting their trustworthiness and practical utility.
Aggregate observations from reviewed EEG-XAI studies noting inconsistent attributions, sensitivity to analysis choices, and few studies that validate explanations against neuroscientific markers or clinical endpoints.
Optional LLM access without training was associated with shorter written answers compared with no LLM access.
Measured answer length in the randomized trial (n = 164); comparison between untrained optional-access arm and no-access arm showed shorter answers in the untrained-access group.
AI adoption can reinforce winner‑take‑most market dynamics and increase market concentration due to data‑ and AI‑driven advantages.
Theoretical arguments and industry analyses on platform markets and data economies; empirical market‑structure studies and descriptive evidence cited in the review; the claim is derived from synthesis rather than a single causal identification design.
Impacts of AI on labor are uneven globally: developing regions face larger risks due to digital infrastructure gaps, limited reskilling capacity, and weaker social protections.
Cross‑country comparative analyses, policy and industry reports highlighting infrastructure and institutional differences, and sectoral case studies; review notes geographic bias toward advanced economies in the empirical literature, making some cross‑region inference provisional.
There is widespread displacement of routine and lower‑skilled tasks associated with AI and automation.
Task‑based analyses decomposing occupations into automatable vs augmentable tasks, econometric studies correlating measures of automation/AI exposure with declines in employment and/or hours in routine occupations, and industry reports documenting automation of routine tasks; evidence is largely from cross‑country and country‑specific empirical work summarized in the review.
Strict oversight requirements for GLAI could raise fixed compliance costs (audit, certification, human-in-the-loop processes), benefiting incumbent firms and potentially reducing competition and barriers to entry.
Regulatory economics argument drawing on compliance-cost logic and market structure effects; no empirical entry-cost analysis or case studies.
Perception of increased legal risk and regulatory uncertainty may slow adoption of GLAI and redirect investment toward safer subfields (verification tools, retrieval-augmented systems, formal-reasoning hybrids).
Economic reasoning and market-design argumentation based on risk/uncertainty dynamics; no econometric or survey data presented.
Divergent regulatory regimes (e.g., strict EU rules vs. looser regimes elsewhere) may produce regulatory arbitrage, influencing where GLAI companies locate, invest, and trade internationally.
Cross-jurisdictional regulatory analysis and economic inference about firm behavior under differential regulation; no firm-level relocation data provided.
Interpretive, ad-hoc human-centered evaluation practices (e.g., “vibe checks”, team sense-making) are rational adaptations to LLM behavior rather than merely sloppy or inferior methodological choices.
Authors' interpretive argument based on interview evidence where practitioners explained why such practices persist and how they serve sense-making for unpredictable model behavior.
The possibility of strategic argument construction (gaming) motivates governance needs: standards for provenance, certification, and liability rules.
Policy recommendation based on anticipated incentive problems; no empirical governance evaluations.
AI changes the nature of capital (digital/algorithmic assets) and complicates productivity accounting; researchers should decompose firm-level productivity gains into AI technology, complementary organizational capital, and human capital effects.
Theoretical proposal grounded in productivity accounting literature and conceptual discussion; no single decomposition empirical result presented.
Conventional productivity statistics and standard evaluation methods may undercount benefits from conversational initiation assistance; new survey and administrative measures might be needed.
Policy and measurement recommendation based on the conceptual model; no empirical measurement validation provided.
Policy and governance issues become salient: liability, IP, security, and certification of AI-generated code require new standards for provenance, testing, and accountability.
Argument based on practitioner-raised concerns about security, IP, and provenance in the Netlight study; authors recommend policy attention; no legal/regulatory analysis or empirical policy evaluation provided.
Many AI-assisted decision systems operate in competitive settings (e.g., admission or hiring) where only a fraction of candidates can succeed.
Authors' characterization of real-world contexts motivating the study (literature-based/contextual claim within the paper).
The authors assess system performance on JobSearch-XS across retrieval tasks.
Paper states that system performance is assessed on JobSearch-XS across retrieval tasks. The excerpt does not provide the tasks, metrics, sample sizes, or numerical results.
Output quality saturates at approximately seven governed memories per entity.
Empirical analysis reported in the controlled experiments showing output quality vs. number of governed memories per entity, with saturation near seven memories.
A Sankey diagram of thematic evolution shows lexical convergence over time and indicates that a small set of authors has disproportionate influence in structuring the discourse.
Thematic evolution analysis visualized with a Sankey diagram; author influence inferred from performance trends (citations/publication counts) in the bibliometric data.
Distributed agency (Problem C) complicates classical principal–agent models; economists should develop models that capture multiple, overlapping agents and ambiguous attribution of outcomes.
Conceptual implication for economic modeling derived from the paper’s diagnosis of distributed agency; recommendation for formal modeling and simulations but none provided.
ToM alignment matters less (i.e., misalignment has smaller effect) in settings with explicit coordination protocols, strong signaling, or standardized conventions.
Analyses and experiments described in the paper showing smaller performance differences between matched and mismatched ToM orders when explicit conventions or reliable signals are available; reported as part of robustness/conditional analyses.
Manipulating costs and benefits of observation versus action in experiments can probe the switching behavior driven by System M.
Proposed experimental manipulation; no empirical data presented.
Ablation studies disabling System M or decoupling Systems A and B will help test whether meta-control provides empirical benefits.
Suggested experimental design (ablation study) in the methods section; no results provided.
The study is the first empirical investigation of human–AI assistance in a live CTF setting with a direct comparison to autonomous AI agents on the same fresh challenges.
Authors' positioning of their work as novel; methodology involved a live onsite CTF, instrumentation of human–AI interactions (41 participants), and direct benchmarking of four autonomous agents on the same fresh challenge set.
This is the first study to compare human–human and human–AI collaboration outcomes for temporary virtual tasks from employees’ perspective in an applied service-industry context.
Author-stated novelty claim in the paper (based on study design: online experiment with retail employees examining temporary, virtual teamwork).
Measuring AI's contribution to productivity and coordination effects will be challenging; new metrics (e.g., coordination time per task, error/rework rates attributable to communication lapses) are required.
Conceptual argument and recommended measurement agenda in the paper; no empirical testing of proposed metrics provided.
AI adoption is not associated with significant changes in operating costs.
Analysis of operating costs in firm financials showing no significant post-adoption change for adopters relative to nonadopters.
The innovation effects of AI adoption are not concentrated among larger firms, financially unconstrained firms, or high-tech firms.
Heterogeneity tests across firm size, financial constraint status, and industry technology intensity showing no concentration of effects in these groups (as reported in the paper).
We did not observe significant differences between using Gemini (free or paid) and not using Gemini in terms of secure software development.
Statistical comparison of code-security outcomes across the three experimental groups (no AI, free Gemini, paid Gemini) in the n = 159 participant sample; the paper reports no statistically significant group differences.
Workers prefer systems that are straightforward, tolerant, and practical.
Survey responses from workers collected in the study on the representative sample of tasks (171) and possibly summarized/scaled via LMs.
Developers report emphasizing politeness, strictness, and imagination in system design.
Survey responses from developers collected as part of the study on the representative sample of tasks (171) and possibly summarized/scaled via LMs.
Prior work has mapped which workplace tasks are exposed to AI, but less is known about whether workers perceive these tasks as meaningful or as busywork.
Statement referencing prior literature (background motivation) in the paper; no new data provided for this claim within the excerpt.
SWE-Skills-Bench is the first requirement-driven benchmark that isolates the marginal utility of agent skills in real-world software engineering (SWE).
Authors present a new benchmark designed to evaluate marginal utility of skills; benchmark pairs skills with repositories and requirement documents and is described as requirement-driven and focused on isolating marginal utility.
Collaborative ability is distinct from individual problem-solving ability.
Model-based estimates from the Bayesian IRT framework that separately parameterize collaborative ability and individual problem-solving ability, with results indicating they are separable constructs (analysis on n = 667 benchmark data).
LLMs can be understood as condensates of human symbolic behavior—compressed, generative representations that render patterns of collective discourse computationally accessible.
Theoretical framing and conceptual argument provided by the authors; presented as an interpretive model rather than an empirically tested assertion in the excerpt.
Previous studies have identified language barriers as impediments to labor market engagement but empirical information assessing both policy reductions and the relative efficacy of professional, AI-assisted, and hybrid translation methods is scarce.
Paper's literature review claim that existing literature documents language barriers but lacks comparative empirical evaluations of policy reductions and multiple translation models; asserted as motivation for current study.
Translation verified against existing performance implementations achieves throughput parity with MJX (1.04x) for HalfCheetah JAX.
Benchmarking HalfCheetah implemented in the translated backend versus MJX, reporting a 1.04x throughput ratio (approximate parity).
Logistics efficiency does not mediate (fails to fulfill) the anticipated role in transmitting AI's effects to supply chain stability.
Mechanism/mediation tests in the DML analysis on the 45 Chinese listed SEs (2012–2023) indicate no significant mediation via logistics efficiency.
Personal experience with an AI 'boss' did not affect workers' attitudes on using AI in public decision making.
Same randomized design (N > 1,500) with attitudinal measures collected across a three-wave panel; comparison between AI-assigned and human-assigned participants showed no measurable effect on attitudes about AI in public decision making.
The study presents an advanced systematic ranking of I4.0 adoption barriers in the Thai automotive industry.
Paper outputs a ranked list of barriers produced by the integrated Fuzzy BWM-PROMETHEE II-DEMATEL framework; full ranked list and quantitative ranks not included in the supplied summary.
This study developed a unified framework that integrates technology acceptance and trust-based perspectives.
Conceptual/methodological claim in the paper: authors report constructing an integrated framework based on literature and their empirical testing.
In the sentiment-analysis task, those individual differences do not produce human–AI complementarity: the joint performance of humans and AI did not exceed that of either alone.
Empirical finding reported from the preregistered sentiment-analysis experiment showing no complementarity effect (joint human-AI performance ≤ best individual performance). (Statistical tests and sample size not included in the excerpt.)
We conducted a systematic review and meta-analysis of the literature on AI/HR analytics and organizational decision making, using 85 publications and grounding the work in theories of algorithm-automated decision-making (AST) and matching/hybrid models (STS).
Paper's methods: systematic review and meta-analysis; sample = 85 publications; theoretical framing explicitly stated as AST and STS.
Macroeconomic fiscal moderation remains empirically unvalidated.
Synthesis conclusion from the review noting an absence of empirical evidence that Agentic AI produces macroeconomic fiscal moderation; i.e., no validated studies showing broad fiscal relief effects were identified in the reviewed literature.
No significant differences emerged in job titles and industry suggested by GPT-5 across genders.
Empirical finding from analysis of GPT-5 outputs comparing suggested job titles and industries for the 24 profiles; exact statistical tests not specified in the summary.
Self-generated (model-authored) Skills provide no average benefit.
Comparison of three evaluation conditions (no Skills, curated Skills, self-authored Skills) across SkillsBench. Averaged pass-rate deltas show that model-authored Skills do not increase average pass rate relative to baseline; analysis used 7,308 trajectories over 86 tasks and 7 agent–model configurations.
AI did not significantly moderate the relationship between workplace stress and job performance.
Moderation test in PLS-SEM (SmartPLS 4.0) on N = 350; reported non-significant AI × Stress → Performance moderator (paper reports no significant moderating effect).