Evidence (3470 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
Traditional questionnaires yielded slightly higher accuracy in risk assessment.
Result reported from the two experiments comparing traditional questionnaires to adaptive ARQuest versions; no numeric accuracy or sample size provided in the excerpt.
Insurers must blindly trust users' responses, increasing the chances of fraud.
Stated as a motivating problem in the paper; presented as logical/empirical concern rather than supported by a reported study within the paper.
Insurance application processes often rely on lengthy and standardized questionnaires that struggle to capture individual differences.
Descriptive claim in paper introduction arguing limitations of standard questionnaires; no experiment or sample size reported for this assertion.
Using a stylised inpatient capacity signalling example and minimal game-theoretic reasoning, task optimisation alone is unlikely to change system outcomes when incentives are unchanged.
Theoretical analysis using a stylised inpatient capacity signalling example and game-theoretic reasoning presented in the paper (no empirical data/sample reported in the abstract).
Deployment of AI systems carries significant costs including ongoing costs of monitoring and it is unclear whether optimism of a deus ex machina solution is well-placed.
Conceptual/argumentative claim made by the authors in the paper (no empirical study or sample size reported in the abstract).
Improvements in operational resilience (OR) effectively reduce corporate operational risk.
Further analysis reported in the paper linking higher OR to lower operational risk measures for firms in the sample.
AI promotes operational resilience by reducing management agency conflicts.
Mechanism (mediation) tests reported in the paper showing AI associated with reductions in measures of agency/management conflict, which in turn relate to OR improvements.
Practitioners identified specific functional deficiencies in AI: inability to maintain sustained partnerships.
Theme from semi-structured interviews with 10 practitioners; cited as an example of the functional gap.
Practitioners identified specific functional deficiencies in AI: inability to adapt contextually.
Theme from semi-structured interviews with 10 practitioners; cited as an example of the functional gap.
Practitioners identified specific functional deficiencies in AI: inability to negotiate responsibilities.
Theme from semi-structured interviews with 10 practitioners; cited as an example of the functional gap.
Practitioners currently view AI models as intellectual teammates rather than social partners and expect fewer SEI attributes from them than from human teammates.
Qualitative findings from semi-structured interviews with 10 software practitioners reported in the study.
Current AI systems lack SEI capabilities that humans bring to teamwork, creating a potential gap in collaborative dynamics.
Framed as background/context in the paper; asserted rather than empirically tested in this study.
Unbalanced or poorly governed adoption of Big Data and AI contributes to increased systemic risk, cybersecurity vulnerability, regulatory fragmentation and third-party dependence on BigTech platforms.
Argument based on qualitative literature review and synthesis of international empirical studies and comparative sector analysis; no single-sample empirical study in this paper.
Task orchestration is the most under-researched dimension among the five workplace-design components.
Finding from the PRISMA-guided systematic review of 120 papers, which mapped coverage across the five dimensions and identified task orchestration as having the least research attention.
Decision authority allocation emerges as the binding constraint for Society 5.0 transitions.
Result synthesized from the systematic review and theoretical analysis mapping the five workplace-design dimensions; stated as the binding constraint in the paper's findings.
A weak manager directing a weak worker achieves a 42% success rate, performing worse than the weak agent alone which achieves 44%.
Empirical comparison across the same 200 SWE-bench Lite instances and pipeline configurations, comparing weak-manager+weak-worker pipeline to weak single-agent baseline.
Under low emotional intelligence, the model predicts higher risks of over-reliance on AI, emotionally detached communication, and weaker delegation quality.
Theoretical predictions derived from the EI-moderated human–AI model presented in the paper.
The common claim that generative AI simply amplifies the Dunning–Kruger effect is too coarse to capture the available evidence.
Paper's synthesis of heterogenous empirical findings from human–AI interaction, learning research, and model evaluation used to critique the uniform-amplification interpretation; no single empirical countertest reported.
LLM use degrades metacognitive accuracy and flattens the classic competence–confidence gradient across skill groups (i.e., reduces calibration and narrows differences in self-assessed confidence by skill level).
Synthesis of studies from human–AI interaction and learning research reported in the paper that document worsened calibration and a reduction in the competence–confidence gradient when users rely on LLM outputs; the paper does not report a single combined sample size.
The agent team topology exhibits higher operational fragility due to multi-author code generation.
Reported empirical observation from experiments comparing architectures, attributing increased fragility/errors to multi-author code generation in the agent team setup (stated qualitatively; no quantitative failure rates provided in the abstract).
Prominent studies predict substantial job displacement due to automation.
Paper asserts this as background, referencing the existence of prominent studies in the literature (no specific citations or sample sizes provided in the abstract).
For organizations of n humans with AI agents, the optimal team size decreases with agent capability.
Derived implication from the stylized model's analysis of multi-human organizations interacting with AI agents.
There is no smooth sublinear regime for human effort; it transitions sharply from O(E) to O(1) with no intermediate scaling class.
Mathematical derivation from a stylized model of human-AI collaboration that assumes tasks decompose into atomic decisions, a fraction ν are novel, and specification/verification/error correction scale with task size.
There is a growing gap between rapid experimentation with AI tools and limited organizational capability to institutionalize them in everyday workflows.
Argument supported by targeted literature synthesis and review of recent scholarly and institutional sources; no primary empirical sample reported in this paper.
Technological proximity has a noteworthy negative effect on collaboration, underscoring the importance of complementary knowledge in AI innovation.
SAOM estimates from longitudinal patent collaboration data (2013–2024) showing a statistically negative coefficient for technological proximity (implying organizations closer in technology space are less likely to form ties).
Within the set of agentic-mention filings, autonomy evidence remains rare.
Empirical statement derived from analysis of the identified agentic-mention filings (small number of such filings reported across 2024–2025).
Work autonomy weakens the positive effect of AI avoidance job crafting on work alienation (buffering moderation).
Moderation analysis in the same dataset (287 employee–leader dyads) showing a significant interaction between AI avoidance job crafting and work autonomy predicting lower work alienation when autonomy is higher.
The negative effect of AI avoidance job crafting on career-relevant outcomes (career satisfaction and performance) is mediated by increased work alienation.
Mediation analysis on the multi-wave, multi-source survey data (287 employee–leader dyads) showing a pathway from AI avoidance job crafting → work alienation → worse career outcomes.
AI avoidance job crafting negatively predicts career satisfaction and performance.
Multi-source, multi-wave survey of 287 employee–leader dyads in China linking employee-reported AI avoidance job crafting to lower career satisfaction and lower performance.
Analysis of global datasets on energy dependency, economic concentration, debt levels, demographic trends, digital infrastructure, and AI adoption highlights that interconnected systemic risks can amplify economic instability.
Paper reports drawing upon multiple global datasets (energy dependency, economic concentration, debt, demographics, digital infrastructure, AI adoption) to analyze systemic risk interactions; specific datasets, sample sizes, and statistical methods are not detailed in the excerpt.
Events such as supply chain disruptions, oil price surges linked to geopolitical conflicts, and sudden labour market shifts due to reverse migration have exposed the limitations of prediction-based planning frameworks.
Illustrative examples cited in the paper; the claim is supported by referenced global events and the paper's use of global datasets, but no specific empirical case-study sample sizes or quantification are provided in the excerpt.
Traditional economic models that rely heavily on historical data and linear forecasting are increasingly inadequate in capturing the complexity and unpredictability of contemporary economic shocks.
Conceptual claim supported by discussion and examples of recent shocks (supply chain disruptions, oil price surges, labor market shifts); no specific empirical evaluation or quantified model comparison reported in the excerpt.
The global economic system is undergoing a structural transformation characterized by geopolitical tensions, energy price volatility, trade fragmentation, demographic imbalances, and rapid technological disruption driven by artificial intelligence.
Narrative synthesis in the paper drawing on global trends; the paper references global datasets on energy dependency, trade patterns, demographics, and AI adoption (no specific sample size or empirical study detailed in the excerpt).
The competence shadow compounds multiplicatively to produce degradation far exceeding naive additive estimates.
Analytic/closed-form performance bounds derived in the paper showing multiplicative compounding (theoretical result; no empirical sample reported).
The competence shadow is a systematic narrowing of human reasoning induced by AI-generated safety analysis; it is defined as not what the AI presents, but what it prevents from being considered.
Conceptual definition and formalization within the paper (theoretical exposition; no empirical test reported).
Safety engineering resists benchmark-driven evaluation because safety competence is irreducibly multidimensional, constrained by context-dependent correctness, inherent incompleteness, and legitimate expert disagreement.
Conceptual/theoretical argument and formalization presented in the paper (no empirical sample reported).
Refining the state (as above) raises state-action blind mass from 0.0165 at \tau=50 to 0.1253 at \tau=1000.
Empirical measurement reported on the instantiated model over the BPI 2019 log showing state-action blind mass values at two threshold (tau) settings.
Currently, the region remains reactive as a 'recipient' rather than a 'creator' or an effective partner in the AI ecosystem.
Characterization reported by the authors based on their regional research and field study (qualitative findings from leaders across public/private sectors).
This gap hinders the ability of many governments in the region to push their countries toward joining the ranks of those benefiting from the AI revolution—both in developing the public sector and supporting economic growth and social development.
Authors' analysis and interpretation based on the regional research/field study described in the report.
The Arab region’s capacity for Artificial Intelligence (AI) governance remains limited relative to the accelerating pace of global AI developments and associated challenges.
Stated conclusion in the executive report based on a regional field study (authors' analysis of interviews/surveys and research across the region).
These harms increasingly translate into financial loss through litigation, enforcement penalties, brand erosion, and failed deployments.
Paper argues this linkage using conceptual reasoning and illustrative examples/case vignettes; cites regulatory and market incidents but does not provide systematic empirical estimates or a sample size.
AI systems can create material harms: discriminatory outcomes, privacy and security failures, opacity in decision logic, and regulatory noncompliance.
Paper lists these harms as core risks based on prior literature, regulatory developments, and conceptual risk analysis. Presented as well-documented categories rather than as new empirical findings; no sample size reported.
Insufficient organizational resources significantly inhibit AI adoption in procurement (β = -0.19, p < 0.05).
Same questionnaire survey (n=326) and multiple linear regression analysis; reported coefficient β=-0.19 with p<0.05.
Measuring only technical model performance (such as predictive accuracy) is insufficient for assessing the strategic impact of AI in drug discovery.
Argued in the paper as a critique of current evaluation practices; presented as a conceptual point rather than supported by new empirical data in the excerpt.
Pressure remains high to increase the probability of success to improve the effectiveness of pharmaceutical R&D.
Asserted in the paper as motivational context for the work; framed as an industry pressure point rather than backed by a specific empirical sample or quantified survey in the excerpt.
Increasing cost and failure rates in the pharmaceutical R&D process have not fundamentally improved over the last decade.
Stated as a contextual observation in the paper's opening paragraph; presented as a summary of industry trends (no specific dataset, sample size, or citation included in the excerpt).
Without support, performance stays stable up to three issues but declines as additional issues increase cognitive load.
Empirical study / human-AI negotiation case study in a property rental scenario that varied the number of negotiated issues; the paper reports observed performance across different numbers of issues (no sample size for this specific comparison stated in the abstract).
Reliance on automated content generation introduces risks of cognitive overreliance, algorithmic bias, and strategic misalignment.
The paper articulates these risks as conceptual/qualitative concerns in its discussion; no quantitative estimates or empirical tests of these specific risks are reported in the provided excerpt.
Wide disagreement among AIs created confusion and undermined appropriate reliance on advice.
Reported experimental finding from the paper: manipulating within-panel disagreement across tasks produced wide disagreement conditions that, according to the abstract, led to confusion and reduced appropriate reliance. No quantitative metrics reported in abstract.
High within-panel consensus fostered overreliance on AI advice.
Experimental manipulation of within-panel consensus across the three tasks; the abstract reports that high consensus increased participants' reliance on AI (interpreted as overreliance). Specific measures and sample size not provided in abstract.