Evidence (8570 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Adoption
Remove filter
Agent-written code introduces more security vulnerabilities than code authored by humans.
Comparative analysis of security vulnerabilities attributed to agent-authored code versus human-authored code within the SWE-chat dataset (method details not specified in excerpt).
Just 44% of all agent-produced code survives into user commits.
Empirical measurement of code provenance and survival within the SWE-chat dataset: proportion of agent-produced code that becomes part of subsequent user commits across sessions.
Despite rapidly improving capabilities, coding agents remain inefficient in natural settings.
Authors' summary claim supported by dataset-derived metrics such as agent code survival rate (44%) and user pushback (44% of turns); observational analysis of SWE-chat.
Regulated deployment imposes four load-bearing systems properties — deterministic replay, auditable rationale, multi-tenant isolation, statelessness for horizontal scale — and stateful architectures violate them by construction.
Conceptual/architectural argument presented in the paper (theoretical analysis), not an empirical measurement in the abstract.
Evaluation of four leading AI platforms shows that standard RAG-based approaches achieve an average of only 15% accuracy when information is insufficient.
Empirical evaluation described in paper: four AI platforms tested on benchmark; reported average accuracy of 15% for RAG-based approaches on cases with insufficient information.
Unemployment insurance adjudication has seen rapid integration of AI systems and the question of additional fact-finding poses the most significant bottleneck for a system that affects millions of applicants annually.
Contextual/introductory claim in paper; references to domain-scale impact and bottleneck; no specific numeric study sample provided in excerpt.
A well-known limitation of AI systems is presumptuousness: the tendency of AI systems to provide confident answers when information may be lacking.
Statement in paper framing the problem; general literature/contextual claim (no specific experiment cited in the excerpt).
Critical gaps persist in explainability, regulatory alignment, ethical governance, and context-specific validation.
Authors' synthesis and Conclusion listing persistent shortcomings identified across the reviewed literature.
Integration of decision intelligence principles into AI applications for financial risk management in emerging markets is nascent.
Authors' synthesis noting limited presence of decision intelligence frameworks or hybrid human-AI decision processes across the reviewed literature.
There is limited empirical validation of AI approaches in emerging market settings.
Review finding described in Results and Conclusion: comparatively few studies provide robust, context-specific empirical validation for emerging markets despite general claims of effectiveness.
Recent policy and academic discourse has increasingly acknowledged the infeasibility of fullstack AI sovereignty, but has not yet provided an integrating theoretical architecture for governing dependence under these conditions.
Literature/policy-discourse claim made in the paper (review/interpretation). No empirical sampling or quantitative evidence reported in the provided text.
The concentration of AI-related infrastructures is coalescing into distinct geocognitive power poles whose competing infrastructural ecosystems generate structural asymmetries that position small and medium-sized states within regimes of cognitive-informational dependence.
Theoretical/geopolitical argument introduced in the paper (conceptual framing). No empirical sample size or quantitative measurement provided in the excerpt.
There is a growing concentration of computational capacity, data ecosystems, and advanced model architectures within a limited number of technological actors, signaling the emergence of a cognitive-informational order in which influence is exercised through the architectures that shape how knowledge is generated, interpreted, and operationalized.
Theoretical/observational assertion in the paper (conceptual synthesis). No empirical details, sample sizes, or quantitative analyses provided in the supplied text.
The observed negative OPM effect is consistent with short-term 'J-curve' transition costs (process redesign and capability buildup) during early AI adoption.
Interpretation of empirical patterns (short-term decline in OPM concurrent with no ROA change) offered by the authors as an explanatory mechanism; not presented as separately estimated or experimentally tested.
AI adoption had a significantly negative impact on the operating profit margin (OPM).
Causal analysis of KOSDAQ-listed companies (2018–2025) with AI-adoption timing identified via multi-step, contextually validated text analysis of DART business reports; endogeneity addressed using two-way fixed effects (TWFE) and Propensity Score Matching (PSM).
For agentic systems, there are three structural breaks: decision diffusion, evidence fragmentation, and responsibility ambiguity.
Analytical identification and labeling of three specific structural problems for agentic AI within the paper's argumentation.
The paper introduces the 'cascade of uncertainty', showing how governance failures propagate through serial dependencies between framework layers.
Conceptual/theoretical model introduced and analyzed in the paper (cascade model linking framework layers and failure propagation).
Agentic AI systems encounter structural breaks that prevent normal framework fillability.
Paper's analytic assessment reports that agentic AI systems cause structural breaks undermining the framework's ability to fill DES-properties.
Classical ML systems achieve only minimal DES-property fillability.
Analytic comparison in the paper classifies classical ML systems as providing minimal governance evidence fillability.
When automated decision systems fail, organizations frequently discover that formally compliant governance infrastructure cannot reconstruct what happened or why.
Asserted by the paper as an observed problem motivating the study; presented as a general empirical/experiential claim (literature/examples synthesis) rather than a controlled empirical estimate.
Training data scarcity is an emerging challenge for organizations that aim to train proprietary LLMs.
Paper highlights training data scarcity as a challenge in its analysis and discussion sections (qualitative observation).
A gender gap persists, concentrated in the most exposed occupations.
Stratified/descriptive and regression analyses of the 2024 EWCS showing gender differences in self-reported generative AI adoption, with the gap largest among occupations with highest exposure; sample >36,600 workers across 35 countries.
AI is driving states to reconsider interdependence not as the source of peace, but as a battlefield of power.
Normative and interpretive conclusion drawn from the paper's analysis of AI's geopolitical implications; no empirical data or sample reported in the abstract.
AI is redefining foreign policy in a multipolar world by making the line between economic cooperation and strategic vulnerability indistinct.
Theoretical claim and synthesis in the paper's thesis; no empirical evidence or sample size provided in the abstract.
AI is reshaping economic relationships between countries that were previously sources of mutually beneficial relations into instruments of coercion.
The paper presents a theoretical analysis drawing on international political economy and foreign policy theory; no empirical measurements reported in the abstract.
AI enhances the weaponization of economic interdependence by enabling states to monitor, predict, manipulate, and disrupt transnational networks with unprecedented accuracy.
The paper advances a theoretical argument and synthesis of international political economy and foreign policy literatures; no empirical sample or quantitative data reported in the abstract.
The infrastructure for cross-user agent collaboration is entirely absent, let alone the governance mechanisms needed to secure it.
Authoritative claim in paper framing the research gap; presented as observational/argumentative (no empirical audit reported).
Current AI agent frameworks have made remarkable progress in automating individual tasks, yet all existing systems serve a single user.
Statement in paper's introduction/positioning; conceptual survey-style claim (no empirical study or systematic benchmark reported).
Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations.
Paper asserts that existing/standard benchmarks do not adequately isolate parsing and computation-orchestration abilities, motivating the new benchmark.
As multimodal AI achieves human-parity understanding of speech and gesture, [the keyboard's] necessity dissolves.
Theoretical claim supported by multidisciplinary review (history, neuroscience, technology, organizational studies); no quantified empirical test reported.
General-purpose LLMs pose misinformation risks for development and policy experts, lacking epistemic humility for verifiable outputs.
Conceptual/argumentative claim stated in the paper's motivation; no empirical test reported in the abstract.
Traditional machine learning approaches, including the baseline methodology proposed in previous studies, typically optimize global predictive accuracy and therefore fail to capture business-critical outcomes, especially the identification of high-risk clients.
Conceptual critique and literature/contextual claim in the paper; contrasted with the study's business-aware methods (no direct external benchmarking numbers provided in the abstract).
Classifying customers without a prior history at a given company is particularly challenging due to the absence of historical behavior, extreme class imbalance, heavy-tailed loss distributions, and strict operational constraints.
Argumentation / problem statement in the paper (no empirical test reported); descriptive characterization of the insurance cold-start classification problem.
Thin training coverage fosters anxiety about substitution and slows diffusion of AI tools.
Reported associations from surveys of mid-level managers and technical staff, interviews, and document analysis across cases; thematic coding identified links between limited training, worker anxiety, and slower diffusion. (Sample size not reported.)
There exist inequalities in the emergence of algorithmic bias and in transparency of these systems.
Paper states that inequalities and lack of transparency were observed/identified (citing Memarian, 2023; Bello, 2023; Gambacorta et al., 2024) and discusses these as findings.
Algorithmic bias in automated credit scoring systems may block marginalized groups from accessing financial services.
Explicit statement in the introduction citing prior literature (Agboola, 2025; Nwafor et al., 2024; Oguntibeju, 2024) and motivating the study.
In the geographical network, both technological diversity and technological proximity inhibit main path formation, implying macro-regional evolution requires specialized focus and complementary knowledge.
ERGM results for the geographical diffusion layer showing negative (inhibitory) associations for diversity and proximity variables; interpreted in regional evolution context.
The study identified significant implementation challenges including algorithmic bias, digital divide concerns, data privacy risks, and low technology readiness among HR teams in Tier 2 cities.
Synthesis of qualitative case study findings from 4 organizations plus survey responses (N=150) reporting barriers and risks encountered during adoption.
Current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents, compounding over long interaction.
Qualitative and quantitative analysis of errors observed across the DELEGATE-52 experiments (19 LLMs) showing sparse, high-severity, and silently introduced errors that accumulate over long workflows.
Degradation severity is exacerbated by document size, length of interaction, or presence of distractor files.
Additional experiments and analyses varying document size, interaction length, and presence of distractor files reported in the paper showing increased degradation under these conditions.
Agentic tool use does not improve performance on DELEGATE-52.
Additional experiments reported in the paper that compare plain LLM delegation vs. agentic tool-using configurations on DELEGATE-52 and find no performance improvement from agentic tool use.
Even frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupt an average of 25% of document content by the end of long workflows.
Reported results from the experiment evaluating 19 LLMs on DELEGATE-52; these named models are highlighted and an average corruption fraction (25%) is reported at the end of long workflows.
Our large-scale experiment with 19 LLMs reveals that current models degrade documents during delegation.
Large-scale experiment reported in the paper evaluating 19 LLMs on DELEGATE-52 long delegated workflows; observed document degradation across models.
Inherent algorithmic opacity and historical data biases tend to give rise to obvious group prejudices based on gender, educational background, age, and regional origin, thereby further exacerbating the structural inequalities that exist in the current employment market.
Claim made in abstract referencing known sources of algorithmic bias (opacity, historical data bias) and listing affected group attributes; presented as a problem motivating the study, without specific empirical statistics in the abstract.
Small and medium-sized practices face challenges of skill gaps and resource constraints that hinder adoption of technology and data analytics.
Consistent findings across included studies highlighting barriers in small and medium-sized practices (SMPs).
AI adoption is reinforcing existing structural disparities within the BRICS bloc, creating a two‑tier productivity hierarchy (China & India vs. Brazil, Russia & South Africa).
Observed divergence in TFP trajectories and differing links between AI indicators and TC/EC across the five BRICS economies; comparative analysis shows stronger frontier-shifting effects in China and India and weaker or negative effects in the other three economies.
Brazil, Russia, and South Africa experience stagnation or decline in both efficiency and technological advancement over 2005–2023.
Malmquist TFP decomposition (EC and TC) for each BRICS economy showing flat or negative trends in EC and TC for Brazil, Russia, and South Africa during 2005–2023.
While achieving financial autonomy, firms are also getting exposed to new constraints by shifting their reliance on third-party software, technological infrastructures and opaque algorithms (Gaviyau & Godi, 2025; Suhrab et al., 2026).
Stated with citations to Gaviyau & Godi (2025) and Suhrab et al. (2026); presented as an observed/paraphrased risk or unintended consequence in the paper. No empirical sample details in the excerpt.
SMEs are suffering from various financial constraints, mostly relying heavily on traditional financial institutions for their survival (Kadzima et al., 2025).
Statement supported by citation to Kadzima et al. (2025); presented as a literature-supported empirical generalization in the paper's background/introduction. No sample size or empirical details given in the excerpt.
Fluency is not reliability: without structures that stabilise both human and model reasoning, AI cannot be trusted or governed where it matters most.
Central thesis/claim of the paper; normative argument synthesising the paper's observations and proposals rather than an empirically tested finding provided here.