Evidence (6507 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Productivity Remove filter

The community knowledge functions both as practical how-to guidance and as collective experimentation with platform rules and revenue mechanisms.

Observed dual nature in the 377-video corpus: instructional workflows alongside demonstrations/testing of platform-tailored monetization tactics and workarounds.

high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... co-occurrence of instructional content and platform-experimentation practices

Typical practices emphasized by creators include rapid mass production of content, productizing prompt engineering, repurposing existing material via synthesis/localization, and packaging AI outputs as sellable creative services or assets.

Recurring practices surfaced through qualitative coding of workflows, tools, and pipelines described in the 377 videos.

high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... presence and frequency of recommended production and productization practices

Across the 377 videos, creators converge on a set of repeatable use cases and platform‑tailored monetization tactics.

Thematic coding of 377 videos produced a catalog of recurring use cases and tactics; the paper reports convergence across that sample.

high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... frequency and recurrence of specific use cases and monetization tactics in the s...

YouTube creators have collectively constructed and circulated a practical knowledge repository about how to monetize GenAI-driven creative work.

Systematic qualitative content analysis (thematic coding) of 377 publicly available YouTube videos in which creators promote GenAI workflows and monetization strategies.

high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... presence and characteristics of a community knowledge repository (practical guid...

The topology of service-dependency graphs (modelled as DAGs of compute stages) is a first-order determinant of whether decentralised, price-based resource allocation will be stable and scalable.

Systematic ablation study using simulation: 1,620 runs total across six experiment types, sweeping graph topology (hierarchical vs cross-cutting), load, hybrid integrator presence, and governance constraints; metrics included price convergence/volatility and allocation throughput/quality. Effect sizes reported in the paper show topology had the largest impact on price stability and scalability.

high mixed Real-Time AI Service Economy: A Framework for Agentic Comput... price convergence / price volatility and system scalability (throughput and allo...

Choice of scaffold materially affects outcomes: an open-source scaffold outperformed vendor-provided scaffolds by up to approximately 5 percentage points.

Comparative experiments across three scaffolding approaches (vendor scaffolds and at least one open-source scaffold) showing up to ~5 percentage point differences in measured outcomes.

high mixed Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... performance_difference_across_scaffolds (detection/exploitation_rates_difference...

Adoption of NFD approaches in regulated domains will depend on standards for validation, auditability, and update procedures.

Implications and governance discussion emphasizing regulatory constraints (finance, healthcare) and the need for validation/audit standards; logical/ normative claim rather than empirical finding.

high mixed Nurture-First Agent Development: Building Domain-Expert AI A... adoption rate in regulated domains conditional on available validation/audit sta...

Absence of irreducibility, positive recurrence, or aperiodicity in the state dynamics can produce non-ergodic reward behavior.

Theoretical argument and examples in the paper illustrating how breakdowns of these chain conditions lead to multiple invariant measures or absorbing regimes; analysis-based evidence.

high mixed Ergodicity in reinforcement learning presence of non-ergodic long-run reward behavior (e.g., multiple invariant measu...

Standard Markov chain ergodicity conditions (irreducibility, positive recurrence, aperiodicity) imply ergodic reward processes when rewards depend only on the chain state.

Formal mapping in the paper between Markov-chain ergodicity properties and reward-process ergodicity; theoretical derivation (no empirical sample).

high mixed Ergodicity in reinforcement learning ergodicity of reward process (equivalence to chain ergodicity when rewards are s...

Non-ergodic processes admit path-dependent long-run behavior (e.g., absorbing sets, multiple invariant measures, path-dependent reinforcement), so different runs with the same policy can have different long-run averages.

Analytic discussion of Markov-chain examples and theory plus the paper's illustrative constructed example showing path-dependent locking into regimes; theoretical and example-driven evidence.

high mixed Ergodicity in reinforcement learning variance across realized long-run average rewards across trajectories under the ...

Ergodic reward processes are those where time averages along almost every long trajectory converge to the same value as the ensemble average.

Formal definition and discussion in the paper mapping ergodicity concepts from stochastic processes to reward processes; theoretical exposition.

high mixed Ergodicity in reinforcement learning convergence of time-average reward to ensemble average

Some patients value human contact for sensitive cases; automated interactions can feel impersonal.

Semi-structured interviews with patients/staff and open-ended survey responses documenting preferences for human interaction in sensitive/complex complaints.

high mixed The Role of Artificial Intelligence in Healthcare Complaint ... patient-reported preference for human contact and perceived interpersonal qualit...

India’s reported post-harvest loss is relatively low (3.2%) despite poor food-security outcomes (Global Hunger Index rank 111/125).

Reported statistics cited in the paper (FAO/Kaggle for post-harvest loss; Global Hunger Index ranking referenced).

high mixed AI in food inequality: Leveraging artificial intelligence to... post-harvest loss (percent) and Global Hunger Index rank

Data‑driven policies can either amplify or mitigate inequalities depending on data representativeness, model design, and deployment governance.

Multiple empirical examples and theoretical analyses in the review highlighting cases of both harm (bias amplification) and mitigation, identified across the 103 items.

high mixed Models, applications, and limitations of the responsible ado... distributional equity outcomes (inequality amplification or mitigation)

Citizen acceptance, transparency, and perceived fairness strongly shape adoption trajectories and the political feasibility of AI tools in government.

Repeated empirical findings in the reviewed literature linking public trust, transparency measures, and fairness perceptions to successful or failed deployments (drawn from multiple case studies in the 103 items).

high mixed Models, applications, and limitations of the responsible ado... adoption trajectory/political feasibility of government AI tools (measured via d...

Adoption of AI and data-driven governance is highly uneven across jurisdictions and sectors, driven by institutional capacity, governance frameworks, and public trust.

Cross‑regional and cross‑sector comparisons in the review corpus (103 items) showing varying maturity levels and repeated identification of institutional capacity, governance arrangements, and trust factors as determinants.

high mixed Models, applications, and limitations of the responsible ado... adoption level/maturity of AI-driven governance systems

Governance approaches are emerging at global, regional and national levels; they vary widely across sectors and jurisdictions, creating opportunities for regulatory experimentation but also risks of fragmentation and regulatory arbitrage.

Cross-jurisdictional comparison of existing/global/regional/national governance instruments and sectoral guidance; gap analysis highlighting heterogeneity.

high mixed AI Governance and Data Privacy: Comparative Analysis of U.S.... degree of regulatory heterogeneity, instances of fragmentation/regulatory arbitr...

Weak formal institutions often coexist with strong informal institutions in African contexts, shaping governance, trust, and enforcement mechanisms in supply chains.

Cross-disciplinary literature review presented in the paper; conceptual argumentation rather than primary empirical analysis.

high mixed Continental shift: operations and supply chain management re... relative strength of formal vs informal institutions and their effects on govern...

Technology effectiveness depends on institutional support (extension, property rights), finance, and local knowledge — technologies are not a silver bullet alone.

Conceptual frameworks and comparative analysis in the review; supporting case studies and program evaluations linking adoption and impact to institutional factors (extension reach, tenure security, access to credit).

high mixed MODERN APPROACHES TO SUSTAINABLE AGRICULTURAL TRANSFORMATION technology adoption rates, realized productivity gains, distribution of benefits...

Productivity gains from generative AI depend on task mix, integration design, and the availability of complementary human skills.

Theoretical evaluation and synthesis of heterogeneous empirical findings; authors highlight variation across firms, sectors, and tasks.

high mixed The Use of ChatGPT in Business Productivity and Workflow Opt... productivity change conditional on task mix/integration/human skills (productivi...

Existing evidence is time-sensitive and heterogeneous: rapidly evolving models, heterogeneous study designs, and many short-term lab/microtask studies limit direct comparability and long-run inference.

Meta-observation from the review: documented methodological limitations across the literature (variation in models, tasks, metrics; prevalence of short-term studies).

high mixed ChatGPT as a Tool for Programming Assistance and Code Develo... generalizability and comparability of empirical findings (study heterogeneity)

Real‑time and LLM‑based methods improve responsiveness but raise governance, transparency, and reproducibility challenges that BLS must manage (audit trails, uncertainty communication).

Operational tradeoff discussion in the paper identifying governance risks; no case studies or incident analyses provided.

high mixed Enhancing BLS Methodologies for Projecting AI's Impact on Em... tradeoff between responsiveness (timeliness/accuracy) and governance metrics (tr...

Distinguishing automation versus augmentation using causal methods changes policy responses (e.g., income support versus reskilling).

Policy implication drawn from conceptual separation of substitution and complementarity effects; logical inference rather than empirical demonstration in the paper.

high mixed Enhancing BLS Methodologies for Projecting AI's Impact on Em... policy prescriptions chosen contingent on causal classification (automation vs a...

Methodological caveats across the literature (heterogeneity of tasks/measures, publication bias, short-term studies) limit the generalizability of current findings.

Meta-level critique within the synthesis noting study heterogeneity, likely publication/short-term biases, and variable domain-specific performance dependent on user expertise and workflows.

high mixed ChatGPT as an Innovative Tool for Idea Generation and Proble... generalizability and external validity of LLM-assisted creativity findings

Standard productivity metrics are likely to undercount the value generated by AI-augmented ideation; quality-adjusted measures of creative output are required.

Measurement critique based on the mismatch between existing productivity statistics and the kinds of upstream idea-generation gains observed in empirical studies; supported by the review's methodological discussion.

high mixed ChatGPT as an Innovative Tool for Idea Generation and Proble... measured productivity vs. true quality-adjusted creative output

Realized value from AI methods (ML, predictive analytics, anomaly detection, XAI) is conditional: these technical methods deliver capabilities only when combined with strong data governance, standardized processes, and change management.

Thematic synthesis across the systematic review (2020–2025) showing repeated case-study and practitioner-report evidence that technical gains failed to scale without governance, process standardization, and organizational change efforts.

high mixed Integrating Artificial Intelligence and Enterprise Resource ... magnitude and durability of ERP-AI benefits (e.g., sustained accuracy gains, ado...

Despite laboratory and pilot successes, many engineered bioprocesses remain at bench or pilot scale and require techno‑economic validation before industrial competitiveness can be established.

Review aggregate noting scale and validation status of case studies (many reported at lab or pilot fermenter scale) and explicit references to the need for TEA and LCA for industrial assessment.

high mixed Harnessing Microbial Factories: Biotechnology at the Edge of... technology readiness level (lab/pilot vs commercial), presence/absence of publis...

Results and implications are limited by the sample and context: evidence comes from law students on a single issue-spotting exam using one brief training intervention, so generalizability to experienced professionals, other tasks, or other models is untested.

Authors’ reported sample (164 law students) and explicit caution about generalizability in the study summary; the intervention and outcome are specific to one exam and one ~10-minute training.

high mixed Training for Technology: Adoption and Productive Use of Gene... Generalizability/applicability to other populations and tasks

Some mechanism-specific estimates are imprecise due to the sample size; confidence intervals for those estimates are wide.

Authors report wide confidence intervals for mechanism decomposition (principal stratification) results based on the randomized sample of 164 students.

high mixed Training for Technology: Adoption and Productive Use of Gene... Precision of mechanism estimates (confidence interval width for adoption vs prod...

There is no consensus in the literature on net job effects — studies diverge on whether AI produces net job gains.

Direct finding from the review: the 17 peer‑reviewed studies produce heterogeneous results on net employment impacts (some positive, some negative, some neutral).

high mixed The role of generative artificial intelligence on labor mark... net job gains/losses

The effects of K_T adoption are heterogeneous across industries, firms, countries, and cohorts — early adopters and capital-rich firms/countries gain most — implying important transition dynamics for political economy.

Cross-country comparisons, industry- and firm-level panel heterogeneity analyses, and case studies demonstrating variation in adoption timing and gains; model simulations emphasizing transition path dependence.

high mixed The Macroeconomic Transition of Technological Capital in the... industry-/firm-/country-level productivity, income, employment, and adoption tim...

Aggregate productivity (output per worker or per unit of inputs) can rise while labor’s share and employment decline due to substitution toward K_T.

Macro growth-accounting exercises decomposing output growth into contributions from labor, traditional capital, and technological capital; model simulations showing productivity gains coexisting with falling labor shares under substitution elasticities.

high mixed The Macroeconomic Transition of Technological Capital in the... productivity (e.g., TFP or output per worker) and labor share

Das Dokument untersucht neuere Daten zur Verbreitung von KI in den G7-Volkswirtschaften, die auf große und anhaltende Unterschiede zwischen KMU und großen Unternehmen hindeuten.

Empirical examination of recent diffusion/adoption data across G7 economies as described in the paper; no sample size or specific datasets provided in the excerpt.

high negative Einführung von KI in kleinen und mittleren Unternehmen Unterschiede in der KI-Verbreitung zwischen KMU und großen Unternehmen

Trotz der jüngsten technologischen Fortschritte bei KI-Tools, sind KMU bei der Einführung von KI im Vergleich zu anderen digitalen Technologien und größeren Unternehmen zurückhaltender.

Statement referencing 'neuere Daten zur Verbreitung von KI in den G7-Volkswirtschaften' showing differences between SMEs and large firms; implies empirical analysis of diffusion/adoption data (no sample size given in excerpt).

high negative Einführung von KI in kleinen und mittleren Unternehmen Adoption/Verbreitung von KI-Technologien in KMU versus großen Unternehmen

In algorithm-triggered emotional escalations, workers showed lower engagement: they sent fewer messages, contributed a smaller share of total chat rounds, and showed less proactivity in information seeking and solution provision.

Behavioral measures derived from chat logs in the randomized experiment comparing worker actions post-escalation across escalation types; reported differences in message counts, share of rounds, and proxies for proactivity.

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... worker engagement measures (message count, share of chat rounds, proactivity ind...

Human intervention is less effective in algorithm-triggered emotional escalations (where customers express frustration or dissatisfaction).

Experimental subgroup analysis comparing intervention outcomes for algorithm-triggered emotional escalations versus technical escalations; emotional escalations showed worse post-intervention outcomes.

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... service quality after emotional escalations

AI deployment substantially lowers ratings for AI-eligible chats.

Randomized field experiment measuring customer ratings for AI-eligible chats; treated condition (AI + human oversight) produced substantially lower ratings relative to control (humans only).

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... customer ratings for AI-eligible chats

AI deployment reduces average chat duration.

Randomized field experiment on Alibaba's Taobao platform: workers in treatment supervised an agentic AI resolving AI-eligible chats while handling AI-ineligible chats; control workers resolved all chats without AI. Effect observed on average chat duration in experiment data.

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... average chat duration

Parsing through LLM-generated code can be tedious and time-consuming, potentially negating the productivity gains promised by AI-coding tools.

Motivation/background statement in the paper: a qualitative claim about the cost (time/effort) of reviewing LLM-generated code; presented as motivation rather than empirically quantified evidence in the excerpt.

high negative Viverra: Text-to-Code with Guarantees time/effort required to review LLM-generated code

Overthinking is a shared and exploitable vulnerability in modern reasoning systems, underscoring the need for more robust defenses.

Conclusion drawn by authors based on their empirical findings described in the abstract (amplification of output length across multiple models and transferability experiments).

high negative Inducing Overthink: Hierarchical Genetic Algorithm-based DoS... presence of shared vulnerability across models (qualitative security posture)

This overthinking behavior significantly increases inference latency and energy consumption, forming a potential vector for denial-of-service (DoS)-style resource exhaustion.

Authors assert increased latency and energy consumption as consequences of longer reasoning traces; framed as a potential attack vector in the abstract (no quantitative latency/energy measurements provided in abstract).

high negative Inducing Overthink: Hierarchical Genetic Algorithm-based DoS... inference latency and energy consumption

Large reasoning models (LRMs) exhibit a tendency to "overthink", producing excessively long and redundant reasoning traces when confronted with incomplete or logically inconsistent inputs.

Empirical observation reported by the authors based on experiments described in the paper (abstract references experiments across multiple SOTA reasoning models); no numerical sample size for inputs reported in abstract.

high negative Inducing Overthink: Hierarchical Genetic Algorithm-based DoS... response length / reasoning trace length (verbosity and redundancy)

Distinct readability issue patterns and limited effectiveness of prompt engineering reveal a latent technical debt in LLM-generated code that could affect long-term maintainability.

Interpretation/conclusion in paper combining empirical findings (distinct issue patterns and limited prompt impact) to argue for potential technical debt and maintainability risks; presented as a forward-looking implication rather than a quantified causal estimate.

high negative The Readability Spectrum: Patterns, Issues, and Prompt Effec... maintainability_risk / technical_debt_inferred_from_readability

LLM-generated code displays distinct readability issue patterns compared to human-written code.

Empirical analysis of readability subcomponents/features showing different patterns of readability issues between LLM-generated and human-written code (paper reports qualitative/quantitative distinctions in issue patterns).

high negative The Readability Spectrum: Patterns, Issues, and Prompt Effec... readability_issue_patterns (feature-level readability problems)

Policy responses in Europe are fragmented across the EU and Member State levels and do not match the potential scale of disruption from AGI.

Paper's policy analysis of EU- and Member-State-level responses (stated in abstract); no quantitative metrics provided in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation

Europe has low rates of industrial AI adoption.

Paper's empirical/policy review claiming low industrial AI adoption in Europe (as stated in abstract); the abstract does not provide numeric adoption rates or sample sizes.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... adoption_rate

Europe exhibits structural weaknesses in compute infrastructure and talent retention.

Paper's structural assessment of Europe's AI value-chain capabilities (stated in abstract); no numerical measures provided in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... adoption_rate

Europe has limited strategic awareness of frontier AI progress.

Paper's assessment of Europe's positioning based on policy analysis and review of capabilities monitoring (as stated in abstract); no supporting metrics or sample sizes provided in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation

AGI could strain existing governance frameworks.

Paper's policy analysis describing potential mismatches between governance capacity and AGI-induced disruptions (as stated in abstract); no empirical tests or quantification reported in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation

AGI could intensify interstate competition.

Paper's geopolitical analysis and scenario-based reasoning informed by trends in AI capabilities (stated in abstract); no quantitative measures reported in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation

« Prev 1 2 3 … 5 6 7 … 130 131 Next »