Evidence (6869 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Governance Remove filter

These methodological adaptations reduce but do not eliminate validity threats; they often increase complexity and cost while leaving unresolved issues of generalizability and time-dependence.

Practitioner accounts (n=16) describing limits/tradeoffs of adaptations; authors' synthesis concluding residual threats remain despite adaptations.

medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... effectiveness and tradeoffs of mitigation strategies for validity threats

External validity is limited: results from a given trial may not generalize across model versions, populations, tasks, or to temporally distant deployments.

Interview-derived themes (16 practitioners) and authors' analytic mapping to external validity concerns; supported by examples of model/version dependence discussed in interviews.

medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... generalizability/external validity of trial results across versions, populations...

Construct validity is threatened because commonly used outcome measures can misrepresent the constructs of interest when AI changes task structure or human strategies.

Practitioners' reports in semi-structured interviews (n=16) and authors' synthesis illustrating cases where metrics no longer capture intended constructs after AI introduction.

medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... construct validity of outcome measures (accuracy of metrics in capturing intende...

Common internal validity threats in uplift studies of frontier AI include violations of treatment fidelity and SUTVA (e.g., contamination, time-varying treatments).

The paper's validity-consequences section, based on thematic analysis of 16 interviews and mapping practitioner-reported problems to internal validity constructs.

medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... treatment fidelity and SUTVA adherence in RCTs measuring uplift

Porous real-world settings cause spillovers and contamination across experimental arms, violating SUTVA and threatening internal validity.

Multiple practitioners (n=16) reported examples of spillovers and contamination during deployment-like studies; thematic analysis mapped these to SUTVA/treatment-fidelity concerns.

medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... internal validity (SUTVA, treatment contamination) of uplift trials

Shifting baselines (changes in tools, protocols, or knowledge during and across studies) complicate defining an appropriate control or status quo.

Interview data (16 practitioners) and thematic analysis identifying shifting baselines as a recurring challenge reported by participants.

medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... construct validity of the control/status-quo definition in uplift studies

Rapidly evolving models (nonstationarity) make any single trial a moving target, undermining the temporal stability of measured uplift.

Practitioner reports from semi-structured interviews (n=16) describing model updates and performance changes during/after trials; thematic coding indicating nonstationarity as a common concern.

medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... temporal stability/generalizability of measured uplift across model versions

Properties of frontier AI — rapid model evolution, shifting baselines, heterogeneous and changing users, and porous real-world settings — regularly strain internal, construct, and external validity of human uplift studies.

Recurring themes identified via qualitative analysis of 16 practitioner interviews; mapped to internal/construct/external validity dimensions in the paper's results.

medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... internal, construct, and external validity of human uplift RCTs

Standardized platforms and benchmarks may create network effects and lock-in around dominant hardware–software stacks; antitrust and standards policy will matter to preserve competition.

Workshop participants' market-structure analysis and policy discussion included in the summary recommendations (NSF workshop, Sept 26–27, 2024).

medium negative Report for NSF Workshop on Algorithm-Hardware Co-design for ... market concentration metrics, prevalence of platform lock-in, and competition in...

GDP and productivity metrics that ignore interpretive labor risk understating the inputs to creative and knowledge work; RATs offer a means to measure previously invisible inputs.

Policy argument in the measurement/productivity subsection; no empirical re-estimation of GDP/productivity presented.

medium negative Chasing RATs: Tracing Reading for and as Creative Activity completeness of productivity/GDP measurement with respect to interpretive labor

Algorithmic feeds and AI summarizers tend to compress or automate interpretive traces, potentially erasing signals of reasoning, context, and tacit knowledge.

Conceptual claim supported by argumentation and examples in the paper; no empirical comparison between RATs and existing summarizers is presented.

medium negative Chasing RATs: Tracing Reading for and as Creative Activity loss of interpretive trace signals (reasoning/context/tacit knowledge) when usin...

Contracts and incentives based on expected performance can incentivize strategies that deliver high expected returns but poor or unreliable time-average outcomes; incentive design should account for path-dependent risks.

Theoretical/incentive argument and examples in the paper linking objective mismatch to adverse incentives; illustrative reasoning rather than empirical contract studies.

medium negative Ergodicity in reinforcement learning alignment/misalignment of incentives with reliable long-run (time-average) perfo...

Economic evaluations and deployment decisions that rely on ensemble expectations can misstate economic value and risk because firms and users experience single time-averaged trajectories; regulators and decision-makers should therefore prefer objectives reflecting single-run guarantees when relevant.

Conceptual mapping of the theoretical results to economic decision-making and deployment risk; policy and incentive discussion in the paper (argumentative, not empirical).

medium negative Ergodicity in reinforcement learning accuracy of economic valuation and risk assessment when using ensemble expectati...

The paper's illustrative example shows a policy that maximizes expected reward can produce trajectories that lock into high- or low-reward regimes so an agent’s long-term realized reward is highly uncertain and not captured by the expectation.

Constructed example provided in the paper; demonstration of divergent single-trajectory outcomes under a single policy; no empirical sample size (example-based).

medium negative Ergodicity in reinforcement learning distribution (uncertainty) of long-term realized reward across individual trajec...

In contexts analogous to AI markets, a firm at a network/geographic disadvantage would need exponentially greater scale (users/data/compute) to match the probability of early discovery achieved by a better-positioned rival.

Interpretation/translation of the model's analytic scaling result to market-relevant quantities; this is a theoretical implication rather than an empirically tested claim.

medium negative Macroscopic Dominance from Microscopic Extremes: Symmetry Br... required scale (users, data, compute) to match probability of early discovery fo...

MLOps and governance provisions shift costs from one-off implementation to ongoing maintenance, implying recurring costs that should be captured in economic evaluations.

Analytical/economic argument presented in the paper as an implication of including an MLOps layer (conceptual; no empirical cost accounting provided).

medium negative ALGORITHM FOR IMPLEMENTING AI IN THE MANAGEMENT LOOP OF SMES... cost structure (recurring maintenance costs vs one-off implementation costs)

Differential adoption across firms (due to modular, scalable designs and data advantages) may create winner‑takes‑most effects and increase market concentration, benefiting early adopters with rich data/integration capabilities.

Market-structure claim supported by economic reasoning about scale and data advantages; no cross-firm empirical adoption study or market concentration time‑series is provided.

medium negative Next-Generation Financial Analytics Frameworks for AI-Enable... market concentration metrics (e.g., HHI), firm market shares, adoption timing di...

Initial investment, integration, and ongoing maintenance/compliance costs can be substantial and affect short-term ROI.

Interviewed administrators and implementation reports citing upfront and recurring costs (integration, model maintenance, compliance); quantitative budget figures not standardized across sites in the paper.

medium negative The Role of Artificial Intelligence in Healthcare Complaint ... implementation and maintenance costs; short-term return on investment (ROI)

Risk of deskilling or reduced empathy if human roles are overly automated.

Thematic analysis of staff interviews and surveys reporting concerns about loss of practice, reduced patient contact, and potential diminishment of empathetic skills; no longitudinal measures of skill loss presented.

medium negative The Role of Artificial Intelligence in Healthcare Complaint ... staff-reported empathy/skill levels and qualitative indicators of deskilling

Technical and organizational integration with legacy hospital IT systems is nontrivial.

Implementation reports and interviews describing integration work, time, and resource needs; descriptive accounts of technical and organizational barriers (no universal timelines/costs reported).

medium negative The Role of Artificial Intelligence in Healthcare Complaint ... integration difficulty/time/cost (implementation burden)

Algorithmic bias in NLP models can misclassify complaints from underrepresented groups.

Observations from system classification error analyses (disparities reported by demographic group) and corroborating qualitative concerns from staff and administrators; specific subgroup sample sizes and effect magnitudes not provided.

medium negative The Role of Artificial Intelligence in Healthcare Complaint ... differential misclassification rates by demographic group (bias in NLP classific...

Data privacy and security risks arise from centralizing complaint text and metadata.

Stakeholder interviews, thematic coding of concerns, and risk assessment commentary based on centralized logs and metadata aggregation; no measured breach incidents reported here.

medium negative The Role of Artificial Intelligence in Healthcare Complaint ... privacy/security risk (qualitative risk indicators; potential exposure of compla...

Organizations will incur additional governance and procurement costs (diversity audits, recalibration of reward models, multi-model infrastructures) to mitigate homogenization, shifting some economic benefits of AI toward governance spending.

Cost implication argued from the need for auditing and multi-model procurement described in recommendations; not supported by quantified cost analyses in the paper.

medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... governance and procurement costs associated with LLM deployment

Inter-model convergence undermines product differentiation across AI providers and could accelerate commoditization of base LLM outputs.

Market-structure inference built on empirical finding of high cross-model output similarity across 70+ models and theoretical discussion of vendor differentiation; no market-level price or adoption time-series analyzed in the paper.

medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... vendor product differentiation / commoditization of base outputs

Homogenized AI outputs reduce the value of AI as a source of varied cognitive complements to human labor, potentially lowering productivity gains from human–AI collaboration in tasks requiring creativity and exploration.

Economic argument drawing on measured decreases in model output diversity and theoretical literature on complementarities between diverse AI outputs and human creativity; no direct measured productivity changes reported in field settings within the paper.

medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... productivity gains from human–AI collaboration (theoretical implication inferred...

Reward-model and evaluation miscalibration can cause organizations to prefer models that maximize apparent evaluation scores at the expense of useful stylistic or cognitive diversity.

Comparative analyses between automated evaluation/reward-model rankings and human preference/diversity assessments reported in the paper; examples where high-scoring models produced more consensus-style outputs.

medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... model selection bias driven by automated evaluation scores; reduction in diversi...

Homogenized outputs increase organizational susceptibility to groupthink and correlated errors across teams using different models.

Argument based on observed inter-model convergence (high similarity across models) implying correlated outputs and thus correlated mistakes across teams; no randomized organizational field experiment reported, this is an inferred risk from the empirical convergence data.

medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... risk of correlated errors / susceptibility to groupthink (conceptual risk inferr...

Homogenization of LLM outputs erodes creative diversity in AI-assisted work and reduces the variety of solutions produced.

Inference drawn from measured decreases in response diversity (entropy/distinct-n) and the observed inter-model convergence across real-world queries; argument linking lower measured diversity to fewer distinct solution proposals in AI-augmented workflows.

medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... creative diversity / number of distinct solution variants produced

Current reward models and automated evaluation metrics are biased toward consensus/high-probability responses, preferring consensus-style outputs even when stylistically diverse alternatives are judged equally high-quality by humans.

Reported human preference assessments and comparisons between human judgments and automated/reward-model scores showing cases where reward models favor higher-probability/consensus outputs despite no human-quality advantage; analyses described comparing reward-model scores to human judgments on stylistically diverse outputs.

medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... alignment between reward-model/automated evaluation scores and human quality jud...

Uneven inclusion in digital/AI deployments risks exacerbating digital divides and creating distributional harms.

Descriptive and case-based studies report differential access and uptake among demographic groups; limited causal quantification and varying measurement approaches across studies.

medium negative Digital Transformation and AI Adoption in Government: Evalua... service coverage across demographic groups, measures of digital divide (access, ...

Limited auditability and explainability of AI systems increase trust and legitimacy risks.

Technical governance literature and case reports show challenges in model explainability and external audit; evidence is technical and illustrative rather than based on large-sample causal studies.

medium negative Digital Transformation and AI Adoption in Government: Evalua... auditability metrics, transparency indicators, public trust measures

Inadequate regulatory frameworks raise privacy, accountability, and fairness concerns for AI in government.

Governance reviews and risk assessments documented in the literature highlight regulatory gaps and associated incidents/risks; empirical incident counts are not comprehensively tabulated in the review.

medium negative Digital Transformation and AI Adoption in Government: Evalua... privacy breaches, accountability/audit findings, measures of fairness/bias incid...

Procurement, budgeting rules, and siloed incentives discourage cross-cutting transformation and modular iterative deployments.

Policy and institutional analyses in the reviewed literature point to rigid procurement cycles, capital budgeting practices, and siloed funding as obstacles; examples and case narratives are provided but systematic quantification is limited.

medium negative Digital Transformation and AI Adoption in Government: Evalua... frequency of modular/iterative procurements, number of cross-cutting projects fu...

Organizational resistance and fragmented coordination block integrated rollouts of cross-cutting digital reforms.

Qualitative case studies and governance analyses repeatedly identify intra-governmental silos, conflicting incentives, and change-resistance as implementation barriers; evidence is primarily descriptive.

medium negative Digital Transformation and AI Adoption in Government: Evalua... degree of cross-agency integration, completion rates of integrated projects, imp...

Skills shortages (technical, managerial, data literacy) impede adoption and maintenance of digital and AI systems.

Multiple surveys, policy briefs and qualitative studies cited in the review report workforce capacity gaps; often based on targeted assessments or organizational audits rather than representative sampling.

medium negative Digital Transformation and AI Adoption in Government: Evalua... adoption rates, system maintenance capacity, time-to-value for deployments

Infrastructure deficits (connectivity, legacy systems) limit scale and reliability of digital/AI initiatives.

Recurring barrier documented across governance analyses and case studies; evidence includes reports of downtime, integration failures, and limited geographic reach; no unified cross-study sample provided.

medium negative Digital Transformation and AI Adoption in Government: Evalua... system reliability/uptime, scalability, geographic/service coverage

Unresolved liability and regulatory uncertainty increase malpractice risk and insurance costs, leading insurers and providers to favor conservative adoption and continued human-in-the-loop safeguards.

Regulatory/legal analysis and stakeholder behavior models discussed in the review; observed cautious deployment patterns in practice noted in the literature.

medium negative Will AI Replace Physicians in the Near Future? AI Adoption B... malpractice risk; insurance premiums; adoption conservatism; presence of human-i...

Regulatory pathways and approval standards are evolving but are not yet aligned with deployment of high-autonomy clinical systems.

Review of recent policy analyses and regulatory documents showing ongoing updates and gaps between current standards and requirements for high-autonomy AI deployment.

medium negative Will AI Replace Physicians in the Near Future? AI Adoption B... alignment between regulatory frameworks and high-autonomy clinical deployment re...

Robust, locally appropriate data governance (privacy, interoperability, standards) is a public good that underpins trust and data-driven markets; weak governance raises risks of exclusion and foreign dependency.

Governance and policy literature synthesized in the review; conceptual arguments supported by examples but limited empirical evaluation in LMIC SME contexts.

medium negative Artificial Intelligence Adoption for Sustainable Development... data governance robustness; SME inclusion in data-driven markets; foreign depend...

Platform effects and supplier ecosystems associated with AI may create winner-takes-most market dynamics, so policy should monitor market concentration and enable competitive access to core AI services.

Literature on platforms and market structure combined with case examples; review notes potential for concentration but lacks broad causal studies quantifying effects in LMIC SME markets.

medium negative Artificial Intelligence Adoption for Sustainable Development... market concentration metrics; access to core AI services by SMEs

Fragmented or weak data governance (privacy rules, standards, interoperability, and trust) reduces SMEs’ ability to participate in data-driven markets and adopt AI.

Policy analyses and governance-focused studies in the review highlighting data governance weaknesses in LMICs and associated risks for SMEs; examples discussed rather than quantified nationally.

medium negative Artificial Intelligence Adoption for Sustainable Development... data governance quality; SME participation in data markets; trust/interoperabili...

Sanctions and supply-chain restrictions affect access to hardware and software, altering adoption paths and increasing costs; domestic substitution or international cooperation will influence future trajectories.

Institutional analysis documenting sanctions/import restrictions and their implications for hardware/software access; qualitative assessment of substitution and cooperation options.

medium negative ADOPTION OF ARTIFICIAL INTELLIGENCE IN THE RUSSIAN EXTRACTIV... availability and cost of hardware/software inputs for AI and resulting adoption ...

The barriers to AI adoption in Russia’s extractive industries interact systemically (e.g., lack of data reduces demand for talent; weak infrastructure deters investment), so piecemeal measures will have limited effect.

Analytical synthesis identifying co-moving constraints across cross-country trends and qualitative firm-level evidence showing interacting bottlenecks.

medium negative ADOPTION OF ARTIFICIAL INTELLIGENCE IN THE RUSSIAN EXTRACTIV... overall effectiveness of isolated vs. coordinated interventions on AI diffusion ...

Institutional failures—weak standards/interoperability, limited public–private coordination, regulatory uncertainty, and sanctions/import restrictions—exacerbate diffusion problems for AI in extractive sectors.

Institutional review of standards, procurement and public–private coordination mechanisms; documentation of regulatory uncertainty and sanctions/import restrictions affecting hardware/software access.

medium negative ADOPTION OF ARTIFICIAL INTELLIGENCE IN THE RUSSIAN EXTRACTIV... standards/interoperability quality, level of public–private coordination, regula...

Infrastructure shortfalls — insufficient sensorization, limited connectivity (edge/cloud), inadequate computing hardware and immature localized software stacks — are underdeveloped in Russia relative to peers and hinder deployment.

ICT infrastructure indicators, comparative metrics on sensorization/connectivity/computing availability, and project case evidence from extractive firms.

medium negative ADOPTION OF ARTIFICIAL INTELLIGENCE IN THE RUSSIAN EXTRACTIV... sensor density, connectivity quality (edge/cloud readiness), availability of com...

There are human capital constraints: shortages of AI talent in industry-specific roles, limited retraining of engineering staff, and brain drain reduce the sector's capacity to absorb and deploy AI.

Workforce and education statistics, patent/activity counts, and expert commentary; qualitative case evidence showing limited retraining and talent shortages in industry-specific AI roles.

medium negative ADOPTION OF ARTIFICIAL INTELLIGENCE IN THE RUSSIAN EXTRACTIV... industry-specific AI talent supply, retraining rates for engineering staff, meas...

Absolute and relative AI investment volumes in the Russian extractive sector are lower than in the US, China and EU; private risk capital is limited and public support insufficiently targeted to scale-up projects.

Investment datasets and national/industry statistics comparing public and private AI investment volumes (absolute and relative to output) for extractive sectors across jurisdictions (2020–2025).

medium negative ADOPTION OF ARTIFICIAL INTELLIGENCE IN THE RUSSIAN EXTRACTIV... AI investment volumes (absolute and per unit of extractive output); availability...

Data access is a primary bottleneck: datasets are fragmented, often proprietary or closed, ownership rules are unclear, and mechanisms for safe data sharing are weak, hindering model training and cross-firm applications.

Review of data governance frameworks across jurisdictions and firm-level case evidence documenting closed/proprietary datasets and weak sharing mechanisms.

medium negative ADOPTION OF ARTIFICIAL INTELLIGENCE IN THE RUSSIAN EXTRACTIV... availability and usability of industrial data for AI model training and cross-fi...

The gap is driven not only by smaller investment flows but also by institutional constraints—limited data access, weak data governance, human capital shortages, and inadequate digital infrastructure—that together suppress diffusion and scaling of AI applications.

Institutional analysis (review of data governance frameworks, regulatory regimes, standards, market structure) plus qualitative firm-level case studies and expert commentary illustrating how these factors impede adoption and scaling.

medium negative ADOPTION OF ARTIFICIAL INTELLIGENCE IN THE RUSSIAN EXTRACTIV... diffusion and scaling of AI applications in extractive industries

Russia’s adoption of AI in extractive industries is both slower (lower growth rate) and shallower (lower depth of digitalization) than peer jurisdictions in 2020–2025.

Time-series comparison of digitalization/digit maturity proxies and AI investment volumes across countries for 2020–2025; synthesis of trend differences from public datasets and sectoral indices.

medium negative ADOPTION OF ARTIFICIAL INTELLIGENCE IN THE RUSSIAN EXTRACTIV... rate of change in digitalization indicators and depth of digitalization (digit m...

« Prev 1 2 3 … 100 101 102 … 137 138 Next »