Evidence (8066 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	417	113	67	480	1091
Governance & Regulation	419	202	124	64	823
Research Productivity	261	100	34	303	703
Organizational Efficiency	406	96	71	40	616
Technology Adoption Rate	323	128	74	38	568
Firm Productivity	307	38	70	12	432
Output Quality	260	71	27	29	387
AI Safety & Ethics	118	179	45	24	368
Market Structure	107	128	85	14	339
Decision Quality	177	75	37	19	312
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	74	34	78	9	197
Skill Acquisition	98	36	40	9	183
Innovation Output	121	12	24	13	171
Firm Revenue	98	35	24	—	157
Consumer Welfare	73	31	37	7	148
Task Allocation	87	16	34	7	144
Inequality Measures	25	76	32	5	138
Regulatory Compliance	54	61	13	3	131
Task Completion Time	89	7	4	3	103
Error Rate	44	51	6	—	101
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	33	11	7	98
Wages & Compensation	54	15	20	5	94
Team Performance	47	12	15	7	82
Automation Exposure	27	26	10	6	72
Job Displacement	6	39	13	—	58
Hiring & Recruitment	40	4	6	3	53
Developer Productivity	34	4	3	1	42
Social Protection	22	11	6	2	41
Creative Output	16	7	5	1	29
Labor Share of Income	12	6	9	—	27
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

There is insufficient availability of appropriate, solutions-oriented, and user-friendly tools for practitioners and decision-makers; availability should be increased.

Tool development and iterative testing with end users within MYRIAD-EU, and stakeholder feedback pointing to a demand for more usable tools.

medium negative Reducing risk together: moving towards a more holistic appro... availability and usability of practitioner-facing decision-support tools

Methods are needed to generate both present-day and future multi-hazard and multi-risk scenarios that integrate climate, socio-economic change, and cascading effects.

Project development and testing of scenario methods reported, plus identification of remaining methodological gaps in scenario integration.

medium negative Reducing risk together: moving towards a more holistic appro... availability and quality of multi-hazard and multi-risk scenario generation meth...

Concepts, definitions, and terminologies for multi-hazard and multi-risk work must be mainstreamed and harmonized to enable comparability and communication across disciplines and stakeholders.

Stakeholder feedback and the project's synthesis of interdisciplinary outputs highlighting conceptual fragmentation and communication barriers.

medium negative Reducing risk together: moving towards a more holistic appro... comparability and clarity of concepts/terminology across disciplines and stakeho...

If quantum advantages accrue initially to well-capitalized incumbents (cloud providers, financial firms, pharmaceuticals), we should expect increased market power and higher rents.

Scenario analysis and historical analogs where early compute advantages concentrated market power; qualitative market-structure modeling.

medium negative Modeling Macroeconomic Output Gains from Quantum-Driven Prod... market concentration measures (e.g., market shares, rents), firm-level competiti...

Benefits of quantum diffusion are likely to be uneven across countries, firms, and workers—boosting regions with strong innovation ecosystems and possibly increasing market concentration among compute-capable incumbents.

Multi-region/sectoral modeling with heterogenous adoption and capability parameters; historical analogs showing concentration following early compute advantages; scenario comparisons.

medium negative Modeling Macroeconomic Output Gains from Quantum-Driven Prod... regional competitiveness, firm-level market concentration, distributional outcom...

Without coordinated investments and governance, large theoretical gains may remain unrealized or be very unevenly distributed.

Policy counterfactual scenarios in which underinvestment, fragmented governance, or restrictive export regimes reduce adoption elasticities and infrastructure readiness, producing lower and more concentrated macro gains compared with coordinated-investment scenarios.

medium negative Modeling Macroeconomic Output Gains from Quantum-Driven Prod... realized productivity gains; distribution of gains across firms/countries (inequ...

High executive digital cognition on its own tends to weaken the policy's positive effect on energy utilization efficiency (interpreted as short-run adjustment costs from digital transformation).

Interaction tests between policy treatment and an executive-level digital-cognition measure show a negative interaction coefficient in DID regressions; authors interpret this as evidence of short-run adjustment costs.

medium negative How Does Urban Green Data Center Policy Empower Corporate En... corporate energy utilization efficiency

The under‑use of external text sources in the reviewed literature may be due to privacy, legal/regulatory uncertainty, or integration costs.

Authors' interpretation linking observed low coverage of external text sources (social media, news, reviews) in the 109 articles to plausible barriers (privacy/regulation/integration); no direct empirical test in the review.

medium negative Natural language processing in bank marketing: a systematic ... use of external text sources in marketing research and barriers to their use

Restrictions on cross‑border data flows or fragmented privacy rules reduce the training data available to AI systems, lowering the quality and scalability of AI services exported internationally.

Theoretical linkage and literature on AI training data needs synthesized in the paper; no original empirical measurement of AI performance loss presented.

medium negative Analysis of Digital Services Trade and Export Competitivenes... AI model performance, quality/scalability of AI‑enabled exported services

Support systems for digital services exporters, especially SMEs, are inadequate in China.

Review of policy documents and literature highlighting gaps in finance, legal support, and standards compliance assistance for SME internationalization (qualitative).

medium negative Analysis of Digital Services Trade and Export Competitivenes... SME capacity to internationalize / SME export performance in digital services

China's platform firms show uneven internationalization and platform infrastructure is not consistently internationally competitive.

Case examples and synthesis of domestic/international studies on platform internationalization included in the review (qualitative evidence).

medium negative Analysis of Digital Services Trade and Export Competitivenes... platform international reach and infrastructure competitiveness

China has limited influence in high‑level trade rule formation.

Policy review and comparative institutional analysis within the literature review; descriptive assessment of China's participation in multilateral rule‑making (no formal measurement of influence).

medium negative Analysis of Digital Services Trade and Export Competitivenes... influence/representation in international rule‑setting fora (digital trade and d...

Current institutional, technological, and market shortcomings limit China’s ability to close the gap with economies operating under high‑standard trade regimes.

Qualitative comparative analysis of policy and institutional frameworks against high‑standard trade members; literature and case examples (no new microdata).

medium negative Analysis of Digital Services Trade and Export Competitivenes... relative export competitiveness gap vs. high‑standard trade economies

Widespread deployment of similar models could create correlated failures or fraud vectors, implying systemic risk that may warrant macroprudential attention.

Analytic caution based on model homogeneity and case/literature discussion; speculative systemic risk concern rather than empirically demonstrated.

medium negative Explore the Impact of Generative AI on Finance and Taxation systemic correlated failure risk, incidence of correlated fraud events

There is regulatory uncertainty around AI-generated filings and responsibility/liability for automated outputs.

Analysis and literature review discuss unclear regulatory positions and legal risks noted in case organizations' deployment considerations.

medium negative Explore the Impact of Generative AI on Finance and Taxation regulatory/compliance risk exposure for AI-generated filings

Integration complexity with legacy ERP/financial systems and sharing-center processes is a significant implementation challenge.

Case study narratives describe integration work and friction points; analytic framing highlights ERP compatibility issues.

medium negative Explore the Impact of Generative AI on Finance and Taxation integration effort/time/cost, compatibility with ERP systems

Model hallucinations, lack of explainability, and limited audit trails limit safe adoption.

Paper cites literature and case observations about model reliability and explainability issues; examples and discussion are qualitative.

medium negative Explore the Impact of Generative AI on Finance and Taxation model reliability (hallucination incidence), explainability/auditability metrics

Data privacy, confidentiality, and cross-border data transfer concerns are important barriers to deployment.

Challenges enumerated from case studies and literature; specific organizational concerns cited in cases (Xiaomi, Deloitte) and in regulatory discussion.

medium negative Explore the Impact of Generative AI on Finance and Taxation deployment constraints related to data privacy (e.g., blocked data flows, need f...

Automation and human–robot assemblages can reproduce subjugation and vulnerability affecting care workers and marginalized users, requiring attention to distributional justice and labor-market impacts.

Illustrative vignettes from healthcare robotics and literature synthesis on care ethics and labor impacts; no quantitative labor-market analysis presented.

medium negative Examining ethical challenges in human–robot interaction usin... distributional impacts on wages, bargaining power, welfare, and vulnerability of...

Legal liability regimes and insurance products may systematically under- or mis-assign costs of harm in socio-technical assemblages when primordial ethical demands are considered.

Conceptual argument and suggested modeling directions; no empirical simulation or insurance-market data presented.

medium negative Examining ethical challenges in human–robot interaction usin... accuracy of cost assignment in liability/insurance regimes for socio-technical h...

Treating responsibility as a Levinasian, asymmetrical moral obligation implies it operates as a non-contractible externality that markets and contracts may fail to internalize, creating persistent externalities in AI deployment that standard economic models may miss.

Theoretical implication derived from philosophical argument applied to economic concepts; suggested consequences but no formal models or empirical validation in the paper.

medium negative Examining ethical challenges in human–robot interaction usin... degree to which markets/contracts internalize asymmetrical moral obligations (th...

Simple pluralist or multi-principle balancing approaches risk reproducing structural subordination by failing to foreground the asymmetrical ethical demand toward vulnerable Others.

Normative critique supported by cross-disciplinary literature (care ethics, mediation, STS) and illustrative examples; no empirical test of pluralist approaches’ effects.

medium negative Examining ethical challenges in human–robot interaction usin... tendency of pluralist balancing approaches to reproduce structural subordination...

The Levinasian framework helps reveal how human–robot interactions can both expose and reproduce systemic vulnerabilities, subjugation, and unaddressed harms (termed 'Problem C' — attribution of responsibility and distributed agency).

Theoretical diagnosis supported by interdisciplinary literature synthesis and illustrative vignettes from healthcare robotics, autonomous vehicles, and algorithmic governance. No quantitative prevalence data.

medium negative Examining ethical challenges in human–robot interaction usin... presence/manifestation of systemic vulnerabilities, subjugation, and unaddressed...

Absent interoperability, divergence in data and AI rules will raise transaction costs, reduce trade gains, and create opportunities for regulatory arbitrage.

Economic reasoning and scenario-based projections; asserted as an outcome of mechanism analysis rather than demonstrated with quantitative estimates.

medium negative Path Analysis of Digital Economy and Reconstruction of Inter... transaction costs, aggregate trade gains, incidence of regulatory arbitrage

Explainability, auditability, or data-localization requirements could favor larger vendors with compliance capacity, increasing market concentration and affecting competition among AI suppliers.

Market-structure argument grounded in regulatory-compliance burden analysis and comparative examples; not supported by empirical market data in the study.

medium negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... market concentration and competition among AI vendors (supplier market structure...

Legal uncertainty and strict procedural requirements increase compliance costs and regulatory risk, which can slow AI adoption by firms and public agencies.

Theoretical economic implications drawn from legal analysis and comparative observations; no empirical measurement of costs or adoption rates in the study.

medium negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... AI adoption rate and investment risk (speed and likelihood of procurement/invest...

AI can restrict or reshape human administrative discretion in legally sensitive ways.

Doctrinal analysis of statutory specificity and formal procedural requirements in civil-law contexts, illustrated with Vietnam as the exemplar case; comparative observations.

medium negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... scope of administrative discretion (degree of human decision-making latitude)

Physical constraints (power grid reliability, water consumption for cooling, and data-center capacity) together with diminishing marginal returns on scaling make continued monolithic scaling economically and environmentally risky.

Conceptual argumentation using known infrastructure constraints and economic reasoning about diminishing returns; no new empirical assessment or quantified risk analysis included.

medium negative An Alternative Trajectory for Generative AI economic and environmental risk metrics (probability/impact of grid stress, wate...

Reasoning-augmented models (e.g., models using chain-of-thought, multi-step reasoning, or external retrieval/looping) can inflate per-query compute by orders of magnitude, exacerbating sustainability problems.

Argument based on architectural patterns (multi-step reasoning, retrieval augmentation, multiple model passes) and reported per-query compute multipliers in auxiliary literature (referenced anecdotally); the paper provides no new benchmarked per-query compute measurements.

medium negative An Alternative Trajectory for Generative AI per-query compute cost and associated energy consumption (compute FLOPs or joule...

The energetic burden of generative AI is shifting from one-time training to recurring, potentially unbounded inference costs as models become productized and high-traffic.

Synthesis of industry observations and early/anecdotal quantitative reports on operational workloads; no original empirical time-series or workload measurements provided in this paper.

medium negative An Alternative Trajectory for Generative AI distribution of energy consumption between training and inference (energy per in...

Scaling monolithic LLMs toward artificial general intelligence (AGI) is colliding with hard physical and economic limits (energy, grid stress, water use, diminishing returns).

Conceptual synthesis and argumentation drawing on observed industry trends (training/inference cost growth), infrastructure constraints (grid reliability, data-center cooling/water use) and theoretical diminishing marginal returns on model/data scaling. No new empirical dataset or controlled experiments reported in the paper.

medium negative An Alternative Trajectory for Generative AI feasibility of continued monolithic scaling measured by physical (power, water, ...

Five qualitatively distinct D3 reflexive failure modes were identified in model responses, including categorical self-misidentification and false-positive self-attribution.

Qualitative coding and taxonomy reported in Results: five D3 categories cataloged with examples; identification based on analysis of model responses to narrative dilemmas (sample drawn from the study runs).

medium negative Literary Narrative as Moral Probe : A Cross-System Framework... enumeration and qualitative descriptions of reflexive failure modes observed in ...

A probe composed of deliberately unresolvable moral dilemmas embedded in literary (science-fiction) narrative resists surface performance and exposes a measurable gap between performed and authentic moral reasoning.

Experimental application of the probe to 13 distinct LLM systems across 24 experimental conditions (13 blind, 4 declared re-tests, 7 ceiling-probe runs), with scoring and qualitative coding showing discriminating failure modes and a measurable gap in responses.

medium negative Literary Narrative as Moral Probe : A Cross-System Framework... discriminative power of the probe (ability to expose failures/gaps) operationali...

Existing AI moral-evaluation benchmarks largely measure surface-level, correct-sounding answers rather than genuine moral-reasoning capacity.

Comparative argument based on study results showing a measurable gap when applying the authors' narrative-based probe (unresolvable SF dilemmas) versus standard benchmarks; empirical support comes from experiments across 24 conditions and 13 systems showing systems produce plausible-sounding but reflexive/invalid reasoning on the narrative probe.

medium negative Literary Narrative as Moral Probe : A Cross-System Framework... gap between polished/surface moral answers and deeper/authentic moral-reasoning ...

Capabilities and data advantages for certain vendors could lead to market concentration and platform dominance in AI-driven educational feedback.

Expert concern synthesized from the workshop of 50 scholars about market dynamics; theoretical warning without empirical market-structure analysis in the report.

medium negative The Future of Feedback: How Can AI Help Transform Feedback t... market concentration measures (market share, Herfindahl index); entry barriers; ...

Differential access to high-quality AI feedback systems and bias in training data can exacerbate educational inequalities and harm marginalized groups.

Expert consensus and thematic analysis from the 50-scholar workshop, raising equity and bias risks; no empirical subgroup effectiveness estimates included.

medium negative The Future of Feedback: How Can AI Help Transform Feedback t... access disparities; differential effectiveness by subgroup; measures of algorith...

Learners may over-rely on AI feedback or game systems to obtain desirable responses, reducing effortful learning.

Workshop participant concerns synthesized qualitatively; cited as risk and an open empirical question—no experimental data provided.

medium negative The Future of Feedback: How Can AI Help Transform Feedback t... learner reliance on AI (usage patterns); changes in effortful learning behaviors...

Field observations from an enterprise deployment demonstrate production failure modes traceable to missing identity propagation, timeout/budgeting policies, and machine-readable error semantics.

Empirical context described as field lessons from an enterprise agent platform integrated with a major cloud provider's MCP servers; production failure vignettes and operational log analysis (client redacted).

medium negative Bridging Protocol and Production: Design Patterns for Deploy... frequency and types of production failures related to identity, timeouts/budgets...

MCP lacks three protocol-level primitives needed for reliable, production-scale agent operation: identity propagation, adaptive tool budgeting, and structured error semantics.

Observational analysis and classification of production failures from an enterprise agent deployment; taxonomy of failure modes identifying gaps in these specific areas.

medium negative Bridging Protocol and Production: Design Patterns for Deploy... presence/absence of protocol-level primitives for (1) identity propagation, (2) ...

Reliance on single-agent outputs or non-diverse agent ensembles can understate substantive uncertainty and bias conclusions in automated policy evaluation or AI-assisted empirical research.

Observed substantial agent-to-agent variability (NSEs) in the experiment (150 agents) demonstrating that single-agent results do not capture between-agent methodological uncertainty; imbalance between model families further implies potential bias if only one family is used.

medium negative Nonstandard Errors in AI Agents degree to which single-agent point estimates fail to capture between-agent dispe...

The post-exemplar convergence largely reflected imitation of exemplar choices rather than demonstrated understanding or principled correction by agents.

Qualitative and behavioral analysis of agents' post-exposure outputs showing direct adoption of exemplar measures/procedures and lack of substantive justification or mechanistic reasoning indicating comprehension; inference based on content of agent code and writeups after exposure.

medium negative Nonstandard Errors in AI Agents qualitative indicators of reasoning/comprehension in agents' outputs (textual ju...

Chat-like interfaces commonly activate misleading beliefs including overtrust in correctness/robustness, attribution of goals or moral agency, and underestimation of hallucination/bias/privacy risks.

Aggregated observations from literature in HCI and ethics; suggested examples rather than empirical prevalence estimates; no sample size given.

medium negative Why We Need to Destroy the Illusion of Speaking to A Human: ... incidence of overtrust, attribution of agency, and underestimation of model fail...

Natural conversational style creates the impression the system is human-like, intentional, or reliably knowledgeable.

Conceptual claim supported by synthesis of prior work on anthropomorphism and conversational interfaces; no new quantitative data provided.

medium negative Why We Need to Destroy the Illusion of Speaking to A Human: ... user beliefs about system humanness, intentionality, and perceived reliability

Reliance on preference signals risks learning spurious proxies and produces unstable behavior under distribution shift.

Theoretical argument supported by examples of spurious proxies in ML and by observations in RLHF-trained models; the paper cites literature showing proxy behavior but does not present a unified empirical quantification specific to RLHF across many tasks.

medium negative Via Negativa for AI Alignment: Why Negative Constraints Are ... frequency of spurious-proxy-driven failures and degradation in behavior under di...

Positive preference signals are continuous, context-dependent, and entangled with surface correlates (e.g., agreement with the user), which causes models trained on them to pick up spurious proxies and exhibit sycophancy and brittleness.

Conceptual/theoretical argument in the paper describing structural properties of preference spaces, supported by cited observations of sycophantic behavior in models trained with preference-based objectives. No single definitive empirical quantification is provided within the paper; supporting examples are drawn from recent literature.

medium negative Via Negativa for AI Alignment: Why Negative Constraints Are ... incidence of sycophantic behavior and brittleness (e.g., tendency to agree with ...

Agents that attempt to infer others' reasoning depth may be vulnerable to strategic misrepresentation (partners could behave to induce incorrect ToM estimates).

Conceptual analysis in the paper and discussion of strategic incentives; paper also identifies the risk and suggests potential mitigations (e.g., conservatism, verification, meta-reasoning).

medium negative Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... vulnerability to strategic manipulation (qualitative risk and proposed mitigatio...

Both too little and too much recursive reasoning (i.e., too shallow or too deep ToM) can produce poor joint behavior — miscalibrated anticipation harms coordination.

Observed non-monotonic effects in the reported experiments where fixed-order agents at either low or high ToM orders performed worse in mismatched pairings; evidence comes from the same multi-environment evaluation using joint-payoff / success-rate metrics.

medium negative Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... coordination performance (joint payoff, success rate)

Misalignment in Theory-of-Mind (ToM) order between agents (i.e., agents using different recursive reasoning depths) degrades coordination performance.

Empirical experiments using LLM-driven agents with configurable ToM depth across four coordination environments (a repeated matrix game, two grid navigation tasks, and an Overcooked task); comparisons of matched (same-order) vs mismatched (different-order) pairings using task-specific joint payoffs and success rates as metrics.

medium negative Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... coordination performance (joint payoff, task success rate, task completion/time)

There is a risk of manipulation and misinformation if argument mining/synthesis is unregulated or misaligned with social incentives, creating externalities that may justify public intervention.

Conceptual risk assessment combining known misinformation dynamics and AI capabilities; no empirical incident data provided.

medium negative Argumentative Human-AI Decision-Making: Toward AI Agents Tha... incidence of manipulation/misinformation attributable to argument-mining/synthes...

Increased error risk and weaker explainability from GLAI will raise malpractice and liability exposure for firms and lawyers, driving up insurance and compliance costs.

Legal-risk analysis and economic reasoning connecting explainability/liability to insurance costs; no empirical cost studies presented.

medium negative Why Avoid Generative Legal AI Systems? Hallucination, Overre... malpractice/liability exposure levels and associated insurance/compliance costs

« Prev 1 2 3 … 87 88 89 … 161 162 Next »