Evidence (6917 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	761	200	101	904	2020
Governance & Regulation	829	400	191	122	1566
Organizational Efficiency	784	193	125	84	1197
Technology Adoption Rate	637	236	124	97	1103
Research Productivity	431	131	58	340	972
Output Quality	481	183	59	47	770
Decision Quality	332	177	82	49	647
Firm Productivity	439	57	88	20	610
AI Safety & Ethics	218	279	66	33	602
Market Structure	181	170	123	24	503
Task Allocation	214	64	72	33	388
Skill Acquisition	174	62	62	17	315
Innovation Output	204	27	45	18	295
Employment Level	105	54	108	13	282
Fiscal & Macroeconomic	132	69	43	26	277
Consumer Welfare	117	63	42	11	233
Firm Revenue	154	48	26	3	231
Task Completion Time	173	31	8	12	225
Inequality Measures	44	123	50	6	223
Worker Satisfaction	89	65	22	12	188
Error Rate	71	92	10	2	175
Regulatory Compliance	77	69	14	5	165
Automation Exposure	58	56	26	13	156
Training Effectiveness	96	21	14	19	152
Wages & Compensation	77	37	25	6	145
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	81	21	1	115
Hiring & Recruitment	52	7	8	3	70
Creative Output	32	20	8	3	64
Skill Obsolescence	5	47	6	1	59
Social Protection	28	16	8	2	54
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Governance Remove filter

Vulnerability is path-dependent and contingent on states’ adaptive capacity—governance quality, industrial policy, and bargaining leverage determine whether a country captures upgrading opportunities or becomes a strategic casualty.

Comparative case analysis using indicators of governance, industrial policy presence, and bargaining outcomes; process tracing of critical junctures showing divergent trajectories. (Data sources: governance indicators, case comparisons; sample sizes not specified.)

medium mixed China-US Trade War and the Challenges for Developing Countri... upgrading outcomes (e.g., movement into higher-value segments), differential FDI...

Trade diversion caused by tariff escalation and restrictions re-routes production and trade flows, but benefits are asymmetric: countries with stronger institutions, infrastructure, and policy capacity capture more investment and value-added.

Analysis of bilateral trade and FDI flow changes after tariffs; supply-chain mapping of relocation events; firm announcements of relocation; comparative cases emphasizing institutional/infrastructure differences. (Data sources: trade and investment flow data, supply-chain maps, firm-level announcements; sample sizes not specified.)

medium mixed China-US Trade War and the Challenges for Developing Countri... FDI inflows into manufacturing/tech, share of value-added retained domestically,...

The benefits of AI come with governance, ethical, and sustainability challenges (standards, control, accountability) that require balancing against innovation incentives.

Synthesis of policy, ethics, and governance literature documenting concerns about standards, accountability, and incentive trade-offs; argument is qualitative and prescriptive rather than empirically tested within this paper.

medium mixed The Evolution and Societal Impact of Artificial Intelligence... governance effectiveness, ethical compliance, and balance between regulation and...

AI has enhanced delivery in education, health, transportation, and government, improving some service outcomes while persistent issues like bias, privacy, transparency, and accountability remain.

Synthesis of applied-AI case studies and sectoral evaluations drawn from interdisciplinary literature; evidence described qualitatively without new empirical aggregation or meta-analysis in this paper.

medium mixed The Evolution and Societal Impact of Artificial Intelligence... service delivery quality/accessibility and fairness/privacy/accountability indic...

AI reshapes demand for skills, redefines occupations, and accelerates the need for reskilling, with distributional effects that can increase inequality.

Narrative review of labor-economics and workforce studies documenting task reallocation and shifting skill requirements; based on observational studies and sectoral analyses summarized in the review (no unified sample size or new empirical test in this paper).

medium mixed The Evolution and Societal Impact of Artificial Intelligence... skill demand, occupational employment composition, wages/distributional outcomes

A multi-hazard, multi-risk approach increases societal resilience but is complex and cross-disciplinary.

Project-wide synthesis, in-depth place-based case studies, and stakeholder engagement reported in MYRIAD-EU activities indicating benefits to resilience alongside noted disciplinary and practical complexity.

medium mixed Reducing risk together: moving towards a more holistic appro... societal resilience

Shifting disaster risk management toward a genuinely multi-hazard, multi-risk paradigm is feasible and valuable but requires coordinated advances across conceptual mainstreaming, evidence on spatio-temporal hazard–exposure–vulnerability dynamics, scenario methods, usable decision-support tools, explicit equity integration, deep case-study coproduction, support for MHEWS, and strengthened ECR leadership.

Synthesis and reflection across MYRIAD-EU (2021–2025) project outputs, comparative synthesis of activities, lessons learned, and stakeholder feedback reported by the project.

medium mixed Reducing risk together: moving towards a more holistic appro... feasibility and value of adopting a multi-hazard, multi-risk disaster risk manag...

Technical milestones (scalable, error-corrected qubits; hybrid algorithms) create fat-tailed outcome distributions where a small probability of breakthrough could yield outsized long-run effects.

Monte Carlo experiments and scenario ensembles that include low-probability, high-impact technical breakthrough parameters; expert elicitation of milestone probabilities.

medium mixed Modeling Macroeconomic Output Gains from Quantum-Driven Prod... tail outcomes for GDP/TFP (extreme long-run gains)

R&D funding, standards, regulatory clarity, export controls, and public–private partnerships shape quantum adoption trajectories; policy missteps can slow adoption and concentrate benefits.

Policy counterfactual scenarios and qualitative analysis of ecosystem roles; calibration informed by historical effects of policy on diffusion of strategic technologies.

medium mixed Modeling Macroeconomic Output Gains from Quantum-Driven Prod... adoption rates, distribution of benefits, market concentration

Aggregate gains hinge on how quickly and broadly quantum technologies diffuse; early gains concentrated in frontier firms/sectors can take decades to propagate economy-wide.

Diffusion modeling using logistic/S-curve and Bass models calibrated to historical analog technologies; scenarios show long lag between frontier adoption and economy-wide diffusion.

medium mixed Modeling Macroeconomic Output Gains from Quantum-Driven Prod... time to economy-wide propagation, aggregate GDP/TFP growth over decades

As successive pilot batches of urban green data center policies are rolled out, the aggregate policy impact follows a nonlinear rise-then-fall (increase followed by decline) diffusion trajectory.

Analysis across pilot-batch rollout timing showing a nonlinear (rise-then-fall) pattern in aggregate estimated effects as the number of pilot batches expands; modeled/visualized within the staggered-adoption DID framework.

medium mixed How Does Urban Green Data Center Policy Empower Corporate En... aggregate policy impact on corporate energy utilization efficiency over pilot-ba...

Realizing NLP value in banks requires organizational investments (data pipelines, model deployment, CRM integration) and complementarity between AI tools and managerial/IT capabilities; returns will depend on these complementarities.

Conceptual implication derived from review of applied/engineering papers and literature on technology complementarities; not directly estimated empirically in the review.

medium mixed Natural language processing in bank marketing: a systematic ... realized ROI from NLP adoption conditional on organizational investments and com...

Automated tax-preparation and filing could increase compliance rates but also make tax bases more sensitive to automated tax-optimization strategies, requiring updated regulatory oversight and audit tools.

Paper's policy and economic implications section combining case-based observations and literature; presented as plausible outcomes rather than measured effects.

medium mixed Explore the Impact of Generative AI on Finance and Taxation tax compliance rates, prevalence of automated tax-optimization, regulatory/audit...

Ethics is distinct from and prior to law: legal codification cannot fully capture the primordial ethical demand.

Philosophical engagement with Derrida and Levinas; normative argumentation and conceptual examples. No empirical validation of precedence.

medium mixed Examining ethical challenges in human–robot interaction usin... completeness of legal codification in representing primordial ethical demands (c...

Legal norms and technical reforms are necessary but incomplete: they must remain responsive to a primordial, non-codifiable ethical obligation that structures how responsibility is perceived and allocated in practice.

Conceptual analysis drawing on Derrida and Levinas; argument supported by illustrative cases across three domains (care robotics, AVs, algorithmic governance). No empirical measurement of legal efficacy.

medium mixed Examining ethical challenges in human–robot interaction usin... adequacy of legal/technical reforms in capturing primordial ethical obligations ...

AI-driven productivity and data externalities can reconfigure which countries/regions specialize in which activities, with implications for labor demand, offshoring, and services trade patterns.

Mechanism and theory-based analysis drawing on literature about comparative advantage, automation, and data externalities; empirical testing recommended but not performed in the paper.

medium mixed Path Analysis of Digital Economy and Reconstruction of Inter... specialization patterns, labor demand, offshoring levels, services trade composi...

Standard international trade models should be updated to incorporate data as an input, platform-mediated matching, algorithmic complementarities, and costs of regulatory fragmentation.

Theoretical critique and modeling recommendations based on mechanism analysis; no new formal model calibration or empirical testing presented in the paper.

medium mixed Path Analysis of Digital Economy and Reconstruction of Inter... adequacy and predictive accuracy of trade models for AI-era trade patterns

AI-enabled markets tend toward winner-take-most platforms amplified by network effects.

Theoretical reasoning supported by platform literature and case illustrations of platform concentration dynamics; empirical magnitudes not estimated in the paper.

medium mixed Path Analysis of Digital Economy and Reconstruction of Inter... market concentration / platform dominance

Competitive advantage is shifting away from asset- and labor-intensive models toward data-, model-, and platform-driven advantages, altering comparative advantage and market structure.

Mechanism/theoretical analysis drawing on platform and AI economics literature and qualitative examples; no empirical estimation provided in the paper.

medium mixed Path Analysis of Digital Economy and Reconstruction of Inter... comparative advantage (sectoral specialization), market structure (incumbency, c...

Regulatory design acts as an economic instrument that can balance social value from AI with protection of rights, affecting social welfare, public trust, and long-term adoption rates.

Normative synthesis combining legal and economic reasoning; suggested as a theoretical mechanism rather than empirically validated within the paper.

medium mixed ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... social welfare, public trust, long-term AI adoption rates

Automation of routine administrative tasks may reduce demand for certain clerical roles while increasing demand for oversight, auditing, and legal-technical expertise, altering public-sector labor composition and retraining needs.

Qualitative labor-market reasoning based on task-based automation literature and the administrative context; no field labor-data or sample provided.

medium mixed ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... demand for different job categories (clerical roles vs oversight/legal-technical...

AI feedback may either augment teacher productivity (complementarity) or substitute for routine teacher feedback tasks (substitution), with unclear net labor impacts.

Workshop deliberations among 50 scholars highlighting competing theoretical scenarios; no causal labor-market evidence provided.

medium mixed The Future of Feedback: How Can AI Help Transform Feedback t... teacher time allocation; demand for teacher skills; employment levels in educati...

Easier conversational access to models can substitute for routine cognitive labor while complementing high-skill work; miscalibrated trust affects labor outcomes and supervision costs.

Labor and task-allocation implications argued conceptually; no labor-market empirical evidence or quantified substitution/complementarity rates presented.

medium mixed Why We Need to Destroy the Illusion of Speaking to A Human: ... labor substitution for routine tasks, complementarity with high-skill tasks, sup...

Firms can compete on front-end design (transparency, trustworthiness) as a socially beneficial quality signal, but absent regulation competition may favor more persuasive (less honest) interfaces.

Economic argument about product differentiation and competitive incentives, drawn from market theory and literature; no empirical market study provided.

medium mixed Why We Need to Destroy the Illusion of Speaking to A Human: ... firm competition strategies, prevalence of transparent vs. persuasive interfaces...

Misleading cues can create short-term surplus (user satisfaction) but long-term welfare losses if overtrust causes harms or misinformation.

Theoretical economic argument based on information asymmetry and externalities; no empirical quantification in the paper.

medium mixed Why We Need to Destroy the Illusion of Speaking to A Human: ... short-term user satisfaction vs. long-term welfare (harms from misinformation/ov...

LLM-based chatbots’ conversational naturalness increases usability and adoption but also triggers misleading mental models (e.g., anthropomorphism, overtrust).

Paper-level main finding based on conceptual analysis and literature synthesis from HCI, ethics, and conversational analysis; no new large-scale empirical study or sample reported.

medium mixed Why We Need to Destroy the Illusion of Speaking to A Human: ... usability, adoption (engagement/use rates), and prevalence of misleading mental ...

Human experts will likely shift roles from sole decision-makers to adjudicators, challengers, and validators of AI-generated arguments, changing required skills toward critical evaluation and dialectical oversight.

Conceptual labor-market projection; no empirical labor studies or surveys presented.

medium mixed Argumentative Human-AI Decision-Making: Toward AI Agents Tha... changes in job tasks, skill demand, and employment shares for expert validators/...

Productivity gains from partial automation may be offset by negative externalities (incorrect legal outcomes, appeals, reputational damage) that impose social and private costs not captured by narrow productivity measures.

Theoretical economic analysis and illustrative case vignettes describing error propagation; no empirical quantification of externalities.

medium mixed Why Avoid Generative Legal AI Systems? Hallucination, Overre... net social welfare/productivity after accounting for error-related externalities

Market demand will likely split between providers offering generative convenience with liability exposure and providers offering certified/verified, explainable tools at a premium, creating a two-tier market.

Market-structure analysis and illustrative projections; no empirical market data or sample size.

medium mixed Why Avoid Generative Legal AI Systems? Hallucination, Overre... market segmentation between riskier low-cost generative providers and premium ve...

Reported monetary supervision cost was low (~$200) for this project, but the paper cautions that general equilibrium effects and scaling may change costs as demand for supervisors rises.

Paper provides reported supervision cost (≈$200) for the single project and includes a caveat about external validity and scaling; cost is self-reported and contextualized by authors.

medium mixed Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau E... monetary supervision cost for this project (≈$200) and authors' caution about sc...

Because these agents will be embedded in safety‑critical infrastructure, economic and technical outcomes will depend heavily on system architecture choices.

Systems‑engineering and policy reasoning drawing on analogies to Internet/IoT evolution and domain examples (disaster response, healthcare, industrial automation, mobility); conceptual argumentation rather than empirical measurement.

medium mixed The Internet of Physical AI Agents: Interoperability, Longev... economic costs and technical system performance/resilience

Policymakers must weigh productivity gains from higher autonomy against increased systemic risk and governance costs; optimal allocation will vary by sector (high-consequence systems justify stricter human oversight; lower-consequence tasks may tolerate more autonomy).

Normative policy analysis and cost–benefit reasoning; sector-differentiated triage framework proposed (no quantitative welfare or sectoral optimization performed).

medium mixed Resilience Meets Autonomy: Governing Embodied AI in Critical... policy-optimal oversight allocation by sector (trade-off between productivity ga...

Bounded-autonomy governance internalizes some externalities from automated interactions, reducing the probability of cascading failures and associated economic damages, but misaligned or heterogeneous governance across firms/sectors can still generate systemic vulnerabilities.

Theoretical argument combining externalities literature and governance design principles; illustrative scenarios and policy reasoning (no empirical validation).

medium mixed Resilience Meets Autonomy: Governing Embodied AI in Critical... net effect on systemic risk (probability and expected loss from cascades) under ...

Modern critical infrastructure increasingly uses embodied AI for monitoring, predictive maintenance, and decision support, but these systems are typically trained for statistically representable uncertainty rather than systemic, cascading crises.

Review and synthesis of policy texts, industry descriptions, and safety/AI standards cited in the paper (EU AI Act, ISO standards) and literature on embodied-AI applications; conceptual argument (no original empirical sample).

medium mixed Resilience Meets Autonomy: Governing Embodied AI in Critical... mismatch between training uncertainty assumptions and real-world systemic crisis...

Increasing benign-agent count and agent stubbornness are practical levers for improving robustness, but both carry costs: added compute/operational cost for scaling agents, and degraded consensus/coordination when stubbornness is high.

Argumentation supported by simulation results showing improved robustness with more agents or higher stubbornness, combined with discussion of computational cost (scaling) and observed consensus degradation; computational cost is presented as conceptual/operational reasoning rather than quantified in the summary.

medium mixed Don't Trust Stubborn Neighbors: A Security Framework for Age... robustness to manipulation (improvement), computational/operational cost (increa...

Naïvely lowering trust weights assigned to suspected adversaries can limit adversarial influence but may also hinder cooperation and reduce task performance.

Simulations manipulating fixed trust weights and observing tradeoffs between reduced adversarial sway and decreased cooperative task performance/convergence; conceptual analysis of the tradeoff is provided.

medium mixed Don't Trust Stubborn Neighbors: A Security Framework for Age... adversarial influence (reduction) and cooperative task performance / convergence...

Raising agents' innate stubbornness (peer resistance) reduces susceptibility to adversarial manipulation but impairs the network's ability to reach consensus or coordinate effectively.

Combined theoretical reasoning from FJ model (stubbornness is weight on innate opinion) and simulation experiments varying stubbornness parameters; measured outcomes include adversarial influence and measures of convergence/coordination or task performance.

medium mixed Don't Trust Stubborn Neighbors: A Security Framework for Age... adversarial influence (reduction) and network coordination/consensus metrics or ...

Investments in interpretability that aim to fully 'rule‑ify' LLM competence may have diminishing returns; economic value may be better captured by research into robust behavioral evaluation, stress testing, and hybrid human‑AI workflows, while partial interpretability remains valuable.

R&D allocation and interpretability economics argument built on the central thesis; suggestion rather than empirical finding.

medium mixed Why the Valuable Capabilities of LLMs Are Precisely the Unex... returns to different types of interpretability/AI safety R&D

The paper challenges a purely rule‑based view of scientific explanation: some explanatory power will remain in implicit model structure rather than explicit rules.

Philosophical/epistemological argument based on the main thesis about tacit competence; no empirical validation.

medium mixed Why the Valuable Capabilities of LLMs Are Precisely the Unex... completeness of rule‑based scientific explanations when applied to LLM behavior

Liability regimes and penalties should account for limits of enforced compliance and false positives/negatives from probabilistic policy evaluations.

Normative/economic discussion in the paper highlighting probabilistic outputs of the Policy function and calibration challenges; no empirical validation.

medium mixed Runtime Governance for AI Agents: Policies on Paths appropriateness of liability frameworks given probabilistic enforcement (policy ...

Firms will trade off compliance strictness against service quality (task completion rates), creating an economic tradeoff that shapes market offerings (e.g., safer-but-slower vs. faster-but-riskier agents).

Economic reasoning and conceptual models in the paper; suggested objective balancing task completion and legal/reputational costs; no empirical market data.

medium mixed Runtime Governance for AI Agents: Policies on Paths tradeoff curve between task completion rate and compliance risk (expected violat...

Alignment and instruction tuning approaches intended to encourage up-to-date answers improve some behaviors but do not reliably solve time-sensitivity and cross-modal consistency issues.

Experiments applying alignment/instruction-tuning methods with measurement of correctness and consistency; reported partial or inconsistent improvements rather than full resolution.

medium mixed V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... changes in correctness and consistency after alignment/instruction tuning

Diagnostic analysis links outdated predictions to (i) the static, time-stamped nature of training/evaluation datasets and (ii) mechanistic limits in how multimodal representations encode and retrieve temporal facts.

Error attribution analyses connecting incorrect answers to training snapshot timestamps and dataset provenance; representation-level analyses and qualitative case studies demonstrating multimodal encoding/retrieval limits.

medium mixed V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... attribution of errors to dataset temporal mismatch and representation/mechanisti...

The economic value of deploying DeePC-based controllers depends critically on representativeness of training data and the costs of online adaptation and safety verification.

Authors' deployment-risk analysis and discussion of trade-offs (qualitative), grounded in methodological requirements of DeePC (need for representative, persistently exciting data and safeguards).

medium mixed Data-driven generalized perimeter control: Zürich case study net economic value after accounting for data collection, adaptation, and verific...

System-level improvements from the controller do not imply uniform spatial/temporal benefits—distributional effects may favor certain routes or neighborhoods.

Authors' discussion and caution about distributional effects and equity; possibly supported by spatial analyses in simulation (qualitative discussion in paper).

medium mixed Data-driven generalized perimeter control: Zürich case study spatial/temporal distribution of travel-time changes across network links or nei...

Deploying conformal factuality systems increases development cost (collecting representative calibration data) and inference cost (verifier compute), though efficient verifiers mitigate inference cost.

Discussion and empirical cost measurements: need for representative calibration datasets to maintain guarantees; measured verifier FLOPs; qualitative economic analysis in the paper.

medium mixed Is Conformal Factuality for RAG-based LLMs Robust? Novel Met... development effort for calibration data, inference compute cost (FLOPs), margina...

Conformal filtering improves formal reliability (statistical factuality guarantees) but does not, by itself, deliver robustness and task utility without careful system design.

Aggregate empirical results: improved factuality guarantees after calibration/filtering, but concurrent reductions in informativeness and sensitivity to distribution shift/distractors unless calibration/data-processing are adapted.

medium mixed Is Conformal Factuality for RAG-based LLMs Robust? Novel Met... post-filtering factuality guarantees, informativeness metrics, robustness under ...

DeepSeek-R1 exhibits a distributed memorization signature: 76.6% partial reconstruction rate but 0% verbatim recall on the TS‑Guessing probe.

Model-specific results from Experiment 3 (TS‑Guessing) reporting per-model rates of partial reconstruction and verbatim recall across the 513 MMLU items for DeepSeek-R1.

medium mixed Are Large Language Models Truly Smarter Than Humans? partial reconstruction rate and verbatim recall rate (per-model)

Quantitative comparisons across tested models show systematic Misapplication Rate even in settings where Appropriate Application Rate is high.

Aggregated MR and AAR statistics reported for multiple frontier models across the benchmark showing co‑occurrence of high AAR and nontrivial MR.

medium mixed BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Co‑occurrence of high Appropriate Application Rate (AAR) and nonzero Misapplicat...

Prompt‑based defensive instructions (explicitly instructing models to suppress preferences where inappropriate) reduce misapplication but fail to fully eliminate it.

Ablation experiments adding prompt‑based safety/defenses to model inputs and measuring MR and AAR; defenses produced reductions in MR but residual misapplication remained.

medium mixed BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Misapplication Rate (MR) and Appropriate Application Rate (AAR) under prompt‑bas...

« Prev 1 2 3 … 85 86 87 … 138 139 Next »