Evidence (6869 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Governance Remove filter

Framing a change as bug-free reduces vulnerability detection rates by 16-93%.

Result reported from Study 1 controlled experiments across models and framing conditions (250 CVE pairs).

high negative Measuring and Exploiting Confirmation Bias in LLM-Assisted S... vulnerability detection rate

LLM-generated peer reviews place significantly less weight on clarity and significance of the research.

Comparative analysis between LLM-generated reviews and human reviews from the conference dataset; reported as a statistically significant difference but exact statistics and sample size not provided in the excerpt.

high negative How LLMs Distort Our Written Language importance/weight given to clarity and significance in peer review content

Significantly more heavy LLM users reported that the writing was less creative and not in their voice.

Self-reported measures from participants in the human user study comparing heavy LLM users to others; no sample size or exact statistics provided in the excerpt.

high negative How LLMs Distort Our Written Language self-reported creativity and 'in-your-voice' authenticity of writing

In Chicago, the model shows moderate under-detection of Black residents with DIR equal to 0.22.

Reported DIR value from simulation results on Chicago 2022 data.

high negative Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... Disparate Impact Ratio (DIR) indicating under-detection of Black residents

It is impractical to uniformly apply an alignment method across diverse, independently developed AI models in strategic settings.

Paper assertion / motivating argument (stated as motivation for investigating zero-shot Nash-like behavior); not presented as an empirical finding within the paper.

high negative Reasonably reasoning AI agents can avoid game-theoretic fail... practicality/adoption feasibility of universal alignment methods

The crowding-out effect of AI washing on green innovation is heterogeneous: private enterprises, small and medium-sized enterprises (SMEs), and firms in highly competitive sectors suffer more severe negative impacts.

Subgroup/heterogeneity analysis reported in the paper on the same sample of Chinese A-share listed companies (2006–2024); abstract identifies private firms, SMEs, and firms in highly competitive industries as more affected.

high negative The Spillover Effects of Peer AI Rinsing on Corporate Green ... green innovation (heterogeneous treatment effects across firm types and industri...

The negative relationship between AI washing and green innovation is transmitted through dual channels in both product and capital markets.

Mechanism analysis reported in the paper (presumably mediation or channel analysis) using the same dataset of Chinese A-share firms' annual reports and firm-level market data; abstract states product- and capital-market channels convey the crowding-out effect.

high negative The Spillover Effects of Peer AI Rinsing on Corporate Green ... green innovation (via product-market and capital-market channels)

Corporate AI washing exerts a significant crowding-out effect on green innovation.

Empirical analysis using semantic measures of 'AI washing' derived from large language model (LLM) analysis of annual reports for Chinese A-share listed companies (2006–2024); paper reports statistically significant negative relationship between AI washing and firms' green innovation (details of regression models not provided in abstract).

high negative The Spillover Effects of Peer AI Rinsing on Corporate Green ... green innovation

Exclusion-based cohesion can produce state-contingent illusory precision together with effective input concentration and dynamic lock-in simultaneously—i.e., these phenomena co-occur under the model's parameter regimes.

Analytical model results showing co-occurrence of multiple adverse phenomena (bias that grows in tails, illusory precision, input concentration, lock-in) under the same exclusion mechanisms; derived within the paper's theoretical framework.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... co-occurrence of multiple adverse outcomes: tail bias, observed disagreement, ef...

When the anchor belief is updated from internally filtered aggregates, the system can exhibit dynamic lock-in: delayed recognition of regime shifts followed by abrupt correction.

Analytical dynamics studied in the model when anchor updates depend on filtered (excluded) aggregates; derivations demonstrate delayed detection and abrupt adjustments. This is a theoretical/dynamical model result, no empirical data.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... delay in regime recognition and magnitude/timing of corrective update

Exclusion leads to effective concentration of decision inputs: the effective number of independent inputs falls below the nominal participant count.

Model-derived analytic result showing that report shrinkage and discarding reduce effective information contributions, quantified relative to nominal participation in the theoretical framework. No empirical sample.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... effective number of independent decision inputs (information concentration)

Exclusion-based cohesion induces 'illusory precision': observed disagreement can fall while actual estimation error in tail regimes rises (i.e., lower recorded variance despite higher true error).

Theoretical result derived from the signal-aggregation model showing a regime in which filtered reports reduce observed variance even as tail-regime estimation error increases. No empirical validation provided.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... observed disagreement (reported variance) versus true estimation error in tail r...

Relative to a full-inclusion benchmark, exclusion-based cohesion produces state-contingent bias that is small in normal regimes but grows sharply under regime displacement (tail events).

Analytical comparisons between the exclusion model and a full-inclusion benchmark within the theoretical model; derivations showing bias as a function of regime and exclusion parameters. The result is from model analysis, not empirical data.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... estimation bias (especially under regime displacement/tail events)

The establishment of the China–ASEAN Free Trade Area (CAFTA) reduced regional trade policy uncertainty.

Empirical analysis treats CAFTA as an exogenous policy shock and measures a decline in regional trade policy uncertainty using firm‑ and trade‑level data from the China Industrial Enterprise Database and China Customs Database covering 2000–2014; identification via difference‑in‑differences (DID). (Sample sizes not specified in provided summary.)

high negative How regional trade policy uncertainty affects agricultural i... regional trade policy uncertainty (measured at regional/firm level)

Securitization of economic dependencies—especially in strategic sectors (semiconductors, telecoms, cloud)—frames partner states as security risks and exposes them to blacklists, de-risking campaigns, and sudden loss of market access.

Process tracing of export controls and blacklisting episodes; chronologies of sanction/policy actions affecting firms and partners; policy documents and public lists (e.g., export-control lists). (Data sources: export-control lists, sanction policy documents, corporate/access denials; sample sizes not specified.)

high negative China-US Trade War and the Challenges for Developing Countri... incidence of blacklisting/sanctions affecting partners, sudden changes in market...

Large-scale AI models have significant energy and resource costs, creating a notable environmental footprint that must be addressed.

Narrative integration of prior empirical studies measuring compute, energy consumption, and embodied emissions of large models (cited literature); the review does not present new quantitative measurements itself.

high negative The Evolution and Societal Impact of Artificial Intelligence... energy consumption, carbon emissions, and resource use associated with large-sca...

As AI is deployed in safety-critical domains, reliability, regulation, and human-oriented system design become essential to avoid harms.

Review of literature on safety-critical systems, human–machine interaction studies, and regulatory policy discussions; the paper reports this as a consensus implication rather than presenting new empirical tests.

high negative The Evolution and Societal Impact of Artificial Intelligence... system reliability/safety and risk of harm in safety-critical deployments

Stronger empirical evidence is needed on how hazard, exposure, and vulnerability interact across space and time to shape aggregated multi-risks.

Evaluation of project activities and case studies identifying gaps in empirical spatio-temporal analyses of interacting risk components; synthesis recommends targeted empirical work.

high negative Reducing risk together: moving towards a more holistic appro... empirical understanding of spatio-temporal interactions among hazard, exposure, ...

The current literature is skewed toward descriptive and engineering work; there is a lack of causal, field‑experimental evidence on NLP interventions' effects on customer behavior and firm profits.

Review coding of study types in the sample (engineering/descriptive vs. experimental/causal) showing few field experiments or causal designs.

high negative Natural language processing in bank marketing: a systematic ... presence vs. absence of causal/experimental studies measuring effects on custome...

Important gaps include customer acquisition, personalization at scale, use of external text sources (social media, news, reviews), operational process improvement, and cross‑channel integration.

Gap detection via low‑density regions in the UMAP thematic map of sentence‑transformer embeddings and manual review showing low article counts for these topics within the 109‑article sample.

high negative Natural language processing in bank marketing: a systematic ... topical coverage by customer journey stage and source type (acquisition, persona...

Existing literature on NLP in marketing is concentrated around customer retention tasks (e.g., churn prediction, complaint handling, relationship management).

Thematic clustering from sentence‑transformer embeddings of article text combined with UMAP visualization, and manual review of article topics and keywords identifying frequent retention‑related themes.

high negative Natural language processing in bank marketing: a systematic ... topical frequency/coverage by customer journey stage (retention)

NLP applications in bank marketing are severely under‑studied.

Descriptive result from the PRISMA review showing only 8/109 articles focused on NLP in bank marketing (≈7%), plus thematic mapping showing sparse coverage in bank‑marketing/NLP intersection.

high negative Natural language processing in bank marketing: a systematic ... proportion and absolute count of studies at the intersection of NLP and bank mar...

AI‑enabled platforms can magnify winner‑takes‑most dynamics in digital services trade, concentrating market power.

Theoretical and empirical literature on network effects and platform markets reviewed in the paper; illustrative examples (no novel empirical aggregation).

high negative Analysis of Digital Services Trade and Export Competitivenes... market concentration / competition in digital services

Current data governance regimes in China can impede cross‑border data flows.

Comparative policy analysis and literature documenting data localization and privacy/regulatory regimes that restrict flows (descriptive evidence in the review).

high negative Analysis of Digital Services Trade and Export Competitivenes... volume/feasibility of cross‑border data flows

Institutional barriers—fragmented international rules on data flows and privacy, regulatory divergence including data localization, weak participation in multilateral rule setting, and uneven domestic regulation of platforms—impede digital services trade.

Comparative policy analysis and literature review, supported by policy documents and case examples (qualitative evidence; no original econometric tests).

high negative Analysis of Digital Services Trade and Export Competitivenes... cross‑border digital services trade / export competitiveness

Problem C is the practical difficulty of attributing responsibility and agency across distributed socio-technical systems (robots, algorithms, institutions, humans).

Conceptual diagnosis developed in the paper and exemplified with vignettes from three application domains; defined as an analytic concept rather than empirically measured.

high negative Examining ethical challenges in human–robot interaction usin... ability to attribute responsibility/agency in distributed socio-technical system...

Jurisdictions are taking divergent policy approaches (e.g., U.S. emphasis on innovation/competition, EU emphasis on rights/standards like GDPR), producing fragmented digital trade rules.

Comparative legal and policy analysis of existing national/regional rules and international instruments (examples cited include GDPR and U.S. policy orientations); descriptive, with specific regulatory texts analyzed.

high negative Path Analysis of Digital Economy and Reconstruction of Inter... regulatory fragmentation / interoperability of digital trade rules

AI creates novel non-tariff frictions, e.g., pressures toward data localization and regulatory requirements for algorithmic transparency.

Comparative legal and policy analysis of emerging regulations (e.g., data localization laws, algorithmic regulation initiatives) and illustrative jurisdictional examples.

high negative Path Analysis of Digital Economy and Reconstruction of Inter... non-tariff regulatory frictions (data-flow restrictions, transparency/compliance...

Vietnam's civil-law features—statutory specificity, formal procedures, and constitutional principles like legal certainty and fairness—make straightforward AI deployment legally fraught.

Close textual analysis of Vietnam's statutes, constitutional provisions, and administrative procedures (doctrinal legal analysis); no quantitative sample.

high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... legal compatibility of AI deployment (degree of legal obstacles to deployment)

Automated decisions complicate assigning responsibility and hinder judicial and administrative reviewability.

Doctrinal examination of accountability and review mechanisms in administrative law plus comparative institutional analysis of automated decision-making governance.

high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... clarity of accountability (ability to assign responsibility) and effectiveness o...

Opaque AI models risk violating notice, reason-giving, and appeal rights protected under administrative due process.

Analysis of procedural due-process requirements (notice, reason-giving, appeal) in Vietnam's legal framework and assessment of opacity issues in algorithmic systems; qualitative reasoning, no empirical testing.

high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... compliance with due-process requirements (notice, reasons, appealability)

Provider incentives may be misaligned (e.g., optimizing for engagement or test performance instead of durable learning), requiring contracts, regulation, or purchaser design to align incentives.

Consensus from interdisciplinary workshop (50 scholars) highlighting incentive risks and market-design considerations; descriptive, not empirical.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... provider optimization metrics (engagement/test performance) vs. durable learning...

Extensive learner data needed to personalize AI feedback raises privacy and data-governance concerns (consent, storage, usage).

Qualitative consensus from workshop participants (50 scholars) noting data-collection requirements and governance risks; no empirical governance studies included.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... volume/type of learner data collected; privacy risk indicators; compliance with ...

Automated feedback may not capture pedagogical nuances expert teachers use (motivation, socio-emotional cues, complex reasoning), limiting pedagogical fit.

Expert syntheses from the workshop of 50 scholars highlighting limits of automation relative to expert teacher judgment; no empirical comparisons presented.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... coverage of socio-emotional and complex-reasoning cues in feedback; corresponden...

AI-generated feedback can be incorrect, misleading, or misaligned with learning objectives; assessing feedback quality is nontrivial.

Repeated concern raised across workshop participants (50 scholars) in qualitative synthesis; noted as a substantive risk and open challenge rather than empirically quantified here.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... feedback factual correctness; alignment with stated learning objectives; rate of...

Exposure to top-rated exemplar papers produced large reductions in interquartile range (IQR) of estimates—within converging measure families, IQR fell by roughly 80–99%.

Stage 3 of the protocol: after agents were shown top-rated exemplar papers, measured within-measure-family IQRs of agents' estimates decreased substantially; reported quantitative reduction range of 80%–99% within measure families that converged.

high negative Nonstandard Errors in AI Agents percentage reduction in interquartile range (IQR) of effect estimates within mea...

Integration costs—domain modeling, human-in-the-loop protocols, and regulatory/liability frameworks—are significant barriers to deployment.

Conceptual assessment of operational and regulatory requirements; no quantified cost studies provided.

high negative Argumentative Human-AI Decision-Making: Toward AI Agents Tha... implementation cost and organizational burden for deploying argumentative AI sys...

AFs and LLMs may be gamed or misled; adversaries may exploit systems leading to strategic argumentation or manipulation.

Conceptual security/adversarial concern based on known vulnerabilities in ML and strategic behavior; no adversarial tests reported.

high negative Argumentative Human-AI Decision-Making: Toward AI Agents Tha... system vulnerability metrics / susceptibility to adversarial manipulation

Faithful extraction—aligning LLM-extracted arguments with formal AF primitives and ensuring fidelity to source evidence—is a key technical challenge.

Paper's explicit identification of failure modes and alignment issues; grounded in documented limitations of IE/LLMs (no empirical quantification here).

high negative Argumentative Human-AI Decision-Making: Toward AI Agents Tha... fidelity/alignment error rate between extracted elements and source evidence

Computational argumentation approaches have required heavy feature engineering and domain-specific knowledge to be effective.

Conceptual claim grounded in prior work and practical experience reported in the literature; no quantitative cost estimates provided in the paper.

high negative Argumentative Human-AI Decision-Making: Toward AI Agents Tha... engineering cost / domain modeling effort required for AF-based systems

Automation bias (human tendency to defer to automated outputs) compounds the risk that GLAI errors become embedded in legal processes.

Behavioral literature review on automation bias and trust in AI systems; applied to legal-context vignettes. No primary empirical test within the paper.

high negative Why Avoid Generative Legal AI Systems? Hallucination, Overre... likelihood of human operators deferring to GLAI outputs (automation bias effect)

A key architectural risk is interoperability failure and fragmentation across vendors and protocols in agent ecosystems.

Comparative analysis with IoT and other platform histories showing vendor/protocol fragmentation; argument is conceptual and illustrative rather than empirically measured for future agent ecosystems.

high negative The Internet of Physical AI Agents: Interoperability, Longev... degree of interoperability and fragmentation across vendors/protocols

Domains such as disaster response, healthcare, industrial automation, and mobility will be affected and are safety‑critical, where failures have high social and economic cost.

Domain examples and policy reasoning; draws on general knowledge about those sectors and potential harms; no new empirical damage quantification provided in the paper.

high negative The Internet of Physical AI Agents: Interoperability, Longev... social and economic costs of failures in safety‑critical domains

IoT digitized perception at scale but exposed limitations such as fragmentation, weak security, limited autonomy, and poor sustainability.

Historical and comparative analysis of IoT deployments and literature cited illustratively in the paper; qualitative evidence from prior IoT incidents and ecosystem studies rather than new empirical data.

high negative The Internet of Physical AI Agents: Interoperability, Longev... levels of fragmentation, security robustness, autonomy, and sustainability in Io...

A single malicious or compromised LLM agent with high stubbornness and persuasive power can trigger a persuasion cascade that steers the collective opinion of a multi-agent LLM system (MAS).

Theoretical analysis using the Friedkin–Johnsen (FJ) opinion-formation model (analysis of fixed points and influence propagation) plus simulation experiments mapping LLM-MAS interactions to FJ dynamics across multiple network topologies and attacker profiles. (Paper reports simulation results but does not provide exact sample sizes in the provided summary.)

high negative Don't Trust Stubborn Neighbors: A Security Framework for Age... extent of adversarial sway / shift in collective opinion (final consensus and op...

Static ACLs evaluate deterministic rules that ignore partial execution paths and therefore can only capture a subset of organizational constraints.

Formal argument and examples showing static ACLs map to Policy functions that do not depend on partial_path; illustrative limitations presented.

high negative Runtime Governance for AI Agents: Policies on Paths coverage of organizational constraints by static ACLs (proportion of constraints...

Runtime evaluation imposes additional compute, latency, logging, and engineering costs that increase the marginal cost of deploying agents.

Operational discussion in the paper outlining additional runtime compute and logging requirements; cost implications argued qualitatively; no empirical cost measurements provided.

high negative Runtime Governance for AI Agents: Policies on Paths marginal deployment cost (compute/latency/engineering overhead)

Prompt-level instructions and static access control lists (ACLs) are limited special cases of a more general runtime policy-evaluation framework and cannot, in general, enforce path-dependent rules.

Formalization showing prompt/system messages and static ACLs map to restricted forms of the Policy(agent_id, partial_path, proposed_action, org_state) function; logical proof/argument in the paper and illustrative counterexamples.

high negative Runtime Governance for AI Agents: Policies on Paths ability to detect/enforce path-dependent policy violations (yes/no / coverage of...

LLM-based agent behavior is non-deterministic and path-dependent: an agent's safety/compliance risk depends on the entire execution path, not just the current prompt or single action.

Formal/abstract execution model defined in the paper (states, actions, execution paths) and conceptual arguments/illustrative examples showing how earlier states/actions affect later behavior; no large-scale empirical dataset reported.

high negative Runtime Governance for AI Agents: Policies on Paths path-dependent compliance/safety risk (probability of policy violation condition...

Qualitative case studies show modality-specific failures, such as correct entity recognition but wrong factual attribute.

Paper includes qualitative examples/case studies from the benchmark where models identify entities in images correctly but produce incorrect time-sensitive attributes (e.g., current officeholder or company status).

high negative V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... case-study examples of modality-specific failure modes

« Prev 1 2 3 … 25 26 27 … 137 138 Next »