Evidence (6869 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Framing a change as bug-free reduces vulnerability detection rates by 16-93%.
Result reported from Study 1 controlled experiments across models and framing conditions (250 CVE pairs).
LLM-generated peer reviews place significantly less weight on clarity and significance of the research.
Comparative analysis between LLM-generated reviews and human reviews from the conference dataset; reported as a statistically significant difference but exact statistics and sample size not provided in the excerpt.
Significantly more heavy LLM users reported that the writing was less creative and not in their voice.
Self-reported measures from participants in the human user study comparing heavy LLM users to others; no sample size or exact statistics provided in the excerpt.
In Chicago, the model shows moderate under-detection of Black residents with DIR equal to 0.22.
Reported DIR value from simulation results on Chicago 2022 data.
It is impractical to uniformly apply an alignment method across diverse, independently developed AI models in strategic settings.
Paper assertion / motivating argument (stated as motivation for investigating zero-shot Nash-like behavior); not presented as an empirical finding within the paper.
The crowding-out effect of AI washing on green innovation is heterogeneous: private enterprises, small and medium-sized enterprises (SMEs), and firms in highly competitive sectors suffer more severe negative impacts.
Subgroup/heterogeneity analysis reported in the paper on the same sample of Chinese A-share listed companies (2006–2024); abstract identifies private firms, SMEs, and firms in highly competitive industries as more affected.
The negative relationship between AI washing and green innovation is transmitted through dual channels in both product and capital markets.
Mechanism analysis reported in the paper (presumably mediation or channel analysis) using the same dataset of Chinese A-share firms' annual reports and firm-level market data; abstract states product- and capital-market channels convey the crowding-out effect.
Corporate AI washing exerts a significant crowding-out effect on green innovation.
Empirical analysis using semantic measures of 'AI washing' derived from large language model (LLM) analysis of annual reports for Chinese A-share listed companies (2006–2024); paper reports statistically significant negative relationship between AI washing and firms' green innovation (details of regression models not provided in abstract).
Exclusion-based cohesion can produce state-contingent illusory precision together with effective input concentration and dynamic lock-in simultaneously—i.e., these phenomena co-occur under the model's parameter regimes.
Analytical model results showing co-occurrence of multiple adverse phenomena (bias that grows in tails, illusory precision, input concentration, lock-in) under the same exclusion mechanisms; derived within the paper's theoretical framework.
When the anchor belief is updated from internally filtered aggregates, the system can exhibit dynamic lock-in: delayed recognition of regime shifts followed by abrupt correction.
Analytical dynamics studied in the model when anchor updates depend on filtered (excluded) aggregates; derivations demonstrate delayed detection and abrupt adjustments. This is a theoretical/dynamical model result, no empirical data.
Exclusion leads to effective concentration of decision inputs: the effective number of independent inputs falls below the nominal participant count.
Model-derived analytic result showing that report shrinkage and discarding reduce effective information contributions, quantified relative to nominal participation in the theoretical framework. No empirical sample.
Exclusion-based cohesion induces 'illusory precision': observed disagreement can fall while actual estimation error in tail regimes rises (i.e., lower recorded variance despite higher true error).
Theoretical result derived from the signal-aggregation model showing a regime in which filtered reports reduce observed variance even as tail-regime estimation error increases. No empirical validation provided.
Relative to a full-inclusion benchmark, exclusion-based cohesion produces state-contingent bias that is small in normal regimes but grows sharply under regime displacement (tail events).
Analytical comparisons between the exclusion model and a full-inclusion benchmark within the theoretical model; derivations showing bias as a function of regime and exclusion parameters. The result is from model analysis, not empirical data.
The establishment of the China–ASEAN Free Trade Area (CAFTA) reduced regional trade policy uncertainty.
Empirical analysis treats CAFTA as an exogenous policy shock and measures a decline in regional trade policy uncertainty using firm‑ and trade‑level data from the China Industrial Enterprise Database and China Customs Database covering 2000–2014; identification via difference‑in‑differences (DID). (Sample sizes not specified in provided summary.)
Securitization of economic dependencies—especially in strategic sectors (semiconductors, telecoms, cloud)—frames partner states as security risks and exposes them to blacklists, de-risking campaigns, and sudden loss of market access.
Process tracing of export controls and blacklisting episodes; chronologies of sanction/policy actions affecting firms and partners; policy documents and public lists (e.g., export-control lists). (Data sources: export-control lists, sanction policy documents, corporate/access denials; sample sizes not specified.)
Large-scale AI models have significant energy and resource costs, creating a notable environmental footprint that must be addressed.
Narrative integration of prior empirical studies measuring compute, energy consumption, and embodied emissions of large models (cited literature); the review does not present new quantitative measurements itself.
As AI is deployed in safety-critical domains, reliability, regulation, and human-oriented system design become essential to avoid harms.
Review of literature on safety-critical systems, human–machine interaction studies, and regulatory policy discussions; the paper reports this as a consensus implication rather than presenting new empirical tests.
Stronger empirical evidence is needed on how hazard, exposure, and vulnerability interact across space and time to shape aggregated multi-risks.
Evaluation of project activities and case studies identifying gaps in empirical spatio-temporal analyses of interacting risk components; synthesis recommends targeted empirical work.
The current literature is skewed toward descriptive and engineering work; there is a lack of causal, field‑experimental evidence on NLP interventions' effects on customer behavior and firm profits.
Review coding of study types in the sample (engineering/descriptive vs. experimental/causal) showing few field experiments or causal designs.
Important gaps include customer acquisition, personalization at scale, use of external text sources (social media, news, reviews), operational process improvement, and cross‑channel integration.
Gap detection via low‑density regions in the UMAP thematic map of sentence‑transformer embeddings and manual review showing low article counts for these topics within the 109‑article sample.
Existing literature on NLP in marketing is concentrated around customer retention tasks (e.g., churn prediction, complaint handling, relationship management).
Thematic clustering from sentence‑transformer embeddings of article text combined with UMAP visualization, and manual review of article topics and keywords identifying frequent retention‑related themes.
NLP applications in bank marketing are severely under‑studied.
Descriptive result from the PRISMA review showing only 8/109 articles focused on NLP in bank marketing (≈7%), plus thematic mapping showing sparse coverage in bank‑marketing/NLP intersection.
AI‑enabled platforms can magnify winner‑takes‑most dynamics in digital services trade, concentrating market power.
Theoretical and empirical literature on network effects and platform markets reviewed in the paper; illustrative examples (no novel empirical aggregation).
Current data governance regimes in China can impede cross‑border data flows.
Comparative policy analysis and literature documenting data localization and privacy/regulatory regimes that restrict flows (descriptive evidence in the review).
Institutional barriers—fragmented international rules on data flows and privacy, regulatory divergence including data localization, weak participation in multilateral rule setting, and uneven domestic regulation of platforms—impede digital services trade.
Comparative policy analysis and literature review, supported by policy documents and case examples (qualitative evidence; no original econometric tests).
Problem C is the practical difficulty of attributing responsibility and agency across distributed socio-technical systems (robots, algorithms, institutions, humans).
Conceptual diagnosis developed in the paper and exemplified with vignettes from three application domains; defined as an analytic concept rather than empirically measured.
Jurisdictions are taking divergent policy approaches (e.g., U.S. emphasis on innovation/competition, EU emphasis on rights/standards like GDPR), producing fragmented digital trade rules.
Comparative legal and policy analysis of existing national/regional rules and international instruments (examples cited include GDPR and U.S. policy orientations); descriptive, with specific regulatory texts analyzed.
AI creates novel non-tariff frictions, e.g., pressures toward data localization and regulatory requirements for algorithmic transparency.
Comparative legal and policy analysis of emerging regulations (e.g., data localization laws, algorithmic regulation initiatives) and illustrative jurisdictional examples.
Vietnam's civil-law features—statutory specificity, formal procedures, and constitutional principles like legal certainty and fairness—make straightforward AI deployment legally fraught.
Close textual analysis of Vietnam's statutes, constitutional provisions, and administrative procedures (doctrinal legal analysis); no quantitative sample.
Automated decisions complicate assigning responsibility and hinder judicial and administrative reviewability.
Doctrinal examination of accountability and review mechanisms in administrative law plus comparative institutional analysis of automated decision-making governance.
Opaque AI models risk violating notice, reason-giving, and appeal rights protected under administrative due process.
Analysis of procedural due-process requirements (notice, reason-giving, appeal) in Vietnam's legal framework and assessment of opacity issues in algorithmic systems; qualitative reasoning, no empirical testing.
Provider incentives may be misaligned (e.g., optimizing for engagement or test performance instead of durable learning), requiring contracts, regulation, or purchaser design to align incentives.
Consensus from interdisciplinary workshop (50 scholars) highlighting incentive risks and market-design considerations; descriptive, not empirical.
Extensive learner data needed to personalize AI feedback raises privacy and data-governance concerns (consent, storage, usage).
Qualitative consensus from workshop participants (50 scholars) noting data-collection requirements and governance risks; no empirical governance studies included.
Automated feedback may not capture pedagogical nuances expert teachers use (motivation, socio-emotional cues, complex reasoning), limiting pedagogical fit.
Expert syntheses from the workshop of 50 scholars highlighting limits of automation relative to expert teacher judgment; no empirical comparisons presented.
AI-generated feedback can be incorrect, misleading, or misaligned with learning objectives; assessing feedback quality is nontrivial.
Repeated concern raised across workshop participants (50 scholars) in qualitative synthesis; noted as a substantive risk and open challenge rather than empirically quantified here.
Exposure to top-rated exemplar papers produced large reductions in interquartile range (IQR) of estimates—within converging measure families, IQR fell by roughly 80–99%.
Stage 3 of the protocol: after agents were shown top-rated exemplar papers, measured within-measure-family IQRs of agents' estimates decreased substantially; reported quantitative reduction range of 80%–99% within measure families that converged.
Integration costs—domain modeling, human-in-the-loop protocols, and regulatory/liability frameworks—are significant barriers to deployment.
Conceptual assessment of operational and regulatory requirements; no quantified cost studies provided.
AFs and LLMs may be gamed or misled; adversaries may exploit systems leading to strategic argumentation or manipulation.
Conceptual security/adversarial concern based on known vulnerabilities in ML and strategic behavior; no adversarial tests reported.
Faithful extraction—aligning LLM-extracted arguments with formal AF primitives and ensuring fidelity to source evidence—is a key technical challenge.
Paper's explicit identification of failure modes and alignment issues; grounded in documented limitations of IE/LLMs (no empirical quantification here).
Computational argumentation approaches have required heavy feature engineering and domain-specific knowledge to be effective.
Conceptual claim grounded in prior work and practical experience reported in the literature; no quantitative cost estimates provided in the paper.
Automation bias (human tendency to defer to automated outputs) compounds the risk that GLAI errors become embedded in legal processes.
Behavioral literature review on automation bias and trust in AI systems; applied to legal-context vignettes. No primary empirical test within the paper.
A key architectural risk is interoperability failure and fragmentation across vendors and protocols in agent ecosystems.
Comparative analysis with IoT and other platform histories showing vendor/protocol fragmentation; argument is conceptual and illustrative rather than empirically measured for future agent ecosystems.
Domains such as disaster response, healthcare, industrial automation, and mobility will be affected and are safety‑critical, where failures have high social and economic cost.
Domain examples and policy reasoning; draws on general knowledge about those sectors and potential harms; no new empirical damage quantification provided in the paper.
IoT digitized perception at scale but exposed limitations such as fragmentation, weak security, limited autonomy, and poor sustainability.
Historical and comparative analysis of IoT deployments and literature cited illustratively in the paper; qualitative evidence from prior IoT incidents and ecosystem studies rather than new empirical data.
A single malicious or compromised LLM agent with high stubbornness and persuasive power can trigger a persuasion cascade that steers the collective opinion of a multi-agent LLM system (MAS).
Theoretical analysis using the Friedkin–Johnsen (FJ) opinion-formation model (analysis of fixed points and influence propagation) plus simulation experiments mapping LLM-MAS interactions to FJ dynamics across multiple network topologies and attacker profiles. (Paper reports simulation results but does not provide exact sample sizes in the provided summary.)
Static ACLs evaluate deterministic rules that ignore partial execution paths and therefore can only capture a subset of organizational constraints.
Formal argument and examples showing static ACLs map to Policy functions that do not depend on partial_path; illustrative limitations presented.
Runtime evaluation imposes additional compute, latency, logging, and engineering costs that increase the marginal cost of deploying agents.
Operational discussion in the paper outlining additional runtime compute and logging requirements; cost implications argued qualitatively; no empirical cost measurements provided.
Prompt-level instructions and static access control lists (ACLs) are limited special cases of a more general runtime policy-evaluation framework and cannot, in general, enforce path-dependent rules.
Formalization showing prompt/system messages and static ACLs map to restricted forms of the Policy(agent_id, partial_path, proposed_action, org_state) function; logical proof/argument in the paper and illustrative counterexamples.
LLM-based agent behavior is non-deterministic and path-dependent: an agent's safety/compliance risk depends on the entire execution path, not just the current prompt or single action.
Formal/abstract execution model defined in the paper (states, actions, execution paths) and conceptual arguments/illustrative examples showing how earlier states/actions affect later behavior; no large-scale empirical dataset reported.
Qualitative case studies show modality-specific failures, such as correct entity recognition but wrong factual attribute.
Paper includes qualitative examples/case studies from the benchmark where models identify entities in images correctly but produce incorrect time-sensitive attributes (e.g., current officeholder or company status).