Evidence (3224 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Labor Markets Remove filter

The sentiment-induced divergences lead to unstable and often inflated compensation predictions by the models.

Analysis of model-predicted compensation amounts under sentiment perturbations showing increased variability and upward bias compared to CJOL amounts.

high negative LLM Safety in Judicial AI: A Stress Test of Social Media Inf... predicted compensation amounts (inflation and instability) from LLMs versus CJOL...

Public opinion (social media sentiment) substantially amplifies deviations between LLM outputs and real rulings.

Stress-test experiments in which injected social media sentiment increased the divergence of model outputs from CJOL judgments across the sample.

high negative LLM Safety in Judicial AI: A Stress Test of Social Media Inf... change in deviation between LLM outputs and CJOL rulings when social media senti...

Models exhibit inherent deviations from real rulings.

Empirical comparison of LLM outputs to CJOL judgments showing systematic differences (based on the paper's reported comparisons across the dataset).

high negative LLM Safety in Judicial AI: A Stress Test of Social Media Inf... magnitude and frequency of deviations between LLM outputs and actual court judgm...

Rather than broad job losses, evidence points to a reallocation at the entry level: AI automates tasks typically assigned to junior staff, shifting the nature of entry-level roles.

Synthesis of firm- and task-level empirical studies reported in the brief documenting automation of routine/junior tasks and changes in job-task composition; specific sample sizes vary by cited study and are not provided in the brief.

high negative AI, Productivity, and Labor Markets: A Review of the Empiric... automation of entry-level/junior tasks and changes to entry-level job content

Large-scale AI models have significant energy and resource costs, creating a notable environmental footprint that must be addressed.

Narrative integration of prior empirical studies measuring compute, energy consumption, and embodied emissions of large models (cited literature); the review does not present new quantitative measurements itself.

high negative The Evolution and Societal Impact of Artificial Intelligence... energy consumption, carbon emissions, and resource use associated with large-sca...

As AI is deployed in safety-critical domains, reliability, regulation, and human-oriented system design become essential to avoid harms.

Review of literature on safety-critical systems, human–machine interaction studies, and regulatory policy discussions; the paper reports this as a consensus implication rather than presenting new empirical tests.

high negative The Evolution and Societal Impact of Artificial Intelligence... system reliability/safety and risk of harm in safety-critical deployments

The current literature is skewed toward descriptive and engineering work; there is a lack of causal, field‑experimental evidence on NLP interventions' effects on customer behavior and firm profits.

Review coding of study types in the sample (engineering/descriptive vs. experimental/causal) showing few field experiments or causal designs.

high negative Natural language processing in bank marketing: a systematic ... presence vs. absence of causal/experimental studies measuring effects on custome...

Important gaps include customer acquisition, personalization at scale, use of external text sources (social media, news, reviews), operational process improvement, and cross‑channel integration.

Gap detection via low‑density regions in the UMAP thematic map of sentence‑transformer embeddings and manual review showing low article counts for these topics within the 109‑article sample.

high negative Natural language processing in bank marketing: a systematic ... topical coverage by customer journey stage and source type (acquisition, persona...

Existing literature on NLP in marketing is concentrated around customer retention tasks (e.g., churn prediction, complaint handling, relationship management).

Thematic clustering from sentence‑transformer embeddings of article text combined with UMAP visualization, and manual review of article topics and keywords identifying frequent retention‑related themes.

high negative Natural language processing in bank marketing: a systematic ... topical frequency/coverage by customer journey stage (retention)

NLP applications in bank marketing are severely under‑studied.

Descriptive result from the PRISMA review showing only 8/109 articles focused on NLP in bank marketing (≈7%), plus thematic mapping showing sparse coverage in bank‑marketing/NLP intersection.

high negative Natural language processing in bank marketing: a systematic ... proportion and absolute count of studies at the intersection of NLP and bank mar...

Vietnam's civil-law features—statutory specificity, formal procedures, and constitutional principles like legal certainty and fairness—make straightforward AI deployment legally fraught.

Close textual analysis of Vietnam's statutes, constitutional provisions, and administrative procedures (doctrinal legal analysis); no quantitative sample.

high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... legal compatibility of AI deployment (degree of legal obstacles to deployment)

Automated decisions complicate assigning responsibility and hinder judicial and administrative reviewability.

Doctrinal examination of accountability and review mechanisms in administrative law plus comparative institutional analysis of automated decision-making governance.

high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... clarity of accountability (ability to assign responsibility) and effectiveness o...

Opaque AI models risk violating notice, reason-giving, and appeal rights protected under administrative due process.

Analysis of procedural due-process requirements (notice, reason-giving, appeal) in Vietnam's legal framework and assessment of opacity issues in algorithmic systems; qualitative reasoning, no empirical testing.

high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... compliance with due-process requirements (notice, reasons, appealability)

Provider incentives may be misaligned (e.g., optimizing for engagement or test performance instead of durable learning), requiring contracts, regulation, or purchaser design to align incentives.

Consensus from interdisciplinary workshop (50 scholars) highlighting incentive risks and market-design considerations; descriptive, not empirical.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... provider optimization metrics (engagement/test performance) vs. durable learning...

Extensive learner data needed to personalize AI feedback raises privacy and data-governance concerns (consent, storage, usage).

Qualitative consensus from workshop participants (50 scholars) noting data-collection requirements and governance risks; no empirical governance studies included.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... volume/type of learner data collected; privacy risk indicators; compliance with ...

Automated feedback may not capture pedagogical nuances expert teachers use (motivation, socio-emotional cues, complex reasoning), limiting pedagogical fit.

Expert syntheses from the workshop of 50 scholars highlighting limits of automation relative to expert teacher judgment; no empirical comparisons presented.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... coverage of socio-emotional and complex-reasoning cues in feedback; corresponden...

AI-generated feedback can be incorrect, misleading, or misaligned with learning objectives; assessing feedback quality is nontrivial.

Repeated concern raised across workshop participants (50 scholars) in qualitative synthesis; noted as a substantive risk and open challenge rather than empirically quantified here.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... feedback factual correctness; alignment with stated learning objectives; rate of...

Generalization across domains and long-term robustness to adversarial adaptation require further validation.

Authors explicitly note the need for further validation; the paper's reported experiments do not (in the provided summary) disclose broad domain coverage, longitudinal tests, or adversarial evolution studies.

high negative CoMAI: A Collaborative Multi-Agent Framework for Robust and ... generalization across domains; long-term robustness to adaptive adversaries

A modular system may increase engineering complexity and compute overhead compared to a single LLM endpoint.

Authors' caveat in the paper noting higher engineering and compute costs as a trade-off for modularity; the summary does not provide quantitative cost or latency measurements.

high negative CoMAI: A Collaborative Multi-Agent Framework for Robust and ... engineering complexity and compute/resource overhead

Quality of CoMAI depends on rubric design and on how the finite-state machine and agent prompts are specified.

Authors' noted limitation/caveat in the paper that system performance hinges on rubric and prompt/FSM design choices; this is a qualitative dependency rather than an empirically quantified effect in the summary.

high negative CoMAI: A Collaborative Multi-Agent Framework for Robust and ... assessment quality as a function of rubric/FSM/agent prompt design

Using C.A.P. entails trade-offs: potential increases in latency and compute cost and a risk of over-correction (unnecessary clarification).

Paper explicitly notes these trade-offs as part of the design discussion and proposes measuring latency, compute cost, and unnecessary clarification rate in evaluations; this is an acknowledged design risk rather than an empirically quantified result.

high negative A Context Alignment Pre-processor for Enhancing the Coherenc... response latency, compute cost per session, rate of unnecessary clarifications

Integration costs—domain modeling, human-in-the-loop protocols, and regulatory/liability frameworks—are significant barriers to deployment.

Conceptual assessment of operational and regulatory requirements; no quantified cost studies provided.

high negative Argumentative Human-AI Decision-Making: Toward AI Agents Tha... implementation cost and organizational burden for deploying argumentative AI sys...

AFs and LLMs may be gamed or misled; adversaries may exploit systems leading to strategic argumentation or manipulation.

Conceptual security/adversarial concern based on known vulnerabilities in ML and strategic behavior; no adversarial tests reported.

high negative Argumentative Human-AI Decision-Making: Toward AI Agents Tha... system vulnerability metrics / susceptibility to adversarial manipulation

Faithful extraction—aligning LLM-extracted arguments with formal AF primitives and ensuring fidelity to source evidence—is a key technical challenge.

Paper's explicit identification of failure modes and alignment issues; grounded in documented limitations of IE/LLMs (no empirical quantification here).

high negative Argumentative Human-AI Decision-Making: Toward AI Agents Tha... fidelity/alignment error rate between extracted elements and source evidence

Computational argumentation approaches have required heavy feature engineering and domain-specific knowledge to be effective.

Conceptual claim grounded in prior work and practical experience reported in the literature; no quantitative cost estimates provided in the paper.

high negative Argumentative Human-AI Decision-Making: Toward AI Agents Tha... engineering cost / domain modeling effort required for AF-based systems

Automation bias (human tendency to defer to automated outputs) compounds the risk that GLAI errors become embedded in legal processes.

Behavioral literature review on automation bias and trust in AI systems; applied to legal-context vignettes. No primary empirical test within the paper.

high negative Why Avoid Generative Legal AI Systems? Hallucination, Overre... likelihood of human operators deferring to GLAI outputs (automation bias effect)

Current models heavily rely on large static datasets and batch training and exhibit poor lifelong/continual learning.

Synthesis of common practices in contemporary ML (supervised pretraining and offline training paradigms); no new experiments provided.

high negative Why AI systems don't learn and what to do about it: Lessons ... continual learning performance; dependence on dataset size and batch training

Proactive AI at national scale amplifies concerns around transparency, accountability, privacy, and potential misuse, necessitating robust regulatory and ethical frameworks.

Normative and ethical analysis in the paper, supported by general literature on large-scale AI governance; no empirical assessment of regulatory effectiveness in Russia included.

high negative DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... risks to transparency, accountability, privacy and potential for misuse

Aggregating informal and recommendation data raises privacy and consent issues in low-regulation contexts, requiring governance safeguards.

Policy and ethical consideration based on the nature of the data used; no specific privacy-impact assessment reported in the summary.

high negative AI-Driven Skill Mapping and Gig Economy Matching Algorithm f... privacy risk / consent compliance

NLP/ML systems can inherit biases from inputs (underrepresentation, noisy self-reports, biased recommendations) and may therefore disadvantage some youth unless transparency and fairness constraints are implemented.

Reasoned risk assessment grounded in known properties of ML/NLP; the pilot summary does not report an audit or measured bias outcomes.

high negative AI-Driven Skill Mapping and Gig Economy Matching Algorithm f... bias in match outcomes / differential access by demographic group

There are limited randomized controlled trials or longitudinal evaluations; few studies measure patient-relevant outcomes or economic impacts.

Literature synthesis noting scarcity of RCTs and long-term observational studies, and absence of widespread patient-outcome and cost-effectiveness evaluations in existing publications.

high negative Human-AI interaction and collaboration in radiology: from co... number of RCTs/longitudinal studies, frequency of patient outcome and economic o...

Many published studies focus on standalone algorithm accuracy rather than clinician–AI joint performance in routine workflows.

Review of the literature categorizing study designs (preponderance of algorithm development/validation studies, fewer reader-in-the-loop, simulation, or deployment studies).

high negative Human-AI interaction and collaboration in radiology: from co... proportion of studies reporting standalone algorithm metrics versus those report...

Advanced technologies' complexity and lack of explainability create risks for audit reliability and professional judgement.

Findings from literature synthesis and professional/regulatory perspectives included in the review; presented as an identified risk/challenge rather than quantified effect.

high negative Audit 5.0 and the Digital Transformation of Auditing: The Ro... audit reliability and the exercise of professional judgement in presence of opaq...

Audit 5.0 introduces key challenges: data quality and integration issues, complexity and explainability of advanced technologies, regulatory and ethical uncertainty, and skills shortages combined with cultural resistance.

Systematic literature review and synthesis of professional standards and regulatory perspectives; assertions based on reviewed literature rather than a single empirical dataset.

high negative Audit 5.0 and the Digital Transformation of Auditing: The Ro... barriers to adoption/readiness factors (data quality, explainability, regulatory...

Gaps in infrastructure readiness, digital awareness, and inclusive policy frameworks hinder equitable AI adoption among micro‑enterprises.

Cross‑study synthesis of barriers identified across the 55 included articles; infrastructural, awareness, and policy barriers are explicitly reported as recurring themes.

high negative Role of AI in Enhancing Work Efficiency and Opportunities fo... barriers to AI adoption (infrastructure readiness, digital awareness, policy inc...

Only 24.4% of at-risk workers have viable transition pathways, where 'viable' is defined as sharing at least 3 skills and achieving at least 50% skill transfer.

Analysis of job-to-job transitions on the validated knowledge graph using an operational definition of viable pathways (>=3 shared skills and >=50% skill transfer); proportion of at-risk workers meeting that criterion reported as 24.4% (underlying at-risk worker count not given in the excerpt).

high negative Graph-Based Analysis of AI-Driven Labor Market Transitions: ... percentage of at-risk workers with viable transition pathways (per defined thres...

20.9% of jobs in the dataset face high automation risk.

Risk classification applied to the jobs represented in the knowledge graph (sample size: 9,978 job postings); proportion of jobs labeled as 'high automation risk' is reported as 20.9%.

high negative Graph-Based Analysis of AI-Driven Labor Market Transitions: ... proportion of jobs classified as high automation risk

Japan's population is shrinking, the share of working-age people is falling, and the number of elderly is growing fast.

Statement grounded in official national statistics referenced by the paper (demographic time series used to initialize and calibrate the system dynamics model).

high negative Fiscal Dynamics in Japan under Demographic Pressure total population size; share (%) of working-age population; number and share (%)...

A preregistered, nationally representative replication experiment in the United States (N = 1,200) replicates the causal finding that a labor-replacing (vs. labor-creating) AI frame reduces willingness to politically engage with future AI developments.

Preregistered randomized experiment (nationally representative US sample, N = 1,200) replicating the UK manipulation and measuring willingness to engage politically regarding AI.

high negative Perceiving AI as labor-replacing reduces democratic legitima... willingness to politically engage with future AI developments (self-reported)

A preregistered, nationally representative experiment in the United Kingdom (N = 1,202) shows that exposure to a labor-replacing (vs. labor-creating) AI frame causally reduces trust in democracy.

Preregistered randomized experiment (nationally representative UK sample, N = 1,202) manipulating AI framing (labor-replacing vs. labor-creating) and measuring trust/satisfaction with democratic institutions.

high negative Perceiving AI as labor-replacing reduces democratic legitima... trust in democracy / satisfaction with democratic institutions (post-manipulatio...

Large-scale survey data indicate that the public tends to view AI as labor-replacing rather than labor-creating.

Cross-sectional survey (N = 37,079 respondents across 38 European countries); descriptive analysis of responses about AI's labor market impact.

high negative Perceiving AI as labor-replacing reduces democratic legitima... public perception of AI's labor-market impact (labor-replacing vs. labor-creatin...

Only 12% of gig workers participate in retirement savings programs.

Survey and administrative measures of retirement-savings participation among gig workers in the 24-country sample.

high negative The Gig Economy and Labor Market Restructuring: Platform Wor... proportion of gig workers participating in retirement savings programs (%)

Only 23% of gig workers report access to employer-provided health insurance.

Self-reported benefits coverage from labor force surveys and linked administrative records for gig workers across the 24 OECD countries (2015–2025).

high negative The Gig Economy and Labor Market Restructuring: Platform Wor... proportion of gig workers reporting access to employer-provided health insurance...

Human judgment is constrained by bounded rationality, cognitive biases, and information-processing limitations.

Cited as established findings from prior research across decision sciences and related fields (extensive literature evidence referenced; no new empirical data in this paper's abstract).

high negative Reframing Organizational Decision-Making in the Age of Artif... human judgment accuracy/quality and cognitive processing capacity

Ireland exhibits the largest gender gap in advanced digital task use: approximately 44% of men versus 18% of women perform advanced digital tasks — a 26 percentage point gap, close to double the European average.

Country-level descriptive statistics from ESJS for Ireland reporting shares of men and women performing advanced digital tasks. (Exact Irish sample size not provided in the excerpt.)

high negative Squandered skills? Bridging the digital gender skills gap fo... Share (%) of men and women in Ireland performing advanced digital tasks; gender ...

Across Europe, women are around 15 percentage points less likely than men to perform advanced digital tasks in their jobs.

Empirical analysis of the European Skills and Jobs Survey (ESJS) (Cedefop, 2021) using regression-based estimates and descriptive statistics across European countries. (Exact sample size and country count not provided in the excerpt.)

high negative Squandered skills? Bridging the digital gender skills gap fo... Probability / share of workers performing advanced digital tasks (binary indicat...

AI substitutes many routine tasks, including both manual and cognitive/rule-based activities, disproportionately affecting middle-skill occupations.

Task-based substitution reasoning within SBTC framework and cross-sectoral task analysis. The paper provides conceptual synthesis rather than presenting new microdata or quantified task-level estimates.

high negative Artificial Intelligence, Automation, and Employment Dynamics... employment and wages in routine / middle-skill occupations; task displacement

Key implementation challenges include data quality and integration, model interpretability, cybersecurity and privacy, regulatory/compliance uncertainty, skills gaps among accounting professionals, and implementation costs.

Identified by the paper through literature review and practitioner reports; these are presented as recurring barriers rather than quantified with a specific sample.

high negative Role of Artificial Intelligence in the Accounting Sector incidence/severity of implementation barriers (data quality scores, integration ...

Two regimes emerge: an inequality-decreasing regime when AI behaves like a broadly available commodity technology or when labor-market institutions share rents widely (high ξ).

Model regime characterization and calibrated counterfactuals showing falling wage dispersion and ΔGini under commodity-like AI assumptions or higher rent-sharing elasticity.

high negative When AI Levels the Playing Field: Skill Homogenization, Asse... wage dispersion and aggregate inequality (ΔGini)

Generative AI compresses within-task skill differences (reduces dispersion of individual task performance).

Theoretical task-based model and calibrated quantitative simulations (Method of Simulated Moments matching six empirical moments) showing reductions in within-task performance dispersion after introducing AI technology.

high negative When AI Levels the Playing Field: Skill Homogenization, Asse... within-task performance dispersion (skill/ability variance within a task)

« Prev 1 2 3 … 9 10 11 … 64 65 Next »