Evidence (8066 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	417	113	67	480	1091
Governance & Regulation	419	202	124	64	823
Research Productivity	261	100	34	303	703
Organizational Efficiency	406	96	71	40	616
Technology Adoption Rate	323	128	74	38	568
Firm Productivity	307	38	70	12	432
Output Quality	260	71	27	29	387
AI Safety & Ethics	118	179	45	24	368
Market Structure	107	128	85	14	339
Decision Quality	177	75	37	19	312
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	74	34	78	9	197
Skill Acquisition	98	36	40	9	183
Innovation Output	121	12	24	13	171
Firm Revenue	98	35	24	—	157
Consumer Welfare	73	31	37	7	148
Task Allocation	87	16	34	7	144
Inequality Measures	25	76	32	5	138
Regulatory Compliance	54	61	13	3	131
Task Completion Time	89	7	4	3	103
Error Rate	44	51	6	—	101
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	33	11	7	98
Wages & Compensation	54	15	20	5	94
Team Performance	47	12	15	7	82
Automation Exposure	27	26	10	6	72
Job Displacement	6	39	13	—	58
Hiring & Recruitment	40	4	6	3	53
Developer Productivity	34	4	3	1	42
Social Protection	22	11	6	2	41
Creative Output	16	7	5	1	29
Labor Share of Income	12	6	9	—	27
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Heterogeneity across universities implies that targeting high-performing institutions and diffusing their practices could be more effective than uniform expansion of AI training.

Observed variation in employment effectiveness, placement outcomes, and wages across the 191 universities; policy implication drawn from comparative performance patterns.

medium mixed Employment og Graduates of Educational Programs in the Field... Relative effectiveness of university programs (employment rates, wage outcomes) ...

Labor market institutions (unions, collective bargaining), education and training systems, social safety nets, and regulations substantially mediate distributional and aggregate outcomes of AI adoption.

Comparative institutional analysis and equilibrium models linking institutional settings to wage-setting and reallocation dynamics, supported by empirical cross-jurisdiction comparisons where available.

medium mixed Intelligence and Labor Market Transformation: A Critical Ana... distributional outcomes (inequality), unemployment, and wage-setting dynamics

Developing economies face different trade-offs from AI adoption than advanced economies, due to different occupational structures and complementarities.

Comparative analyses and sectoral studies drawing on cross-country microdata and institutional comparisons; theoretical models highlighting differences in task composition and absorptive capacity.

medium mixed Intelligence and Labor Market Transformation: A Critical Ana... country-level employment and wage impacts, particularly by sector and occupation...

Occupational reallocation occurs: declines in some routine occupations alongside growth in AI-complementary roles (e.g., AI maintenance, oversight, and creative tasks).

Administrative and household employment data analyzed with occupational breakdowns, supplemented by task-mapping methods and panel/event-study approaches documenting shifting occupational shares over time.

medium mixed Intelligence and Labor Market Transformation: A Critical Ana... occupational employment shares and job creation in AI-complementary roles

Lower-skill roles experience mixed outcomes: some see adverse effects from automation while others benefit where AI is complementary to their tasks.

Microdata analyses and case studies showing heterogeneous effects by task complementarity; task-based exposure measures that differentiate which low-skill tasks are automatable versus augmentable.

medium mixed Intelligence and Labor Market Transformation: A Critical Ana... employment and wages of lower-skill workers

AI contributes to wage polarization: earnings grow at the top of the distribution and stagnate or fall for middle occupations.

Wage distribution decompositions and panel regression studies that examine percentile-level wage changes, combined with task-based exposure measures linking AI adoption to differential impacts across the wage distribution.

medium mixed Intelligence and Labor Market Transformation: A Critical Ana... wage changes across distribution (top percentiles vs. middle percentiles)

The employment impact of automation depends crucially on labour-market structure (formal vs informal), availability of alternative employment, and social protections.

Theoretical framing supported by secondary literature comparing institutional contexts and their mediating effects on automation outcomes; no primary causal estimates in this paper.

medium mixed Who Loses to Automation? AI-Driven Labour Displacement and t... employment impact of automation (unemployment, underemployment, reallocation rat...

Standard policy responses focused on retraining and active labor-market programs are necessary but insufficient to fully offset structural job losses where K_T substitutes broadly for tasks.

Model simulations and policy experiments in the calibrated dynamic model comparing scenarios with aggressive retraining versus structural fiscal/interventionist reforms; discussion of empirical limits from case studies and historical reskilling outcomes.

medium mixed The Macroeconomic Transition of Technological Capital in the... employment recovery and distributional outcomes under alternative policy scenari...

Routine automation of routine drafting tasks by GLAI may reduce demand for junior drafting labor while increasing demand for skilled reviewers, auditors, and legal technologists.

Labor-market reasoning based on task automation literature and illustrative vignettes; no labor-force survey or longitudinal employment data provided.

medium mixed (negative for junior drafting roles, positive for reviewer/technologist roles) Why Avoid Generative Legal AI Systems? Hallucination, Overre... employment demand by role (junior drafters vs. skilled reviewers/auditors/techno...

Unstructured physical trades and high-stakes caretaking roles exhibit absolute resilience to LLM-driven automation (i.e., very low OAI), quantifying a 'Cognitive Risk Asymmetry.'

Empirical classification from computed OAIs showing low exposure for unstructured physical trades and high-stakes caretaking roles; the excerpt does not provide specific OAI values or counts.

medium negative Bounded by Risk, Not Capability: Quantifying AI Occupational... Relative Occupational Automation Index (OAI) for unstructured physical trades an...

Variance-based Human-in-the-Loop (HITL) validation with an expert panel demonstrates a profound cognitive gap: isolated algorithmic probabilities fail to encapsulate the "institutional premium" imposed by experts bounded by professional liability.

Empirical validation procedure reported: variance-based HITL validation involving an expert panel that compared algorithmic scores and expert adjustments, concluding a systematic difference attributed to institutional liability considerations. The excerpt does not give panel size or quantitative variance statistics.

medium negative Bounded by Risk, Not Capability: Quantifying AI Occupational... difference between algorithmic probabilities and expert-assessed risk (instituti...

Industry self-regulation has demonstrably failed, motivating the need for IASCA.

Proposal asserts a 'demonstrated failure of industry self-regulation' as rationale for IASCA; no specific empirical studies, incidents, or metrics are cited in the provided text.

medium negative IASCA: The International AI Safety Certification Authority —... effectiveness of industry self-regulation

Roughly half of the projected LFPR decline to 55% by 2050 is attributable to AI—equivalent to around 10 million lost jobs.

Authors' decomposition/interpretation of conditional forecast results under the rapid scenario reported in the abstract (ties LFPR decline to job-count equivalents).

medium negative Forecasting the Economic Effects of AI job losses attributable to AI (by 2050, rapid scenario)

Our findings echo observations of pervasive annotation errors in text-to-SQL benchmarks, suggesting quality issues are systemic in data engineering evaluation.

Comparative claim referencing prior observations in text-to-SQL literature and the authors' audit results on ELT-Bench; no new cross-benchmark quantitative analysis reported in the excerpt.

medium negative ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... presence of systemic annotation/benchmark quality issues across data engineering...

That measured machine-equivalent work appeared on no financial statement, workforce report, or government statistical return.

Claim about absence of reporting for the deployment's measured work (asserted in the paper for the deployment case).

medium negative HEWU: A Standardized Framework for Measuring Machine-Generat... reporting/disclosure of machine labor in formal records

The AI-as-advisor approach has limitations: people frequently ignore accurate advice, rely too much on inaccurate advice, and their decision-making skills may deteriorate over time.

Paper asserts these limitations in motivation/background and/or derives them from observed behavior in experiments (stated in abstract as known problems with AI-as-advisor).

medium negative Beyond AI advice -- independent aggregation boosts human-AI ... skill deterioration / susceptibility to incorrect advice

When given a choice between which information source to give to an AI agent, a large portion of subjects fail to select the more informative one.

Experimental condition where subjects chose which source (prompt vs revealed-preference data) to provide to an AI agent; reported result that a large portion did not choose the more informative source.

medium negative Should I State or Should I Show? Aligning AI with Human Pref... choice by subjects of which information source to provide to the AI (rate of sel...

The gap in predictive accuracy is driven by subjects' difficulty in translating their own preferences into written instructions.

Further analysis reported in the experiment attributing the observed accuracy gap to subjects' difficulty converting their preferences into prompts (presumably via analysis comparing content of prompts to revealed choices).

medium negative Should I State or Should I Show? Aligning AI with Human Pref... degree to which prompt quality explains predictive accuracy gap (i.e., translati...

The emergence and diffusion of these technologies create an era of labor displacement.

Framed in the paper as a premise motivating policy proposals; presented as a conceptual claim rather than supported by original empirical estimates in the text provided.

medium negative IoT, artificial intelligence, cloud computing and robotics a... labor displacement (job loss/occupational displacement)

Many automotive firms, especially those developing new energy and intelligent vehicles, have suffered financial distress and even exited the market.

Descriptive statement in the paper's introduction/motivation citing observed industry outcomes (financial distress and market exit) among automotive firms focused on NEV and intelligent vehicles.

medium negative The 'Intelligent Trap' in Corporate Finance—A Study Based on... financial distress / market exit

The dominant mechanism behind the performance drop is a collapse of Type2_Contextual issue detection at config_B, consistent with attention dilution in long contexts.

Analysis of issue-type specific detection rates shows Type2_Contextual detection collapses at config_B; interpretation ties this to attention dilution in longer contexts.

medium negative SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... Type2_Contextual issue detection rate

The economic inevitability of technological transformation (in agentic finance) and the critical urgency of proactive intervention.

Author claim synthesizing the paper's argument and modeling results (normative conclusion based on earlier analysis and assertions, not a validated empirical finding).

medium negative STRENGTHENING FINANCIAL WORKFORCE COMPETITIVENESS: A CURRICU... likelihood of technology-driven structural change in the finance workforce

Surveillance intensity is associated with hyper-vigilance (reported effect = -4.213).

One of the six propositions from the paper's trilevel framework; the abstract reports an effect value of '-4.213' associated with surveillance intensity → hyper-vigilance.

medium negative Algorithmic Control and Psychological Risk in Digitally Mana... hyper-vigilance (psychological arousal/state)

Platform workers receive 36.3% more third-party ratings than traditional workers.

Quantitative synthesis/summary reported in the paper (no primary sample size in abstract); likely aggregated from included studies.

medium negative Algorithmic Control and Psychological Risk in Digitally Mana... number of third-party ratings received

Platform workers experience 59.6% higher digital speed determination than traditional workers.

Quantitative synthesis/summary reported in the paper (no primary sample size given in the abstract); presumably aggregated from included studies comparing platform and traditional workers.

medium negative Algorithmic Control and Psychological Risk in Digitally Mana... digital speed determination

Our findings surface practical limits on the complexity people can manage in human-AI negotiation.

Synthesis claim based on the empirical study varying number of issues and observed decline in performance beyond three issues; presented as a conceptual/practical implication of the results.

medium negative From Overload to Convergence: Supporting Multi-Issue Human-A... maximum manageable negotiation complexity (number of issues before performance d...

Multiple competing arbitrageurs drive down consumer prices, reducing the marginal revenue of model providers.

Analytic argument and empirical/simulation results reported in the paper showing that competition among arbitrageurs lowers prices faced by consumers and decreases marginal revenue for model providers.

medium negative Computational Arbitrage in AI Model Markets consumer prices and marginal revenue of model providers

Distillation further creates strong arbitrage opportunities, potentially at the expense of the teacher model's revenue.

Experiments or analyses involving model distillation reported in the paper showing that distilled/student models enable profitable arbitrage and may reduce revenue captured by the original teacher model.

medium negative Computational Arbitrage in AI Model Markets arbitrage profitability enabled by distilled models and impact on teacher model ...

The pre-existing AI community dissolved as the tools went mainstream, and the new vocabulary was absorbed into existing careers rather than binding a new occupation.

Interpretation of resume-data patterns: observed dispersion of previously coherent AI practitioners and spread of AI-related vocabulary into other occupational records rather than consolidation into a new occupational cluster.

medium negative NLP Occupational Emergence Analysis: How Occupations Form an... population cohesion / absorption into existing careers (dissolution of standalon...

Beyond an environment-specific optimum, scaling further degrades institutional fitness because trust erosion and cost penalties outweigh marginal capability gains.

Analytical argument from the Institutional Scaling Law together with illustrative examples and discussion of mechanisms (trust erosion, cost penalties) in the paper.

medium negative Punctuated Equilibria in Artificial Intelligence: The Instit... institutional fitness (net effect of capability, trust, cost, compliance)

Bias effects vary by vulnerability type, with injection flaws being more susceptible to framing bias than memory corruption bugs.

Subgroup analysis in Study 1 comparing framing sensitivity across vulnerability classes (injection vs memory corruption) within the experiment dataset.

medium negative Measuring and Exploiting Confirmation Bias in LLM-Assisted S... change in vulnerability detection rate by vulnerability type

Model convergence in DRL can lead to crowded trades, which has implications for market stability and motivates a robust regulatory framework balancing innovation with market stability.

Analytical argument in the paper linking convergence/crowding to systemic effects; the excerpt does not include empirical market-impact studies, simulations, or measured incidence rates of crowding.

medium negative Deep Reinforcement Learning for Dynamic Portfolio Optimizati... market stability / systemic risk (incidence or severity of crowded trades result...

Deploying DRL at scale requires socio-technical infrastructure considerations including algorithmic governance, systemic risk management, and accounting for the environmental cost of large-scale computational finance.

Conceptual and system-level analysis presented in the paper; no empirical auditing data, carbon-footprint measurements, or governance case studies are provided in the excerpt.

medium negative Deep Reinforcement Learning for Dynamic Portfolio Optimizati... governance readiness, systemic risk exposure, and environmental/resource cost me...

Two sources of spurious performance addressed are memorization bias from ticker-specific pre-training and survivorship bias from flawed backtesting.

Problem identification and methodological focus: the paper names memorization bias and survivorship bias as primary confounders it aims to mitigate. The excerpt does not detail experiments that quantify the magnitude of those biases or the degree to which they were reduced.

medium negative Can Blindfolded LLMs Still Trade? An Anonymization-First Fra... reduction/mitigation of spurious performance attributable to memorization and su...

Traditional ex ante regulatory approaches struggle to keep pace with AI development, exacerbating the 'pacing problem' and the Collingridge dilemma.

Theoretical/legal literature review and conceptual argument presented in the paper (no empirical sample or quantitative data reported in the abstract).

medium negative Experimentalism beyond ex ante regulation: A law and economi... regulatory responsiveness/effectiveness in relation to AI technological change

Low internal conflict or unanimity can be diagnostic of variance depletion (i.e., exclusion) rather than healthy integration, so governance systems should treat low conflict as a potential red flag until heterogeneity integration is verified.

Interpretive policy implication derived from the model's demonstration that exclusionary processes can produce deceptively low observed disagreement while increasing fragility; this recommendation is based on theoretical reasoning without empirical validation in the paper.

medium negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... internal conflict levels (observed dissent/unanimity) as indicator of variance d...

Most existing candidate matching systems act as keyword filters, failing to handle skill synonyms and nonlinear careers, resulting in missed candidates and opaque match scores.

Paper's introductory assertion about limitations of most current systems. The excerpt does not cite empirical studies, statistics, or systematic reviews to substantiate this claim.

medium negative JobMatchAI An Intelligent Job Matching Platform Using Knowle... limitations of extant systems: keyword-filter behavior, failure on skill synonym...

TDD (test-driven development) prompting alone increased regressions to 9.94%.

Empirical result reported in the paper comparing a TDD prompting intervention against other workflows on the benchmark (values given in the excerpt).

medium negative TDAD: Test-Driven Agentic Development - Reducing Code Regres... regression rate (percentage of tests that regressed) under TDD prompting

Current benchmarks focus almost exclusively on resolution rate, leaving regression behavior under-studied.

Paper's critique of existing benchmark literature and practices (asserted by authors in background; no specific benchmark survey details in the excerpt).

medium negative TDAD: Test-Driven Agentic Development - Reducing Code Regres... coverage of regression measurement in existing benchmarks

The paper identifies five structural challenges arising from the memory governance gap: memory silos across agent workflows; governance fragmentation across teams and tools; unstructured memories unusable by downstream systems; redundant context delivery in autonomous multi-step executions; and silent quality degradation without feedback loops.

Qualitative analysis and problem framing presented in the paper (authors' identification of five specific challenges).

medium negative Governed Memory: A Production Architecture for Multi-Agent W... presence/identification of five structural governance challenges

AI raises managerial cognitive complexity and creates recurring tensions between algorithmic optimisation and systemic, ethical reasoning.

Theoretical synthesis highlighting emergent tensions from integrating computational optimisation with systems thinking and ethical considerations; conceptual, no empirical tests.

medium negative Comparative analysis of strategic vs. computational thinking... managerial cognitive complexity and frequency/severity of optimisation vs ethica...

Underprovision of verification is likely if left to market forces because information quality has positive externalities and misinformation imposes negative externalities, justifying public funding, subsidies, or regulation.

Economic reasoning and policy implications drawn from the study's findings and the literature on public goods/externalities.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... level of provision of verification services relative to social optimum

Censorship, restricted data flows, and government interference fragment markets, limit economies of scale, and favor well-resourced, internationally connected actors—widening capacity gaps.

Interpretive economic analysis grounded in observed access constraints and comparative case material across the three platforms.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... market fragmentation and distribution of capacity among actors

Limited data access and censorship reduce the efficacy of AI tools by creating training and validation gaps; legal risks complicate use of proprietary platforms and cloud services.

Interviews describing constraints on data availability and legal/operational barriers to using some platforms and cloud services; interpretive analysis of implications for AI training/validation.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... AI tool effectiveness (training/validation quality) and deployability

Generative AI increases the volume and sophistication of misinformation (deepfakes, fabricated documents), raises false-positive risks, and can be weaponized by state or nonstate actors.

Interview accounts and qualitative analysis noting observed or anticipated misuse of generative models and associated verification challenges.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... misinformation volume/sophistication and verification error risk

Resource constraints—limited staff time, funding, and technical capacity—are recurring operational challenges for these platforms.

Staff and stakeholder interviews plus analysis of organizational reports indicating staffing, funding, and technical limitations.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... staffing levels, funding availability, technical capacity

Platforms experience difficulty building and retaining audience trust and engagement, especially in contexts of high public skepticism or polarization.

Interview data from platform staff describing audience engagement challenges, supported by analysis of audience-focused platform formats and community-reporting strategies.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... audience trust and engagement levels

Platforms face limited or asymmetric access to primary data sources such as platform APIs, state data, and archives.

Interview accounts and document analysis noting restricted API access and barriers to state-held data and archives across the three cases.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... access to primary data sources

Censorship and legal risks constrain reporting and distribution for these fact-checking platforms.

Consistent reports from interview subjects and corroborating document analysis indicating legal/censorship-related limitations on publishing and distribution.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... reporting frequency, distribution channels, and content choices

Political instability, legal pressure, and censorship strongly shape what platforms can investigate, publish, and access in the region.

Thematic findings from semi-structured interviews with platform staff and document analysis of public reports and policy statements across the three country cases.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... ability to investigate, publish, and access information

« Prev 1 2 3 … 85 86 87 … 161 162 Next »