Evidence (5157 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Human Ai Collab Remove filter

AI adoption in peripheral economies is not a purely technological or financial challenge but a social and human capital challenge, embedded in a biocultural environment shaped by brain drain, institutional thinness, and weak civic intermediation.

Synthesis of interview findings using Bitsani's Biocultural City framework; qualitative evidence from 12 interviews supports this argument.

high negative Artificial Intelligence, Social Capital, and Sustainable Emp... nature_of_challenges_to_AI_adoption

Knowledge deficits and financial constraints emerge as primary barriers [to AI adoption].

Thematic analysis of the twelve semi-structured interviews reporting these themes as primary barriers.

high negative Artificial Intelligence, Social Capital, and Sustainable Emp... barriers_to_AI_adoption

Disclosure banners, conversion A/B testing, UI dark-pattern taxonomies, and generic LLM safety scores were built for older interfaces and miss the prose-recommendation surface where the steering happens.

Argument in paper that existing governance/audit tools designed for ranked-list or older UIs do not cover the new single-sentence prose-recommendation surface; no empirical test reported in excerpt.

high negative TourMart: A Parametric Audit Instrument for Commission Steer... coverage/effectiveness of existing governance tools for prose recommendations

Common failures include replacing essential operations such as sweeps, lofts, and twist-extrudes with simpler sketch-and-extrude patterns.

Error-mode analysis described in the paper/abstract showing that models substitute complex CAD operations (sweep, loft, twist-extrude) with simpler sketch-and-extrude sequences.

high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... use_of_appropriate_CAD_operations_in_generated_code

Common failures include misinterpreting industrial design parameters.

Reported error analysis in the paper/abstract indicating models often misinterpret engineering/design parameters when generating CAD programs.

high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... accuracy_of_inferred_design_parameters

Common failures include missing fine 3D structure.

Qualitative and quantitative analysis of model outputs on BenchCAD reported in the paper/abstract noting missing fine 3D structural details as a frequent error mode.

high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... completeness_of_3D_structure_in_generated_models

Current AI development trajectory reflects value choices that prioritize conversational generality over domain specificity, accountability, and long-term social sustainability.

Normative/critical analysis in the paper highlighting design priorities and trade-offs; no empirical measurement provided.

high negative What if AI systems weren't chatbots? Relative prioritization of conversational generality versus domain specificity, ...

Sustained investment in large-scale chatbot infrastructures increases environmental costs.

Paper asserts environmental impacts from infrastructure investment (energy, resource use) as part of systemic critique; no quantified environmental measurements or sample size reported.

high negative What if AI systems weren't chatbots? Environmental costs associated with energy/resource use of chatbot infrastructur...

Chatbot-driven AI development contributes to concentration of economic power.

Argumentation about industry dynamics and infrastructure centralization in the paper; no empirical market-concentration metrics or sample provided.

high negative What if AI systems weren't chatbots? Concentration of economic power among firms/platforms producing and hosting chat...

The normalization of chatbots contributes to labor displacement.

Theoretical argument linking widespread chatbot adoption to changes in work and employment; no empirical displacement estimates provided.

high negative What if AI systems weren't chatbots? Labor displacement (job losses attributable to chatbot adoption)

Normalization of chatbot-mediated interaction alters patterns of work, learning, and decision-making, contributing to deskilling, homogenization of knowledge, and shifting expectations of expertise.

Analytical reasoning and literature-informed claims in the paper; no quantitative measurement or sample reported.

high negative What if AI systems weren't chatbots? Levels of skill retention/ acquisition (deskilling), diversity of knowledge (hom...

Chatbot-based systems often fail to adequately meet user needs, particularly in complex or high-stakes contexts, while projecting confidence and authority.

Qualitative argumentation and illustrative examples in the paper; no reported controlled empirical study or sample size.

high negative What if AI systems weren't chatbots? Adequacy of chatbot responses to user needs in complex/high-stakes contexts and ...

The chatbot paradigm is not a neutral interface choice, but a dominant sociotechnical configuration whose widespread adoption reshapes social, economic, legal, and environmental systems.

Conceptual argument and synthesis in the paper (theoretical analysis); no empirical sample or quantitative data reported.

high negative What if AI systems weren't chatbots? Degree to which chatbot adoption reshapes social, economic, legal, and environme...

This reliance frequently leads to an excessive reliance on mechanistic interpretability to address a deployment challenge beyond its intended scope.

Author argument drawing on conceptual critique and cited empirical distinctions (paper's argumentative content).

high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... appropriateness of mechanistic interpretability as a gate for deployment

AI deployment in sensitive domains (health care, credit, employment, criminal justice) is often treated as unsafe to authorize until model internals can be explained.

Author assertion based on observed regulatory and institutional tendencies described in the paper (argumentative / contextual evidence within the paper).

high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... authorization policy stance toward AI in sensitive domains (requirement for inte...

A scoping review found that only 9.0% of FDA-approved AI/ML device documents contained a prospective post-market surveillance study.

Paper references a scoping review that examined FDA-approved AI/ML device documents and reported the 9.0% figure.

high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... presence of prospective post-market surveillance study in FDA AI/ML device docum...

A 53-percentage-point gap between internal representations and output correction shows that understanding may not translate into action.

Paper cites a recent empirical finding reporting a 53 percentage-point gap between models' internal representations and their ability to correct outputs (described as 'recent evidence').

high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... gap between internal model representations and ability to correct outputs

In labor-intensive industries, industrial robots shorten the backward linkage length (i.e., they reduce backward linkage length in labor-intensive sub-sectors).

Heterogeneity analysis in the paper comparing effects across labor-intensive sub-sectors within the panel of 14 manufacturing sub-sectors; reported finding of a negative effect on backward linkage length in labor-intensive industries.

high negative Research on the impact of industrial robot application on th... backward linkage length (a component of global value chain length) in labor-inte...

Institutional inertia in property valuation poses risks to asset pricing, collateral risk modelling and investor confidence.

Analytical inference from interview findings and theoretical synthesis highlighting implications for property investment and financial market stability.

high negative Exploring barriers to valuation technology adoption in prope... risks to asset pricing, collateral risk modelling and investor confidence

Despite advances in automation, data analytics and AI, the sector has been slow to digitise.

Background statement supported by interview data and sector observation reported in the study.

high negative Exploring barriers to valuation technology adoption in prope... pace of digitisation in the property valuation sector

The IDOI framework provides a transferable model for understanding digital transformation in regulated, high-trust professions and highlights the market-level risks of institutional inertia in property valuation.

Development of the IDOI conceptual framework from qualitative data and theoretical integration; authors' claim about transferability and implications.

high negative Exploring barriers to valuation technology adoption in prope... transferability of the framework and market-level risks from institutional inert...

Generational divides, protectionist attitudes and fears of automation reinforce digital resistance.

Qualitative interview evidence reporting attitudes across cohorts of valuers and firm personnel; thematic analysis identifying cultural and attitudinal themes.

high negative Exploring barriers to valuation technology adoption in prope... cultural/attitudinal resistance to VTech

The Valuers Act (1948), fragmented infrastructure and sovereignty concerns limit innovation.

Interview data from practitioners, firm leaders and regulators in New Zealand citing specific regulatory and infrastructure constraints; thematic analysis.

high negative Exploring barriers to valuation technology adoption in prope... regulatory and infrastructure constraints on innovation

Barriers to adoption arise primarily from institutional conservatism, outdated regulation and weak data governance rather than technical shortcomings.

Qualitative semi-structured interviews with valuers, firm leaders and regulators in New Zealand; thematic analysis guided by Rogers' diffusion of innovations and institutional theory synthesised into the IDOI framework.

high negative Exploring barriers to valuation technology adoption in prope... barriers to VTech adoption

Even access to the true conditional vulnerability probability cannot eliminate misallocation: aleatoric uncertainty over individual vulnerability status is irreducible, and probabilistic targeting inevitably misallocates some resources.

Theoretical argument in the paper (conceptual/theoretical result about irreducible aleatoric uncertainty and its implications for probabilistic targeting).

high negative The Limits of AI-Driven Allocation: Optimal Screening under ... misallocation of resources (allocation error due to aleatoric uncertainty)

Consequently, generated artifacts may exhibit brittle behavior and limited deployability.

Paper asserts that lack of production awareness leads to brittle artifacts and limited deployability; no quantitative measures or sample sizes provided in the abstract.

high negative Architectural Constraints Alignment in AI-assisted, Platform... brittleness of artifacts and deployability

AI-assisted development tools often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments.

Asserted observation in the paper arguing limitations of general-purpose AI code generation when targeting production-ready systems; no empirical sample size or methodological details provided in the excerpt.

high negative Architectural Constraints Alignment in AI-assisted, Platform... awareness of architectural constraints / suitability for production

Current AI tools are not yet mature enough to replace developers.

Conclusion drawn from the controlled experiment and participant feedback comparing AI-assisted vs traditional task-splitting.

high negative Splitting User Stories Into Tasks with AI -- A Foe or an All... suitability of AI to replace developers

Breaking down user stories into actionable tasks is a critical yet time-consuming process in agile software development.

Background/introductory statement in the paper describing the problem motivation; no experimental sample size reported for this claim.

high negative Splitting User Stories Into Tasks with AI -- A Foe or an All... time required to split user stories (descriptive claim about time consumption)

There are three practical failure modes produced or amplified by AI-assisted causal analysis: (1) method-data mismatch, where AI bypasses expertise at execution; (2) confidence laundering, where AI amplifies the credibility of formatted output; and (3) invisible forking, which spans both.

Taxonomy created and justified in the paper via conceptual argument and illustrative discussion; no empirical classification study or prevalence estimates provided.

high negative Vibe Econometrics and the Analysis Contract types of inferential failure modes arising in AI-assisted causal analysis

AI industrializes the packaging of existing inferential failure modes: the barrier between naming a method and executing it has collapsed, allowing weak foundations, dressed as rigorous analysis, to reach audiences at a scale, speed, and polish that previously required expertise.

Conceptual claim supported by narrative reasoning and illustrative examples; no empirical data on scale, speed, or reach are given.

high negative Vibe Econometrics and the Analysis Contract scale/speed/polish of dissemination of weak analyses (i.e., reach/adoption of lo...

AI changes the incidence, observability, and persuasive force of inferential failures enough to create a practically distinct governance problem (even if it does not invent previously nonexistent inferential failures).

Argumentative/theoretical reasoning in the paper; no empirical measurement of incidence, observability, or persuasiveness provided.

high negative Vibe Econometrics and the Analysis Contract governance challenge arising from changed incidence, observability, and persuasi...

When AI assists with methods whose validity depends on assumptions that cannot be verified from the output alone ("vibe inference"), the failure surface is structurally different: the output does not reliably signal invalidity, and when it does, recognizing the signal requires the expertise the workflow bypasses.

Logical/qualitative argument and definition development in the paper (no empirical validation or measured instances provided).

high negative Vibe Econometrics and the Analysis Contract observability/detectability of invalid inference and requirement of expert knowl...

AI-assisted methodology ("vibe methodology") democratizes the failure modes specific to each domain.

Conceptual/theoretical argument presented in the paper; no empirical sample, quantitative data, or experiments reported.

high negative Vibe Econometrics and the Analysis Contract democratization of domain-specific inferential failure modes (i.e., more widespr...

AI adoption deepens the negative indirect effect of CEO–TMT faultlines on green innovation via reduced eco-attention (moderated mediation).

Reported moderated mediation analysis on the panel dataset (35,347 firm-year observations) showing that AI moderates the indirect path from CEO–TMT faultlines to green innovation through eco-attention, making the indirect effect more negative when AI is greater.

high negative When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... green innovation (indirect effect via eco-attention)

AI technology strengthens the negative relationship between CEO–TMT faultlines and eco-attention (AI exacerbates the adverse effect of faultlines on eco-attention).

Moderation/interaction analysis reported in the paper using the same panel dataset (35,347 firm-year observations) indicating a significant interaction between AI adoption and CEO–TMT faultlines on eco-attention.

high negative When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... eco-attention

CEO–TMT faultlines reduce eco-attention (organizational attention to environmental issues).

Direct association reported in the paper from regression/mediation models using the panel dataset (35,347 firm-year observations) showing a negative relationship between CEO–TMT faultlines and eco-attention.

high negative When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... eco-attention

CEO–TMT faultlines negatively affect green innovation through reduced eco-attention.

Empirical mediation analysis on the panel dataset (35,347 firm-year observations, 2010–2023) testing CEO–TMT faultlines -> eco-attention -> green innovation.

high negative When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... green innovation (mediated by eco-attention)

Municipal 311 call centers and complaint intake systems face a structural mismatch between incoming volume and classification capacity that produces a bottleneck and differential service quality that follows income and racial lines.

Stated in the paper's introduction; cites prior work (Liu 2024 SLA) as support for the differential service-quality / demographic claim. No sample size or quantitative result reported in the excerpt.

high negative Scaling the Queue: Reinforcement Learning for Equitable Call... differential service quality by income and race

There is an absence of agreed-upon benchmarks for evaluating AI systems.

Introductory chapter notes lack of standardized evaluation benchmarks as a cross-cutting concern; presented as an analytical observation by the task force.

high negative Introduction: Artificial Intelligence, Politics, and Politic... existence of standardized evaluation benchmarks for AI

AI systems exhibit bias.

Introductory chapter points to bias in AI systems as a recurring theme; supported by the broader literature cited in the report (no numerical sample reported in the introduction).

high negative Introduction: Artificial Intelligence, Politics, and Politic... bias and fairness issues in AI system outputs and decisions

AI model outputs are often opaque and non-replicable.

Introductory chapter identifies opacity and non-replicability of AI outputs as a cross-cutting theme; claim is based on literature synthesis and conceptual critique in the report.

high negative Introduction: Artificial Intelligence, Politics, and Politic... transparency and replicability of AI model outputs

A small number of AI corporations have unprecedented power.

Introductory chapter highlights the theme of concentrated corporate power in AI; asserted as an observational claim in the report's framing rather than derived from a presented empirical sample in the introduction.

high negative Introduction: Artificial Intelligence, Politics, and Politic... concentration of corporate power in the AI industry (market control, platform in...

GPT-4.1 exhibits hidden workflow shortcuts despite achieving perfect TSR and HF1.

Model-level observation from the ASR analysis within the experiment (paper reports GPT-4.1 had perfect TSR and HF1 but failed trajectory-level fidelity).

high negative Beyond Task Success: Measuring Workflow Fidelity in LLM-Base... trajectory fidelity vs. standard metrics (TSR, HF1)

Applied to the Hierarchical Multi-Agent System for Payments (HMASP) across 18 LLMs and 90,000 task instances, ASR reveals that 10 of 18 models systematically skip a confirmation checkpoint during payment checkout, a deviation invisible to both TSR and HF1, while 8 models enforce the checkpoint perfectly.

Empirical evaluation reported in the paper: HMASP tested across 18 LLMs and 90,000 task instances; analysis via ASR showing checkpoint-skipping behavior for 10 models and correct enforcement for 8 models.

high negative Beyond Task Success: Measuring Workflow Fidelity in LLM-Base... adherence to expected workflow transitions (confirmation checkpoint adherence)

From an information-theoretic perspective, this transition corresponds to an emergent information bottleneck in the human-AI loop, where entropy reduction reflects loss of diversity and support under closed-loop feedback rather than beneficial compression.

Theoretical / information-theoretic analysis in the paper linking observed dynamics to entropy reduction and information bottleneck concepts.

high negative Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Sy... entropy (diversity/support) of the human-AI data loop and its interpretation as ...

Through a simple simulation, we demonstrate that increasing reliance on AI can induce a transition toward a low-diversity, suboptimal equilibrium.

Computational simulation reported in the paper (described as a 'simple simulation'); no sample size or experimental dataset reported in the provided text.

high negative Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Sy... system transitioning to a low-diversity, suboptimal equilibrium as reliance on A...

DePAI entails risks including security, centralization, incentive failure, legal exposure, and the crowding-out of intrinsic motivation, requiring value-sensitive design and continuously adaptive governance.

Risk analysis and conceptual argument in the paper identifying possible failure modes and recommended design/governance responses; no empirical incidence data provided.

high negative DAO-enabled decentralized physical AI: A new paradigm for hu... security, centralization, incentive failure, legal exposure, and intrinsic motiv...

Experimental results show that current agents remain far from reliable workspace learning.

Authors' interpretation based on the reported agent performance (< best agent 68.7% vs human 80.7%, average 47.4%).

high negative Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tas... reliability of agents on workspace learning tasks

The average performance across evaluated agents is only 47.4%.

Reported mean performance across agents in the experiments (authors' aggregated result).

high negative Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tas... average benchmark score across agents

« Prev 1 2 3 … 7 8 9 … 103 104 Next »