Evidence (4793 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Productivity Remove filter

Accumulated latent defects from unchecked AI outputs create negative externalities across dependent systems, complicating pricing and insurance; liability and cyber insurance markets may need to adapt.

Policy and economics argumentation drawing on externality theory; no actuarial or insurance-market empirical analysis provided.

medium negative Overton Framework v1.0: Cognitive Interlocks for Integrity i... incidence and cost of third-party harms attributable to AI-originated defects, i...

Measured productivity gains from AI-assisted development may overstate welfare gains if verification costs, defect externalities, and long-run fragility are omitted from accounting.

Economic reasoning and accounting argument; no empirical accounting studies or welfare analyses presented.

medium negative Overton Framework v1.0: Cognitive Interlocks for Integrity i... net productivity/welfare (productivity gains minus verification and remediation ...

The harm from latent defects is diffuse and slow-moving, making it easy for decision-makers to underweight these risks in adoption choices.

Descriptive argument drawing on behavioral economics concepts (discounting, salience); no empirical decision-making data included.

medium negative Overton Framework v1.0: Cognitive Interlocks for Integrity i... time-discounted valuation of future incident costs by decision-makers; observed ...

Small, unverified changes accumulate over time into system-level fragility, hidden bugs, and security vulnerabilities (latent risk accumulation).

Causal reasoning and illustrative examples; no longitudinal empirical measurement of defect accumulation presented.

medium negative Overton Framework v1.0: Cognitive Interlocks for Integrity i... rate of latent defects/vulnerabilities per release over time; system fragility i...

AI-assisted code generation produces a throughput asymmetry: generation capacity rises much faster than human or automated verification capacity.

Synthesis of conceptual arguments and illustrative scenarios; no quantitative empirical evidence or sample-based analysis included in the paper.

medium negative Overton Framework v1.0: Cognitive Interlocks for Integrity i... relative growth rates of generation capacity vs verification capacity (generatio...

Verification (human review, testing, security analysis) does not scale at the same rate as AI-assisted generation and becomes the bottleneck.

Mechanism reasoning and qualitative argumentation; illustrative examples showing mismatch between generation and verification capacity. No empirical scaling measurements provided.

medium negative Overton Framework v1.0: Cognitive Interlocks for Integrity i... verification throughput (e.g., reviews/tests/sec, reviewer-hours per generated a...

Differences in access to AI tools and digital infrastructure could exacerbate global and within-country inequalities in research capacity and outputs.

Statement in Distributional and Competitive Effects. Motivated by observed heterogeneity in infrastructure and access; abstract does not provide empirical heterogeneity estimates or samples.

medium negative Artificial Intelligence for Improving Research Productivity ... access to AI tools/infrastructure, disparities in research outputs (publication ...

Institutions that adopt and integrate AI effectively may gain disproportionate advantages, increasing stratification in academic prestige and funding.

Presented as a distributional/competitive implication. Based on theory and possibly institutional case studies; no causal evidence or quantitative estimates provided in the abstract.

medium negative Artificial Intelligence for Improving Research Productivity ... changes in institutional prestige/rankings, funding allocation shifts, measures ...

Overreliance on generative AI risks eroding worker critical thinking and loss of tacit expertise.

Conceptual arguments supported by observational reports and theoretical concerns in the literature synthesis; limited empirical evidence cited.

medium negative The Use of ChatGPT in Business Productivity and Workflow Opt... measures of worker critical thinking, retention/loss of tacit skills, task profi...

Security vulnerabilities and IP leakage create negative externalities; absent internalization, social costs (breaches, legal disputes) may rise.

Security analyses, documented incidents, and economic externality reasoning synthesized from the literature; empirical quantification of social cost is limited.

medium negative ChatGPT as a Tool for Programming Assistance and Code Develo... social costs from security breaches and IP disputes (incidence and severity)

Generated code may incidentally reproduce copyrighted or licensed snippets from training data.

Analyses detecting verbatim or near-verbatim reproductions of licensed/copyrighted code in model outputs in selected tests and audits; evidence heterogeneous and depends on prompts and model/data.

medium negative ChatGPT as a Tool for Programming Assistance and Code Develo... frequency of reproduced copyrighted/licensed code in outputs

Outputs often lack deep, project-level contextual reasoning (e.g., design tradeoffs, architecture constraints).

Qualitative failure-mode analyses, user studies, and benchmark tasks showing limitations in system-level reasoning and context-aware design decisions; evidence from short-horizon labs and case studies.

medium negative ChatGPT as a Tool for Programming Assistance and Code Develo... ability to produce context-appropriate architectural/design decisions

There is a risk of shallow learning if learners over-rely on AI outputs without understanding fundamentals.

Educational studies and observational analyses indicating reduced engagement with underlying concepts for some learners using AI assistance, plus qualitative reports from instructors; studies often short-term.

medium negative ChatGPT as a Tool for Programming Assistance and Code Develo... depth of conceptual understanding and learning outcomes

Existing extrapolation‑based projection systems understate AI’s nonlinear, spillover, and augmentation effects and miss differential impacts across occupations, industries, regions, and demographic groups.

Theoretical argument and literature-based reasoning in the paper; no quantitative demonstration comparing extrapolation systems to the proposed approach.

medium negative Enhancing BLS Methodologies for Projecting AI's Impact on Em... magnitude and distribution of AI effects (nonlinearity, spillovers, augmentation...

Traditional BLS projection methods are insufficient for forecasting labor market changes driven by rapid AI adoption.

Conceptual critique and argumentation in the paper; no empirical evaluation or comparative forecast error statistics provided.

medium negative Enhancing BLS Methodologies for Projecting AI's Impact on Em... forecasting accuracy / ability to capture AI-driven labor market changes

Rapid post-2020 advances in AI (LLMs and multimodal models) have already rendered some pre-2020 profession-level conclusions obsolete by 2025.

Argument based on observed acceleration in AI capabilities after 2020 (LLMs, multimodal systems) discussed in the paper; evidence is temporal comparison of the state of capabilities and the applicability of older exposure indices rather than a single empirical re-test of all prior predictions.

medium negative Recent Methodologies on AI and Labour - a Desk Review validity/applicability of pre-2020 profession-level forecasts in 2025

Generative AI introduces risks such as model hallucinations and potential erosion of human skills over time.

Practitioner interview reports and authors' interpretive synthesis; qualitative evidence from consulting firms describing hallucination incidents and concerns about reduced skill practice. No longitudinal or quantitative measurement reported.

medium negative Where Automation Meets Augmentation: Balancing the Double-Ed... hallucination/error risk; consultant skill retention/skill erosion

Conversely, lack of standards or failed validation can create regulatory setbacks, reputational risk, and stranded R&D spending.

Case reports and regulatory analysis in the narrative review describing negative outcomes from failed validation or non-aligned AI tools (qualitative evidence).

medium negative Artificial Intelligence in Drug Discovery and Development: R... incidence of regulatory setbacks, reputational damage, amount of stranded/wasted...

Productivity gains from deploying agentic AI may be overstated if alignment costs, monitoring overhead, and coordination inefficiencies are ignored.

Conceptual economic accounting argument; recommends new accounting categories and empirical studies to quantify these factors.

medium negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... net productivity gains after accounting for alignment/monitoring costs

Agentic systems generate tail risks and endogenous systemic correlations (multiple systems converging on similar failure modes), creating new insurability challenges.

Theoretical risk analysis and analogy to systemic risk literature; proposed implications for insurance markets but no empirical testing.

medium negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... frequency/severity of tail events and systemic correlated failures among agentic...

Coordination and control mechanisms (hierarchies, protocols, monitoring) face scalability and specification problems when agents generate unforeseen actions.

Theoretical analysis and examples from multi-agent/organizational theory; no empirical measurement included.

medium negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... effectiveness/scalability of coordination and control mechanisms

Human cognitive learning processes (calibration, error-correction) may misalign with agentic AIs because humans and AIs learn from different signals and on different horizons.

Conceptual argument supported by cross-disciplinary literature synthesis; empirical tests are proposed but not conducted in the paper.

medium negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... alignment of learning/calibration processes between humans and AIs

Relational interaction mechanisms (trust, norms, mutual adjustment) can break down when AI objectives diverge or are opaque, reducing effective teaming.

Argument drawing on human factors and HAT literature; no new experimental data presented.

medium negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... strength/stability of trust, norms, and mutual adjustment in HAT

Agreement on bounded outputs (specifications, short-term goals) is insufficient for maintaining alignment with agentic AI.

Theoretical critique of specification-based alignment approaches; literature on limits of bounded specifications applied to open-ended systems.

medium negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... effectiveness of bounded-output alignment strategies

Agentic AI undermines key assumptions that shared awareness will reliably stabilize coordinated action over time.

Theoretical argument showing mismatches in representation, timescales, and learning dynamics between humans and agentic AIs; drawn from literature synthesis rather than empirical tests.

medium negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... stability of coordinated action given shared awareness

Under agentic conditions, alignment cannot be treated as a one-time agreement over bounded outputs; it must be continuously sustained as plans and priorities evolve.

Conceptual argument and modeling in the paper; literature synthesis highlighting limits of specification-based alignment approaches; no empirical validation presented.

medium negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... alignment persistence / need for continuous re-alignment

Agentic AI creates a new kind of structural uncertainty for human–AI teaming (HAT).

Theoretical/conceptual synthesis across literature on HAT, Team Situation Awareness (Team SA), human factors, multi-agent systems, and AI alignment; no new empirical data.

medium negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... structural uncertainty in human–AI teaming

Regulators can operationalize 'human oversight' through auditable handover architectures like DAR, but this will increase compliance and record-keeping costs for firms and public bodies.

Policy implication argued in the paper: coupling Reversal Register and hysteresis parameters to regulatory enforcement; no empirical cost estimates provided.

medium negative Human–AI Handovers: A Dynamic Authority Reversal Framework f... compliance_costs; recordkeeping_burden; regulator_enforceability

Current AI tooling often mismatches existing team workflows and CI/CD pipelines, reducing seamless adoption.

Qualitative observations and practitioner reports from the Netlight study describing tooling and workflow frictions; specific integrations or lack thereof discussed but not quantitatively evaluated.

medium negative Rethinking How IT Professionals Build IT Products with Artif... compatibility of AI tools with team processes and CI/CD

Generated code can introduce security vulnerabilities and licensing/IP ambiguity, raising quality, security, and IP concerns.

Practitioner concerns and examples documented in interviews and observations at Netlight; paper cites security and IP uncertainty as recurring themes; no systematic security scans or legal analyses reported.

medium negative Rethinking How IT Professionals Build IT Products with Artif... presence of security vulnerabilities and IP/licensing risk in AI-generated code ...

Compliance with GDPR/CCPA and auditing for bias/harms imposes non-trivial technical and legal costs; implementing federated learning and DP increases engineering complexity and compute cost.

Paper's policy and cost discussion; cites increased engineering complexity and compute demands for privacy-preserving deployments but does not present quantified cost estimates.

medium negative Personalized Content Selection in Marketing Using BERT and G... engineering complexity metrics, compute/resource costs, legal/compliance expendi...

Firms need complementary investments (data pipelines, monitoring tools, feedback loops, human oversight systems) which materially affect the economics of adoption.

Industry case studies and practitioner reports synthesized in the review describing necessary complementary investments; no quantified investment sample or ROI analysis provided here.

medium negative The Effectiveness of ChatGPT in Customer Service and Communi... required investment levels, effect on adoption economics and ROI

Regulatory attention is likely to focus on transparency, liability for factual errors, data privacy, and nondiscrimination; compliance and auditing will add to adoption costs.

Policy and regulatory analyses aggregated in the review and references to ongoing regulatory discussions; no primary regulatory impact study conducted in this paper.

medium negative The Effectiveness of ChatGPT in Customer Service and Communi... regulatory compliance requirements, related adoption costs, and scope of regulat...

Generative AI currently lacks genuine empathy and relational capabilities necessary for high-stakes or sensitive interactions.

Conceptual analyses and practitioner case examples aggregated in the review; limited direct quantitative measurement cited in this brief review.

medium negative The Effectiveness of ChatGPT in Customer Service and Communi... empathy/relational effectiveness in sensitive interactions, customer satisfactio...

Generative models exhibit contextual misunderstandings and cannot reliably infer nuanced customer intent in all cases.

Synthesis of empirical studies and practitioner observations documenting misinterpretation and intent-detection failures; no new testing reported in this review.

medium negative The Effectiveness of ChatGPT in Customer Service and Communi... accuracy of intent detection and rate of context-related misunderstandings

There is substitution risk: routine ideation and drafting tasks may be automated, altering task-level labor demand and wage structure.

Task-automation literature and empirical studies of LLMs performing routine drafting/ideation tasks summarized in the review; no long-run labor-market causality established in the paper.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... employment and wages for routine ideation/drafting tasks

Generative AI lacks reliable situational judgment on ambiguous problems and on ethical trade-offs, making it insufficient for autonomous decision-making in such contexts.

Case examples and experimental studies cited in the synthesis showing inconsistent or inappropriate responses to ambiguous/ethical scenarios; no large-scale causal evidence provided.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... quality/appropriateness of situational judgment and ethical decision-making in t...

LLMs are prone to bias, mediocrity, and factual or logical errors when domain-specific context or experiential knowledge is absent.

Review of empirical evaluations documenting biased outputs, superficial or mediocre suggestions, and factual errors in open-ended tasks and domain-specific prompts; evidence comes from multiple short-term studies and applied examples.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... accuracy/factuality, bias indicators, perceived quality of outputs in domain-spe...

LLMs are predominantly recombinative — they tend to rework and recombine existing material rather than produce deeply novel insights.

Analytical synthesis of output analyses and creativity assessments from multiple empirical studies demonstrating frequent recombination of existing concepts and lower rates of highly original novelty; studies and measures vary.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... novelty/creativity metrics (e.g., originality scores, novelty ratings)

Proliferation of low-quality or biased AI-generated ideas creates externalities: increased filtering and reputational costs for firms and risks of poor product designs, ethical lapses, or regulatory violations if evaluation is insufficient.

Case studies and qualitative reports documenting filtering burdens and instances of biased/misleading outputs; theoretical reasoning about reputational and regulatory risks; direct quantification of these externalities is limited.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... filtering effort/costs; incidence of reputational/regulatory incidents tied to A...

Standard productivity metrics (e.g., TFP) may undercount the value of ideation and creative augmentation provided by generative AI, making attribution between human and AI contributions difficult.

Methodological discussion in the review supported by heterogeneity in outcome measures across studies and challenges in measuring implemented idea quality and long-run impacts.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... coverage/accuracy of productivity metrics for ideation-related gains; attributio...

Generative models exhibit recombination bias: they tend to remix existing patterns rather than produce deeply original, paradigm-shifting insights.

Synthesis of output analyses across studies showing frequent recombination of known patterns and limited evidence of wholly novel, paradigm-changing ideas; claim based on qualitative and comparative analyses in reviewed literature.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... degree of novelty vs. recombination in generated outputs; incidence of paradigm-...

Integration complexity (data access, context continuity, privacy/security, workflow alignment) raises implementation costs and time-to-value.

Deployment case studies and vendor reports documenting engineering effort, data plumbing, compliance work, and multi-month integration timelines; no aggregated cost meta-analysis provided.

medium negative The Effectiveness of ChatGPT in Customer Service and Communi... implementation cost; time-to-value (time until measurable benefits)

Lack of genuine empathy and emotional intelligence undermines performance on complex or emotionally charged interactions.

Qualitative assessments and noisy measurement from pilot studies and customer feedback in complex cases; limited experimental validation and heterogeneous metrics.

medium negative The Effectiveness of ChatGPT in Customer Service and Communi... customer satisfaction/trust in emotionally charged interactions; resolution qual...

AI illiteracy (lack of understanding of AI capabilities/limits) impedes adoption and appropriate use of AI tools in finance.

Survey and interview data reporting lower adoption/intended use among respondents with limited self-reported AI understanding; supplemented by qualitative explanations; sample described as finance professionals across multinational institutions (size unspecified).

medium negative Human-AI Synergy in Financial Decision-Making: Exploring Tru... adoption rates; appropriate use of AI tools

Excessive reliance on algorithmic suggestions can erode human judgment and create systemic risks.

Interview reports and, where available, operational/risk metrics indicating overreliance patterns; authors note systemic-risk implications based on combined qualitative and quantitative observations (no causal identification reported).

medium negative Human-AI Synergy in Financial Decision-Making: Exploring Tru... quality of human judgment; systemic risk

Cognitive biases and inappropriate trust (both overtrust and distrust) distort decision outcomes and limit the benefits of AI-assisted decision-making.

Qualitative interview evidence describing instances of cognitive bias and misplaced trust; some quantitative indicators of decision distortion and risk where operational performance/risk metrics were available; sample: finance professionals across multinational institutions (detailed metrics not specified).

medium negative Human-AI Synergy in Financial Decision-Making: Exploring Tru... decision quality/distortion; systemic risk indicators

Market dominance by global platforms can stifle local entrants and distort competition; policies should address market power and data monopolies.

Review of platform economics and competition policy literature; policy argumentation rather than new empirical competition analysis in this paper.

medium negative Towards Responsible Artificial Intelligence Adoption: Emergi... market concentration indices, entry/exit rates of local firms, measures of compe...

If local data ownership, capacity and governance are weak, economic gains from AI risk accruing to foreign firms and exacerbating income and wealth concentration.

Conceptual synthesis referencing empirical studies on platform rents and data monetization; no original economic distribution analysis presented.

medium negative Towards Responsible Artificial Intelligence Adoption: Emergi... distribution of AI-related revenues, market share of foreign vs local firms, mea...

AI and automation can displace labour—particularly routine tasks—heightening the need for retraining, active labour policies and social protection.

Review of literature on automation and labour markets combined with normative inference for African contexts; no primary labour market data presented.

medium negative Towards Responsible Artificial Intelligence Adoption: Emergi... job displacement rates, changes in task composition, employment levels in routin...

« Prev 1 2 3 … 55 56 57 … 95 96 Next »