Evidence (11633 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

AI dissolves the boundaries that once separated firms, markets, experts, and consumers by internalizing human multimodal interfaces (language, vision, and behavioral data) into computational systems.

Theoretical argument and conceptual framework introduced in the paper (Structural Dissolution Framework); no empirical sample or quantitative analysis reported for this claim in the text provided.

high mixed Structural Dissolution: How Artificial Intelligence Dismantl... dissolution of boundaries between firms, markets, experts, and consumers

Failures are structured by task family and execution surface, with HR, management, and multi-system business workflows as persistent bottlenecks and local workspace repair comparatively easier but unsaturated.

Error-mode analysis across the 105 tasks and evaluated models reported in experiments; authors identify task-family-level patterns (HR, management, multi-system workflows) and relative ease of local workspace repair.

high mixed Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... failure distribution by task family / execution surface

Whether LLM-based assistants improve or degrade code quality remains unresolved: existing studies report contradictory outcomes contingent on context and evaluation criteria.

Review finds mixed/contradictory findings across included studies regarding code quality effects.

high mixed The Impact of LLM-Assistants on Software Developer Productiv... code quality (e.g., correctness, maintainability, defects)

Architectural interventions can instead be used to trade off personalization against preference privacy.

Proposed solution described in the paper (architectural interventions) as an alternative to prompt-level fixes; presented as a design tradeoff rather than empirically validated mitigation in the excerpt.

high mixed When Agents Shop for You: Role Coherence in AI-Mediated Mark... trade-off between personalization and preference privacy under architectural int...

AI-driven automation marks the beginning of a new political era—one in which the role of work in society becomes a central axis of welfare conflict.

Theoretical and interpretive claim in the paper, motivated by the survey findings and broader argumentation about political consequences.

high mixed AI, the Future of Work, and the Politics of the Welfare Stat... political salience of 'the role of work' in welfare politics / emergence of new ...

The system tends to be factually correct when it answers but often omits information (i.e., 'the system is right when it answers — it just leaves things out').

Interpretation combining reported factual accuracy (85.5%) with low completeness (0.40) from benchmark results.

high mixed Benchmarking Complex Multimodal Document Processing Pipeline... factual accuracy vs. answer completeness

Differences between models are large enough to shape outcomes in practice, so reliability should be incorporated alongside average performance when assessing and deploying LLMs in high-stakes decision contexts.

Authors' interpretation of empirical differences in funding decisions, scores, confidence, and reliability across models in the controlled experiment; presented as an implication/recommendation.

high mixed Algorithmic personalities and the myth of neutrality: financ... policy recommendation regarding assessment criteria (reliability + average perfo...

This hybrid Make governance form has qualitatively different economics, capability requirements, and governance structures than pre-AI in-house development.

Paper's conceptual comparison between pre-AI hierarchy and post-AI hybrid Make governance (theoretical reasoning and examples; no empirical quantification).

high mixed The Buy-or-Build Decision, Revisited: How Agentic AI Changes... economics and capability requirements of in-house development governance

AI reshapes seven canonical decision determinants for make-or-buy choices: cost, strategic differentiation, asset specificity, vendor lock-in, time-to-market, quality and compliance, and organizational capability.

Paper's factor-level conceptual analysis enumerating and discussing seven determinants (theoretical synthesis rather than empirical measurement).

high mixed The Buy-or-Build Decision, Revisited: How Agentic AI Changes... sensitivity of canonical make-or-buy determinants to AI

Demographic characteristics intersect with AI exposure—i.e., exposure varies by demographic groups.

Paper reports that it examines how demographic characteristics intersect with exposure based on recent empirical studies; no demographic breakdowns or sample sizes provided in the abstract.

high mixed AI Displacement Risk in the Labor Market: Evidence, Exposure... variation in AI exposure across demographic groups

Recent studies combine task-level exposure metrics with employment and usage data to assess AI exposure and impacts.

Paper notes that it draws on studies that use task-level exposure metrics alongside employment and usage data; methodological claim rather than a quantitative result.

high mixed AI Displacement Risk in the Labor Market: Evidence, Exposure... measurement approach for AI exposure (task-level exposure linked to employment/u...

Generative large language models (LLMs) present organizations with a transformative technology whose labor market implications remain nascent yet consequential.

Statement in paper synthesizing emerging empirical research; no specific study, method, or sample size reported in the abstract.

high mixed AI Displacement Risk in the Labor Market: Evidence, Exposure... labor market implications (disruption and augmentation)

The adoption of AI in Israel constitutes a systemic transformation of employment relations, necessitating doctrinal adaptation and institutional reform to keep the labor market aligned with foundational legal principles.

Synthesis and conclusion from the paper's combined legal and empirical analysis; presented as the author's overarching interpretive claim rather than as a specific quantified finding.

high mixed Artificial Intelligence in Israel, Trends, Developments, and... degree of systemic transformation of employment relations and need for doctrinal...

Within the public sector, there is an emerging policy trend to incorporate AI considerations into workforce planning, including examining whether human positions may be substituted by technological solutions prior to recruiting new employees.

Paper reports an observed policy trend in public-sector workforce planning; specific policy documents, jurisdictions, or counts not provided in the excerpt.

high mixed Artificial Intelligence in Israel, Trends, Developments, and... public sector workforce planning practices (consideration of substituting human ...

Objectives, constraints, and prompt guidance affect reliability and generalization.

Authors' analysis and discussion based on experiments and ablations described in the paper (qualitative/empirical observations about sensitivity to objectives, constraints, and prompts).

high mixed Agentic Architect: An Agentic AI Framework for Architecture ... organizational_efficiency

The architect's role is shifting, but the human remains central.

Authors' discussion and interpretive analysis about the role of humans in agentic AI-driven design processes.

high mixed Agentic Architect: An Agentic AI Framework for Architecture ... skill_acquisition

Across evolved designs, components often correspond to known techniques; the novelty lies in how they are coordinated.

Authors' qualitative analysis of evolved architectures and components reported in the paper (design inspection and interpretation of evolved solutions).

high mixed Agentic Architect: An Agentic AI Framework for Architecture ... innovation_output

The study establishes statistically significant relationships between organizational AI adoption and compensation dynamics.

Econometric estimates (difference-in-differences and propensity score matched comparisons) using the combined datasets listed in the paper and controlling for industry, firm size, geography, occupation characteristics, and macroeconomic variables.

high mixed The Generative AI Revolution: Early Evidence of Structural T... compensation dynamics (wages/pay)

The study establishes statistically significant relationships between organizational AI adoption and changes in occupational structures.

Same econometric approach (difference-in-differences and propensity score matching) applied to combined datasets (Anthropic Economic Index, Census Business Trends and Outlook Survey, Federal Reserve regional surveys, labor market analytics), with controls for industry, firm size, location, occupation-level characteristics, and macroeconomic environment.

high mixed The Generative AI Revolution: Early Evidence of Structural T... occupational structures

The study establishes statistically significant relationships between organizational AI adoption and changes in employment patterns in the United States during 2022–2025.

Econometric analysis using multiple large-scale data sources (Anthropic Economic Index, U.S. Census Bureau Business Trends and Outlook Survey, Federal Reserve regional surveys, labor market analytics) and methods described as difference-in-differences estimation and propensity score matching controlling for industry (NAICS 2-digit), firm size, geography, occupation characteristics, and macro conditions.

high mixed The Generative AI Revolution: Early Evidence of Structural T... employment patterns

We identify significant differences between human and AI negotiation behaviors, finding that humans favor lower-complexity deals and are significantly less reliable partners compared to LM-based agents.

Results from the user study comparing human vs LM-based agent negotiation behavior (statements in the results section).

high mixed Cooperate to Compete: Strategic Coordination in Multi-Agent ... deal complexity preference and partner reliability in negotiations

High-value uses require broader authority exposure — data access, workflow integration, and delegated authority — when governance controls have not yet decoupled capability from authority exposure.

Conceptual/mechanism claim articulated in the paper (motivating assumption for the analytical model; no empirical sample given in the abstract).

high mixed The Security Cost of Intelligence: AI Capability, Cyber Risk... authority exposure associated with AI deployment

Firms are deploying more capable AI systems, but organizational controls often have not kept pace.

Stated as background context in the paper's abstract/introduction (observational claim; no empirical sample or experiment reported in the abstract).

high mixed The Security Cost of Intelligence: AI Capability, Cyber Risk... deployment of capable AI systems / governance maturity

The distribution of complementary (non-AI) skills across the workforce shapes whether AI improvements generate productivity bottlenecks or concentration-driven inequality.

Derived from the task-based model analysis described in the article; framed as a theoretical mechanism with reference to empirical patterns but without specific empirical study details in the excerpt.

high mixed AI as Augmentation: How Human Capital Shapes Technology's Im... occurrence of productivity bottlenecks and concentration-driven wage/income ineq...

There is a strict policy reversal in optimal editorial policy sign: tightening is optimal pre-transition, loosening is optimal post-transition.

Analytical proof in the model showing the sign reversal of the editor's optimal constrained response as AI capability crosses the critical threshold.

high mixed Buying the Right to Monitor:Editorial Design in AI-Assisted ... direction of optimal editorial policy change (tighten vs loosen) across regimes

After the AI transition, editors must loosen acceptance standards while investing in AI detection, because further tightening only amplifies dissipative polishing without improving sorting.

Analytical characterization of the constrained optimal editorial response in the post-transition regime within the model; argument relies on the discontinuous reviewer-effort collapse and comparative statics.

high mixed Buying the Right to Monitor:Editorial Design in AI-Assisted ... optimal editorial policy (acceptance standards and investment in AI detection) a...

The reviewer-effort collapse creates a welfare misalignment: authors benefit from a weakened 'rat race' while editors suffer from degraded signal informativeness.

Comparative statics and welfare analysis in the theoretical model showing authors' equilibrium payoffs rise as competition/polishing dissipates, while editor's signal informativeness declines due to lower reviewer effort.

high mixed Buying the Right to Monitor:Editorial Design in AI-Assisted ... welfare for authors (utility/payoff) and informativeness of editorial signals

In academic peer review, generative AI enters both sides of the market: authors use AI to polish submissions, and reviewers use it to generate plausible reports without exerting evaluative effort.

Model assumption and motivation in the paper's three-sided equilibrium framework; described as the dual adoption mechanism analyzed analytically (no empirical sample size reported).

high mixed Buying the Right to Monitor:Editorial Design in AI-Assisted ... adoption of AI by authors and reviewers (change in task allocation and effort)

The paper extends paradox theory to conceptualise the Creativity Paradox in the context of GenAI.

Theoretical extension and conceptual development within the paper (no empirical tests reported).

high mixed Beyond the Creativity Paradox: A Theory-informed Framework f... extension of paradox theory (Creativity Paradox)

Within that n=11 subset, 9 of 11 agents shift by at least 2 ranks between composite and benchmark-only rankings.

Comparison of rank positions between composite and benchmark-only rankings on the 11-agent subset; reported count of agents that moved at least 2 ranks.

high mixed AgentPulse: A Continuous Multi-Signal Framework for Evaluati... count/proportion of agents with ≥2-rank shifts

The four factors capture largely complementary information (n=50; ρ_max = 0.61 for Adoption-Ecosystem, all others |ρ| ≤ 0.37).

Correlation analysis among the four factor scores computed on the 50-agent sample; reported maximum inter-factor Pearson/Spearman correlation coefficients.

high mixed AgentPulse: A Continuous Multi-Signal Framework for Evaluati... inter-factor correlations (Adoption vs Ecosystem and other factor pairs)

The intervention only modestly narrows the gap to a full-information benchmark.

Comparison between post-intervention calibration/auction outcomes and a full-information benchmark reported in the paper, showing only modest improvement.

high mixed MarketBench: Evaluating AI Agents as Market Participants remaining gap between post-intervention outcomes and full-information benchmark ...

Provisioned Throughput delivers the lowest latency at low concurrency but saturates its reserved capacity above approximately 20 concurrent users.

Empirical measurements from the instrumented system across concurrency up to 50 users and tier comparisons; the paper reports the observed saturation point near ~20 concurrent users.

high mixed Latency and Cost of Multi-Agent Intelligent Tutoring at Scal... response time (latency) and saturation threshold (concurrency where reserved cap...

Delegating tasks to genAI can be individually beneficial in the short term even as widespread adoption degrades future model performance (creating a social dilemma).

Result of the paper's behavioral model showing an individual-level incentive to use genAI versus a collective cost from adoption (theoretical/model-based; no empirical sample reported in abstract).

high mixed Generative artificial intelligence reduces social welfare th... individual short-term benefit vs future model performance (collective welfare)

Token usage is highly variable and inherently stochastic: runs on the same task can differ by up to 30x in total tokens.

Observed run-to-run variability in total token counts for identical tasks across the collected agentic trajectories from eight frontier LLMs on SWE-bench Verified.

high mixed How Do AI Agents Spend Your Money? Analyzing and Predicting ... run-to-run variability in total token consumption for the same task

ASC (adaptive stopping criterion) halts harmful refinement but incurs a 3.8 pp confidence-elicitation cost.

Reported experiment with ASC showing that it prevents harmful iterative refinement yet causes a measured cost described as 3.8 percentage points due to confidence elicitation.

high mixed When Does LLM Self-Correction Help? A Control-Theoretic Mark... trade-off between stopping harmful refinement and a confidence-elicitation cost ...

Only o3-mini (+3.4 pp, EIR = 0%), Claude Opus 4.6 (+0.6 pp, EIR ~ 0.2%), and o4-mini (+/-0 pp) remain non-degrading under self-correction; GPT-5 degrades by -1.8 pp.

Reported measured changes in accuracy (percentage-point changes) and measured EIR values for the named models after applying iterative self-correction across the experiment suite.

high mixed When Does LLM Self-Correction Help? A Control-Theoretic Mark... accuracy change from self-correction

Across 7 models and 3 datasets (GSM8K, MATH, StrategyQA), we find a sharp near-zero EIR threshold (<= 0.5%) separating beneficial from harmful self-correction.

Empirical experiments reported across 7 LLMs and 3 benchmark datasets (GSM8K, MATH, StrategyQA) comparing outcomes of iterative self-correction as a function of measured EIR.

high mixed When Does LLM Self-Correction Help? A Control-Theoretic Mark... accuracy change from self-correction as a function of EIR

Firms with a high market position tend to imitate the peer leader, whereas firms in middle and low market positions are more likely to follow the peer group.

Heterogeneity analysis / subgroup regressions in fixed-effects models on panel data of publicly listed Chinese firms (2012–2023), stratifying firms by market position (high, middle, low).

high mixed Following the Herd or the Bellwether: Peer Effects in Firms’... focal firm AI adoption level (differential peer influence by firm market positio...

AI influences innovation performance in organizations.

Discussion and synthesis of studies and reports on AI adoption and innovation performance presented in the review.

high mixed The Impact of AI on Employability and Evolving Job Roles of ... innovation performance

AI adoption is producing organizational implications, including changes in project management practices.

Findings synthesized from conference papers, case studies and industry reports included in the review.

high mixed The Impact of AI on Employability and Evolving Job Roles of ... project management practices / organizational processes

Automation, generative AI, and intelligent systems are reshaping task structures, leading to both job displacement risks and the creation of new AI-driven roles.

Synthesis of empirical studies, conference findings, and industry reports reporting both displacement risks and new role emergence (review paper).

high mixed The Impact of AI on Employability and Evolving Job Roles of ... job displacement and role creation

AI is rapidly transforming the nature of work, the demand for skills, and the professional roles of Information Technology (IT) practitioners.

Stated as a synthesis result from a narrative review of recent empirical studies, conference findings, and industry reports (review paper).

high mixed The Impact of AI on Employability and Evolving Job Roles of ... demand for skills / professional roles

Semiconductors are a representative case study for analyzing weaponized interdependence in advanced technology sectors.

Methodological claim in the paper: selection and focus on the semiconductor sector as illustrative of broader advanced-technology sector dynamics under export restraints and chokepoint activation.

high mixed Weaponized Interdependence and Dynamics of Partial Decouplin... suitability of semiconductors as a representative sector for studying weaponized...

Previous literature is based primarily on the short-term effectiveness of coercion; this paper shifts attention to the longer-term structural consequences of technological restraints.

Literature review and positioning in the paper contrasting prior studies' short-term focus with the paper's longer-term structural emphasis (methodological/literature-critique claim).

high mixed Weaponized Interdependence and Dynamics of Partial Decouplin... scholarly framing of effects of technological coercion (short-term vs. long-term...

Over time, U.S.–China reaction–counterreaction interactions generate three structural transformations: supply-chain reconfiguration, substitution, and regulations reinforcing segmentation.

Synthesis from the paper's longitudinal/case-analysis of semiconductor-related export restraints and subsequent industry and regulatory responses (qualitative identification of three emergent structural outcomes).

high mixed Weaponized Interdependence and Dynamics of Partial Decouplin... structural transformations in technology supply chains and regulatory regimes

Current instability in U.S.–China relations arises less from complete ideological divergence or failure of outright containment policy than from a structured reaction–counterreaction dynamic triggered by chokepoint activation.

Argument based on qualitative analysis of U.S. export restraints after the first Trump administration and application of the 'weaponized interdependence' framework to advanced-technology sectors (paper's theoretical argument and case discussion).

high mixed Weaponized Interdependence and Dynamics of Partial Decouplin... primary driver(s) of instability in U.S.–China technological relations

AIGC is reshaping the rights and obligations of platforms and workers.

Argument in the paper describing legal and practical impacts of AIGC on platform-worker relationships; based on doctrinal/legal analysis and discussion of platform practices rather than reported quantitative empirical data.

high mixed AIGC+ Determination of Labor Relations in the Context of the... rights and obligations (legal status)

The study explores implications of algorithmic enterprises for competitive advantage, labour markets, and regulatory policy.

Declared scope of the paper in the abstract; exploration is conceptual and analytical rather than reporting empirical findings or quantified effects.

high mixed Algorithmic Enterprises: Rethinking Firm Strategy in the Age... implications for firm competitive advantage, labour market outcomes, and policy

Survey evidence suggests public attitudes towards AI combine optimism with apprehension, and most respondents oppose granting AI systems final authority over hiring and dismissal decisions.

Review cites multiple public opinion and survey studies reporting mixed (optimistic and apprehensive) attitudes and opposition to AI final authority in employment decisions (survey evidence summarized).

high mixed From Technological Substitution to Institutional Response: A... public attitudes toward AI and policy preferences (authority in hiring/dismissal...

« Prev 1 2 3 4 5 … 232 233 Next »