Evidence (4793 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Productivity Remove filter

The broader cognitive automation potential is roughly five times larger than visible adoption and is geographically widespread (present across all states, not only coastal hubs).

Direct comparison of the two model-derived aggregates (11.7% vs 2.2%) and spatial analysis of the Iceberg Index across ~3,000 counties and all states in the simulation.

medium negative The Iceberg Index: Measuring Workforce Exposure in the AI Ec... ratio of Iceberg Index wage-share to visible-adoption wage-share (~5×) and geogr...

Broader cognitive automation potential across administrative, financial, and professional services amounts to 11.7% (~$1.2 trillion).

Iceberg Index computation summing the wage-value contributions of skills that current AI capabilities can perform; based on mapping of thousands of AI tools to ~32,000 skills and the simulated 151M-agent workforce across ~3,000 counties.

medium negative The Iceberg Index: Measuring Workforce Exposure in the AI Ec... percent of U.S. wage value exposed to current AI capabilities (Iceberg Index = 1...

Visible AI adoption concentrated in computing/technology represents about 2.2% of U.S. wage value (~$211 billion).

Model-derived visible-adoption metric computed from mapped AI tool usage in technology/computing occupations, applied to the simulated 151M-worker population and national wage data to estimate percentage and dollar value.

medium negative The Iceberg Index: Measuring Workforce Exposure in the AI Ec... percent of U.S. wage value attributed to visible AI adoption (2.2%) and correspo...

Reduced labor shares disproportionately harm lower- and middle-skill workers relative to higher-skill workers, increasing distributional inequality.

Micro and firm-case analyses linking K_T exposure to occupation- and skill-level wage/employment outcomes; regressions showing heterogeneous effects across skill groups; supporting evidence from sectoral studies.

medium negative The Macroeconomic Transition of Technological Capital in the... employment and wages by skill group; inequality indicators across skill deciles

The loss of labor share and payrolls materially undermines PAYG pension sustainability and payroll-tax revenue bases under realistic adoption trajectories.

Dynamic general equilibrium overlapping-generations model calibrated and simulated to incorporate substitution between labor and K_T and a PAYG pension sector; fiscal simulations show declining contributor bases and pressure on pension balances; sensitivity analyses across adoption speeds.

medium negative The Macroeconomic Transition of Technological Capital in the... PAYG pension sustainability metrics (e.g., contribution-revenue ratios, projecte...

Wages for workers in K_T‑intensive firms/industries fall or grow more slowly relative to less-exposed counterparts, compressing wage contributions to income.

Panel regressions estimating wage outcomes conditional on K_T intensity measures, with controls and robustness specifications; supported by matched employer‑employee microdata in case studies and industry-level decompositions.

medium negative The Macroeconomic Transition of Technological Capital in the... wage levels and wage growth

Significant implementation hurdles—chronic infrastructure gaps, weak data governance, severe digital skills shortages, high initial investment costs, and organizational inertia—create a 'pilot trap' that prevents successful AI pilots from scaling.

Qualitative findings from interviews/case studies in the mixed-methods research detailing recurring barriers to scaling AI projects in large enterprises and across the sector.

medium negative (barrier) AI-Based Technological Transformation as a Driver for Develo... ability to scale AI projects (incidence of pilots failing to scale; presence of ...

Strict oversight requirements for GLAI could raise fixed compliance costs (audit, certification, human-in-the-loop processes), benefiting incumbent firms and potentially reducing competition and barriers to entry.

Regulatory economics argument drawing on compliance-cost logic and market structure effects; no empirical entry-cost analysis or case studies.

medium negative (for competition), positive (for incumbents) Why Avoid Generative Legal AI Systems? Hallucination, Overre... barriers to entry and market competition metrics in legal-AI markets

Perception of increased legal risk and regulatory uncertainty may slow adoption of GLAI and redirect investment toward safer subfields (verification tools, retrieval-augmented systems, formal-reasoning hybrids).

Economic reasoning and market-design argumentation based on risk/uncertainty dynamics; no econometric or survey data presented.

medium negative (for generative adoption), positive (for verification subfields) Why Avoid Generative Legal AI Systems? Hallucination, Overre... adoption rates of GLAI and relative investment flows across AI subfields

Divergent regulatory regimes (e.g., strict EU rules vs. looser regimes elsewhere) may produce regulatory arbitrage, influencing where GLAI companies locate, invest, and trade internationally.

Cross-jurisdictional regulatory analysis and economic inference about firm behavior under differential regulation; no firm-level relocation data provided.

medium negative (for regulatory harmonization), neutral for firms (strategic outcome) Why Avoid Generative Legal AI Systems? Hallucination, Overre... firm location/investment decisions and cross-border trade in legal-AI services

The positive macroeconomic effects of AI are severely limited by structural issues, notably large petroleum import volumes and the fiscal burden of incomplete fuel subsidy reforms.

Integrated quantitative analysis showing that operational savings are outweighed by import volumes and subsidy fiscal costs; contextual fiscal data cited (fuel subsidy reform peak).

medium negative (limits positive effect) AI-Based Technological Transformation as a Driver for Develo... net macroeconomic impact of AI on GDP/trade balance after accounting for import ...

Identified concrete training gaps in current models: delegation, scoped execution, and mode switching are skills absent from current training data and limit splitting models into manager/worker roles.

Authors' diagnosis based on experimental outcomes and qualitative reasoning about model training distributions; recommendation for future training focus.

medium neutral Can AI Models Direct Each Other? Organizational Structure as... presence/absence of specific training capabilities in model training data (deleg...

Interpretive, ad-hoc human-centered evaluation practices (e.g., “vibe checks”, team sense-making) are rational adaptations to LLM behavior rather than merely sloppy or inferior methodological choices.

Authors' interpretive argument based on interview evidence where practitioners explained why such practices persist and how they serve sense-making for unpredictable model behavior.

medium neutral Results-Actionability Gap: Understanding How Practitioners E... characterization of interpretive evaluation practices (rational adaptation vs. m...

The possibility of strategic argument construction (gaming) motivates governance needs: standards for provenance, certification, and liability rules.

Policy recommendation based on anticipated incentive problems; no empirical governance evaluations.

medium neutral Argumentative Human-AI Decision-Making: Toward AI Agents Tha... existence and effectiveness of governance mechanisms (standards, certification, ...

Standard GDP statistics can mask AI-driven demand shortfalls; central banks and statistical agencies should therefore monitor labor-share–velocity links, distributional income measures, and consumption by income quantile in addition to headline GDP.

Theoretical Ghost GDP channel and calibration results showing divergence between measured GDP and consumption-relevant income; policy recommendation follows from those model results.

medium neutral Abundant Intelligence and Deficient Demand: A Macro-Financia... detection of demand shortfalls (labor-share–velocity relationship and consumptio...

AI changes the nature of capital (digital/algorithmic assets) and complicates productivity accounting; researchers should decompose firm-level productivity gains into AI technology, complementary organizational capital, and human capital effects.

Theoretical proposal grounded in productivity accounting literature and conceptual discussion; no single decomposition empirical result presented.

medium neutral Modern Management in the Age of Artificial Intelligence: Str... components of multifactor productivity attributable to AI assets versus organiza...

Conventional productivity statistics and standard evaluation methods may undercount benefits from conversational initiation assistance; new survey and administrative measures might be needed.

Policy and measurement recommendation based on the conceptual model; no empirical measurement validation provided.

medium neutral A Model of Action Initiation Barrier Reduction through AI Co... coverage of productivity statistics regarding initiation effects (hypothesized m...

Policy and governance issues become salient: liability, IP, security, and certification of AI-generated code require new standards for provenance, testing, and accountability.

Argument based on practitioner-raised concerns about security, IP, and provenance in the Netlight study; authors recommend policy attention; no legal/regulatory analysis or empirical policy evaluation provided.

medium neutral Rethinking How IT Professionals Build IT Products with Artif... need for regulatory standards and governance mechanisms for AI-assisted developm...

International shipping produces approximately 3% of global greenhouse gas emissions.

Contextual statement in the paper citing external estimates (specific source not provided in the excerpt).

medium null result Physics-informed offline reinforcement learning eliminates c... share of global greenhouse gas emissions attributable to international shipping ...

Output quality saturates at approximately seven governed memories per entity.

Empirical analysis reported in the controlled experiments showing output quality vs. number of governed memories per entity, with saturation near seven memories.

medium null result Governed Memory: A Production Architecture for Multi-Agent W... output quality as a function of number of governed memories per entity (saturati...

The report provides scenario-based forecasts for HACCA emergence across near-, mid-, and long-term timelines, identifying capability thresholds to monitor.

Capability trajectory assessment combining trends in AI capabilities, automation of software tasks, computation availability, and diffusion dynamics; scenario and expert-judgment approach (qualitative forecasting).

medium null result Highly Autonomous Cyber-Capable Agents: Anticipating Capabil... projected timelines to HACCA emergence and associated capability thresholds

An interpretable logistic-regression model, calibrated with isotonic regression, produces well-calibrated, individual-level attrition probabilities suitable for policy simulation.

Modeling pipeline: logistic regression for prediction, isotonic regression for calibration; authors report strong predictive performance and well-calibrated probabilities (specific performance metrics not included in the provided summary).

medium null result Explainable AI for Employee Retention in Green Human Resourc... calibrated predicted probability of attrition (model calibration/readiness)

A Sankey diagram of thematic evolution shows lexical convergence over time and indicates that a small set of authors has disproportionate influence in structuring the discourse.

Thematic evolution analysis visualized with a Sankey diagram; author influence inferred from performance trends (citations/publication counts) in the bibliometric data.

medium null result Generative AI and the algorithmic workplace: a bibliometric ... lexical convergence across themes and concentration of author influence (disprop...

This paper is one of the first systematic reviews focused specifically on NLP in bank marketing, organizing findings along the customer journey and the marketing mix to provide a practical taxonomy.

Authors' stated novelty claim based on the scoped literature search (2014–2024) and topical focus; novelty inferred from the small number of prior papers identified at the intersection.

medium null result Natural language processing in bank marketing: a systematic ... existence of prior systematic reviews specifically on NLP in bank marketing

Productivity gains from AI may be under- or mis-measured if national accounts and tax systems do not adjust for AI-driven quality changes in services.

Analytic observation in the paper's measurement and externalities discussion; not empirically tested within the study.

medium null result Explore the Impact of Generative AI on Finance and Taxation accuracy of productivity measurement and GDP accounting for AI-enabled quality i...

ToM alignment matters less (i.e., misalignment has smaller effect) in settings with explicit coordination protocols, strong signaling, or standardized conventions.

Analyses and experiments described in the paper showing smaller performance differences between matched and mismatched ToM orders when explicit conventions or reliable signals are available; reported as part of robustness/conditional analyses.

medium null result Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... difference in coordination performance between matched and mismatched ToM orders...

Manipulating costs and benefits of observation versus action in experiments can probe the switching behavior driven by System M.

Proposed experimental manipulation; no empirical data presented.

medium null result Why AI systems don't learn and what to do about it: Lessons ... switching thresholds; allocation of observation vs action; resultant task perfor...

Ablation studies disabling System M or decoupling Systems A and B will help test whether meta-control provides empirical benefits.

Suggested experimental design (ablation study) in the methods section; no results provided.

medium null result Why AI systems don't learn and what to do about it: Lessons ... performance difference with/without M; switching/adaptation behavior

The authors will publicly release the benchmark, code, and pre-trained models.

Statement in the paper (release/availability section) announcing plans to publish benchmark, code, and pre-trained models.

medium null result MessyKitchens: Contact-rich object-level 3D scene reconstruc... availability of benchmark, code, and pre-trained models (public release)

The study is the first empirical investigation of human–AI assistance in a live CTF setting with a direct comparison to autonomous AI agents on the same fresh challenges.

Authors' positioning of their work as novel; methodology involved a live onsite CTF, instrumentation of human–AI interactions (41 participants), and direct benchmarking of four autonomous agents on the same fresh challenge set.

medium null result Understanding Human-AI Collaboration in Cybersecurity Compet... novelty claim (existence of prior comparable live CTF human–AI empirical studies...

This is the first study to compare human–human and human–AI collaboration outcomes for temporary virtual tasks from employees’ perspective in an applied service-industry context.

Author-stated novelty claim in the paper (based on study design: online experiment with retail employees examining temporary, virtual teamwork).

medium null result Adoption of AI partners in temporary tasks: exploring the ef... comparative collaboration outcomes (human–human vs human–AI) in temporary virtua...

Measuring AI's contribution to productivity and coordination effects will be challenging; new metrics (e.g., coordination time per task, error/rework rates attributable to communication lapses) are required.

Conceptual argument and recommended measurement agenda in the paper; no empirical testing of proposed metrics provided.

medium null result AI as a universal collaboration layer: Eliminating language ... feasibility and precision of proposed coordination/productivity metrics

Many early-stage AI advances have not translated into higher Phase II/III success rates.

Synthesis of reported outcomes and failures from industry experience; no new systematic statistical analysis provided.

medium null result Learning from the successes and failures of early artificial... Phase II/III clinical success rates

After roughly a decade of adoption in large biopharma, AI has not yet changed late-stage (Phase II/III) clinical success rates.

Qualitative assessment of industrywide experience and reported outcomes; statement based on narrative review rather than systematic, long-run quantitative analysis or causal estimates.

medium null result Learning from the successes and failures of early artificial... Phase II/III clinical success rates (late-stage trial success probability)

Three primary adoption archetypes in large pharma are (1) partnership-driven acceleration, (2) culture-centric transformation, and (3) production-first democratization.

Conceptual classification in the editorial derived from trends and illustrative examples rather than empirical survey or sampling; no quantitative validation provided.

medium null result AI as the Catalyst for a New Paradigm in Biomedical Research types of organizational approaches to AI adoption

AI adoption is not associated with significant changes in operating costs.

Analysis of operating costs in firm financials showing no significant post-adoption change for adopters relative to nonadopters.

medium null result AI and Productivity: The Role of Innovation operating costs

The innovation effects of AI adoption are not concentrated among larger firms, financially unconstrained firms, or high-tech firms.

Heterogeneity tests across firm size, financial constraint status, and industry technology intensity showing no concentration of effects in these groups (as reported in the paper).

medium null result AI and Productivity: The Role of Innovation distribution of treatment effects across firm-size, financial-constraint, and in...

SWE-Skills-Bench is the first requirement-driven benchmark that isolates the marginal utility of agent skills in real-world software engineering (SWE).

Authors present a new benchmark designed to evaluate marginal utility of skills; benchmark pairs skills with repositories and requirement documents and is described as requirement-driven and focused on isolating marginal utility.

medium null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... existence/novelty of a requirement-driven benchmark for evaluating marginal util...

Collaborative ability is distinct from individual problem-solving ability.

Model-based estimates from the Bayesian IRT framework that separately parameterize collaborative ability and individual problem-solving ability, with results indicating they are separable constructs (analysis on n = 667 benchmark data).

medium null result Quantifying and Optimizing Human-AI Synergy: Evidence-Based ... separability/distinctness of model parameters for collaborative ability versus i...

A complexity-aware routing mechanism selectively activates planning for complex queries, ensuring optimal resource allocation during online serving.

Method description in the paper explaining adaptive online serving and complexity-aware routing; evaluated in serving experiments.

medium null result Probe-then-Plan: Environment-Aware Planning for Industrial E... selective activation of planning (system routing/resource allocation outcome)

AI has not yet significantly promoted university–industry collaborative R&D capabilities.

Mechanism analysis in the paper testing the university–industry collaborative R&D channel and reporting no statistically significant effect of AI adoption on that capability in the sample.

medium null result Artificial intelligence and the sustainable development of a... university–industry collaborative R&D capability (and its contribution to TFP)

This study empirically tests a theoretically acknowledged but rarely tested relationship (AI adoption → performance conditional on structural constraints) in an emerging-economy setting.

Literature gap claim supported by the authors' review and execution of an empirical test using survey data from 280 Tunisian SMEs and PLS-SEM.

medium null result Structural Constraints as Moderators in the Ai–performance R... existence and nature of the conditional relationship between AI adoption and fir...

Institutional conditions do not exert a significant moderating influence on the relationship between AI adoption and firm performance in this sample.

PLS-SEM moderation tests on the 280 Tunisian SMEs found the institutional-environment moderator to be non-significant.

medium null result Structural Constraints as Moderators in the Ai–performance R... AI adoption → performance (moderated by institutional conditions)

Key limitations in the literature include methodological heterogeneity, scarce safety data, and a focus on non-acute settings.

Authors' appraisal of the included studies as reported in the discussion section.

medium null result How Do AI-Assisted Diagnostic Tools Impact Clinical Decision... quality and applicability of evidence (heterogeneity, safety reporting, setting)

Unemployment does not exert a statistically significant impact on GDP growth in the employed model.

Unemployment included among the macroeconomic determinants in the panel regressions but reported as statistically insignificant (no effect) in the provided summary; methods cited include OLS, FE, Difference and System GMM (sample details not included).

medium null result The Role of Artificial Intelligence in Economic Growth: Syst... GDP growth (national GDP growth rate)

Previous studies have identified language barriers as impediments to labor market engagement but empirical information assessing both policy reductions and the relative efficacy of professional, AI-assisted, and hybrid translation methods is scarce.

Paper's literature review claim that existing literature documents language barriers but lacks comparative empirical evaluations of policy reductions and multiple translation models; asserted as motivation for current study.

medium null result Translation Models Empowering Immigrant Workforce Integratio... state of literature (presence/absence of comparative empirical evidence)

Translation verified against existing performance implementations achieves throughput parity with MJX (1.04x) for HalfCheetah JAX.

Benchmarking HalfCheetah implemented in the translated backend versus MJX, reporting a 1.04x throughput ratio (approximate parity).

medium null result Automatic Generation of High-Performance RL Environments throughput parity (ratio) vs MJX

Levers such as raising taxes, reforming pensions, boosting productivity interact with each other through feedback loops and time delays that are not yet well understood.

Literature and model motivation stated in the paper; the integrated model is built to capture such interactions and delays.

medium null result Fiscal Dynamics in Japan under Demographic Pressure interactions between policy levers (qualitative/systemic feedbacks) and timing e...

These efficiency and cost gains are achieved while maintaining accuracy parity with the matched hierarchical baseline.

Paper states accuracy parity was maintained in the empirical evaluation comparing the proposed framework to the matched hierarchical baseline on the 2,847-query testbed.

medium null result One Supervisor, Many Modalities: Adaptive Tool Orchestration... answer accuracy (no significant difference reported vs baseline)

The short‑term effect of AI on labor‑intensive industries is weak.

Short‑run/dynamic subgroup analysis in the China 2003–2017 panel indicating minimal or weak immediate growth effects for labor‑intensive sectors.

medium null result The Impact of Artificial Intelligence Development on Economi... short‑term industry growth rate in labor‑intensive industries

« Prev 1 2 3 … 57 58 59 … 95 96 Next »