Evidence (4503 claims)
Adoption
5227 claims
Productivity
4503 claims
Governance
4100 claims
Human-AI Collaboration
3062 claims
Labor Markets
2480 claims
Innovation
2320 claims
Org Design
2305 claims
Skills & Training
1920 claims
Inequality
1311 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 373 | 105 | 59 | 439 | 984 |
| Governance & Regulation | 366 | 172 | 115 | 55 | 718 |
| Research Productivity | 237 | 95 | 34 | 294 | 664 |
| Organizational Efficiency | 364 | 82 | 62 | 34 | 545 |
| Technology Adoption Rate | 293 | 118 | 66 | 30 | 511 |
| Firm Productivity | 274 | 33 | 68 | 10 | 390 |
| AI Safety & Ethics | 117 | 178 | 44 | 24 | 365 |
| Output Quality | 231 | 61 | 23 | 25 | 340 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 158 | 68 | 33 | 17 | 279 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 88 | 31 | 38 | 9 | 166 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 105 | 12 | 21 | 11 | 150 |
| Consumer Welfare | 68 | 29 | 35 | 7 | 139 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 71 | 10 | 29 | 6 | 116 |
| Worker Satisfaction | 46 | 38 | 12 | 9 | 105 |
| Error Rate | 42 | 47 | 6 | — | 95 |
| Training Effectiveness | 55 | 12 | 11 | 16 | 94 |
| Task Completion Time | 76 | 5 | 4 | 2 | 87 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 16 | 9 | 5 | 48 |
| Job Displacement | 5 | 29 | 12 | — | 46 |
| Social Protection | 19 | 8 | 6 | 1 | 34 |
| Developer Productivity | 27 | 2 | 3 | 1 | 33 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Skill Obsolescence | 3 | 18 | 2 | — | 23 |
| Labor Share of Income | 8 | 4 | 9 | — | 21 |
Productivity
Remove filter
As technological progress devalues labor, the welfare benefits of steering are at first increased but, beyond a critical threshold, decline and optimal policy shifts toward greater redistribution.
Theoretical model extension analyzing planner's optimal choice as labor's economic value changes; the paper states a non-monotonic relationship with a critical threshold.
Using pre-existing exposure as an instrument for ChatGPT adoption in a long-difference IV design, ChatGPT adoption causes households to spend more time on digital leisure activities while leaving total time spent on productive online activities unchanged.
IV long-difference empirical design: instrumenting household adoption with pre-ChatGPT exposure (2021 browsing); outcome measured as changes in categorized browsing durations (LLM-based classification into 'leisure' vs 'productive' sites); controls include demographic-by-region fixed effects and browsing composition controls.
Once efficiency is made explicit, the main practical question becomes how many efficiency doublings are required to keep scaling productive despite diminishing returns.
Framing/forecasting claim in the paper presenting an operational research question (conceptual; no empirical sample in excerpt).
The practical burden of scaling depends on how efficiently real resources are converted into that (logical) compute.
Argument in the paper linking conceptual 'logical compute' to real-world conversion efficiency (qualitative claim; no empirical sample in excerpt).
The compute variable is best understood as logical compute, an implementation-agnostic notion of model-side work.
Conceptual argument presented in the paper reframing 'compute' as an abstract, implementation-agnostic quantity (no empirical sample provided).
These patterns are consistent with a reorganization of the scientific production process rather than immediate efficiency gains, in line with theories of general-purpose technologies.
Interpretation linking observed changes in budget allocation, team size, and task breadth (from the proposal dataset and task-level analyses) to theoretical predictions about general-purpose technologies (GPTs); empirical findings show organizational change rather than large average short-run productivity gains.
This paper offers a forward-looking framework that emphasizes the decentralizing potential of AI on labor markets, moving beyond the traditional displacement-versus-creation dichotomy.
Paper's stated contribution; based on conceptual framework and synthesis of historical and contemporary analyses (no empirical validation presented in the abstract).
The emergence of artificial intelligence and robotics is catalyzing a profound transformation in the nature of human labor.
Stated as a central premise in the paper's abstract; supported by the paper's synthesis of economic history, contemporary labor market data, and analysis of digital platform growth (no specific datasets or sample sizes reported in the abstract).
The resulting AI safety profile is asymmetric: AI is bottlenecked on frontier research (novel tasks) but unbottlenecked on exploiting existing knowledge.
Theoretical implication of the novelty-bottleneck model distinguishing novel (human-judgment) vs. routine (covered by agent prior) components of tasks.
Wall-clock time can be reduced to O(√E) through team parallelism, but total human effort remains O(E).
Model-derived result showing parallelism across humans can speed wall-clock completion time while aggregate human effort does not drop asymptotically.
Better agents improve the coefficient on human effort but not the exponent (i.e., they reduce the constant factor but do not change the asymptotic scaling class).
Analytic result from the stylized model under the paper's assumptions about task decomposition and novelty fraction ν.
India's systematic investment plan (SIP) flows provide a high-frequency observable for the model's endogenous participation rate and constitute the natural empirical laboratory for the displacement–participation mechanism.
Empirical suggestion in the paper proposing SIP flows as an observable proxy for the modelled participation rate and recommending India as a lab to test the displacement–participation channel (no empirical test reported in the excerpt).
Three analytical results characterise non-linear financial fragility, regime-contingent risk premium divergence, and the general equilibrium alignment squeeze.
Stated analytical results in the paper derived from the theoretical model describing three named phenomena (non-linear fragility, regime-contingent divergence, alignment squeeze).
Whether AI is equity-bullish or equity-bearish depends on which channel dominates—a condition that differs sharply between deep financial markets, where the ARP is the dominant driver of elevated risk premia (Regime D), and shallow markets, where participation compression dominates (Regime E).
Model regime analysis in the paper distinguishing Regime D (deep markets, ARP-dominated) and Regime E (shallow markets, participation-compression-dominated) and stating comparative dominance determines net bullish/bearish outcome.
The equilibrium equity risk premium decomposes into three additively separable terms corresponding to these three channels (Proposition 1).
Formal proposition (Proposition 1) in the paper deriving an additive decomposition of the equilibrium ERP into the productivity, participation compression, and alignment risk terms.
We develop a heterogeneous-agent framework in which AI-driven labour displacement affects the equity risk premium (ERP) through three co-equal channels.
Stated model contribution in the paper: a theoretical heterogeneous-agent framework that posits three channels linking AI-driven labour displacement to the ERP (productivity, participation compression, alignment risk).
The top four models are statistically indistinguishable (mean score 0.147–0.153) while a clear tier gap separates them from the remaining four models (mean score <= 0.113).
Reported mean performance scores across 8 models and statement of statistical indistinguishability for the top four vs lower-tier four; numerical means provided.
Behavioral factors — specifically trust calibration, cognitive load, and affective reactions — shape the transition of corporate AI initiatives from pilot deployments to scalable, sustained use.
Synthesis of human-AI interaction literature integrated with adoption frameworks (TAM and TOE); conceptual linkage rather than new empirical testing in this paper.
AI accelerates value-chain maturation while creating distinct risks — including professional responsibility tensions and potential system-level externalities.
Conceptual argument and risk analysis in the Article (theoretical reasoning and synthesis of management/ethics literature). No empirical causal estimate reported in the excerpt.
The legal profession is at a crossroads, caught between intensifying fears of AI-driven displacement and a generational opportunity for transformation.
Author's synthesis and framing in the Article (conceptual assessment; literature/contextual synthesis). No empirical sample or experiment reported in the excerpt.
This advantage is contingent upon robust AI governance, ethical frameworks, and the transition from 'pilot-lite' projects to integrated, data-driven 'AI-first' business models.
Conditional claim in the paper linking success to governance, ethics, and organizational integration; appears to be normative/analytical rather than empirical in the abstract.
Machine-readable metrics and open scholarly infrastructure are reshaping scholarly profiles and incentives.
Conceptual and historical discussion referring to platforms and metrics (e.g., arXiv, Google Scholar, ORCID) as mechanisms changing incentives; no new empirical estimates provided.
That interconnected ecosystem is fundamentally restructuring who can do science (access), how fast discoveries propagate, and what counts as a valid scientific contribution.
Argumentative claim linking infrastructural and tool changes to changes in access, dissemination speed, and norms of contribution. The paper presents examples and narrative but no systematic empirical evaluation or sample.
The most consequential development is not any single tool but the emergence of an interconnected ecosystem—AI agents, preprint platforms, open source codebases, and citation infrastructure—that forms a feedback loop.
Synthesis/argument based on multiple examples (LLM agents, preprint servers like arXiv, open-source code repositories, citation indices). No quantitative measurement or causal identification reported.
The central tension in AI for science is between automation (building systems that replace human researchers) and augmentation (tools that amplify human creativity and judgement).
Analytical claim based on the paper's review of historical examples and conceptual discussion; no primary data or experimental design reported.
Science has repeatedly delegated its bottlenecks to machines—first inference, then search, then measurement, then the full workflow—and each delegation solves one problem while exposing a harder one underneath.
Interpretive historical argument drawing on examples across AI-for-science milestones (e.g., DENDRAL, search and inference systems, measurement automation, and contemporary end-to-end workflows). No quantitative sample or experimental method reported.
Testing revealed AI excels at computational tasks but consistently misses nuanced factors like new construction rent premiums and infrastructure proximity impacts, validating the framework's hybrid structure as essential for professional-grade underwriting.
Findings from the controlled ChatGPT-4 test on the single 150-unit scenario: qualitative and comparative observations showing AI handled computations well but failed to capture specific local-market nuances, leading authors to endorse a hybrid human-AI framework.
Phase Two requires human-led professional validation to correct AI limitations, apply local market knowledge, and integrate risk factors.
Framework description supported by observations from the controlled test where human review was used to correct AI outputs and apply local knowledge (e.g., adjusting for nuanced market factors).
Traffic performance is sensitive to the distribution of safe time gaps and the proportion of RL vehicles.
Simulation results comparing Fundamental Diagrams across scenarios with different distributions of safe time gaps and shares of RL-controlled vehicles. Number of simulation runs or replicates not stated in the claim text.
AUROC_2 and M-ratio produce fully inverted model rankings, demonstrating these metrics answer fundamentally different evaluation questions.
Metric comparison across models showing that AUROC_2-based ranking and M-ratio-based ranking are fully inverted in the reported results on the evaluated dataset.
Temperature manipulation shifts Type-2 criterion while meta-d' remains stable for two of four models, dissociating confidence policy from metacognitive capacity.
Experimental manipulation (temperature changes) applied to models; reported result that Type-2 criterion shifted with temperature while meta-d' was stable for two models (out of four) in the 224,000-trial dataset.
Metacognitive efficiency is domain-specific, with different models showing different weakest domains, invisible to aggregate metrics.
Domain-level analyses reported in the paper showing per-domain M-ratio results and identification of different weakest domains per model, contrasted with aggregate metric behavior.
Metacognitive efficiency varies substantially across models even when Type-1 sensitivity is similar — Mistral achieves the highest d' but the lowest M-ratio.
Empirical comparison of Type-1 sensitivity (d') and metacognitive efficiency (M-ratio) across the four evaluated LLMs on the 224,000 QA trials; explicit statement that Mistral had highest d' but lowest M-ratio.
The paper's primary contribution is to combine established ingredients—attention scarcity, free-entry dilution, superstar effects, and preferential attachment—into a unified framework directed at claims about AI-enabled entrepreneurship.
Stated contribution and methodological description in the paper (synthesis and applied formalisation); this is a descriptive/methodological claim rather than an empirical result.
Modern pretrained time-series foundation models can forecast without task-specific training, but they do not fully incorporate economic behavior.
Statement in paper's introduction/abstract summarizing prior capabilities and limitations of pretrained time-series foundation models (no experimental sample or numeric evidence provided in the excerpt).
The governance risk-mitigation effects of AI operate through increasing financial risk exposure.
Authors' mechanism tests indicate a relationship between AI adoption and changes in financial risk exposure measures, which they interpret as a channel affecting executive behavior.
Organizational culture and technological readiness moderate the effectiveness of generative AI integration in decision-making processes.
The paper reports moderation effects tested in the SEM framework using survey data from senior managers, decision-makers, and AI adoption specialists (SmartPLS). No numeric moderator effect sizes or sample size provided in the excerpt.
Small language models offer privacy-preserving alternatives to frontier models, but their specialization is hindered by fragmented development pipelines that separate tool integration, data generation, and training.
Background claim stated in paper/abstract; no experimental data provided for this statement within the abstract.
Extensive synthetic experiments show that policy regularizations reshape the narrative on what is the best DRL method for inventory management.
Paper states results from extensive synthetic experiments that change which DRL methods are considered best under policy regularization; abstract does not provide the experimental sample size, specific methods, or quantitative comparisons.
Implementation of human-replacing technologies leads to significant transformations in skill demand: it reduces reliance on low-skilled labour while increasing demand for qualified engineers, system operators and specialists in digital technologies.
Sector-specific analysis and review of international labour-market studies cited in the article documenting skill-biased effects of automation and digitalization; qualitative assessment for Ukraine's mining and metallurgical sector under workforce shortage conditions.
Foreign direct investment (FDI) shows an insignificantly positive direct effect on local TFCP but a significantly negative indirect (spillover) effect, attributed to a 'pollution haven' effect.
Spatial Durbin Model estimates for FDI on panel (30 provinces, 2010–2023): direct coefficient positive but not significant; indirect coefficient significantly negative; interpretation given as pollution-haven mechanism.
Industrial intelligence exhibits regional heterogeneity: a significantly negative direct effect in the east, a significantly positive direct effect in the central region, an insignificant direct effect in the west, and positive indirect (spillover) effects in the east and west.
Regional/subsample Spatial Durbin Model analyses dividing the sample into east, central, and west regions (30 provinces, 2010–2023); reported region-specific direct and indirect coefficients and significance levels.
Industrial intelligence has an insignificantly negative direct effect on local TFCP, but its positive spatial spillover effect is significant at the 1% level, producing a significantly positive total effect.
Spatial Durbin Model results for industrial intelligence on panel (30 provinces, 2010–2023): direct coefficient negative and not statistically significant; indirect coefficient positive and significant at 1%; total effect positive and significant.
China's TFCP rose overall from 2010 to 2023 but exhibited a widening regional gap of 'higher in the east, lower in the west'.
Panel data of 30 Chinese provincial-level regions (2010–2023); TFCP measured using an undesirable-output super-efficiency SBM model and summarized temporal and spatial patterns.
The study identifies the main AI-enabled mechanisms advancing CE principles in smart manufacturing, waste valorisation, supply-chain transparency, and sustainable design.
Bibliometric network analysis of 196 peer-reviewed articles (2023–2024) and systematic review of 104 studies, per the abstract; identification is presented as a product of these analyses.
Governmental structures, labor supply and demand, and incorporation of financial measures act as key intervening variables affecting achieved ROI from GenAI implementations.
Qualitative synthesis and theoretical analysis reported in the paper identifying contextual/intervening variables.
Generative AI serves as an effective 'wingman' for employment lawyers, capable of replacing substantial junior associate work while requiring continued human expertise for client counseling, supervision, and final legal advice preparation.
Authors' synthesis of experimental results showing AI-produced substantive analysis plus discussion about remaining limitations (e.g., citation errors) and required human oversight; qualitative assertion about substitutability for junior associate tasks.
The paper proposes new mechanisms through which big data affects individual welfare (beyond simple productivity gains), linking privacy costs, multiplier effects, and R&D transformation patterns.
Theoretical/mechanism development: the paper articulates new channels in its macro theoretical framework describing how data sharing impacts welfare via multiple mechanisms (model construction and analytic discussion; no empirical/sample validation).
Consumption is affected by the multiplier effect and the transformation patterns of R&D.
Theoretical: model analysis links consumption dynamics to a multiplier effect and to how R&D transforms inputs/outputs (comparative statics/dynamics in the theoretical framework).
Individuals’ welfare is influenced by both the privacy cost of big data sharing and their consumption levels.
Theoretical: welfare in the model is specified as a function of consumption and a privacy cost term arising from big data sharing; result follows from analytic derivation within the model (no empirical/sample data).