Evidence (13870 claims)
Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 196 | 98 | 892 | 1984 |
| Governance & Regulation | 817 | 394 | 188 | 121 | 1544 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 627 | 233 | 123 | 96 | 1088 |
| Research Productivity | 411 | 123 | 56 | 332 | 933 |
| Output Quality | 467 | 178 | 59 | 47 | 751 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 167 | 122 | 24 | 496 |
| Task Allocation | 207 | 64 | 71 | 32 | 379 |
| Skill Acquisition | 165 | 59 | 60 | 17 | 301 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 52 | 107 | 13 | 279 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 150 | 48 | 26 | 3 | 227 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 63 | 20 | 12 | 184 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 93 | 21 | 13 | 19 | 148 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Creative Output | 31 | 17 | 7 | 3 | 59 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
LLM-mediated reward design can affect demographic equity in occupant comfort (i.e., LLM reward shaping has the potential to exhibit or exacerbate disparities).
Motivation and empirical demonstration in paper: initial rounds showed disparities and later rounds changed group outcomes via LLM-generated reward adjustments.
Recent evidence has shown a nuanced pattern involving task automation, role transformation, displacement risk, augmentation, and new roles.
Claim in the paper referencing unspecified recent empirical work (no specific studies or sample sizes provided in the excerpt).
At the token-probability level, the distribution shifts continuously rather than via a threshold when histories bias later judgments.
Token-level analysis reported as a follow-up: observed continuous shifts in token probability distributions rather than abrupt threshold changes.
AI informativeness mediates the relationship between AI usage and learning outcomes.
Experimental manipulation/measurement of AI informativeness in the controlled logical reasoning task and analysis linking informativeness to observed patterns in learning and performance (details and sample size not provided in abstract).
Over the years, fast AI caused a considerable number of incidents, yet these declined, and imaginative AI, with the mass introduction of generative AI, started to cause incidents.
Temporal analysis of incident reports (across the dataset of 1,524 incidents) showing trends in incident attribution by AI trait over time: early concentration in 'fast' AI incidents declining and later emergence of 'imaginative' (generative) AI incidents.
Chinese SMEs exhibit a distinctive policy- and platform-mediated adoption pathway, where state-backed digitalization lowers entry barriers but creates dependencies on external ecosystems.
Synthesis of Chinese case studies and context-specific analyses among the included studies; number of China-focused studies not specified in the summary.
The authors contend that commercial AI development is closely linked to prevailing social, political, and economic circumstances, and that we need to examine that closeness.
Stated argument in the paper's framing that motivates the critical software studies approach; presented as a theoretical claim rather than supported by empirical data in the excerpt.
AI anchors will not broadly replace human anchors, but can be strategically effective when matched to efficiency‑oriented (utilitarian) consumer goals.
Authors' conclusion extrapolating from moderated mediation findings in the experiment (N = 439) showing limited conditions under which AI anchors generate trust.
Productivity gains tend to be larger in controlled experimental settings and smaller in open-source and enterprise contexts.
Moderator/subgroup comparisons reported in the meta-analysis comparing controlled experiments versus open-source and enterprise study contexts; the paper explicitly reports larger effects in controlled experimental settings and smaller effects in open-source/enterprise contexts.
This boundary is not explained by scale alone: some failures respond to targeted interventions, but the effects are model-specific rather than universal.
Intervention experiments reported in the paper showing that targeted interventions fixed some model failures, but response patterns varied across models (i.e., interventions worked for some models/tasks and not others).
Architectural and training differences among VLMs may lead to distinct behavioral responses to visual priming.
Observed heterogeneity in how different state-of-the-art VLMs responded to the same visual primes and mitigations; authors suggest model architecture and training as plausible explanatory factors. (Framed as a point for further investigation rather than a proven causal finding in the abstract.)
Further contribution of AI to potential GDP is associated with a reduction in human resources and the easing of industry constraints.
Scenario projections and conditional analysis in the study which link future AI-driven GDP gains to reductions in human-resource constraints and structural industry limitations.
GPT-4o can be characterized as a cautious allocator, combining relatively favorable evaluations with conservative funding decisions.
Comparative analysis of model outputs across 20 decks and repeated runs showing GPT-4o gives relatively favorable evaluation scores but recommends more conservative funding allocations.
There are systematic and statistically significant differences across models in funding recommendations, evaluation scores, and expressed confidence.
Controlled simulation comparing three models across 20 decks with repeated runs; paper reports statistical significance of model differences in funding recommendations, evaluation scores, and expressed confidence.
After accounting for these factors, the study identifies three interconnected propositions describing how AI adoption is fundamentally restructuring knowledge work.
Paper conclusion statement that, conditional on the described data and methods, it derives three propositions about AI-driven restructuring of knowledge work (propositions not detailed in the provided abstract).
There is a strong correlation between predictive performance and spectral entropy.
Analysis of predictive performance against spectral entropy metrics computed for the datasets in the benchmark; reported correlation finding in the paper.
Digitalization changes corporate governance in German industry, prompting either atomization of inter-corporate relations in the race for technologies and skills or the formation of new forms of cooperation and coordination influenced by institutional legacies and pressures to adjust business models.
Framing of research question and synthesis of findings from the authors' M&A analysis across German industry; the provided excerpt presents this as the central empirical/theoretical tension addressed by the paper.
The value-creation process of AI adoption varies according to industry-specific complementary asset structures and ecosystem conditions, with implications for digital transformation strategies, investment decisions, and AI diffusion policy.
Interpretive conclusion drawn from heterogeneous empirical findings (differential Tobin's Q outcomes and operational performance dynamics across industries) presented by the authors.
Tokenization economics, pricing structures, and budgeting constraints materially affect the buy-versus-build decision for enterprise LLM adoption.
Analytical discussion in the paper examining tokenization costs, pricing models, and budget considerations; illustrated via the Bills Converter case study.
This restructures professional expertise, organizational communication, and how productive labor is recognized.
Theoretical implications drawn from the central thesis and cross-disciplinary evidence; no empirical measurement of restructuring provided.
Many European countries have converged with both poles (i.e., they have integrated with both the US and China).
Network analysis of cross-country collaborations and citation links from multi-decade publication data, compared to randomized baselines.
Anthropic shows low consumer-channel risk and elevated risk in enterprise coding-agent segments in the authors' comparative mapping.
Results from the stylized calibration/comparative risk mapping applied to Anthropic (April 2026 data); authors' interpretation.
That boundary tracks where they locate professional identity, suggesting that the value of AI tooling may lie as much in where and how precisely it stops as in what it does.
Authors' interpretive conclusion drawn from the thematic analysis and patterns observed in the survey responses (n=860).
Lower-skill workers exhibit higher individual productivity gains from AI tools than senior workers, but this does not automatically translate into proportional GDP capture given the skill-weighted capture rate framework applied here.
Paper statement noting distributional asymmetry, described as consistent with Cognizant's (2026) internal findings and captured by the model's skill-weighted capture rate.
Local embedding conditions shape the internal allocation of AI activity along mapped sub-technology branches, implying place-based AI innovation policy relevance.
Synthesis/conclusion from empirical findings (inverted U relationship, moderation by complexity, heterogeneity by region and city size) indicating that local context affects sub-technology allocation.
AI diffusion in China has proceeded at an uneven pace across cities.
Descriptive statements supported by the panel data on AI patenting and urban statistics showing spatial variation across Chinese cities (2014–2023).
AI effectiveness depends on staff training, ethical governance, and strategic alignment.
Commonly reported moderating factors and prerequisites across the included studies (qualitative and possibly empirical evidence across the 27 studies).
Long-term competitive performance in B2B firms is more closely associated with the organisational alignment of governance structures, innovation capabilities, and GenAI adoption than with technology adoption alone, challenging technology-deterministic assumptions.
Synthesis of PLS-SEM findings from survey data of 104 Portuguese B2B managers showing multiple organisational factors (governance, innovation orientation, GenAI adoption) jointly relate to performance and that governance was the strongest correlate.
The role of GenAI adoption is complementary rather than dominant for long-term competitive performance.
Survey of 104 Portuguese B2B managers and PLS-SEM results indicating other organisational factors (e.g., governance, innovation capabilities) have central roles alongside GenAI adoption.
The authors observed weak value misalignment in the coding models and describe how they addressed it.
Case study reports observation of value misalignment in models and reports mitigation/handling strategies (descriptive, not quantitatively evaluated in abstract).
These findings challenge the traditional Routine-Biased Technological Change (RBTC) hypothesis by showing substantial exposure among non-routine cognitive occupations.
Interpretation of cross-sectional OAI results compared to RBTC expectations (which predict routine tasks are most exposed). The paper claims empirical OAIs contradict RBTC for LLMs.
AI's career impact is organizationally mediated rather than technologically predetermined.
Interpretation/conclusion drawn from the study's survey, regression, and mediation results (empirical analyses described in paper; sample size not stated).
The platform's algorithmic content distribution mechanism can moderate the competing interests between AIGC scale and consumer preference for HGC.
Deeper analysis of distribution mechanisms reported in the paper indicating that algorithmic ranking/distribution influences how AIGC and HGC are surfaced and can therefore affect their relative reach and engagement.
We present, to the best of our knowledge, the first large-scale study of real-world conversational programming in IDE-native settings.
Authors' assertion about novelty; study scope described (analysis of messages from Cursor and GitHub Copilot across public repositories).
The observed episodic sequence of routine-job adjustments is likely shaped by technological change alongside macroeconomic and institutional forces.
Interpretation offered by authors based on timing of routine-job adjustments and contextual factors; informed by decomposition analyses but described as a likely cause.
The observed behaviors stem from a root cause: current models are trained as monolithic agents, so splitting them into director/worker roles conflicts with their training distribution; retaining each model close to its trained mode (text generation for the manager, tool use for the worker) and externalizing organizational structure to code enables the pipeline to succeed.
Qualitative analysis and interpretation of experimental results and pipeline design choices reported in the paper (comparison of different pipeline structures and model modes).
This abstraction (logical compute) helps explain both why the laws travel so well across settings and why they give rise to a persistent efficiency game in hardware, algorithms, and systems.
Paper-provided explanatory argument connecting abstraction to empirical observations of cross-setting regularity and continued efficiency-focused innovation (no numerical evidence in excerpt).
Mundlak (correlated random effects) specifications indicate that the between-country components are statistically insignificant, while within-country effects remain significant.
Results from Mundlak (correlated RE) specifications reported in abstract indicating insignificance of between-country components and significance of within-country components (no numeric coefficients for the between/within split given in abstract).
Keyword-style queries persist even among experienced users.
Analysis of query types across experience levels in the Asta dataset showing continued presence of keyword-style queries among users labeled as experienced.
Prior research has primarily focused on automating user actions through clicks and keystrokes, this paradigm overlooks human intention, where users value the ability to explore, iterate, and refine their ideas while maintaining agency.
Literature characterization and conceptual argument presented in the paper's introduction (qualitative claim based on authors' synthesis of prior work and user values).
It develops a new, evidence-based typology of AI governance models and shows that differences across countries are driven by institutional structures and not by ethical principles alone.
Authors' typology constructed from coded indices (n=24) and argued causal inference that institutional structures, rather than shared ethical language, explain cross-country differences.
These differences reflect the historically embedded political–economic institutions shaping each regime.
Interpretive causal claim linking comparative coding results to historical political-economic institutional contexts of the regions; based on theory-guided analysis of the 24 documents.
Our results suggest that arbitrage can be a powerful force in AI model markets with implications for model development, distillation, and deployment.
Synthesis/conclusion based on the paper's empirical findings (case study, robustness experiments, distillation analysis) and economic interpretation.
The paper provides supporting empirical evidence spanning frontier laboratory dynamics, post-training alignment evolution, and the rise of sovereign AI as a geopolitical selection pressure.
Empirical/observational sections in the paper that the authors state cover those three areas (specific datasets, experiments, or case studies are referenced in the text but not quantified in the abstract).
The paper develops an illustrative empirical application based on event studies of AI-agent capability disclosures and heterogeneous market repricing.
Methodological description in the paper: an illustrative empirical application using event-study methodology on AI capability disclosures and observing heterogeneous market repricing; the excerpt does not report sample size or quantified results.
Macroeconomic effects remain hard to observe because of a 'productivity J-curve': firms often must invest in organizational changes first and only later realize measurable financial/productivity gains from AI.
Conceptual synthesis supported by firm-level case studies and empirical papers in the reviewed literature indicating implementation lags; the brief frames this as an interpretation of mixed short-run macro evidence rather than a single causal estimate.
There are architectural tensions between actor-critic frameworks and value-based methods in DRL for finance, and state-space representation and reward function engineering are important to performance in complex financial environments.
Analytical comparison and emphasis in the paper; the excerpt does not include quantitative comparisons, ablation studies, or dataset descriptions to substantiate which architectures perform better under which conditions.
The paper provides an extensive system-level investigation into the deployment of DRL architectures for dynamic portfolio optimization.
Stated scope of the paper (system-level investigation); details about methods, datasets, experimental design, or sample sizes are not given in the provided text.
An extended evaluation over 2024–2025 reveals market-regime dependency: the learned policy performs well in volatile conditions but shows reduced alpha in trending bull markets.
Out-of-sample robustness claim: evaluation over an extended period (calendar 2024 through 2025). The excerpt states qualitative regime-dependent performance but does not provide quantitative splits, volatility/trend definitions, sample sizes, or per-regime performance metrics.
The success of regulatory sandboxes ultimately depends on sound institutional safeguards, proportionality, and alignment with broader policy objectives.
Normative conclusion derived from the paper's analytical framework and comparative lessons (no empirical validation reported in the abstract).