Evidence (14922 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filter claims →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
The study theoretically extends workforce integration and social inclusion frameworks by explicitly incorporating language access mechanisms.
Authors assert theoretical contribution based on empirical findings linking translation access to labor-market integration, discussed in the paper's theoretical framing and implications sections.
This research is innovative by performing a comparative, multi-model evaluation of translation methods within a single labor market context, providing empirical evidence previously inaccessible in the literature.
Study design explicitly compares professional, AI-assisted, and hybrid models using combined quantitative and qualitative methods within specified U.S. cities; the paper frames this comparative, single-market approach as filling a literature gap.
Hybrid translation models produced approximately 20% higher retention rates relative to conventional methods.
Reported comparative retention-rate analysis from the study's quantitative dataset (survey of 150 LEP immigrants and placement/retention tracking) analyzed in SPSS v28.
Hybrid human–AI translation models achieved up to 40% greater accuracy in job placement compared to conventional translation methods.
Comparative quantitative evaluation reported in the study comparing placement accuracy across translation models (professional, AI-assisted, hybrid) using survey outcomes and placement metrics derived from the sample and analyzed in SPSS v28.
Professional and hybrid human–AI translation services significantly enhance employment alignment, retention, and workplace satisfaction for immigrants with limited English proficiency.
Quantitative analysis of survey data (n=150 LEP immigrants) and corroborating qualitative interview data (50 employers, 20 providers) analyzed via SPSS v28 and thematic coding in NVivo 14; the paper reports statistically significant improvements attributed to professional and hybrid translation models.
Multi-agent systems demonstrated improved collaborative behavior when guided by standardized prompt frameworks, reducing ambiguity and enhancing synergistic task execution.
Experimental simulations of multi-agent systems employing standardized prompt frameworks, with assessments of collaborative behavior expressed as coordination coherence and synergistic task execution efficiency. (Number of agents, experimental runs, and quantitative results not specified in the provided text.)
Well-constructed prompts significantly strengthened agents' ability to interpret complex inputs, generate context-appropriate actions, and maintain consistent performance under variable conditions.
Findings drawn from the experimental simulations comparing prompt quality (described as 'well-constructed' versus alternatives) and reporting improvements across interpretation, action-generation, and performance consistency metrics. (Details on experimental replication, sample size, and statistical significance not provided in the excerpt.)
Structured, context-rich, and strategically layered prompts improved agents’ situational awareness, reasoning accuracy, and operational adaptability.
Quantitative research design using experimental simulations where prompt structure was manipulated and agent outputs were evaluated. Performance indicators cited include response accuracy, task completion efficiency, coordination coherence, and error rates. (Paper does not report sample size or statistical values in the provided text.)
Hierarchical verification (property, interaction, and rollout tests) confirms semantic equivalence for all five environments; cross-backend policy transfer confirms zero sim-to-sim gap for all five.
Verification methodology described in the paper: hierarchical tests (property checks, interaction tests, rollout comparisons) applied to each of the five environments, plus cross-backend policy transfer experiments showing identical behavior/performance between backends.
TCGJax is the first deployable JAX Pokemon TCG engine, achieving 717K SPS for random actions and 153K SPS for PPO; 6.6x faster than the Python reference.
New environment synthesized from a web-extracted specification with throughput benchmarks for random-action and PPO modes, and a direct comparison to a Python reference implementation yielding 6.6x speedup.
The translated HalfCheetah JAX implementation outperforms Brax by 5x at matched GPU batch sizes.
Benchmarks comparing throughput of the HalfCheetah JAX translation against Brax under matched GPU batch sizes, reporting a 5x improvement.
PokeJAX is the first GPU-parallel Pokemon battle simulator, achieving 500M steps-per-second (SPS) for random actions and 15.2M SPS for PPO; 22,320x faster than the TypeScript reference.
Throughput benchmarks reported for PokeJAX (random-action SPS and PPO SPS) and direct comparison of SPS to a TypeScript reference implementation yielding the 22,320x factor. (Single environment: Pokemon battle simulator.)
EmuRust yields a 1.5x PPO speedup via Rust parallelism for a Game Boy emulator.
Benchmark comparison of PPO training/inference throughput between reference implementation and EmuRust; reported speedup factor 1.5x for PPO. (Single environment: Game Boy emulator.)
A reusable recipe (generic prompt template, hierarchical verification, iterative agent-assisted repair) produces semantically equivalent high-performance RL environments for <$10 in compute cost.
Methodological description in the paper: recipe combining prompt template, hierarchical verification, and agent-assisted repair; demonstrated by producing multiple environments with reported compute cost under $10. Empirical support comes from the set of reproduced environments (five total) and their reported build costs.
As AI adoption rises within companies, industries, and regions, demand for complementary skills increases even in non-AI roles.
Longitudinal/cross-sectional analysis of job postings (n ≈ 30 million, 2018–2024) with measures of AI diffusion at company, industry, and regional levels and comparisons of skill demand in non-AI roles over time and across contexts.
Complementary (non-technical) skills are associated with meaningful wage premiums, particularly in managerial, sales, or finance roles working with AI.
Wage/salary analysis linked to skill requirements within the same nearly 30 million job postings dataset (2018–2024), with subgroup analysis for managerial, sales, and finance roles identified as working with AI.
The success of sustainable development is deeply tied to the responsiveness and credibility of governance systems.
Central thesis of the paper supported by synthesis of governance frameworks, SDGs, and illustrative international examples; the summary does not provide quantitative metrics or sample-based validation.
Governance innovations, information systems, and inclusive institutions increase the prospects of just and adaptable progress.
Illustrated via discerning international instances and conceptual synthesis against SDG and governance frameworks; no specific sample size or controlled empirical study is described in the summary.
Transparency, inclusive participation, robust regulation, and the rule of law shape development outcomes across economic, social, environmental, and institutional spheres.
Conceptual analysis leveraging global governance frameworks and the Sustainable Development Goals (SDGs), supported by international examples and literature cited in the paper; no quantitative sample size or statistical analysis is reported in the summary.
Alongside concerns, AI proliferation may introduce new, positive affordances for military decision-making organizations.
Normative/analytical claim by the author based on argumentation; no empirical demonstration, experimental results, or case-study evidence is provided in the excerpt.
Military AI adoption is incentivized by competitive pressures and expanding national security needs.
Author assertion based on qualitative argumentation and literature-informed reasoning; no empirical study, dataset, or sample size reported in the text.
Process-oriented skills appear in 15.6% of feasible transition pathways and emerge as the highest-leverage intervention.
Feature analysis of the 4,534 identified transitions showing process-oriented skills present in 15.6% of pathways; statement that these skills constitute the highest-leverage intervention (comparative ranking implied by analysis).
Eliciting probabilities (instead of forcing binary labels) enables post-hoc recalibration that improves both individual-worker and crowd-level label quality.
Methodological approach in the field experiment: comparison between binary-label interface and elicited-probability interface, followed by linear-in-log-odds recalibration applied to probabilistic responses at worker and crowd aggregation levels. Improvements in label quality reported (specific metrics and sizes not included in the excerpt).
The improvements from balanced feedback, probabilistic elicitation, and pipeline-level recalibration carry through to downstream convolutional neural network (CNN) reliability out of sample.
The study trained convolutional neural networks on labels produced under the different labeling and recalibration pipelines and evaluated out-of-sample reliability; reported that the gains observed at the labeling stage improved downstream CNN reliability (exact architectures, training/validation splits, and quantitative out-of-sample results not provided in the excerpt).
Pipeline-level recalibration substantially improves probabilistic calibration of labels.
Empirical evaluation in the DiagnosUs experiment where probabilistic labels were recalibrated (linear-in-log-odds) and calibration metrics were compared pre- and post-recalibration (specific calibration metrics and numeric results not provided in the excerpt).
Post-processing probabilistic labels using a linear-in-log-odds recalibration approach at the worker and crowd levels substantially improves classification performance.
The paper applied linear-in-log-odds recalibration to elicited probabilistic labels at both individual-worker and aggregated crowd levels, then evaluated classification performance on labels before and after recalibration (methods and quantitative effect sizes not provided in the excerpt).
Balanced feedback (higher positive prevalence in the feedback stream) and probabilistic elicitation reduce rare-event misses.
Results from the DiagnosUs field experiment comparing conditions that vary feedback prevalence (20% vs. 50%) and response interface (binary labels vs. elicited probabilities); miss rates were compared across conditions (sample sizes not given in the excerpt).
A combined scenario pairing moderate productivity gains with moderate cost control nearly eliminates the deficit by 2050.
Specific combined policy scenario simulated in the model projecting fiscal indicators to 2050; reported outcome is near-elimination of the government deficit under those assumptions.
Policy experiments show that productivity improvements and controlling per-person costs offer the most effective near-term relief, because they act quickly through revenue and spending channels.
Counterfactual/policy scenario simulations run with the calibrated system dynamics model comparing effects of productivity gains and per-person cost controls versus other levers; near-term (short- to medium-run) impacts reported.
The model, grounded in official statistics, tracks historical trends reasonably well.
Model historical validation presented in the paper comparing model outputs to observed historical time series (fit to past demographic and fiscal indicators).
This study offers the first systematic analysis of labor markets and the qualitative traits of participants in the criminal ecosystem of the SDE.
Authors' stated contribution claiming novelty; systematic analysis of labor-market roles and participant traits within the paper (methods described as systematic analysis/qualitative review; no external verification or comparative bibliometric analysis provided).
AI innovation produces significant positive spatial spillover effects on employment in neighboring cities, promoting expansion of their employment scale.
Spatial analysis (spatial econometric tests) on the 268 Chinese cities (2010–2023) indicating positive spillovers to neighboring cities' employment.
Temporally, AI innovation affects urban employment through both immediate and lagged effects, with the magnitude of these effects diminishing over time.
Temporal (lag) analysis in extended tests on the 268-city panel covering 2010–2023.
Governmental digital attention positively moderates the relationship between AI innovation and urban employment.
Moderation analysis using measures of governmental digital attention and AI innovation in the 268-city panel (2010–2023).
AI innovation indirectly promotes employment growth by enhancing urban economic density (mediation effect).
Mechanism (mediation) analysis conducted on the 268-city panel (2010–2023) showing economic density as an intermediary channel.
The positive employment effect of AI innovation is stronger in southern cities than in others.
Geographic heterogeneity analysis across 268 Chinese cities (2010–2023).
The positive employment effect of AI innovation is more pronounced in the tertiary sector.
Heterogeneity/sectoral analysis using the panel of 268 Chinese cities (2010–2023).
The positive employment effect of AI innovation is more pronounced in the secondary sector.
Heterogeneity/sectoral analysis using the same panel of 268 Chinese cities (2010–2023).
Overall, AI innovation has a positive effect on urban employment.
Empirical testing on a panel of 268 Chinese cities over the period 2010–2023 (integrated theoretical and empirical analysis).
Our framework achieves a 67% cost reduction compared to the matched hierarchical baseline.
Empirical comparison against a matched hierarchical baseline on the reported evaluation set; paper reports a 67% reduction in cost (operational/cost-per-query as reported by authors).
Our framework achieves an 85% reduction in conversational rework compared to the matched hierarchical baseline.
Empirical comparison against a matched hierarchical baseline on the reported evaluation set; paper reports an 85% reduction in conversational rework.
Our framework achieves a 72% reduction in time-to-accurate-answer compared to the matched hierarchical baseline.
Empirical comparison against a matched hierarchical baseline on the reported evaluation set (2,847 queries); paper reports a 72% reduction in the time-to-accurate-answer metric.
Successful adaptation does not require wholesale abandonment of traditional models nor uncritical technological embrace, but deliberate institutional redesign balancing technological innovation with preservation of core academic values.
Authors' synthesis and prescriptive conclusion drawn from the analysis; presented as a recommended strategy rather than empirically validated practice.
Strategic recommendations emphasize hybrid models that integrate AI capabilities while preserving irreplaceable human elements in higher education.
Paper's concluding recommendations based on its comparative function analysis and normative assessment; not accompanied by empirical trials of proposed hybrid models.
Workforce development systems need lifelong learning infrastructure and dynamic credentialing to support continuous reskilling in an AI-rich environment.
Prescriptive conclusion from the authors based on projected labor-market and skills impacts; no empirical pilot or sample study cited to validate the recommendation.
The transformation driven by AI requires governments to redesign accreditation frameworks and quality assurance mechanisms.
Policy recommendation arising from the paper's analysis of accreditation and validation issues; presented as normative guidance rather than empirically tested intervention.
AI systems democratize knowledge access, personalize learning, and offer scalable skills training.
The paper presents this as a conceptual claim based on literature synthesis and theoretical analysis; no empirical sample size or primary data reported.
Systematic economic impact assessment is vital for guiding public investments, workforce development, and policy decisions related to agricultural technology adoption.
Author conclusion based on study findings from IMPLAN 2022 I–O modeling and the observed differences between robotics and traditional greenhouse scenarios; normative recommendation.
Technological innovation in agriculture (robotics) not only boosts productivity but also contributes to broader regional resilience and economic diversification.
Synthesis of I–O model outcomes (expanded sectoral impacts and higher multipliers) and conceptual arguments in the paper relating diversified economic linkages and productivity gains to regional resilience.
Robotics adoption supports sustainable employment opportunities (i.e., durable regional jobs) rather than simply eliminating jobs.
I–O modeling results showing induced and indirect employment effects from robotics investments in NWI; study discussion framing these as sustainable employment opportunities.