Evidence (7278 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filtered →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Revealing hidden state reduces label uncertainty.
Experiments (hidden-state ablations) in the compact hidden-budget bidding task and/or two-hotel benchmark where providing hidden state information to the learner reduced uncertainty in inferred labels.
Ülkelerin yapay zekâ kaynaklı yapısal dönüşüme uyum sağlayabilmesi için koordineli ve uzun vadeli politika çerçevelerine ihtiyaç vardır; ticaret politikası, sanayi politikası ve dijital düzenlemeler bütünleşik bir strateji dahilinde ele alınmalıdır.
Çalışmanın sonuç ve politika önerisi bölümü; normatif tavsiye ve koordinasyon gereksinimi üzerine argüman; ampirik kanıt veya uygulama örnekleri verilmiyor.
Gelişmekte olan ülkeler için dijital altyapıya erken yatırım yapmak yeni rekabet gücü pencereleri açabilir.
Kavramsal argüman; politika yönelimi ve stratejik öneri; ampirik test veya nicel kanıt sunulmamıştır.
Otomasyon ve akıllı üretim sistemlerinin yaygınlaşmasıyla ucuz işgücüne dayalı karşılaştırmalı üstünlüklerin aşınması ve üretimin gelişmiş ekonomilere veya müttefik ülkelere geri dönüşünü (reshoring ve friendshoring) ifade eden eğilimlerin ivme kazanması beklenmektedir.
Kavramsal analiz ve beklenen teknoloji→tüketim/üretim mekanizmalarına ilişkin mantıksal çıkarımlar; çalışmada ampirik test veya nicel veri sunulmamıştır.
Simulations calibrated to a real multifamily rental market confirm that supra-competitive outcomes arise robustly beyond the theoretical assumptions, including under finite horizons, heterogeneous products, and nonlinear logit demand.
Simulation experiments calibrated to a real multifamily rental market; simulations test finite-horizon settings, product heterogeneity, and nonlinear logit demand formulations.
Under symmetric exploration, prices can reach monopoly levels.
Theoretical result derived in the ODE analysis showing convergence to monopoly-level prices in symmetric exploration scenarios.
Supra-competitive prices arise when firms explore within similar price ranges on the same side of the Nash price.
Analytical characterization from the fluid-limit ordinary differential equation (ODE) analysis of the explore-then-exploit pipeline with misspecified monopoly-style estimation.
Simple algorithmic pricing systems can systematically produce collusive-like (supra-competitive) prices in multi-firm markets.
Theoretical model of multi-firm pricing with an explore-then-exploit pipeline and misspecified monopoly-style demand estimation; fluid-limit ODE analysis characterizing convergence; supporting simulations calibrated to a real multifamily rental market.
We demonstrate its extraterritorial scope for gaining access to elements such as employment contracts and NDAs that have never been provided to the workers concerned.
Reported legal/empirical demonstration in paper: GDPR requests resulting in access to employment contracts and nondisclosure agreements (NDAs) that workers had not previously received. (Exact number of successful requests not stated in the excerpt.)
We audit the working conditions of content moderators in Kenya and Nigeria employed by business process outsourcing (BPO) companies by using the European General Data Protection Regulation (GDPR).
Method reported in paper: use of GDPR data-subject access / information requests to BPOs and platforms to obtain employment-related documents for content moderators in Kenya and Nigeria. (Sample size / number of requests not stated in the excerpt.)
Design principles that promote disagreement and decentralization—contextual grounding, community customization, continual adaptation, and polycentric governance—should be used so oversight is distributed across many legitimate centers rather than centralized in one institutional or moral chokepoint.
Normative design recommendations and governance proposals provided in the paper (argumentative; no empirical governance evaluation reported).
A range of technical directions (e.g., data filtering and upsampling, pre- and post-training, evaluations, collaborative value collection) are relevant for supporting positive alignment across different phases of the LLM and agents lifecycle.
Prescriptive technical recommendations and research directions described by the authors (conceptual proposals, not reported empirical tests).
Several existing failures of alignment (e.g., engagement hacking, loss of human autonomy, failures in truth-seeking, low epistemic humility, error correction, lack of diverse viewpoints, and being primarily reactive rather than proactive) may be better addressed through positive alignment, including cultivating virtues and maximizing human flourishing.
Theoretical argument and illustrative examples presented in the paper (no experimental or observational results reported).
Positive Alignment is a distinct and necessary agenda within AI alignment research.
Normative argumentation in the paper advocating for a separate research agenda (no empirical validation presented).
Positive Alignment is the development of AI systems that (i) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way while (ii) remaining safe and cooperative.
Paper's definitional proposal / conceptual framing (normative definition rather than empirical evidence).
Policy frameworks are necessary to govern verifiable machine intelligence in modern socio-technical infrastructures.
Normative recommendation and policy discussion in the paper; no empirical policy evaluation or legislative case studies are presented in the supplied text.
Process-based supervision has broader implications for algorithmic fairness and can reduce black-box opacity.
High-level discussion in the paper linking process-verifiability to fairness and reduced opacity; no empirical fairness audits or quantitative fairness metrics reported in the provided text.
Integrating reinforcement learning with process-oriented feedback can foster a more transparent AI ecosystem where the path to a conclusion is as scrutinized as the conclusion itself.
Conceptual claim and proposed benefit in the paper; presented as an argument rather than supported by empirical transparency or interpretability studies in the supplied text.
Process-based supervision significantly improves the reliability of models in high-stakes domains such as law, medicine, and engineering.
Asserted by the authors as an advantage of PRMs for high-stakes applications; presented as argumentation rather than backed by reported empirical trials or case-study sample sizes in the provided text.
Optimizing PRMs through reinforcement learning enhances the verifiability and robustness of multi-step reasoning in large-scale model architectures.
Central argumentative claim of the paper (theoretical proposal and conceptual analysis); no experimental results or quantitative evaluation provided in the text supplied.
Process-Based Reward Models (PRMs) assign value to each distinct stage of a reasoning chain, providing a more granular signal for training than outcome-only approaches.
Methodological description and conceptual argument in the paper; described as a design/approach rather than empirically validated with data.
Overall, the study provides a cross-sectoral empirical foundation for understanding how budget flexibility, governance, and technology interact to support resilient financial systems in uncertain economic environments.
Synthesis statement based on the paper's cross-sectoral comparative analysis combining firm 10-K data (four firms), Open Budget Survey, OECD database, GAO reports, and the Flexibility Index.
In the public sector, systems characterized by strong transparency frameworks and Medium-Term Expenditure Frameworks demonstrate higher alignment between planned and actual expenditures.
Cross-sectional analysis using Open Budget Survey 2023, OECD Budget Practices Database, and U.S. GAO oversight reports linking transparency and MTEFs to alignment between planned and actual expenditures.
Firms with decentralized budgeting structures and embedded predictive analytics exhibit lower forecast deviations and faster resource reallocation.
Comparative empirical analysis of four large firms using Form 10-K data (2019–2023) and the Flexibility Index to relate decentralization and AI integration to forecast deviations and reallocation speed.
Methodologically, the study demonstrates how expert reasoning can be operationalized as a benchmark for evaluating AI systems in urban infrastructure contexts, addressing gaps in empirical assessment and governance tools.
Study design: creation of Delphi-derived rubric from 20 experts and its use as an evaluation benchmark for six LLMs; reported as a methodological contribution.
The Delphi process elicited and refined expert reasoning criteria, producing a rubric that emphasized public safety, regulatory compliance, contextual judgment, financial stewardship, and system reliability.
Method: Delphi process with 20 infrastructure professionals that generated and refined reasoning criteria; resulting rubric content reported in paper.
In an empirical study of the Community Health Centers rollout, estimated spillovers account for a substantial share of the effect on older-adult mortality.
Empirical application reported in the paper applying the proposed methods to the Community Health Centers rollout; estimated spillover component contributes substantially to the measured effect on older-adult mortality (results from observational data analysis).
Monte Carlo simulations show the proposed estimators have small bias for these effects and the associated confidence intervals have coverage close to the nominal level.
Monte Carlo simulation evidence reported in the paper indicating small bias of the proposed estimators and coverage of confidence intervals close to nominal in the simulated settings.
The framework contributes to improving understanding of enterprise coordination and governance under constrained legal conditions and offers a basis for future analytical and empirical research.
Author-stated contribution of the paper based on the developed theoretical framework; positioned as foundation for future work.
The analysis identifies theoretical conditions under which such governance may support verifiable integrity, adaptive compliance, and access to formal markets.
Theoretical conditions derived from the review and theory synthesis (no empirical testing reported in this paper).
The study develops a theory-based framework explaining how RegTech-supported governance may, under specified conditions, enable sanctions-safe enterprise ecosystems during post-conflict reconstruction.
Primary contribution of the paper: theory synthesis built from integrative review of five literature streams (RegTech, sanctions compliance, institutional voids, supply-chain governance, algorithmic accountability).
Post-conflict reconstruction relies heavily on private enterprises to bring back employment, rebuild supply networks, and reconnect damaged economies.
Statement grounded in literature cited in the review (paper positions this as a general premise from post-conflict reconstruction literature); no primary data reported.
A causal ablation confirms that each of the four mechanical enforcement primitives is individually necessary.
Causal ablation experiments reported by authors in the synthetic banking domain: removing each primitive degrades performance/governance, implying individual necessity. Abstract does not report exact experimental counts or effect sizes.
Mechanical enforcement raises task accuracy from MCC ~0.43 to 0.88.
Reported Matthews correlation coefficient (MCC) for task accuracy under text-only governance (≈0.43) versus mechanical enforcement (≈0.88) in the paper's synthetic experiments; sample size not provided in abstract.
Mechanical enforcement more than doubles deferral information content.
Comparison of information-content measures for deferrals between text-only governance and mechanical enforcement in the synthetic banking domain experiments; exact numeric basis not given in abstract.
Mechanical enforcement reduces the rate of deferrals that carry no decision-relevant information by 73%.
Head-to-head comparison between text-only governance and a mechanically enforced architecture (four primitives) in the paper's synthetic banking experiments; specific sample size not stated in abstract.
These results challenge the presumed universality of the fairness-accuracy tradeoff and demonstrate that well-designed modeling improvements can advance both fairness and accuracy in large-scale public sector systems.
Synthesis of the three complementary analyses (observational county-level correlations, simulation experiments with added property features, and simulations incorporating Census data) performed on the 26 million-sale dataset covering ~95% of U.S. counties.
Incorporating publicly available Census data into assessment models - a feasible reform in most counties - would significantly improve both accuracy and fairness relative to status quo assessments.
Simulated reforms adding publicly available Census covariates to assessment models and comparing resulting accuracy and fairness metrics to status-quo assessments across the dataset covering 26 million sales/95% of counties.
When accuracy improves in the simulated assessment models, fairness almost always improves as well.
Analysis of simulated model outcomes showing joint changes in accuracy and fairness metrics across many simulated configurations and counties; reported near-universal co-improvement when accuracy rises.
In simulated assessment models, adding property features improves accuracy in most cases.
Simulation experiments using alternative assessment models that include additional property-level features; comparisons between baseline and feature-augmented simulated models across many counties/cases.
Assessment accuracy and fairness - measured using domain-relevant metrics - are strongly correlated across counties under status quo practices.
Observational analysis of status-quo assessment outcomes using a dataset of 26 million property sales spanning ~95% of U.S. counties; county-level correlation analysis between domain-relevant accuracy metrics and fairness metrics.
Policy should prioritize employment‑centered digital strategies that are spatially differentiated and institutionally grounded to mitigate negative labor and development effects.
Normative policy recommendation arising from the paper's theoretical framework and regional field observations (policy prescription; not an empirically estimated intervention in the paper).
PRIF yielded an average ROI of 83%.
Reported financial evaluation/ROI estimate following PRIF adoption in the paper (derived from pilot/case study cost-benefit or sample analysis).
PRIF adoption reduced financial misstatements by 47%.
Reported change in financial misstatement incidence after PRIF implementation in the paper's evaluation (case studies/forensic report analysis).
PRIF adoption reduced compliance resolution time by 58%.
Reported performance metric after PRIF adoption in pilot/case studies described in the paper.
Client retention was 91% for high SCI versus 54% for low SCI.
Reported retention rates stratified by SCI levels in paper (presumably derived from the sample used for SCI analysis).
The Stakeholder Communication Index (SCI) revealed a strong correlation (r = 0.83) between report quality and client retention.
Statistical analysis reported in paper linking SCI-derived report quality scores to client retention; correlation coefficient r = 0.83 provided.
Accuracy increased from 62% to 89–94% after integration of AI and blockchain.
Reported accuracy figures in results section based on PRIF evaluation (presumably from analyzed forensic reports/case studies).
Integration of AI and blockchain reduced the risk detection time from 47 days post-event to 9–22 days pre-event.
Reported results from PRIF implementation/pilot using case studies and forensic report analysis (paper cites these temporal comparisons).
This study pioneers a Proactive Risk Intelligence Framework (PRIF) for Chartered Accountant (CA) firms, targeting gaps in risk anticipation, stakeholder communication, and compliance.
Paper description of study objective and framework development (mixed-method design, interviews, case studies, forensic report analysis).