Evidence (13827 claims)
Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 195 | 97 | 889 | 1979 |
| Governance & Regulation | 815 | 391 | 188 | 121 | 1539 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 624 | 233 | 123 | 96 | 1084 |
| Research Productivity | 410 | 121 | 56 | 331 | 929 |
| Output Quality | 466 | 177 | 59 | 47 | 749 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 166 | 122 | 24 | 495 |
| Task Allocation | 206 | 64 | 70 | 31 | 376 |
| Skill Acquisition | 165 | 57 | 60 | 17 | 299 |
| Innovation Output | 201 | 27 | 41 | 18 | 288 |
| Employment Level | 105 | 51 | 107 | 13 | 278 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 149 | 46 | 26 | 3 | 224 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 61 | 20 | 12 | 182 |
| Error Rate | 69 | 91 | 10 | 2 | 172 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 92 | 19 | 13 | 19 | 145 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Skill Obsolescence | 5 | 45 | 6 | 1 | 57 |
| Creative Output | 31 | 16 | 7 | 2 | 57 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Post-conflict reconstruction relies heavily on private enterprises to bring back employment, rebuild supply networks, and reconnect damaged economies.
Statement grounded in literature cited in the review (paper positions this as a general premise from post-conflict reconstruction literature); no primary data reported.
A causal ablation confirms that each of the four mechanical enforcement primitives is individually necessary.
Causal ablation experiments reported by authors in the synthetic banking domain: removing each primitive degrades performance/governance, implying individual necessity. Abstract does not report exact experimental counts or effect sizes.
Mechanical enforcement raises task accuracy from MCC ~0.43 to 0.88.
Reported Matthews correlation coefficient (MCC) for task accuracy under text-only governance (≈0.43) versus mechanical enforcement (≈0.88) in the paper's synthetic experiments; sample size not provided in abstract.
Mechanical enforcement more than doubles deferral information content.
Comparison of information-content measures for deferrals between text-only governance and mechanical enforcement in the synthetic banking domain experiments; exact numeric basis not given in abstract.
Mechanical enforcement reduces the rate of deferrals that carry no decision-relevant information by 73%.
Head-to-head comparison between text-only governance and a mechanically enforced architecture (four primitives) in the paper's synthetic banking experiments; specific sample size not stated in abstract.
These results challenge the presumed universality of the fairness-accuracy tradeoff and demonstrate that well-designed modeling improvements can advance both fairness and accuracy in large-scale public sector systems.
Synthesis of the three complementary analyses (observational county-level correlations, simulation experiments with added property features, and simulations incorporating Census data) performed on the 26 million-sale dataset covering ~95% of U.S. counties.
Incorporating publicly available Census data into assessment models - a feasible reform in most counties - would significantly improve both accuracy and fairness relative to status quo assessments.
Simulated reforms adding publicly available Census covariates to assessment models and comparing resulting accuracy and fairness metrics to status-quo assessments across the dataset covering 26 million sales/95% of counties.
When accuracy improves in the simulated assessment models, fairness almost always improves as well.
Analysis of simulated model outcomes showing joint changes in accuracy and fairness metrics across many simulated configurations and counties; reported near-universal co-improvement when accuracy rises.
In simulated assessment models, adding property features improves accuracy in most cases.
Simulation experiments using alternative assessment models that include additional property-level features; comparisons between baseline and feature-augmented simulated models across many counties/cases.
Assessment accuracy and fairness - measured using domain-relevant metrics - are strongly correlated across counties under status quo practices.
Observational analysis of status-quo assessment outcomes using a dataset of 26 million property sales spanning ~95% of U.S. counties; county-level correlation analysis between domain-relevant accuracy metrics and fairness metrics.
The research contributes to the literature on technology adoption in developing economies and offers policymakers and business leaders in sub-Saharan Africa valuable insights.
Paper's stated contribution in the abstract; a general claim about the study's scholarly and policy relevance rather than a quantifiable empirical result.
Targeted policy interventions — such as upskilling initiatives and supportive regulatory frameworks — are important to harness AI’s benefits while mitigating adverse impacts on workers.
Paper conclusion/recommendation drawn from empirical findings (positive association of AI with productivity and sales, plus observed cross-country variation). This is presented as a policy implication; no empirical evaluation of specific policies is reported in the excerpt.
AI adoption has a significant positive relationship with firm sales growth in the selected sub-Saharan African countries.
Same firm-level World Bank Enterprise Surveys (2007–2024) and regression methods (FGLS, robust OLS, HDFE) as above. Paper statement: "AI has a significant positive relationship with ... sales growth." Exact sample size and numeric effect not provided in excerpt.
AI adoption has a significant positive relationship with firm labour productivity in the selected sub-Saharan African countries.
Firm-level dataset from the World Bank Enterprise Surveys covering 2007–2024; empirical analysis using feasible generalized least squares (FGLS), robust OLS, and high-dimensional fixed effects (HDFE) linear regressions. Paper statement: "AI has a significant positive relationship with firm labour productivity." Exact firm sample size not reported in the provided excerpt.
Policy should prioritize employment‑centered digital strategies that are spatially differentiated and institutionally grounded to mitigate negative labor and development effects.
Normative policy recommendation arising from the paper's theoretical framework and regional field observations (policy prescription; not an empirically estimated intervention in the paper).
There is a positive spillover effect on AI-ineligible chats: treated workers adapted their multitasking workflow to devote greater attention to these chats.
Experiment-level observations comparing worker behavior on AI-ineligible chats between treatment and control; treated workers reallocated attention/effort (multitasking workflow changes) leading to improved attention on AI-ineligible chats.
Early intervention is essential for sustaining high post-escalation intervention effort.
Temporal analysis of intervention timing within the randomized experiment showing an association between earlier human intervention after escalation and higher subsequent intervention effort.
Human intervention preserves service quality in algorithm-triggered technical escalations (unresolved customer issues beyond the AI's capability).
Experimental subgroup analysis of escalations categorized as algorithm-triggered technical escalations; post-escalation human interventions were observed to maintain service quality in these cases.
PRIF yielded an average ROI of 83%.
Reported financial evaluation/ROI estimate following PRIF adoption in the paper (derived from pilot/case study cost-benefit or sample analysis).
PRIF adoption reduced financial misstatements by 47%.
Reported change in financial misstatement incidence after PRIF implementation in the paper's evaluation (case studies/forensic report analysis).
PRIF adoption reduced compliance resolution time by 58%.
Reported performance metric after PRIF adoption in pilot/case studies described in the paper.
Client retention was 91% for high SCI versus 54% for low SCI.
Reported retention rates stratified by SCI levels in paper (presumably derived from the sample used for SCI analysis).
The Stakeholder Communication Index (SCI) revealed a strong correlation (r = 0.83) between report quality and client retention.
Statistical analysis reported in paper linking SCI-derived report quality scores to client retention; correlation coefficient r = 0.83 provided.
Accuracy increased from 62% to 89–94% after integration of AI and blockchain.
Reported accuracy figures in results section based on PRIF evaluation (presumably from analyzed forensic reports/case studies).
Integration of AI and blockchain reduced the risk detection time from 47 days post-event to 9–22 days pre-event.
Reported results from PRIF implementation/pilot using case studies and forensic report analysis (paper cites these temporal comparisons).
This study pioneers a Proactive Risk Intelligence Framework (PRIF) for Chartered Accountant (CA) firms, targeting gaps in risk anticipation, stakeholder communication, and compliance.
Paper description of study objective and framework development (mixed-method design, interviews, case studies, forensic report analysis).
By reframing reskilling as a shared, supported, and bounded process, AI-driven change can foster long-term career resilience, professional identity renewal, and sustainable human–AI integration.
Conceptual conclusion/implication drawn by the authors from the proposed model and recommendations; no empirical validation included in the paper.
The paper advances a set of sustainable, collective strategies—such as role-linked learning, protected learning time, skill prioritization, and phased AI adoption—to interrupt the reskilling loop and redistribute adaptive demands across organizations.
Prescriptive/theoretical recommendations proposed by the authors; no empirical evaluation or trial evidence presented.
The paper proposes a reconstructed labour law framework based on economic dependency rather than traditional employment classification, including recognition of dependent contractor status, platform liability for worker welfare, algorithmic transparency, social security obligations, and specialised grievance mechanisms.
Normative legal/policy proposal articulated by the author(s) based on theoretical argument and the comparative analysis of existing regulatory gaps; prescriptive recommendation rather than empirically tested intervention.
Because the method sits architecturally below the current safety stack, the same formula provides a real-time warning signal that current alignment does not supply, portable across current and future ChatGPT-like AI architectures and instantiable in application domains where competing response classes can be defined.
Theoretical/architectural claim in paper, supported by cross-architecture empirical tests and theoretical argument (no further quantitative sample size provided in excerpt).
The authors made an a priori time-stamped prediction eleven months before the Stanford 'Delusional Spirals' corpus appeared, and that prediction was independently confirmed by the corpus of 207,443 human–AI exchanges.
Time-stamped prediction reported in paper; independent confirmation claimed via the Stanford 'Delusional Spirals' corpus containing 207,443 human–AI exchanges.
The shift phenomenon and forecasting persist at production scale across ten frontier chatbots.
Empirical observation reported in paper: tests across ten production/frontier chatbots.
The method achieved 90 percent correct forecasting across seven AI models spanning two orders of magnitude in parameter count (124M–12B).
Empirical test reported in paper: seven AI models evaluated for forecasting accuracy; model parameter counts reported as 124M–12B.
The shift-condition approach is validated across six independent tests.
Paper statement listing six independent validation tests (method: multiple independent experiments/tests).
The shift condition is neither model-specific nor driven by stochastic sampling.
Claim supported by cross-model empirical tests reported in the paper (tests spanning multiple model sizes and production chatbots).
The shift condition is derivable mathematically and results from group-level competition between the conversation-so-far (C) and the desirable (B) and undesirable (D) basin dynamics, which can be estimated in advance for a given application.
Paper claims an explicit mathematical derivation (theoretical/mathematical methods reported).
A vector generalization of fusion–fission group dynamics (observed in living and active-matter systems) drives — and can forecast — future shifts in an AI's behavior.
Theoretical proposal plus empirical validation reported in paper (validated across six independent tests as stated).
The appropriate design response to Metis tasks is centaur architectures in which humans lead and AI supports, rather than pursuing further automation.
Prescriptive recommendation based on the conceptual analysis and normative reasoning in the paper; not supported by empirical evaluation or quantified comparisons of architectures.
Policy conclusion: while palliative care is an ethical imperative, its expansion must be decoupled from the oncological paradigm and matched with state-funded long-term care to protect against clinical decline and financial shocks.
Normative recommendation based on the empirical distributional findings (average protective effects but harmful tails for vulnerable groups) and cross-national differences reported in the analysis.
We introduce a Synthetic Data Generation framework using Tabular Denoising Diffusion Probabilistic Models within a Two-Learner architecture to synthesize high-fidelity digital twins from pan-European SHARE data (2016-2021).
Methodological contribution described in the paper; implementation details include use of diffusion-based tabular generative models and a Two-Learner architecture applied to SHARE microdata from 2016–2021.
On average, palliative care (PC) acts as a 'double shield', truncating out-of-pocket expenditures (financial toxicity) and informal caregiving shadow values (time poverty).
Analysis of pan-European SHARE data (2016-2021) using a Synthetic Data Generation framework (Tabular Denoising Diffusion Probabilistic Models within a Two-Learner architecture) to create digital twins and estimate treatment effects.
The study highlights the importance of reskilling and education reforms to ensure inclusive labor market outcomes in the era of AI-driven transformation.
Authors' policy recommendation based on their empirical findings from the survey (n=320) and SEM analysis; presented as a conclusion/recommendation rather than a quantified empirical result.
The model explained 49% of variance in wage dynamics (R^2 = 0.49).
SEM model statistics reported for the survey-based model (n=320); R-squared for wage dynamics = 49%.
The model explained 45% of variance in skill transformation (R^2 = 0.45).
SEM model statistics reported for the survey-based model (n=320); R-squared for skill transformation = 45%.
The model explained 52% of variance in employment patterns (R^2 = 0.52).
SEM model fit/variance-explained statistics reported for the survey-based model (n=320); R-squared for employment patterns = 52%.
Mediation analysis confirmed that skill transformation plays a significant mediating role linking AI adoption with wage distribution/outcomes.
Mediation analysis within the SEM framework applied to the survey data (n=320); authors report a significant mediation effect (no numeric indirect effect reported in the summary).
Mediation analysis confirmed that skill transformation plays a significant mediating role linking AI adoption with employment outcomes.
Mediation analysis within the SEM framework applied to the survey data (n=320); authors report a significant mediation effect (no numeric indirect effect reported in the summary).
Skill transformation significantly affected wage dynamics (β = 0.55, p < 0.001).
Structural equation modeling (SEM) on the same sample (n=320); reported standardized path coefficient β = 0.55 with p < 0.001.
Skill transformation significantly affected employment patterns (β = 0.58, p < 0.001).
Structural equation modeling (SEM) mediation/causal-path analysis on the survey (n=320); reported standardized path coefficient β = 0.58 with p < 0.001.
AI adoption significantly influenced wage dynamics (β = 0.61, p < 0.001).
Structural equation modeling (SEM) on the same survey sample (n=320); reported standardized path coefficient β = 0.61 with p < 0.001.