Evidence (11633 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Digitization is reshaping the structures of Resource Dependence Theory (RDT) instead of eliminating it completely (Yordanova & Hristozov, 2025).
Conceptual/theoretical claim supported by citation to Yordanova & Hristozov (2025); presented as an interpretive conclusion about how digitization interacts with organizational dependence structures. No empirical details provided in the excerpt.
CLARITI matches GPT-5's resolution rate on underspecified issues while generating 41% fewer questions.
Empirical evaluation comparing CLARITI and GPT-5 on a task set of underspecified software engineering issues; the result reported in the abstract indicates parity in resolution rate and a quantified reduction in questions (41%) but the abstract does not report sample size, test set composition, or statistical significance.
They can produce fluent outputs that resemble reflection, but lack temporal continuity, causal feedback, and anchoring in real-world interaction.
Descriptive claim made in the text contrasting surface-level fluency with missing properties; no empirical data or experiments provided.
A within-subject human study with 20 players and 600 games shows that our interventions significantly improve performance for low- and mid-skill players while matching expert-engine interventions for high-skill players.
Within-subject human experiment reported in the paper: N = 20 players, 600 games total; comparisons of performance under the proposed interventions versus expert-engine interventions.
This work establishes a foundation for understanding how generative AI systems not only augment cognitive performance but also reshape self-perception and perceived expertise.
Paper's stated contribution presenting theory and conceptual groundwork; no empirical validation provided in the abstract.
The LLM fallacy has implications for education, hiring, and AI literacy.
Implications and argumentation presented in the paper; these are prospective and conceptual rather than supported by empirical data in the abstract.
Further research is needed to explore the longitudinal impact of these AI deployments on local labor markets and the creation of indigenous datasets that reflect Cameroon’s unique linguistic diversity.
Authors' identified research gaps and recommendations; statement of future research needs rather than empirical result.
The analysis reveals a non-linear, U-shaped relationship between changes in frontier skill intensity and employment growth.
Statistical linkage of changes in frontier skill intensity (OTSS changes) to employment growth using administrative data from 2012–2023; reported functional form is U-shaped.
Frontier technologies remain concentrated in specialised occupations, while digital technologies are widespread.
Distributional analysis of OTSS across occupations showing concentration patterns of frontier technologies versus ubiquity of digital technologies.
For the average worker in 2023, manual technologies account for the largest share of skill content (42 per cent), followed by digital (38 per cent) and frontier technologies (20 per cent).
Computed OTSS applied to occupation-level data for Germany in 2023; reported shares for the "average worker".
Removing safety layers made the system less useful: structured validation feedback guided the model to correct outcomes in fewer turns, while the unconstrained system hallucinated success.
Qualitative and quantitative comparisons from the deployed evaluation across the three conditions (observations about turn counts, validation-feedback loops, and model hallucinations in unconstrained condition over the 25 scenario trials).
The results show how non-IID data, competition intensity, and incentives shape organizational strategies and social welfare.
Findings from the paper's experiments and analyses that vary non-IIDness, competition intensity, and incentive parameters; no numeric sample sizes provided in abstract.
Outcomes are shaped not only by benchmark quality but also by competitive pressure, including user switching, routing decisions, and operational constraints.
Argument/assertion in paper framing motivations for Marketplace Evaluation; conceptual reasoning listing mechanisms (user switching, routing, operational constraints); no empirical tests or sample size reported.
AI plays a dual role as enhancer and eroder, simultaneously strengthening performance while eroding underlying expertise (the 'AI-as-Amplifier Paradox').
Framing claim presented in the paper's conceptual argument and grounded by the paper's stated year-long empirical study among cancer specialists (no numerical sample size reported in abstract).
Cross-border citations show continued technological interdependence rather than decoupling, with Chinese AI inventors relying more heavily on U.S. frontier knowledge than vice versa.
Citation analysis of cross-border patent citations between Chinese and U.S. AI patents (paper reports asymmetry in reliance based on citation patterns).
The organization of AI innovation differs sharply: U.S. AI patenting is concentrated among large private incumbents and established hubs, whereas Chinese AI patenting is more geographically diffuse and institutionally diverse, with larger roles for universities and state-owned enterprises.
Analysis of assignee types, geographic dispersion, and institutional composition of AI patents in the two countries (concentration metrics and assignee categorizations described in paper).
Across all settings, AI Organizations composed of aligned models produce solutions with higher utility but greater misalignment compared to a single aligned model.
Reported experimental results aggregated across two practical settings (AI consultancy and AI software team) and 12 tasks; direct comparison between AI Organizations of aligned models and a single aligned model.
Multi-agent "AI organizations" are simultaneously more effective at achieving business goals, but less aligned, than individual AI agents.
Experimental comparison reported in the paper: experiments comparing multi-agent AI organizations to single aligned agents across tasks and settings (described below).
Alignment operates as a two-way translation, where models are made 'safe for worlds' while those worlds are reshaped to be 'safe for models.'
Conceptual claim supported by ethnographic examples illustrating reciprocal adaptations between models and social/institutional contexts in Nairobi's credit-scoring ecosystem.
Algorithmic credit scoring is accomplished through the ongoing work of alignment that stabilizes risk under conditions of persistent uncertainty, taking epistemic, modeling, and contextual forms.
The paper's theoretical argument grounded in nine-month ethnographic observations and analysis of how practitioners and institutions engage in alignment work across epistemic, modeling, and contextual dimensions.
Practitioners negotiate model performance via technical and political means.
Observational data from the ethnography showing technical adjustments, benchmarks, and political negotiation (e.g., with regulators or management) to establish acceptable performance.
Practitioners formulate risk through multiple interpretations.
Ethnographic evidence from interviews and observations indicating that risk is characterized differently across actors (technical, legal, business interpretations).
Practitioners construct alternative data using technical and legal workarounds.
Field observations and interviews showing practitioners employing technical methods and legal strategies to create or repurpose alternative data sources for credit scoring.
Algorithmic credit scoring is being transformed by new actors, techniques, and shifting regulations.
Ethnographic fieldwork documenting the entry of new actors, novel technical techniques, and regulatory changes affecting credit scoring in Nairobi's digital lending ecosystem.
Credit scoring is an increasingly central and contested domain of data and AI governance.
Nine-month ethnography of credit scoring practices in Nairobi, Kenya; participant observation and interviews across stakeholders in digital lending.
Although some frontier models exceed human performance, model accuracy is still far below what would enable reliable experimental guidance.
Paper reports instances where top-performing (frontier) models outperform aggregate human expert accuracy on SciPredict, but concludes overall accuracies are insufficient for reliable experimental guidance.
The local labor market will follow a dual trajectory: low-skill, routine jobs face high automation risk while demand will rise for AI-collaborative, higher-skill roles.
Paper's analytical prediction based on distinguishing current job roles into routine/repetitive vs cognitive/non-routine and projecting likely impacts; no numeric forecasts or sample sizes provided in the excerpt.
Professional and Technical Services, Information, and Finance and Insurance account for approximately 86 percent of the base-case direct contribution.
Sectoral decomposition of base-case direct contribution in the model; paper explicitly reports the three sectors' combined share as ~86%.
The inverted U-shaped pattern between AI knowledge stickiness and technological concentration is more clearly detected in eastern cities and in small and medium-sized cities; in large cities the quadratic term is not statistically significant.
Heterogeneity/subsample regressions by region (east vs. other) and city size categories within the city-year panel (2014–2023); statistical significance of quadratic term differs across subsamples.
Technological complexity moderates the nonlinear (inverted U) association between AI knowledge stickiness and technological concentration by altering its strength and curvature rather than producing a simple, uniform shift in the turning point.
Interaction/heterogeneity analyses in the two-way fixed-effects city-year panel (2014–2023), examining moderating role of a technological complexity measure on the quadratic association.
There is an inverted U-shaped association between AI knowledge stickiness and technological concentration: higher stickiness up to a limit leads to more concentration and thereafter the opposite.
City-year panel combining AI patent applications with urban statistics for 2014–2023; two-way fixed-effects regression showing a significant positive linear and negative quadratic term (nonlinear association).
Subjectivity persisted in AI-powered recruitment decisions; human judgment remained an important factor.
Theme 2 (subjectivity in AI-powered recruitment) from interviews indicating retained human subjectivity and judgement in recruitment processes (n = 22).
Experiments on the MovieLens-100k dataset illustrate when the empirical payout aligns with — and diverges from — Shapley fairness across different settings and algorithms.
Empirical evaluation performed on the MovieLens-100k dataset (≈100,000 ratings) comparing the proposed payout rule and algorithmic outcomes to Shapley-value allocations across multiple experimental settings and algorithms.
For heterogeneous agents the cooperative game still admits a non-empty core, though convexity and Shapley value core-membership are no longer guaranteed.
Theoretical analysis for heterogeneous-agent case provided in the paper: establishes core non-emptiness but shows convexity and Shapley-in-core do not generally hold.
User interactions in online recommendation platforms create interdependencies among content creators: feedback on one creator's content influences the system's learning and, in turn, the exposure of other creators' contents.
Conceptual/empirical motivation stated in the paper; motivates the multi-agent bandit modeling of creator interactions in recommender systems.
Sensitivity analyses indicate the observed positive belief changes likely reflect recovery from carry-over effects rather than genuine training-induced shifts.
Authors' sensitivity analyses discussed in the paper that examined alternative explanations (e.g., carry-over effects) and concluded the belief-change result is likely due to recovery from such effects.
Simulations demonstrate that standard methods, such as principal components analysis and inverse covariance weighting, can generate spurious cross-study differences, whereas our approach recovers comparable latent treatment effects.
Simulation experiments reported in the paper comparing the proposed method to PCA and inverse covariance weighting; results show PCA and inverse-covariance-weighted estimators can produce spurious cross-study differences while the proposed method recovers comparable latent treatment effects (no simulation sample sizes provided in the abstract).
We ran two large preregistered experiments (N=17,950 responses from 14,779 people) using conversational AI models to persuade participants on a range of attitudinal and behavioural outcomes, including signing real petitions and donating money to charity.
Statement in paper reporting two preregistered experiments, sample sizes (17,950 responses; 14,779 people), use of conversational AI models, and target outcomes including petition signing and charitable donations.
Big data analytics (BDA) adoption is a risky strategy with potentially high rewards for start-ups.
Stated as a summary conclusion based on empirical analysis of a large sample of start-ups in Germany comparing adopters and non-adopters across multiple performance measures (survival, costs, sales, employee growth, access to financing).
While AI may reduce certain traditional roles, it also enhances job quality and creates new career pathways within the commerce sector.
Reported finding from the paper's synthesis of existing studies and sectoral observations (qualitative literature synthesis).
AI exhibits a dual nature—both as a disruptor and an enabler of employment in the commerce sector.
Paper-level synthesis of contradictory findings and sectoral patterns reported across reviewed literature (qualitative literature synthesis).
Bounded agents act as an amplifying but not necessary extension to the foundation-model stack for changing work coordination.
Conceptual argument within the paper distinguishing bounded agents from the core stack; no empirical comparison or measurement reported.
The spatial spillover effects are geographically constrained and vary significantly across regions.
Reported heterogeneity in spatial Durbin model results and discussion of geographic constraint and inter-regional variation (regional heterogeneity analysis).
The effects of generative AI on work and organisations are heterogeneous and context-dependent, shaped by job roles, skill levels, and institutional environments.
Synthesis across the included studies noting variation in outcomes conditional on role, skill, and institutional context.
Overall, AI emerges as a transformative but context-dependent tool for business decision-making in Latin America.
The authors' overall interpretation and synthesis of the 27 reviewed studies highlighting variable outcomes depending on context and readiness.
The positive effect of big data applications on firms' markups exhibits heterogeneity across organizational, technological, and environmental dimensions.
Paper reports heterogeneity analysis showing variation in the magnitude of the positive markup effect across organizational, technological and environmental factors; based on model implications and empirical subgroup/interaction tests using micro-level firm data (sample size not reported).
Although the concurrent paradigm performs worse than the sequential paradigm in terms of immediate task performance, it is more effective in promoting users' emotional trust.
Comparison between concurrent and sequential AI-assisted decision-making paradigms in the RCT (N=120); authors report concurrent < sequential for immediate task performance, but concurrent > sequential for emotional trust.
AI adoption outcomes depend on organizational routines, data arrangements, accountability structures, and public values.
Empirical and theoretical literature review and argument in the article drawing on scholarship in digital government and public-sector technology adoption.
If employment losses are relatively small and productivity gains are realised, AI adoption could boost Exchequer revenues. But if job displacement is sizeable, tax receipts fall while welfare spending rises, resulting in potentially large pressures on the public finances.
Conditional fiscal scenarios simulated in the report combining employment, wage and benefit changes with the public finance implications (tax receipts and welfare spending); reported as scenario-based outcomes.
Ireland’s tax and welfare system absorbs most of the income loss for lower income households, and roughly half of the loss for households at the top of the income distribution.
Microsimulation using SWITCH to model taxes and transfers applied to simulated income changes across income groups; reported as a finding in the report.