Evidence (7395 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Adoption
Remove filter
Operationalizing hardware-based governance must address transition realities including legacy hardware, attestation at scale, and protection of civil liberties.
Policy implementation analysis in the paper identifying practical challenges to deploying hardware-layer controls (conceptual/operational analysis; no empirical trial data provided).
For LLM agents, memory management critically impacts efficiency, quality, and security.
Statement in paper framing and motivation; supported conceptually by literature linking memory design to system properties (no specific experimental details provided in abstract).
Coding patterns are bimodal: in 41% of sessions, agents author virtually all committed code ("vibe coding"), while in 23%, humans write all code themselves.
Empirical analysis of authorship attribution across the 6,000 sessions in the SWE-chat dataset; percentages derived from session-level classification.
A determinism study of 10 replays per case at temperature zero shows both architectures inherit residual API-level nondeterminism, but DPM exposes one nondeterministic call while summarization exposes N compounding calls.
Determinism experiment with 10 replays per case at temperature zero; qualitative/quantitative observation about number of nondeterministic LLM calls exposed by each architecture.
Advanced prompting methods improve accuracy on inconclusive cases but over-correct, withholding decisions even on clear cases.
Empirical comparison of prompting methods reported in paper: advanced prompts increased accuracy on inconclusive (insufficient-information) cases but led to excessive deferral/withholding on clear cases.
There is significant heterogeneity in methodological rigor across studies.
Authors' thematic observation from quality appraisal/extraction noting wide variation in methods, validation approaches, and reporting standards among the 64 studies.
AI is increasingly being integrated into both existing and newly emerging digital infrastructures, altering their architecture, functional role, and strategic significance as these systems begin to operate as embedded cognitive infrastructures shaping knowledge production, decision-making, and institutional processes.
Conceptual and descriptive claim presented by the paper (theoretical analysis/literature-informed observation). No empirical sample size or quantitative methods reported in the provided text.
Hybrid ML+rules systems achieve partial DES-property fillability.
Result of the paper's analytic comparison across the four architectures identifying relative fillability levels for hybrid ML+rules systems.
Open-source versus closed-source trade-offs (including deployment architectures and competitive differentiation) are a central strategic consideration when selecting an enterprise LLM approach.
Paper's comparative analysis of open-source and closed-source alternatives and discussion of strategic implications; supported by the Bills Converter design rationale.
AI is becoming a geopolitical tool that defines trade, finance, supply chains, surveillance abilities, and diplomatic bargaining power.
Conceptual/qualitative synthesis in the paper's argument; no empirical methods or sample size reported in the abstract.
The proposed safety-filter outperforms a standalone deep reinforcement learning-based controller in energy and cost metrics, with only a slight increase in comfort temperature violations.
Reported experimental comparison between the safety-filter-enhanced controller and a standalone DRL controller in the paper; specific metrics and sample size not provided in the excerpt.
Confirmatory Factor Analysis (CFA) and Structural Equation Modeling (SEM) verified correlations among educational background, gender inclusiveness, digital literacy, and perceived algorithmic fairness.
Paper reports use of CFA and SEM to test relationships among those variables; reliability/fit supported by Composite Reliability (CR), Average Variance Extracted (AVE), and model-fit indicators.
Benefits of technology and data analytics are context-dependent, with emerging markets facing unique regulatory and infrastructural barriers.
Narrative synthesis of included studies noting heterogeneity by context and reports of regulatory/infrastructural constraints in emerging markets.
Cybersecurity has a moderating effect on audit data analytics.
Synthesis statement in the review summarizing included studies that report cybersecurity influences the effectiveness/usability of audit data analytics.
Digitization is reshaping the structures of Resource Dependence Theory (RDT) instead of eliminating it completely (Yordanova & Hristozov, 2025).
Conceptual/theoretical claim supported by citation to Yordanova & Hristozov (2025); presented as an interpretive conclusion about how digitization interacts with organizational dependence structures. No empirical details provided in the excerpt.
Outcomes are shaped not only by benchmark quality but also by competitive pressure, including user switching, routing decisions, and operational constraints.
Argument/assertion in paper framing motivations for Marketplace Evaluation; conceptual reasoning listing mechanisms (user switching, routing, operational constraints); no empirical tests or sample size reported.
Alignment operates as a two-way translation, where models are made 'safe for worlds' while those worlds are reshaped to be 'safe for models.'
Conceptual claim supported by ethnographic examples illustrating reciprocal adaptations between models and social/institutional contexts in Nairobi's credit-scoring ecosystem.
Algorithmic credit scoring is accomplished through the ongoing work of alignment that stabilizes risk under conditions of persistent uncertainty, taking epistemic, modeling, and contextual forms.
The paper's theoretical argument grounded in nine-month ethnographic observations and analysis of how practitioners and institutions engage in alignment work across epistemic, modeling, and contextual dimensions.
Practitioners negotiate model performance via technical and political means.
Observational data from the ethnography showing technical adjustments, benchmarks, and political negotiation (e.g., with regulators or management) to establish acceptable performance.
Practitioners formulate risk through multiple interpretations.
Ethnographic evidence from interviews and observations indicating that risk is characterized differently across actors (technical, legal, business interpretations).
Practitioners construct alternative data using technical and legal workarounds.
Field observations and interviews showing practitioners employing technical methods and legal strategies to create or repurpose alternative data sources for credit scoring.
Algorithmic credit scoring is being transformed by new actors, techniques, and shifting regulations.
Ethnographic fieldwork documenting the entry of new actors, novel technical techniques, and regulatory changes affecting credit scoring in Nairobi's digital lending ecosystem.
Credit scoring is an increasingly central and contested domain of data and AI governance.
Nine-month ethnography of credit scoring practices in Nairobi, Kenya; participant observation and interviews across stakeholders in digital lending.
The local labor market will follow a dual trajectory: low-skill, routine jobs face high automation risk while demand will rise for AI-collaborative, higher-skill roles.
Paper's analytical prediction based on distinguishing current job roles into routine/repetitive vs cognitive/non-routine and projecting likely impacts; no numeric forecasts or sample sizes provided in the excerpt.
Professional and Technical Services, Information, and Finance and Insurance account for approximately 86 percent of the base-case direct contribution.
Sectoral decomposition of base-case direct contribution in the model; paper explicitly reports the three sectors' combined share as ~86%.
The inverted U-shaped pattern between AI knowledge stickiness and technological concentration is more clearly detected in eastern cities and in small and medium-sized cities; in large cities the quadratic term is not statistically significant.
Heterogeneity/subsample regressions by region (east vs. other) and city size categories within the city-year panel (2014–2023); statistical significance of quadratic term differs across subsamples.
Technological complexity moderates the nonlinear (inverted U) association between AI knowledge stickiness and technological concentration by altering its strength and curvature rather than producing a simple, uniform shift in the turning point.
Interaction/heterogeneity analyses in the two-way fixed-effects city-year panel (2014–2023), examining moderating role of a technological complexity measure on the quadratic association.
There is an inverted U-shaped association between AI knowledge stickiness and technological concentration: higher stickiness up to a limit leads to more concentration and thereafter the opposite.
City-year panel combining AI patent applications with urban statistics for 2014–2023; two-way fixed-effects regression showing a significant positive linear and negative quadratic term (nonlinear association).
Subjectivity persisted in AI-powered recruitment decisions; human judgment remained an important factor.
Theme 2 (subjectivity in AI-powered recruitment) from interviews indicating retained human subjectivity and judgement in recruitment processes (n = 22).
Big data analytics (BDA) adoption is a risky strategy with potentially high rewards for start-ups.
Stated as a summary conclusion based on empirical analysis of a large sample of start-ups in Germany comparing adopters and non-adopters across multiple performance measures (survival, costs, sales, employee growth, access to financing).
Bounded agents act as an amplifying but not necessary extension to the foundation-model stack for changing work coordination.
Conceptual argument within the paper distinguishing bounded agents from the core stack; no empirical comparison or measurement reported.
The spatial spillover effects are geographically constrained and vary significantly across regions.
Reported heterogeneity in spatial Durbin model results and discussion of geographic constraint and inter-regional variation (regional heterogeneity analysis).
The effects of generative AI on work and organisations are heterogeneous and context-dependent, shaped by job roles, skill levels, and institutional environments.
Synthesis across the included studies noting variation in outcomes conditional on role, skill, and institutional context.
Overall, AI emerges as a transformative but context-dependent tool for business decision-making in Latin America.
The authors' overall interpretation and synthesis of the 27 reviewed studies highlighting variable outcomes depending on context and readiness.
Although the concurrent paradigm performs worse than the sequential paradigm in terms of immediate task performance, it is more effective in promoting users' emotional trust.
Comparison between concurrent and sequential AI-assisted decision-making paradigms in the RCT (N=120); authors report concurrent < sequential for immediate task performance, but concurrent > sequential for emotional trust.
AI adoption outcomes depend on organizational routines, data arrangements, accountability structures, and public values.
Empirical and theoretical literature review and argument in the article drawing on scholarship in digital government and public-sector technology adoption.
If employment losses are relatively small and productivity gains are realised, AI adoption could boost Exchequer revenues. But if job displacement is sizeable, tax receipts fall while welfare spending rises, resulting in potentially large pressures on the public finances.
Conditional fiscal scenarios simulated in the report combining employment, wage and benefit changes with the public finance implications (tax receipts and welfare spending); reported as scenario-based outcomes.
Ireland’s tax and welfare system absorbs most of the income loss for lower income households, and roughly half of the loss for households at the top of the income distribution.
Microsimulation using SWITCH to model taxes and transfers applied to simulated income changes across income groups; reported as a finding in the report.
India exhibits a distinctive polarisation pattern: a shrinking middle-skill workforce alongside a persistently large low-skill labour segment.
Descriptive analysis of secondary data and official reports from 2020–2024 comparing occupational and skill distributions in India.
Mathematics (SAFI: 73.2) and Programming (71.8) receive the highest automation feasibility scores; Active Listening (42.2) and Reading Comprehension (45.5) receive the lowest.
SAFI benchmark results reported for specific O*NET skills (numerical SAFI scores provided in the paper).
Only a small subset of LLM retailers can consistently achieve capital appreciation, while many hover around the break-even point.
Empirical results from the 20-agent benchmark experiments reported in the paper, contrasting capital appreciation for winners vs break-even for many agents.
Benchmarking on 20 open- and closed-source LLM agents reveals significant performance disparities and a winner-take-most phenomenon.
Empirical evaluation described in the paper using 20 LLM agents (open- and closed-source); results reported show uneven performance distribution.
Tool developers, users, and social scientists conceptualize 'context' differently, and these divergent conceptualizations reveal specific pitfalls inherent in computational approaches to context.
Analytic comparison across stakeholder perspectives derived from interviews and conceptual analysis in the paper (qualitative evidence; sample size unspecified).
AI adoption significantly reshaped task profiles for 73% of respondents, particularly affecting routine data processing, administrative tasks, and scheduling activities.
Survey data and secondary data analysis reported in this study (sample size not stated); self-reported change in task profiles with reported percentage (73%).
There is a robust inverted U-shaped relationship between robotics manufacturing development and urban carbon emissions.
Panel data analysis using 277 Chinese prefecture-level cities from 2008 to 2019; econometric analysis reported in the paper finds an inverted U-shaped association and robustness checks are claimed.
AI adoption across firms is heterogeneous, varying across sectors such as finance, technology, and manufacturing.
Survey of 150 leading Nigerian firms across finance, tech, and manufacturing showing variation in AI integration; supported by qualitative interviews and policy analysis.
The rapid, heterogeneous integration of Artificial Intelligence (AI) technologies is profoundly reshaping the dynamics of work across the Nigerian business sector, generating both significant economic opportunities and acute labor market challenges.
Mixed-methods study combining a quantitative survey of 150 leading Nigerian firms across finance, tech, and manufacturing and qualitative analysis of government policy and workforce interviews.
Both rapid model improvement and benchmark quality issues contributed to underestimating agent capabilities.
Synthesis of results: improved LLM performance plus audit findings showing benchmark errors together explain the prior underestimation; based on the re-evaluation and audit described in the paper.
Models performed well on commonly discussed topics but struggled with specialized health data.
Task-level performance comparison across topics in the elicited population statistics: better accuracy on commonly discussed topics, poorer performance on specialized health data tasks.
In a preliminary experiment, giving models web search access degraded predictions for already-accurate models, while modestly improving predictions for weaker ones.
A preliminary comparative test where some models were given web search access and changes in predictive performance were observed: degradation for already-accurate models and modest improvement for weaker models.