Evidence (11633 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
There are important regional differences—especially in developing contexts—that necessitate context-specific approaches to improving women’s participation in AI-enabled work.
Observation reported in the review drawing on geographically diverse studies and policy analyses; the abstract does not quantify differences or report sample sizes for cross-region comparisons.
Social, cultural, and ethical considerations influence women’s engagement in AI-centric workplaces.
Claim made in the review, based on interdisciplinary literature that includes sociocultural analyses and ethical discussions; the abstract does not provide empirical effect estimates or sample sizes.
AI applications—ranging from recruitment algorithms to workplace automation—can either reinforce gender disparities or promote equitable employment outcomes.
Stated in the review based on collated findings from multiple studies and analyses that document both harms (e.g., biased recruitment algorithms) and potential benefits (e.g., tools designed to reduce bias); no single empirical study or pooled effect size provided in the abstract.
Artificial Intelligence (AI) is rapidly transforming workplaces across the globe, offering both novel opportunities and unique challenges for women in technology-driven industries.
Stated in the paper's introduction/abstract as a summary conclusion based on a narrative literature review of peer-reviewed studies, policy analyses, and preprint research; no specific sample size or primary empirical method reported in the abstract.
The study proposes a sectoral risk classification to better understand vulnerability patterns and workforce implications.
Paper reports development/proposal of a sectoral risk classification as a contribution (the classification itself and validation details are not described in the abstract).
The rapid integration of Artificial Intelligence (AI) across industries is fundamentally reshaping occupational structures and redefining employment dynamics.
Stated as an overall conclusion of the paper based on a systematic review of recent literature from major academic databases (details of included studies not provided in the abstract).
These efficiency gains are offset by a growing 'Efficiency-Legitimacy Paradox' (i.e., improvements in efficiency come with worsening legitimacy concerns).
Conceptual synthesis from the systematic review (2018-2026) identifying a recurring trade-off across reviewed studies; specific empirical quantification not provided in abstract.
There is a structural shift from 'street level' bureaucracies to 'system-level' architectures that can be defined as the institutional division of 'Artificial Discretion' to algorithmic infrastructures.
Synthesis from the PRISMA-guided systematic review of literature (2018-2026) reporting observed changes in administrative architectures; specific studies not enumerated in abstract.
As a General-Purpose Technology (GPT), Artificial Intelligence (AI) is fundamentally reconfiguring state capacity, as well as the mechanics of global economic management.
Systematic review of current research studies (2018-2026) conducted following PRISMA guidelines; synthesis of literature claiming broad institutional and macroeconomic effects. Number of studies not specified in abstract.
Agentic AI differs from traditional algorithmic trading and generative AI through its capacity for goal-oriented autonomy, continuous learning, and multi-agent coordination.
Analytic comparison and synthesis across prior research and technical architectures in the survey; descriptive/definitional rather than empirical testing.
Uncertainty-aware exploration (in algorithms) alters fairness metrics compared to policies that ignore uncertainty.
Results from simulation experiments compare uncertainty-aware exploration policies to baseline policies and report changes in fairness metrics (as described in the abstract and results).
Analysis of more than two decades of M&A deals reveals shifts in acquisition activity and allows mapping of corporate linkages and overlapping investments.
Empirical longitudinal analysis of M&A deals over a period exceeding 20 years; method: mapping corporate linkages from M&A data (sample size/dataset not specified in the excerpt).
The emissions effects of digital trade are conditional rather than uniform, depending on complementary policy (carbon pricing, regulatory stringency), technological (AI-enhanced logistics), and energy (renewables) factors.
Synthesis of findings from fixed-effects regressions with interactions, carbon-pricing threshold analysis, machine-learning threshold detection, and SEM mediation on the monthly panel of 38 OECD economies (2000–2024).
Operationalizing hardware-based governance must address transition realities including legacy hardware, attestation at scale, and protection of civil liberties.
Policy implementation analysis in the paper identifying practical challenges to deploying hardware-layer controls (conceptual/operational analysis; no empirical trial data provided).
For LLM agents, memory management critically impacts efficiency, quality, and security.
Statement in paper framing and motivation; supported conceptually by literature linking memory design to system properties (no specific experimental details provided in abstract).
The experimental findings are consistent with the paper's theoretical predictions.
Comparison reported in the paper between theoretical model predictions and observed outcomes from the controlled AI-agent trading experiments.
Coding patterns are bimodal: in 41% of sessions, agents author virtually all committed code ("vibe coding"), while in 23%, humans write all code themselves.
Empirical analysis of authorship attribution across the 6,000 sessions in the SWE-chat dataset; percentages derived from session-level classification.
A determinism study of 10 replays per case at temperature zero shows both architectures inherit residual API-level nondeterminism, but DPM exposes one nondeterministic call while summarization exposes N compounding calls.
Determinism experiment with 10 replays per case at temperature zero; qualitative/quantitative observation about number of nondeterministic LLM calls exposed by each architecture.
Advanced prompting methods improve accuracy on inconclusive cases but over-correct, withholding decisions even on clear cases.
Empirical comparison of prompting methods reported in paper: advanced prompts increased accuracy on inconclusive (insufficient-information) cases but led to excessive deferral/withholding on clear cases.
Multi-agent workflows and benchmark evaluation reveal current capabilities, limitations, and research frontiers in agentic AI for physical design.
The paper states it analyzes recent experience with multi-agent workflows and benchmark evaluation; the abstract does not provide specific benchmark names, metrics, or sample sizes.
Effective AI policy mixes are contingent on regional resource endowments and development conditions (i.e., variation across configurations indicates contingency on regional context).
Observed variation across the fsQCA-derived configurations; authors interpret differences as reflecting dependence on regional resources and development conditions.
The study was a preregistered experiment across seven leading LLMs and twelve investment scenarios covering legitimate, high-risk, and objectively fraudulent opportunities.
Methodological description in the paper stating preregistration, 7 LLMs, 12 scenarios; combined dataset included 3,360 AI advisory conversations and a 1,201-participant human benchmark.
There is significant heterogeneity in methodological rigor across studies.
Authors' thematic observation from quality appraisal/extraction noting wide variation in methods, validation approaches, and reporting standards among the 64 studies.
AI is increasingly being integrated into both existing and newly emerging digital infrastructures, altering their architecture, functional role, and strategic significance as these systems begin to operate as embedded cognitive infrastructures shaping knowledge production, decision-making, and institutional processes.
Conceptual and descriptive claim presented by the paper (theoretical analysis/literature-informed observation). No empirical sample size or quantitative methods reported in the provided text.
Hybrid ML+rules systems achieve partial DES-property fillability.
Result of the paper's analytic comparison across the four architectures identifying relative fillability levels for hybrid ML+rules systems.
Artificial intelligence raises the threshold at which refinement adds value.
Theoretical/analytical statement in the paper describing AI's effect on the marginal value of refinement; no empirical quantification provided in the excerpt.
Open-source versus closed-source trade-offs (including deployment architectures and competitive differentiation) are a central strategic consideration when selecting an enterprise LLM approach.
Paper's comparative analysis of open-source and closed-source alternatives and discussion of strategic implications; supported by the Bills Converter design rationale.
AI is associated with a shift toward younger, relatively less educated workers.
Reported association in the paper's baseline empirical results linking AI presence/pervasiveness to changes in workforce composition (age and education).
AI is becoming a geopolitical tool that defines trade, finance, supply chains, surveillance abilities, and diplomatic bargaining power.
Conceptual/qualitative synthesis in the paper's argument; no empirical methods or sample size reported in the abstract.
Variable importance improvements to zero-shot tabular classification produce mixed results with respect to algorithmic fairness.
Authors report experiments applying variable-importance-based adjustments to zero-shot LLM tabular classification and evaluating resulting algorithmic fairness outcomes; described as producing mixed results. (Sample size not provided in abstract.)
Targeted prompt interventions significantly alter the magnitude of market bubbles (they can amplify or suppress bubble size).
Randomized (or otherwise experimentally manipulated) prompt interventions applied to LLM agents in the simulated open-call auction, with resulting differences in measured bubble magnitude reported.
By analyzing agents' reasoning text through a twenty-mechanism scoring framework, targeted prompt interventions causally amplify or suppress specific behavioral mechanisms.
Qualitative and quantitative analysis of agents' chain-of-thought / reasoning text using a 20-mechanism scoring framework; experimental manipulations of prompts reported to change mechanism scores (interpreted causally as interventions on prompts).
Given the results, educators should revisit pair programming as an educational tool in addition to embracing modern AI.
Authors' recommendation in the paper's conclusion based on experimental findings (performance, workload, emotion, retention outcomes).
Both US and Chinese strategies depend on cross-country relationships in AI innovation.
Conceptual assertion motivating the network analysis of international collaborations and citations.
Formal network verification has made substantial progress in proving correctness properties but is typically applied in offline, pre-deployment settings and faces challenges in accommodating continuous changes and validating live production behavior.
Authors' summary of the state of the art in network verification (assertion in paper; no empirical data in abstract).
Overall, the proposed HRL framework improves learning efficiency and scalability, outperforming heuristic baselines while remaining below the perfect-information oracle bound.
Results reported in the paper from simulation experiments comparing the HRL framework to heuristic baselines and the oracle; pairwise differences analyzed (Wilcoxon tests referenced). The paper asserts better performance than heuristics but still worse than the oracle.
The proposed safety-filter outperforms a standalone deep reinforcement learning-based controller in energy and cost metrics, with only a slight increase in comfort temperature violations.
Reported experimental comparison between the safety-filter-enhanced controller and a standalone DRL controller in the paper; specific metrics and sample size not provided in the excerpt.
Results also reveal divergences between the two interaction scenario types.
Abstract statement that divergences vary across different interaction contexts / scenario types.
Results reveal divergences between purely simulated and human study datasets.
Abstract reports that findings diverge between simulation experiments and the human-subjects dataset; comparisons drawn across the two datasets (simulation N=2000, human N=290).
Confirmatory Factor Analysis (CFA) and Structural Equation Modeling (SEM) verified correlations among educational background, gender inclusiveness, digital literacy, and perceived algorithmic fairness.
Paper reports use of CFA and SEM to test relationships among those variables; reliability/fit supported by Composite Reliability (CR), Average Variance Extracted (AVE), and model-fit indicators.
Experienced developers maintain control through detailed delegation while novices struggle between over-reliance and cautious avoidance.
Observed behaviors and accounts from the AI-assisted debugging task (10 juniors) and senior participants in ACTA/Delphi and blind review phases (5 + 5 seniors).
AI is not just changing how engineers code—it is reshaping who holds agency across work and professional growth.
Qualitative synthesis of findings across the three-phase study (Delphi with 5 seniors; debugging task with 10 juniors; blind reviews by 5 seniors).
The rapid advancement of artificial intelligence (AI) technologies, particularly generative AI and large language models, has reignited debates about the future of work and the potential for widespread labor market disruption.
Statement in the paper's introduction/abstract citing recent empirical studies, industry reports, and ongoing debates; no original sample or numerical evidence reported in the abstract.
How software developers interact with AI-powered tools, including Large Language Models (LLMs), plays a vital role in how these AI-powered tools impact them.
Based on qualitative analysis of twenty-two interviews with software developers about using LLMs for software development; asserted as a central finding in the paper's analysis.
Outcomes of AI deployment in labor-market settings depend on complementary organizational practices, workers’ access to skills, and the regulatory environment.
Synthesis-derived moderator/ mechanism claim from qualitative analysis of the 19 included studies identifying organizational practices, skill access, and regulation as contextual moderators.
Benefits of technology and data analytics are context-dependent, with emerging markets facing unique regulatory and infrastructural barriers.
Narrative synthesis of included studies noting heterogeneity by context and reports of regulatory/infrastructural constraints in emerging markets.
Cybersecurity has a moderating effect on audit data analytics.
Synthesis statement in the review summarizing included studies that report cybersecurity influences the effectiveness/usability of audit data analytics.
No aggregation mechanism can simultaneously satisfy all desiderata of collective rationality (connection to Arrow's Impossibility Theorem); multi-agent deliberation navigates rather than resolves this constraint.
Theoretical argument connecting empirical multi-agent deliberation results to Arrow's Impossibility Theorem and observations that deliberation trades off competing desiderata rather than achieving all simultaneously.
Alignment systematically shapes negotiation strategies and allocation patterns between agents.
Experimentally comparing negotiation behavior and allocation outcomes across agent pairs where one agent is aligned (via RAG) and the partner is either unaligned or adversarially prompted; patterns of strategy and allocation differences reported.
The design space articulates four configurations—No AI, Hidden AI, Translucent AI, and Visible AI—each trading off among accountability, autonomy, and coordination cost.
Conceptual taxonomy introduced in the paper (design artifact). No empirical evaluation or sample reported in the abstract; tradeoffs are argued theoretically.