Evidence (8066 claims)
Adoption
5586 claims
Productivity
4857 claims
Governance
4381 claims
Human-AI Collaboration
3417 claims
Labor Markets
2685 claims
Innovation
2581 claims
Org Design
2499 claims
Skills & Training
2031 claims
Inequality
1382 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 417 | 113 | 67 | 480 | 1091 |
| Governance & Regulation | 419 | 202 | 124 | 64 | 823 |
| Research Productivity | 261 | 100 | 34 | 303 | 703 |
| Organizational Efficiency | 406 | 96 | 71 | 40 | 616 |
| Technology Adoption Rate | 323 | 128 | 74 | 38 | 568 |
| Firm Productivity | 307 | 38 | 70 | 12 | 432 |
| Output Quality | 260 | 71 | 27 | 29 | 387 |
| AI Safety & Ethics | 118 | 179 | 45 | 24 | 368 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 75 | 37 | 19 | 312 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 74 | 34 | 78 | 9 | 197 |
| Skill Acquisition | 98 | 36 | 40 | 9 | 183 |
| Innovation Output | 121 | 12 | 24 | 13 | 171 |
| Firm Revenue | 98 | 35 | 24 | — | 157 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 87 | 16 | 34 | 7 | 144 |
| Inequality Measures | 25 | 76 | 32 | 5 | 138 |
| Regulatory Compliance | 54 | 61 | 13 | 3 | 131 |
| Task Completion Time | 89 | 7 | 4 | 3 | 103 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 33 | 11 | 7 | 98 |
| Wages & Compensation | 54 | 15 | 20 | 5 | 94 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 27 | 26 | 10 | 6 | 72 |
| Job Displacement | 6 | 39 | 13 | — | 58 |
| Hiring & Recruitment | 40 | 4 | 6 | 3 | 53 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 11 | 6 | 2 | 41 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 6 | 9 | — | 27 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
Regional TFCP shows significant positive spatial autocorrelation.
Spatial analysis (Spatial Durbin Model and spatial statistics) applied to panel of 30 provincial-level regions; reported significant spatial autocorrelation (e.g., positive Moran's I implied).
Across 378 hardware validated experiments, concise human-expert skills with structured expert knowledge enable near-perfect success rates across platforms.
Reported experimental results: 378 hardware-validated experiments across platforms comparing agent configurations; finding reported that human-expert skills produce near-perfect success rates (no numeric success rate provided in excerpt).
Large language models (LLMs) and agentic systems have shown promise for automated software development.
Statement in paper referencing prior successes of LLMs and agentic systems for automated software development (no empirical data reported in this excerpt).
Trained participants more often assigned tasks to the agent by defining strategies compared to participants who did not receive teamwork training.
Behavioral measure in experiment (frequency of assigning tasks using defined strategies) comparing trained vs. untrained participants in the KeyWe game with a scripted agent.
Participants who received the training delegated a higher percentage of tasks to the agent than participants who did not receive teamwork training.
Between-subjects comparison in KeyWe testbed with a scripted agent; measured percentage of tasks delegated by participants in trained vs. untrained groups.
A HAT training intervention that took less than 30 minutes was developed to train humans on seven teamwork competencies.
Study description: developed a training intervention under 30 minutes targeting seven teamwork competencies; implemented as part of the experiment.
The largest gains appear when AI is embedded in an orchestrated workflow rather than deployed as an isolated coding assistant.
Central thesis supported by comparisons across five delivery configurations (traditional baseline and V1–V4) in a retrospective longitudinal field study of the Chiron platform applied to three real software modernization programs; authors observe greater portfolio-level improvements when AI is integrated into coordinated workflows.
V3 and V4 add acceptance-criteria validation, repository-native review, and hybrid human-agent execution, simultaneously improving speed, coverage, and issue load.
Observed differences across the five delivery configurations (baseline, V1–V4) in the field study of three modernization programs; authors link feature additions in V3/V4 to measured improvements in stage durations, coverage, and validation issues.
First-release coverage rises from 77.0% to 90.5% across the portfolio as platform versions progress.
Observed first-release coverage measured in the retrospective longitudinal field study of three real modernization programs, reported as percentages across delivery configurations.
Validation-stage issue load falls from 8.03 to 2.09 issues per 100 tasks across the portfolio as platform versions progress.
Observed outcomes from the retrospective field study on three programs; validation-stage issues counted and normalized per 100 tasks across delivery configurations.
Modeled senior-equivalent effort falls from 1080.0 to 139.5 SEE-days under the platform configurations studied.
Modeled senior-equivalent effort computed from the study's staffing scenarios and observed outputs across the three real programs.
Modeled raw effort falls from 1080.0 to 232.5 person-days under the platform configurations studied (baseline -> V4 aggregate).
Modeled outcomes computed from observed task volumes and explicit staffing scenarios in the retrospective longitudinal field study covering three real programs.
Portfolio totals move from 36.0 to 9.3 summed project-weeks under baseline staffing assumptions (across the three studied programs and five delivery configurations).
Retrospective longitudinal field study of the Chiron platform applied to three real software modernization programs (COBOL banking migration ~30k LOC, accounting modernization ~400k LOC, .NET/Angular mortgage modernization ~30k LOC); observed and modeled outcomes were aggregated to produce portfolio totals under explicit staffing scenarios.
There exist reserves for optimizing the interaction of artificial intelligence with the labor market, and it is necessary to adapt AI to the specifics of national economic models.
Conclusions drawn from the envelope-model results showing heterogeneity across countries and implied gaps/opportunities for policy and adaptation; the paper emphasizes policy implications and the need for AI adaptation to national economic specifics.
Certain countries can optimally transform AI diffusion into positive domestic labor-market outcomes (economic development and realization of human capital potential): the Netherlands, France, Portugal, Italy, and Malta.
Comparative envelope-model analysis across the sample of European Union countries produced a ranking or identification of countries judged able to optimally transform AI diffusion into labor-market and human-capital results; these five countries are named in the paper.
Introducing an 'AI Engineer' occupational category could catalyze population cohesion around the already-formed vocabulary, completing the co-attractor.
Speculative policy suggestion based on the co-attractor framework and empirical observation that vocabulary exists but population cohesion is absent.
Applied to 8.2 million US resumes (2022-2026), the method correctly identifies established occupations.
Empirical application of the method to a dataset of 8.2 million US resumes spanning 2022–2026; claim that results match known/established occupations (implies validation against existing taxonomy or known labels).
The co-attractor concept enables a zero-assumption method for detecting occupational emergence from resume data, requiring no predefined taxonomy or job titles: we test vocabulary cohesion and population cohesion independently, with ablation to test whether the vocabulary is the mechanism binding the population.
Methodological claim describing the approach applied to resume data: independent tests of vocabulary cohesion and population cohesion, plus ablation experiments. Supported by the method's implementation on the resume dataset.
A genuine occupation is a self-reinforcing structure (a bipartite co-attractor) in which a shared professional vocabulary makes practitioners cohesive as a group, and the cohesive group sustains the vocabulary.
Theoretical/conceptual proposal introduced by the authors as the defining mechanism for occupational emergence; motivates the detection method.
Occupations form and evolve faster than classification systems can track.
Argument supported by the paper's analysis approach and motivating observation; asserted as motivation for developing a detection method. No specific numerical test reported in the excerpt beyond the large resume dataset.
The effect is amplified in Japanese, where experiential queries draw 62.1% non-OTA citations compared to 50.0% in English.
Subset analysis by language within the audited sample comparing non-OTA citation shares for experiential queries in Japanese vs English; percentages reported in paper.
Experiential queries draw 55.9% of their citations from non-OTA sources, compared to 30.8% for transactional queries — a 25.1 percentage-point gap (p < 5 × 10^{-20}).
Quantitative comparison of citation-source types in the audited sample (1,357 citations across 156 queries), classifying queries as 'experiential' vs 'transactional' and computing share of citations from non-OTA sources; reported p-value indicates statistical test of difference.
Because instructional signals are usable only when the learner has acquired the prerequisites needed to parse them, the effective communication channel depends on the learner's current state of knowledge and becomes more informative as learning progresses.
Theoretical consequence derived from the model's prerequisite-structure assumption and sequential teaching formalization (as described in the abstract).
Generative AI has transformed the economics of information production, making explanations, proofs, examples, and analyses available at very low cost.
Statement in paper (intro/abstract) asserting an empirical/observational fact about generative AI; no empirical sample or data reported in the abstract.
These results highlight the importance of trustworthy AI mediation tools in contexts where not only truth, but also trust and confidence matter.
Policy/recommendation based on experimental findings that AI mediation lowers perceived trust and confidence even when accuracy is unchanged.
The study recommends establishing more accessible AI systems for decision-making, improving digital literacy programmes through regulatory support, and creating special resources for communities that lack essential services.
Authors' policy/research recommendations derived from the study's mixed-methods findings.
AI functions as an essential instrument for advancing financial inclusion in Zimbabwe by enhancing banking access, operational efficiency, and the security of banking services.
Synthesis of mixed-methods findings (survey n=293; interviews n=12) indicating improvements in access, efficiency, and security associated with AI use in banks.
Anomaly detection systems had the most significant impact on financial outcomes, explaining 62.3% of the outcome differences produced by AI technologies.
Quantitative analysis reported in the paper (presumably regression/variance decomposition) based on the survey data (n=293) showing anomaly detection explains 62.3% of variance in the measured financial outcome.
Organisations strongly supported AI systems for decision-making and fraud detection.
Survey responses and/or summary statistics from the questionnaire (n=293) indicating organisational support for AI in decision-making and fraud detection.
AI enables loan processing and makes financial products more accessible through three main functions: usability, safety in transactions, and financial literacy training.
Findings reported from the study's mixed-methods analysis (survey n=293 and interviews n=12) describing perceived AI functions in banking.
Successful implementation of automated tax systems requires a governance framework that integrates transparency, accountability, and user support mechanisms.
Normative and policy-oriented conclusions derived from the synthesis of the 36 articles, which highlight governance features associated with better outcomes in studies examined.
Automation has improved taxpayer compliance across diverse contexts.
Synthesis of results from the reviewed literature (36 studies) indicating higher rates of compliance associated with automated systems such as e-filing, automated reporting, and AI risk profiling.
Automation (e-filing platforms, AI-driven risk profiling, real-time reporting systems) has enhanced administrative efficiency in tax administration.
Synthesis of empirical findings across the 36 reviewed studies reporting improvements to administrative processes attributable to automation tools (e.g., faster processing, streamlined workflows).
Reinforcement learning (post-training) on our corpus improves downstream embodied manipulation performance.
Downstream evaluation described in the paper showing improved performance on embodied manipulation tasks after RL post-training on MultihopSpatial-Train.
Reinforcement learning (post-training) on our MultihopSpatial-Train corpus enhances intrinsic VLM spatial reasoning.
Experimental intervention: RL-based post-training on the authors' training corpus followed by evaluation on intrinsic spatial reasoning benchmarks (described in the paper).
We provide MultihopSpatial-Train, a dedicated large-scale training corpus intended to foster spatial intelligence in VLMs.
Dataset/resource contribution described in the paper (existence and intended use of MultihopSpatial-Train).
We propose Acc@50IoU, a complementary metric that simultaneously evaluates reasoning and visual grounding by requiring both answer selection and precise bounding box prediction.
Methodological contribution in the paper defining the Acc@50IoU metric and its intended use to measure combined answer correctness and bounding-box IoU >= 0.5.
We introduce MultihopSpatial, a comprehensive benchmark designed for multi-hop and compositional spatial reasoning, featuring 1- to 3-hop complex queries across diverse spatial perspectives.
Dataset/benchmark construction described in the paper (design and scope of MultihopSpatial).
Spatial reasoning is foundational for Vision-Language Models (VLMs), particularly when deployed as Vision-Language-Action (VLA) agents in physical environments.
Conceptual/introductory statement in the paper motivating the work (literature-based argument about VLMs and VLA agents).
An approach is needed focused on emerging and future interdependencies between professionals and generative machine learning, implying extending but also reimagining theoretical perspectives on expertise, work and organizations.
Paper's central argument based on theoretical reasoning and literature synthesis about generative ML characteristics and their implications for professionals; method: conceptual/theoretical development; no empirical sample.
Existing theories need to be extended whilst also responding to the distinctive characteristics of generative machine learning and the implications for how we theorize change.
Argumentative/theoretical claim in the paper based on comparison of features of generative ML with prior digital/algorithmic technologies; method: conceptual analysis and literature engagement; no empirical sample.
We develop an approach using insights from existing literature on digital, algorithmic and artificial intelligence technologies.
Paper's stated contribution: theoretical development based on synthesis of existing literature (digital, algorithmic, AI). Method: conceptual synthesis; no empirical testing or sample reported.
There is a need for an approach to theorizing professional work and professional service firms in the generative machine learning age.
Conceptual argument presented in the paper (literature-based rationale); method is theoretical/literature review and argumentation; no empirical sample reported.
The findings position AI not merely as an operational tool but as a strategic orchestrator of regenerative production systems, offering a clear roadmap for accelerating circular transitions in line with the Sustainable Development Goals.
Conclusions drawn from the mixed-methods review (bibliometric analysis of 196 articles and systematic review of 104 studies) as reported in the abstract.
Artificial intelligence is emerging as a powerful driver of the circular economy (CE), enabling production systems to become more resource-efficient, less waste-intensive and strategically aligned with sustainability goals.
Mixed-methods assessment combining bibliometric network analysis (196 peer-reviewed articles, 2023–2024) and a systematic review of 104 studies, as reported in the abstract.
AI can reduce production scrap by as much as 30% in documented cases.
Systematic review of studies (paper reports a systematic review of 104 studies); the abstract cites documented cases showing up to 30% reduction in production scrap.
AI can increase resource-efficiency metrics by up to 25% in documented cases.
Systematic review of studies (paper reports a systematic review of 104 studies); the abstract states documented cases showing up to 25% increases in resource-efficiency metrics.
Policy must shift from simply promoting technology to proactively shaping the regulatory and infrastructural ecosystems that govern AI deployment to ensure a just transition.
Policy recommendation based on study’s empirical findings about conditionality and heterogeneity of AI effects; prescriptive statement by authors.
AI markedly improves recognition justice.
Dimension-level analysis of the energy justice index showing significant positive effects of AI on recognition justice component.
AI markedly improves procedural justice.
Dimension-level analysis of the multidimensional energy justice index indicating significant positive effects of AI on procedural justice component.