Evidence (4114 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Innovation
Remove filter
The study implies policy actions to promote high-quality development based on the finding that innovation and the digital economy now play larger roles in growth.
Authors' discussion/conclusion drawing policy implications from empirical findings (declining capital elasticity, rising TFP and digital economy contribution).
Overall, China's growth model shifted over 2010–2022 from being investment-driven to being innovation-driven.
Synthesis of results: declining capital elasticity, rising TFP contribution, substantial share of digital economy in TFP, and regional patterns reported by the study.
The study's method is novel because it uses both migrant worker monitoring data and digital-economy proxy indicators, giving a more accurate picture of how labor quality and technological progress affect each other.
Author-reported methodological description: extended Cobb–Douglas approach combined with quality-adjusted labor measures derived from migrant worker monitoring data and proxy indicators for the digital economy.
Regional analysis shows coastal regions have been driven by innovation, with an estimated (innovation) coefficient of approximately 0.31.
Regional decomposition/estimation reported in the paper's analysis of coastal vs inland regions using the extended production function and digital/labour-quality measures.
The digital economy accounted for 40% of the observed increase in TFP (i.e., made up 40% of the TFP contribution).
Attribution within the growth decomposition from the extended production function, where digital economy indicators are included and their contribution to TFP is estimated.
The contribution rate of total factor productivity (TFP) rose from 18% to 26% between the earlier and later periods.
Decomposition of growth using the extended Cobb–Douglas production function for China over 2010–2022, reporting TFP contribution rates for the two periods.
The paper proposes design principles for effective, accountable, and adaptive sandboxes to contribute to debates on experimentalism in AI governance.
Stated contribution of the paper (descriptive claim about content; abstract does not list the principles or empirical testing).
Regulatory sandboxes (RSs) have emerged as a potential solution to AI regulatory challenges.
Descriptive observation and normative framing within the paper; contextual reference to the EU AI Act's treatment of sandboxes (no empirical sample reported in the abstract).
PIER is an offline reinforcement learning framework that learns fuel‑efficient, safety‑aware routing policies from physics‑calibrated environments grounded in historical vessel tracking data and ocean reanalysis products, requiring no online simulator.
Methodological description of PIER in the paper: offline RL trained on environments constructed from AIS and reanalysis data; no online simulator used for policy learning (implementation details provided).
Bootstrap 95% confidence interval for PIER mean CO2 savings relative to great-circle routing is [2.9%, 15.7%].
Bootstrap analysis applied to the 2023 AIS validation results (840 episodes per method) producing the stated 95% CI for mean percent savings.
PIER reduces per‑voyage fuel consumption variance by a factor of 3.5 (p < 0.001).
Statistical comparison of per-voyage fuel variance between PIER and baseline routing on 840 episodes per method from 2023 AIS data; significance reported with p < 0.001.
The main results are robust to inclusion of firm, industry, and year fixed effects, DID identification using the 2018 SCD pilot, and multiple robustness checks addressing potential confounders and endogeneity.
Authors report baseline regressions with firm/industry/year fixed effects, DID specifications exploiting the 2018 Supply Chain Innovation and Application Pilot Program as a quasi-natural experiment, and a battery of robustness tests (alternative specifications, controls, and checks).
The positive effect of SCD on green innovation is stronger for substantive green innovation (actual environmentally beneficial R&D and technologies) than for strategic green innovation (symbolic/labeling or reputation‑oriented activities).
Heterogeneous outcome analysis splitting green innovation into 'substantive' (e.g., green patents, technological R&D outputs) versus 'strategic' (signaling/compliance indicators); regression and DID estimates show larger and statistically significant coefficients for substantive measures compared to smaller or weaker effects on strategic measures.
Supply chain digitalization (SCD) significantly increases corporate green innovation among Chinese A-share listed firms (2012–2022).
Panel analysis of Chinese A-share listed firms over 2012–2022 using regression models with firm, industry, and year fixed effects; difference-in-differences (DID) identification exploiting the 2018 Supply Chain Innovation and Application Pilot Program as an exogenous shock to SCD; firm-level controls included; multiple robustness checks reported.
Research priorities include empirically quantifying AI's effects on productivity, wages, inequality, and environmental costs; developing standardized sustainability and governance metrics; and evaluating regulatory impacts on innovation and welfare.
Stated research agenda based on gaps identified in the narrative review; identifies directions for future empirical work rather than presenting new empirical findings.
AI has progressed from symbolic systems to data-driven, generative architectures and large-scale computational infrastructures, becoming a foundational technology across sectors.
Narrative synthesis of historical and technical literature across AI research and innovation studies; qualitative tracing of architectural shifts (symbolic → statistical → deep learning/generative models) and increased deployment across industries. No original empirical measurement or sample size reported in this paper.
Mechanisms linking digital services to export performance include reduced transaction and search costs, platform network and scale effects, data as an input improving service quality and customization, and task‑level specialization changing comparative advantage.
Conceptual/theoretical synthesis drawing on multiple strands of literature and illustrative case studies presented in the review (no new causal identification).
Digital services trade is shifting from traditional cross‑border delivery toward online, platform‑based models, with cross‑border data flows a core input and determinant of competitiveness.
Integrative literature and policy review synthesizing domestic and international studies; theoretical/conceptual synthesis and cited case examples (no new econometric analysis or primary microdata).
An asynchronous sliding-window engine treats the GPU as a sliding compute window and overlaps GPU computation with CPU-side parameter updates and multi-tier I/O to hide data movement and synchronization overheads.
System design and implementation described in the paper: an asynchronous runtime that coordinates GPU kernels, CPU updates, and multi-tier I/O. This is a design/implementation claim rather than a measured outcome; the summary links the design to performance improvements.
Evaluation metrics for the benchmark include task-specific metrics such as win-rate for battling and completion time for speedruns, as well as strategic robustness measures.
Paper's evaluation section lists metrics used: win-rate, completion time, strategic robustness; describes how they are computed and used to compare agents.
Speedrunning Track includes an open-source multi-agent orchestration system and standardized evaluation scenarios for reproducible multi-agent comparisons.
Paper describes and releases an open-source orchestration harness for orchestrating LLMs/agents and provides standardized scenarios and evaluation tools meant for reproducibility.
Community interest in the benchmark was validated by a NeurIPS 2025 competition with 100+ teams and published analyses of winning submissions.
Paper reports organization/validation via a NeurIPS 2025 competition, states participation of 100+ teams, and includes documentation/analyses of top submissions.
The project is a living benchmark: the Battling Track has a live leaderboard and the Speedrunning Track uses self-contained evaluation to ensure reproducibility.
Paper/documentation notes a live leaderboard for Battling and provides self-contained evaluation pipelines/orchestration for Speedrunning intended to support reproducible runs.
Baselines include heuristic rule-based agents, reinforcement-learning (RL) agents trained for specialist play, and LLM-based agents/harnesses for generalist approaches.
Paper presents baseline implementations and experiments spanning heuristic, RL, and LLM-based agents and describes training procedures and architectures used for each baseline category.
The benchmark is split into two complementary tracks: a Battling Track (competitive, partial-observability battles) and a Speedrunning Track (long-horizon RPG tasks with a multi-agent orchestration harness).
Paper structure and dataset descriptions specify two tracks, their scopes, and the inclusion of a multi-agent orchestration system for the Speedrunning Track.
The Battling Track dataset contains more than 20 million recorded battle trajectories.
Paper reports a Battling Track dataset of >20M recorded battle trajectories collected from simulated/match play; size reported explicitly in dataset and methods section.
PokeAgent Challenge is a large, realistic multi-agent benchmark built on Pokemon that stresses partial observability, game-theoretic reasoning, and long-horizon planning simultaneously.
Paper describes design and motivation of the benchmark, detailing two tracks (Battling and Speedrunning) intended to capture partial observability, adversarial/game-theoretic interactions, and long-horizon sequential planning; benchmark implementation built on Pokemon simulator and described task specifications.
iDaVIE's modular architecture supports extensibility (planned features include subcube loading, advanced render modes, video scripting, and collaborative VR sessions).
Paper describes modular architecture and lists planned/possible future features; this is a software design claim rather than an empirical result.
Because iDaVIE is open-source and extensible, software licensing costs are low and marginal adoption costs fall over time.
Paper states iDaVIE is open-source and designed for community-driven enhancements; economic claim based on general properties of open-source software rather than empirical cost accounting.
iDaVIE includes interaction features such as selection, cropping/subcube tools, catalogue overlays, and export back to existing pipelines.
Feature list in paper describing selection, cropping, overlays, in-VR metrics and export functionality; demonstrated integration to export edited masks/subcubes.
Streaming and downsampling pipelines implemented as Unity plug-ins make large volumes interactively viewable in VR while preserving needed detail for inspection.
Technical description of custom Unity plug-ins for streaming/downsampling and on-the-fly statistics; tested on HI cubes (telescopes listed) per the paper.
iDaVIE (v1.0) is a working VR software suite that lets astronomers import, render, inspect, and interactively edit very large 3D data cubes in real time.
Described implementation of iDaVIE v1.0 built on Unity/SteamVR with custom plug-ins for parsing/downsampling and real-time rendering; tested on large 3D spectral (HI) cubes from radio telescopes (MeerKAT, ASKAP, APERTIF) as reported in the paper.
HindSight reveals a large, real difference between systems that is missed by LLM-based judging (i.e., HindSight detects the retrieval-augmentation advantage while LLM-judged metrics do not).
Combined empirical results: HindSight shows a 2.5× advantage (p < 0.001) for retrieval augmentation while LLM-as-Judge reports no significant difference (p = 0.584).
Experiments in the paper cover 10 AI/ML research topics and use a 30-month forward evaluation window.
Experimental setup reported in the paper: scope explicitly stated as 10 AI/ML topics and a 30-month forward window after cutoff T.
Generated ideas can be algorithmically compared to future publications and matched items can be assigned scores reflecting downstream impact (citation counts and venue acceptance).
Method section: description of algorithmic matching procedure and scoring rules that use citation counts and venue acceptance as impact proxies.
A retrieval-augmented idea generator produces 2.5× higher-scoring ideas than a vanilla generator according to HindSight (p < 0.001).
Empirical comparison reported in the paper across the specified experiments (10 AI/ML topics, time-split at T, 30-month forward window); statistical test reporting a 2.5× difference with p < 0.001.
HindSight is a time-split, retrospective evaluation that (1) restricts idea generation to pre-cutoff literature (time T), (2) compares generated ideas to papers published in the following 30 months, and (3) scores matches by downstream impact (citation counts and venue acceptance).
Method described in paper: time-split protocol with a temporal cutoff T, a 30-month forward window, algorithmic matching of generated ideas to later publications, and scoring based on downstream impact metrics (citations and venue acceptance).
The paper introduces a Multi-Object Decoder (MOD) that extends SAM 3D to jointly reconstruct multiple objects from a single image, targeting physically plausible, non-penetrating object configurations and realistic contacts.
Method section: MOD is described as an extension of the single-object SAM 3D architecture to jointly decode multiple object shapes and poses from a monocular image; the method explicitly aims to reduce inter-object penetration and model contacts.
Managing captures, traces, and replay sessions from a unified single design database ensures consistency across replay targets and sessions.
Method description emphasizes a single design database coordinating captures and replays across simulation and emulation for the demonstrator system. (Operational claim demonstrated in the implementation; no metrics on error reduction provided.)
The captured traces can be deterministically replayed across different execution targets (software/hardware simulation and hardware emulation), reducing cross-platform setup complexity and discrepancies.
The same captured waveforms/traces were replayed on both simulation and emulation environments for the ODIN demonstrator; cross-target replay was part of the described method. (Demonstrated on the single reported system; no broad cross-toolchain study provided.)
The paper provides concrete, regulation-inspired policy examples (e.g., content prohibition, sensitive data exfiltration) showing how they map into the Policy function.
Worked, illustrative examples included in the paper mapping regulatory constraints to the Policy(agent_id, partial_path, proposed_action, org_state) formalism.
Runtime policy evaluation can intercept, score, log, allow/modify/block actions, and update organizational state as part of an agent's execution loop (reference implementation architecture).
Reference implementation design described in the paper (runtime policy evaluator hooks, logging, enforcement actions); architectural reasoning and pseudo-workflows provided; no production deployment data.
Policies can be formalized as deterministic functions p_violation = Policy(agent_id, partial_path, proposed_action, org_state) that return a probability or score of violation for a proposed next action.
Formal definition and mapping in the paper; worked examples showing how regulatory-style constraints map into this function; no large-scale empirical validation.
Effective governance for agentic LLM systems requires treating the execution path as the central object and performing runtime evaluation of proposed next actions given the partial path.
Theoretical argument and formal proposal of runtime policy evaluator that takes (agent_id, partial_path, proposed_action, org_state) and returns a violation probability; reference architecture described; illustrative examples.
The surrogate-driven inverse-design pipeline transfers to physical hardware — designs produced by the CNN+GA pipeline were realized and validated experimentally.
Two fabricated prototypes implemented the optimized pixelated combiners and GaN HEMT Doherty PAs; measured performance metrics correspond to the designs, demonstrating transfer from surrogate-driven design to hardware.
Under a 20 MHz 5G-NR-like waveform (9 dB PAPR) with digital predistortion (DPD), each prototype reached average PAE greater than 51% while meeting ACLR ≤ −60.8 dBc.
Realistic waveform testing described: a 20 MHz 5G‑NR-like signal with 9 dB PAPR was applied to the prototypes, DPD was used, and measurements reported average PAE > 51% and ACLR ≤ −60.8 dBc for each prototype.
Each prototype demonstrated drain efficiency greater than 52% at 9 dB back-off.
Back-off efficiency measurements reported for the fabricated prototypes showing drain efficiency > 52% at 9 dB back-off.
Each prototype produced output power exceeding 44.1 dBm at 2.75 GHz.
Measured output power reported from RF characterization of the two fabricated prototypes; reported value > 44.1 dBm at the test frequency.
Each fabricated prototype achieved peak drain efficiency greater than 74%.
Measured RF characterization reported for the two prototypes showing peak drain efficiency > 74%; measurements conducted on fabricated hardware at 2.75 GHz.
A genetic-algorithm (GA) blackbox optimizer paired with the CNN surrogate can effectively search the discrete multi-port pixel layout space to synthesize output combiners for Doherty amplifiers.
Method description: CNN surrogate embedded in a blackbox Doherty framework and used within a GA to select pixelated combiner layouts; successful designs were produced and taken to fabrication.