Evidence (2354 claims)
Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 378 | 106 | 59 | 455 | 1007 |
| Governance & Regulation | 379 | 176 | 116 | 58 | 739 |
| Research Productivity | 240 | 96 | 34 | 294 | 668 |
| Organizational Efficiency | 370 | 82 | 63 | 35 | 553 |
| Technology Adoption Rate | 296 | 118 | 66 | 29 | 513 |
| Firm Productivity | 277 | 34 | 68 | 10 | 394 |
| AI Safety & Ethics | 117 | 177 | 44 | 24 | 364 |
| Output Quality | 244 | 61 | 23 | 26 | 354 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 168 | 74 | 37 | 19 | 301 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 89 | 32 | 39 | 9 | 169 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 106 | 12 | 21 | 11 | 151 |
| Consumer Welfare | 70 | 30 | 37 | 7 | 144 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 75 | 11 | 29 | 6 | 121 |
| Training Effectiveness | 55 | 12 | 12 | 16 | 96 |
| Error Rate | 42 | 48 | 6 | — | 96 |
| Worker Satisfaction | 45 | 32 | 11 | 6 | 94 |
| Task Completion Time | 78 | 5 | 4 | 2 | 89 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 17 | 9 | 5 | 50 |
| Job Displacement | 5 | 31 | 12 | — | 48 |
| Social Protection | 21 | 10 | 6 | 2 | 39 |
| Developer Productivity | 29 | 3 | 3 | 1 | 36 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Skill Obsolescence | 3 | 19 | 2 | — | 24 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Labor Share of Income | 10 | 4 | 9 | — | 23 |
Innovation
Remove filter
AI chat appears in the same broad phase of the purchase journey as traditional search and well before order placement.
Sequence/timestamp analysis of user journeys in platform logs showing the relative timing of chat, search, and order placement within journeys.
Adoption of the embedded shopping AI is highest among older consumers, female users, and highly engaged existing users, reversing the younger, male-dominated profile commonly documented for general-purpose AI tools.
Descriptive demographic analysis of adoption rates across users in the Ctrip dataset (user-level adoption comparisons by age, gender, and prior engagement). Sample drawn from the 31 million users in the platform logs.
Grok attracts users primarily for its content policy.
Survey items asking users for reasons they use each platform; reported attribution of content policy as primary reason for Grok (overall N=388).
DeepSeek attracts users primarily through word-of-mouth.
Survey items asking users for reasons they use each platform; reported attribution of word-of-mouth as primary reason for DeepSeek (overall N=388).
Claude attracts users primarily for answer quality.
Survey items asking users for reasons they use each platform; reported attribution of answer quality as primary reason for Claude (overall N=388).
ChatGPT attracts users primarily for its interface.
Survey items asking users for reasons they use each platform; reported attribution of interface as primary reason for ChatGPT (overall N=388).
Over 80% of users use two or more platforms (i.e., multi-platform usage is common).
Survey self-reports aggregated across respondents (paper reports 'over 80%'); overall sample N=388.
We conducted a cross-platform survey of 388 active AI chat users comparing satisfaction, adoption drivers, use case performance, and qualitative frustrations across seven major platforms: ChatGPT, Claude, Gemini, DeepSeek, Grok, Mistral, and Llama.
Cross-sectional online survey described in the paper; sample size reported as 388 users; seven named platforms explicitly listed.
Robustness tests confirm that the core conclusions about IRs improving urban energy resilience and the identified mechanisms/moderators are highly reliable.
Multiple robustness checks reported by the authors (unspecified in the abstract) applied to the DML estimates on the 280-city panel (2009–2023).
Science expenditure (SE) positively moderates the promoting effect of IRs on urban energy resilience; the interaction term coefficient is significantly positive.
Moderation analysis reported in the paper using interaction terms between IRs and science expenditure in the DML framework on the 280-city panel (2009–2023); reported statistically significant positive interaction coefficient.
Environmental regulation (ER) positively moderates the promoting effect of IRs on urban energy resilience; the interaction term coefficient is significantly positive.
Moderation analysis reported in the paper using interaction terms between IRs and environmental regulation in the DML framework on the 280-city panel (2009–2023); reported statistically significant positive interaction coefficient.
Green technology innovation is a main mediating path through which IRs improve urban energy resilience.
Mediation/transmission mechanism analysis reported in the paper based on the DML approach applied to the 280-city panel (2009–2023).
Industrial structure upgrading is a main mediating path through which IRs improve urban energy resilience.
Mediation/transmission mechanism analysis reported in the paper based on the same DML framework and the 280-city panel (2009–2023).
Industrial robots (IRs) significantly promote the improvement of urban energy resilience (UER).
Empirical analysis using Double Machine Learning (DML) on a panel of 280 prefecture-level and above Chinese cities from 2009 to 2023; various robustness tests reported.
The best designs often do not originate from top-ranked ILP candidates, indicating that global optimization exposes improvements missed by sub-kernel search.
Analysis comparing origins of the best final designs vs. their ILP ranking, reported across the benchmark set (12).
Larger gains on harder benchmarks: streamcluster exceeds 20× and kmeans reaches approximately 10×.
Per-benchmark empirical results reported for streamcluster and kmeans in the evaluation.
Scaling from 1 to 10 agents yields a mean 8.27× speedup over baseline.
Empirical evaluation across the reported benchmark set comparing performance with 1 agent versus 10 agents; mean speedup stated in the results.
We evaluate the approach on 12 kernels from HLS-Eval and Rodinia-HLS using Claude Code (Opus 4.5/4.6) with AMD Vitis HLS.
Experimental setup described in the paper reporting evaluation on 12 kernels drawn from HLS-Eval and Rodinia-HLS, using Claude Code (Opus 4.5/4.6) and AMD Vitis HLS.
In Stage 2, the pipeline launches N expert agents over the top ILP solutions, each exploring cross-function optimizations such as pragma recombination, loop fusion, and memory restructuring that are not captured by sub-kernel decomposition.
Method section describing Stage 2 which runs multiple expert agents exploring cross-function optimizations on top ILP solutions.
In Stage 1, the pipeline decomposes a design into sub-kernels, independently optimizes each using pragma and code-level transformations, and formulates an Integer Linear Program (ILP) to assemble globally promising configurations under an area constraint.
Method section describing Stage 1 decomposition, per-sub-kernel optimization and ILP assembly under an area constraint.
We introduce an agent factory, a two-stage pipeline that constructs and coordinates multiple autonomous optimization agents.
Method description in the paper describing the design and implementation of the two-stage 'agent factory' pipeline.
Deployment validation across 43 classrooms demonstrated an 18x efficiency gain in the assessment workflow.
Field deployment described in the paper: system was validated across 43 classrooms and an efficiency gain of 18x in the assessment workflow is reported.
Interaction2Eval achieves up to 88% agreement with human expert judgments.
Reported evaluation results comparing Interaction2Eval outputs to human expert annotations (rubric-based judgments) on the dataset.
Interaction2Eval, an LLM-based framework, addresses domain-specific challenges (child speech recognition, Mandarin homophone disambiguation, rubric-based reasoning).
Methodological description in the paper: a specialized LLM-based pipeline designed to handle listed domain challenges; presented as the approach used to extract structured quality indicators.
TEPE-TCI-370h is the first large-scale dataset of naturalistic teacher-child interactions in Chinese preschools (370 hours, 105 classrooms) with standardized ECQRS-EC and SSTEW annotations.
Authors' dataset construction and description: 370 hours of recorded interactions from 105 classrooms, annotated with ECQRS-EC and SSTEW rubrics as reported in the paper.
The dataset provides a reproducible and scalable foundation for research on technological diffusion, regional digitalisation, and industry-level transformation, and can be readily extended to future years or adapted to other countries.
Text asserts reproducibility, scalability, and extendability of the dataset and methods for future years and other countries.
By providing indicators for two benchmark years, the dataset supports the study of how AI adoption evolves across the Spanish business landscape.
Text highlights the availability of indicators for 2023 and 2025 and claims this supports temporal study of adoption evolution.
This multi-dimensional structure enables users to explore territorial patterns, sectoral differences, and size-related disparities in the uptake of AI.
Text claims that the dataset's dimensions make it possible to explore spatial (territorial), sectoral, and size-related patterns in AI uptake.
For each province–sector–size combination, the dataset reports whether firms adopt AI, whether they apply it internally, whether it is embedded in their offerings, and how many firms have valid website content.
Text explicitly lists the reported indicators at the province–sector–size aggregation level (adoption, internal use, embedded in offerings, count of valid website content).
The dataset offers a detailed portrait of AI adoption across regions (NUTS 3), industries, and firm size categories.
Text claims multi-dimensional reporting by region (NUTS 3), industry, and firm size categories in the dataset.
The pipeline identifies explicit evidence of AI use both in firms' internal processes and embedded in their products or services.
Text states the structured rubric is used to identify explicit evidence of AI use in internal processes and in products/services.
The paper uses a systemic pipeline based on large language models (LLMs) to segment website text, semantically filter it, and evaluate it with a structured rubric.
Text describes methodological pipeline components (LLM-based segmentation, semantic filtering, structured rubric evaluation).
The dataset results in 225,628 firm-year observations.
Text explicitly reports 225,628 firm-year observations derived from the dataset across the two benchmark years.
The paper introduces a nationwide dataset that maps how 112,814 Spanish firms communicate and implement artificial intelligence (AI) on their corporate websites in 2023 and 2025.
Text states dataset coverage and firm count (112,814 firms) and benchmark years (2023 and 2025).
Those extended-model equilibria also show increasing concentration consistent with power-law-like distributions (i.e., winner-take-most / superstar effects).
Theoretical model combining quality heterogeneity and reinforcement dynamics that yields equilibrium distributions with heavy tails; argument and formalization presented in the paper; no empirical testing reported.
Even as the number of producers increases and average attention per producer falls, total output expands (production scales elastically).
Same formal theoretical model (analytical result): production scales elastically in the model despite finite attention; no empirical validation provided.
Mechanisms identified — network structure evolution and increased relational embeddedness — contribute to a broader understanding of how digital transformation shapes innovation dynamics across geographical boundaries in a globalized knowledge economy.
Synthesis of empirical network evolution results and mediation/structural analyses from the 2011–2021 dataset of digital transformation indicators and patent collaboration networks among cities and firms.
These results provide empirical evidence from a major emerging economy (China) that can offer insights to inform policies and strategies in other regions undergoing digital transition.
Generalization claim based on empirical findings from the 2011–2021 analysis of A-share listed companies' digital transformation and patent collaboration patterns in China.
When the volume of digital patent applications surpasses a certain threshold, the positive effect of digital transformation on the quality of cross-regional collaborative innovation accelerates (nonlinear threshold effect).
Threshold regression / nonlinear analysis relating counts of digital patent applications to the marginal effect of digital transformation on collaborative innovation quality, using 2011–2021 patent and digitalization data from A-share listed firms.
Advancement of digital transformation positively contributes to both the quality and the quantity of cross-regional cooperative innovation.
Empirical econometric analysis (panel regressions) linking measures of corporate/urban digital transformation to indicators of cross-regional cooperative innovation quality and counts, using A-share listed companies' digital transformation indicators and patent collaboration data, 2011–2021.
China’s urban collaborative innovation network demonstrates a notable quadrilateral spatial structure and has evolved toward a multicenter pattern over time.
Spatio-temporal network analysis based on the same 2011–2021 dataset of digital transformation indicators and patent/co-patent links among cities inferred from A-share listed companies' patent data.
The cooperative innovation network exhibits pronounced small-world characteristics.
Network analysis of cross-regional collaborative innovation using digital transformation and patent data from A-share listed companies on the Shanghai and Shenzhen stock exchanges (2011–2021).
If you can prove the value and the effort behind API token spending (agent memory), you can resell it.
Normative/operational claim within the paper's proposal; presented as an implication of verifiable provenance and market layering, with no empirical proof or transactional data.
Enabling timely memory transfer reduces repeated exploration.
Argument in the paper asserting that shared/tradable memory decreases redundant exploration; no experimental or observational data provided.
Together, clawgang and meowtrade transform one-shot API token spending into reusable and tradable assets.
High-level systems argument in the paper; no empirical measurements of reuse or tradability presented.
Meowtrade is a market layer for listing, transferring, and governing certified memory artifacts.
Design proposal described in the paper; no pilot deployment, user adoption metrics, or experimental data provided.
Clawgang binds memory to verifiable computational provenance.
System/design claim describing the proposed mechanism (clawgang) in the paper; no implementation results or empirical validation reported.
Agent memory can serve as an economic commodity in the agent economy, if buyers can verify that it is authentic, effort-backed, and produced in a compatible execution context.
Conceptual argument in the paper's proposal; no empirical evaluation, sample size, or experiments reported.
Economic theory can be used to generate structured synthetic data that improves foundation-model predictions when the theory implies observable patterns in the data.
General conclusion drawn from the paper's experimental findings: improvement in model predictions after fine-tuning on theory-derived synthetic data.
Fine-tuning on GARP-consistent synthetic data substantially improves prediction relative to zero-shot Chronos-2 at all forecast horizons we study.
Empirical results comparing fine-tuned Chronos-2 to zero-shot Chronos-2 across multiple forecast horizons on the authors' experimental panel (no numeric metrics or sample sizes given in the excerpt).