Evidence (2340 claims)
Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 378 | 106 | 59 | 455 | 1007 |
| Governance & Regulation | 379 | 176 | 116 | 58 | 739 |
| Research Productivity | 240 | 96 | 34 | 294 | 668 |
| Organizational Efficiency | 370 | 82 | 63 | 35 | 553 |
| Technology Adoption Rate | 296 | 118 | 66 | 29 | 513 |
| Firm Productivity | 277 | 34 | 68 | 10 | 394 |
| AI Safety & Ethics | 117 | 177 | 44 | 24 | 364 |
| Output Quality | 244 | 61 | 23 | 26 | 354 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 168 | 74 | 37 | 19 | 301 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 89 | 32 | 39 | 9 | 169 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 106 | 12 | 21 | 11 | 151 |
| Consumer Welfare | 70 | 30 | 37 | 7 | 144 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 75 | 11 | 29 | 6 | 121 |
| Training Effectiveness | 55 | 12 | 12 | 16 | 96 |
| Error Rate | 42 | 48 | 6 | — | 96 |
| Worker Satisfaction | 45 | 32 | 11 | 6 | 94 |
| Task Completion Time | 78 | 5 | 4 | 2 | 89 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 17 | 9 | 5 | 50 |
| Job Displacement | 5 | 31 | 12 | — | 48 |
| Social Protection | 21 | 10 | 6 | 2 | 39 |
| Developer Productivity | 29 | 3 | 3 | 1 | 36 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Skill Obsolescence | 3 | 19 | 2 | — | 24 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Labor Share of Income | 10 | 4 | 9 | — | 23 |
Org Design
Remove filter
Recommended next steps for validation include controlled pilots, before-after studies on operational metrics, and cross-firm panel analyses to estimate economic impacts and risk reductions.
Authors' explicit recommendations for empirical validation in the Data & Methods and Implications sections.
There is no reported large-scale quantitative evaluation (e.g., productivity gains, cost-benefit metrics, or causal impact estimates) supporting the framework in the paper.
Explicit limitation noted by the authors stating absence of large-scale quantitative evaluation.
The evidence base for the paper is qualitative: a synthesis of industry best practices and lessons from multi-sector enterprise implementations; methods used include conceptual framework development, architecture design, and case-based illustration.
Explicit methodological statement in the Data & Methods section of the paper.
Limitation: the study analyzes national‑level formal policy texts only and does not measure enforcement, implementation outcomes, or public reactions.
Author‑stated limitations in the paper specifying scope restricted to formal policy documents and absence of empirical enforcement/compliance data.
The paper uses qualitative content analysis, coding documents against the four analytical dimensions to generate a comparative typology of coordination approaches.
Method description: manual qualitative coding of the 36 documents into the specified dimensions, producing the typology distinguishing Chinese and U.S. approaches.
The study's empirical basis comprises 36 national‑level policy documents (18 from China; 18 from the United States) focused on scientific data governance.
Author‑reported dataset and sampling description in the Data & Methods section.
The comparative analysis is organized across four dimensions: coordination objectives, institutional actors, governance mechanisms, and stakeholder legitimacy.
Methodological design reported in the paper; documents were coded against these four analytic categories.
The authors recommend empirical approaches for future work including randomized controlled trials in labs, before-after adoption studies, and collection of microdata on instrument usage, model versions, and provenance to measure impacts.
Explicit methodological recommendations in the Measurement and empirical research agenda section; these are proposals rather than executed studies.
There is a need for rigorous evaluation metrics and benchmarks for safety, reproducibility, and empirical studies quantifying productivity or scientific impact of LLM-driven instrument control.
Identified research gaps and recommended empirical research agenda described by the authors; these are recommendations rather than empirical findings.
The evidence presented consists mainly of qualitative arguments drawn from documented advances and discussion of prototypes; no controlled experimental evaluation is presented.
Authors' own description in the Data & Methods section about the nature of evidence supporting their perspective.
This paper is a conceptual perspective/review rather than an original empirical study.
Explicit statement in the Data & Methods section that the contribution is a perspective synthesizing literature and illustrative examples with no controlled experimental evaluation.
Modern microscopes are increasingly software-driven and data-intensive, while existing ML tools for microscopy are task-specific and fragmented.
Synthesis of recent literature on optical microscopes, detectors, and task-specific ML for image analysis referenced in the perspective (descriptive claim; no new empirical data collected).
Empirical validation of the book’s proposals would require complementary case studies, model documentation, and outcome measurements.
Author/reviewer recommendation in the blurb about methodological limitations and next steps; not an empirical finding.
The book is predominantly conceptual and policy-analytic and uses illustrative case vignettes rather than presenting a single empirical study.
Explicit methodological description in the Data & Methods blurb: synthesis of technical ideas, governance requirements, and illustrative vignettes; no empirical sample or experimental protocol described.
The evidence base is qualitative: the study uses conceptual framework synthesis, comparative analysis of multi-sector implementations, and case examples rather than randomized or large-sample empirical evaluation.
Methods and limitations section of the paper explicitly describing the evidence base and methods (qualitative synthesis, pattern extraction, cross-case lessons).
The paper presents a deployment pattern intended to be adapted by sector and regulatory context rather than a one-size-fits-all blueprint.
Explicit statement in the paper and the described pattern design; based on qualitative pattern extraction and prescriptive guidance.
Methodological claim: combining fixed-effects panel estimation, mediation analysis, and panel threshold models is an effective multi-method approach to (a) estimate average effects, (b) unpack causal channels, and (c) detect nonlinear stage-dependent impacts.
The paper's applied methodology: fixed-effects panel regressions, mediation framework, and panel threshold modeling on the 2012–2022 provincial panel.
The paper constructs a multidimensional digitalization index composed of digital infrastructure, digital service capacity, and the digital development environment.
Index construction described in data/methods: composite indicator combining measures of connectivity/broadband (infrastructure), e-commerce/digital finance (service capacity), and policy/institutional/human capital indicators (development environment).
The study is observational (panel) and subject to limitations: residual confounding is possible; two-way fixed-effects estimators can be biased with heterogeneous treatment timing or dynamics; external validity beyond China and non-grain crops is not established.
Authors' stated limitations and caveats in the paper regarding identification and generalizability of results from the CLDS 2014–2018 observational panel.
The study uses two-way fixed-effects (household and year) models as the primary identification strategy and employs propensity score matching (PSM) as a robustness check.
Methods section of the paper describing estimation strategy applied to the CLDS 2014–2018 panel of grain-producing households.
The ManagerWorker two-agent pipeline (expensive text-only manager + cheaper worker with repo access) can substitute expensive execution by using expensive reasoning in the manager and cheaper execution in the worker.
System design description plus empirical results on 200 SWE-bench Lite instances showing parity in success rates between a strong-manager/weak-worker pipeline and a strong single agent while using fewer strong-model tokens.
A minimal review-only manager loop adds only 2 percentage points over the baseline, whereas structured exploration and planning by the manager add 11 percentage points, demonstrating that active direction (not mere reviewing) produces most of the benefit.
Ablation-style comparison of pipeline variants on the 200-instance SWE-bench Lite evaluation: review-only manager loop versus manager with structured exploration and planning; reported improvements in percentage points.
A strong manager directing a weak worker achieves a 62% success rate on software-engineering tasks, matching a strong single agent which achieves 60%, while using a fraction of the strong-model token usage.
Empirical evaluation on 200 instances from SWE-bench Lite across five pipeline configurations and model pairings; measured task success rates and token usage for manager-worker pipelines versus single-agent baselines.
Overall, the HCT is a robust, accurate, and transparent alternative to the AI-as-advisor approach, offering a simple mechanism to tap into the wisdom of hybrid crowds.
Overall conclusion drawn from the empirical comparisons across datasets and analyses described in the paper (summary statement in abstract).
Using signal detection theory, the paper finds that the HCT outperforms the AI-as-advisor approach because people cannot discriminate well enough between correct and incorrect AI advice.
Analysis in the paper applying signal detection theory to the empirical results (as stated in abstract).
The HCT also performed better in almost all cases in which the AI offered an explanation of its judgment.
Empirical results on the subset of four datasets with AI explanations (abstract reports HCT performed better in 'almost all' of these cases).
The HCT outperformed the AI-as-advisor approach in all datasets.
Empirical comparisons reported across the 10 datasets (statement in abstract that HCT 'outperformed' in all datasets). Specific performance metrics not provided in abstract.
The results (conceptual/model results) support corporate GenAI policies, leadership development programs, and HR assessment of leader readiness for GenAI-enabled delegation and communication.
Practical implications and recommendations section arguing policy and HR applications based on the conceptual model.
The article introduces an EI-driven trust-calibration framework as an explanatory mechanism showing when generative AI improves leadership effectiveness and when it amplifies managerial errors.
Novel theoretical framework developed in the paper synthesizing EI, trust calibration, and psychological safety to explain boundary conditions of AI in leadership.
The paper provides an operationalization toolkit including measures: GenAI use intensity; delegation quality indices (clarity, boundaries, success criteria); communication quality indices (empathy, tone, transparency); psychological safety markers; and behavioral trust-calibration measures.
Operationalization section in the paper listing suggested indices and markers for empirical measurement.
As a follow-up validation path, the paper proposes a two-wave time-lag design and 180° assessment (leader + subordinates) to reduce common-method bias.
Methodological proposal in the paper describing longitudinal and multi-rater validation approaches.
The paper proposes a 'Package B' rapid empirical design: a randomized online experiment manipulating access to generative AI in core managerial tasks (decision, delegation, team communication), combined with EI measurement and trust-calibration indicators.
Methodology section proposing the rapid randomized online experiment design as the primary empirical test.
Emotional intelligence strengthens the positive impact of generative AI on managerial outcomes when trust is properly calibrated and psychological safety is maintained.
Conceptual model and integrative argument combining EI, trust-calibration, and psychological safety; supported by proposed empirical test design.
The paper conceptualizes human–AI leadership as an integrated managerial competence.
Conceptual modeling presented in the paper integrating EI theory, psychological safety, and trust calibration (theoretical synthesis).
Large language model (LLM) use can improve observable output and short-term task performance.
Paper synthesizes empirical findings from human–AI interaction studies, learning-research experiments, and model-evaluation work indicating improved produced outputs and short-term task performance when humans use LLMs; no single pooled sample size or unified effect estimate is reported in the paper.
These empirical insights provide actionable guidelines advocating dynamically routed architectures that adapt their collaborative structures to real-time task complexity.
Authors' recommendation derived from reported empirical findings comparing architectures under varying time budgets and task complexities (prescriptive claim based on study results).
Given extended compute budgets, the agent team topology achieves the deep theoretical alignment necessary for complex architectural refactoring.
Empirical benchmarks run with longer/extended computational budgets showing agent teams perform better on complex architectural refactoring tasks (qualitative claim; no numeric effect sizes or sample counts provided in the abstract).
The subagent mode functions as a highly resilient, high-throughput search engine optimal for broad, shallow optimizations under strict time constraints.
Benchmark comparisons in the execution-based testbed under strictly fixed computational time budgets showing subagent architecture excels in throughput/resilience for broad, shallow optimization tasks (qualitative claim in paper; no numeric effect sizes provided).
Task-level analyses show that activities expanded in AI-enabled projects—particularly ideation and experimentation—are increasingly compatible with large language model capabilities, suggesting potential for future productivity gains as these technologies mature.
Task-level classification mapping tasks described in proposals to LLM-relevant capabilities using LLM-based classification; finding that tasks expanded in AI-enabled projects cluster on ideation and experimentation, which align with current LLM strengths.
AI-enabled projects undertake a broader set of tasks.
Task-level analysis of proposal descriptions (task inventories) classifying tasks via keyword extraction and LLMs, showing AI-enabled proposals list a wider variety of activities than non-AI proposals.
AI-enabled projects involve larger teams.
Comparison of team structure in proposals (team size) between AI-enabled and non-AI projects using the same comprehensive proposal dataset and LLM-based classification of AI presence.
AI-enabled projects reallocate resources toward human capital (i.e., shift budget allocations toward labor / human capital).
Analysis of detailed budget allocations in the proposal dataset, comparing projects identified as AI-enabled versus non-AI projects using keyword extraction and LLM classification to identify AI presence and role.
In the short run, AI adoption is associated with modest improvements in scientific outcomes concentrated in the upper tail.
Observational analysis linking identified AI presence in a comprehensive dataset of research proposals (funded and unfunded) to subsequent publication outcomes; AI presence identified via keyword extraction combined with large language model (LLM) classification; publication outcomes measured after proposal submission.
Education and workforce development should shift focus from rote knowledge accumulation to cultivating skills in human-AI collaboration, creative problem-solving, and the design of novel economic domains.
Normative policy recommendation derived from the paper's framework and analysis of anticipated labor market changes (no empirical evaluation or trial data reported in the abstract).
Human-AI co-evolution will significantly increase individual productivity and open new frontiers of economic activity.
Projected outcome based on combined analysis of AI capabilities, historical patterns, and platform growth; the abstract does not report empirical measurement or sample sizes for this projection.
AI-driven productivity augmentation dramatically lowers the barriers to creating economic value, enabling the decentralized generation of employment.
Argument supported by paper's analysis of contemporary labor market dynamics and the growth of digital platforms; no quantified empirical estimates or sample sizes provided in the abstract.
The transition to an AI-civilization will fundamentally restructure the mechanisms of employment creation from a centralized model (few organizations creating jobs for the many) to a decentralized ecosystem where individuals are empowered to generate their own employment opportunities.
Central thesis of the paper, motivated by theoretical argumentation and synthesis of contemporary data on labor markets and digital platforms (no empirical test or sample sizes specified in the abstract).
Historical precedents from past technological revolutions suggest that innovation tends to expand, rather than shrink, the scope of economic activity and employment in the long run.
Paper draws on analysis of economic history (qualitative historical analysis implied; no specific historical datasets or sample sizes provided in the abstract).
By formalizing the end-to-end transaction model together with its asset and incentive layers, EpochX reframes agentic AI as an organizational design problem focused on infrastructures where verifiable work leaves persistent, reusable artifacts and value flows support durable human-agent collaboration.
Theoretical framing and normative claim in the paper; no empirical evaluation demonstrating that this reframing yields measurable benefits.
Credits lock task bounties, allow budget delegation, settle rewards upon acceptance, and compensate creators when verified assets are reused.
Functional description of the credit mechanics and settlement rules within the proposed EpochX marketplace; presented as part of system design without empirical settlement or user-behavior data.