Evidence (4333 claims)
Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 402 | 112 | 67 | 480 | 1076 |
| Governance & Regulation | 402 | 192 | 122 | 62 | 790 |
| Research Productivity | 249 | 98 | 34 | 311 | 697 |
| Organizational Efficiency | 395 | 95 | 70 | 40 | 603 |
| Technology Adoption Rate | 321 | 126 | 73 | 39 | 564 |
| Firm Productivity | 306 | 39 | 70 | 12 | 432 |
| Output Quality | 256 | 66 | 25 | 28 | 375 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 76 | 38 | 20 | 315 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 77 | 34 | 80 | 9 | 202 |
| Skill Acquisition | 92 | 33 | 40 | 9 | 174 |
| Innovation Output | 120 | 12 | 23 | 12 | 168 |
| Firm Revenue | 98 | 34 | 22 | — | 154 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 84 | 16 | 33 | 7 | 140 |
| Inequality Measures | 25 | 77 | 32 | 5 | 139 |
| Regulatory Compliance | 54 | 63 | 13 | 3 | 133 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Task Completion Time | 88 | 5 | 4 | 3 | 100 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 32 | 11 | 7 | 97 |
| Wages & Compensation | 53 | 15 | 20 | 5 | 93 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 24 | 22 | 9 | 6 | 62 |
| Job Displacement | 6 | 38 | 13 | — | 57 |
| Hiring & Recruitment | 41 | 4 | 6 | 3 | 54 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 10 | 6 | 2 | 40 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 5 | 9 | — | 26 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
Governance
Remove filter
Persistent Gini coefficients of 0.43 to 0.62 across all conditions indicate concentrated detection inequality.
Reported range of Gini coefficients from simulation experiments across conditions.
Experiments reveal extreme and year-variant bias in Baltimore's detected mode, with mean annual DIR up to 15,714 in 2019.
Reported experimental result from simulations on Baltimore data giving mean annual DIR up to 15,714 for 2019.
We compute four monthly bias metrics across 264 city-year-mode observations: the Disparate Impact Ratio (DIR), Demographic Parity Gap, Gini Coefficient, and a composite Bias Amplification Score.
Statement of metrics computed and the number of observations (264 city-year-mode observations) reported in the paper.
The study uses 145,000+ Part 1 crime records from Baltimore (2017–2019) and 233,000+ records from Chicago (2022), augmented with US Census ACS demographic data.
Reported dataset sizes and data sources in the paper (crime records from Baltimore and Chicago; ACS demographic augmentation).
We present a reproducible simulation framework that couples a Generative Adversarial Network (GAN) with a Noisy OR patrol detection model to measure how racial bias propagates through the full enforcement pipeline from crime occurrence to police contact.
Description of methods in paper: coupling a GAN (CTGAN) for synthetic crime generation with a Noisy OR detection/patrol model; method-level claim rather than a numerical result.
Empirical simulations of five game scenarios (ranging from repeated prisoner's dilemma to stylized repeated marketing promotion games) validate the theoretical predictions: AI agents naturally exhibit the proposed reasoning patterns and attain stable equilibrium behaviors intrinsically.
Simulation experiments reported in the paper across five distinct game scenarios; these simulations are presented as empirical validation of the theoretical results.
Relaxing the common-knowledge payoff assumption—allowing stage payoffs to be unknown and each agent to observe only its own privately realized stochastic payoffs—still yields the same on-path Nash convergence guarantee.
Theoretical extension/proof in the paper showing convergence results hold under private, stochastic stage payoffs (no common-knowledge of payoffs).
We prove that 'reasonably reasoning' agents—agents capable of forming beliefs about others' strategies from previous observation and learning to best respond to these beliefs—eventually behave along almost every realized play path in a way that is weakly close to a Nash equilibrium of the continuation game.
Formal theoretical proof provided in the paper (mathematical analysis of agent belief-formation and best-response learning leading to on-path closeness to Nash equilibria).
Off-the-shelf reasoning AI agents can achieve Nash-like play zero-shot, without explicit post-training.
Stated claim in the paper supported by a combination of theoretical results (formal proofs about convergence properties of 'reasonably reasoning' agents) and empirical simulations across five game scenarios (including repeated prisoner's dilemma and stylized repeated marketing promotion games).
This paper employs large language models to conduct semantic analysis on the text of annual reports from Chinese A-share listed companies from 2006 to 2024.
Methodological statement in the abstract describing use of LLM-based semantic analysis on annual report texts spanning 2006–2024.
The paper recommends that the government design targeted support tools to 'enhance market returns and alleviate financing constraints', adopt a differentiated regulatory strategy, and establish a disclosure mechanism combining 'professional identification and reputational sanctions' to curb peer AI washing behaviour.
Policy prescriptions derived from empirical findings and simulation results reported in the paper; presented as recommendations in the abstract.
Simulation results indicate that a combination of policy tools can effectively improve market equilibrium (mitigating the negative effects of AI washing).
Simulation exercises reported in the paper (model specification not provided in abstract) testing policy tool combinations and their effects on market equilibrium.
The paper proposes design principles for effective, accountable, and adaptive sandboxes to contribute to debates on experimentalism in AI governance.
Stated contribution of the paper (descriptive claim about content; abstract does not list the principles or empirical testing).
Regulatory sandboxes (RSs) have emerged as a potential solution to AI regulatory challenges.
Descriptive observation and normative framing within the paper; contextual reference to the EU AI Act's treatment of sandboxes (no empirical sample reported in the abstract).
External inputs that bypass internal filtering shorten recognition delays (i.e., speed up detection of regime shifts).
Model extensions/analysis showing that when some inputs are allowed to bypass internal exclusion mechanisms, the dynamics of anchor updating detect regime changes faster; result comes from theoretical model manipulations, not empirical testing.
Immediate practical steps include improved documentation, stakeholder audits, and multi‑metric evaluation; medium‑term steps include standards for participatory evaluation and tooling for transparency and monitoring; long‑term steps include institutional governance, interoperable safety APIs, and public‑interest evaluation infrastructure.
Prescriptive roadmap in the paper based on conceptual analysis and prior literature; these are recommended policy/program milestones rather than empirically validated interventions.
Transparency (detailed documentation of data, objectives, evaluation processes, and deployment constraints; audit and contest mechanisms) is a necessary mechanism for accountable alignment.
Normative and practical argumentation supported by prior work on model cards, documentation standards, and auditing; no new audits are presented in the paper.
Pluralistic evaluation—using multiple, diverse evaluation criteria and stakeholder‑informed metrics rather than single aggregated alignment scores—will better capture the values and harms at stake.
Argumentative rationale and literature synthesis advocating multi‑metric evaluation approaches; examples from prior evaluation critiques are referenced rather than new empirical comparison.
The Flourishing–Justice–Autonomy (FJA) framework should guide alignment efforts, emphasizing (1) Flourishing (human well‑being and meaningful opportunities), (2) Justice (distributional fairness and protection of vulnerable groups), and (3) Autonomy (informed choice and user control).
Prescriptive proposal grounded in conceptual analysis and synthesis of ethical and technical literature; the paper defines and motivates the three principles as its core normative contribution.
The positive spillover effects of CAFTA on third‑country agricultural imports are concentrated in medium and large firms.
Heterogeneity analysis using firm‑size subgroup DID estimates derived from the China Industrial Enterprise Database (2000–2014) showing stronger effects for medium and large enterprises.
CAFTA induced spillovers that significantly increased China's agricultural imports from non‑ASEAN (third) countries.
Difference‑in‑differences (DID) estimation exploiting CAFTA as an exogenous shock; import outcomes drawn from China Customs Database 2000–2014; robustness checks reported (mediator tests and subgroup analyses).
The report issues seven policy recommendations grouped into three goals: (1) improve understanding of the emerging threat, (2) strengthen defenses, and (3) ensure responsible development and deployment.
Policy synthesis based on threat analysis and governance review (report-authored recommendations; descriptive).
The main results are robust to inclusion of firm, industry, and year fixed effects, DID identification using the 2018 SCD pilot, and multiple robustness checks addressing potential confounders and endogeneity.
Authors report baseline regressions with firm/industry/year fixed effects, DID specifications exploiting the 2018 Supply Chain Innovation and Application Pilot Program as a quasi-natural experiment, and a battery of robustness tests (alternative specifications, controls, and checks).
The positive effect of SCD on green innovation is stronger for substantive green innovation (actual environmentally beneficial R&D and technologies) than for strategic green innovation (symbolic/labeling or reputation‑oriented activities).
Heterogeneous outcome analysis splitting green innovation into 'substantive' (e.g., green patents, technological R&D outputs) versus 'strategic' (signaling/compliance indicators); regression and DID estimates show larger and statistically significant coefficients for substantive measures compared to smaller or weaker effects on strategic measures.
Supply chain digitalization (SCD) significantly increases corporate green innovation among Chinese A-share listed firms (2012–2022).
Panel analysis of Chinese A-share listed firms over 2012–2022 using regression models with firm, industry, and year fixed effects; difference-in-differences (DID) identification exploiting the 2018 Supply Chain Innovation and Application Pilot Program as an exogenous shock to SCD; firm-level controls included; multiple robustness checks reported.
Algorithmic transparency and interpretability are important so investors and regulators can understand how ESG inputs affect automated decision systems.
Normative recommendation grounded in literature on model risk, accountability, and regulatory needs; not an empirical finding but a consensus implication of reviewed work.
Research priorities include empirically quantifying AI's effects on productivity, wages, inequality, and environmental costs; developing standardized sustainability and governance metrics; and evaluating regulatory impacts on innovation and welfare.
Stated research agenda based on gaps identified in the narrative review; identifies directions for future empirical work rather than presenting new empirical findings.
AI has progressed from symbolic systems to data-driven, generative architectures and large-scale computational infrastructures, becoming a foundational technology across sectors.
Narrative synthesis of historical and technical literature across AI research and innovation studies; qualitative tracing of architectural shifts (symbolic → statistical → deep learning/generative models) and increased deployment across industries. No original empirical measurement or sample size reported in this paper.
MYRIAD-EU synthesizes progress and remaining challenges and proposes concrete directions for continued research and practice in multi-hazard, multi-risk DRR.
Overall project scope: synthesis and reflection on interdisciplinary research and practice conducted across MYRIAD-EU (2021–2025), as reported in the paper.
MYRIAD-EU conducted in-depth, place-based case studies co-produced with local stakeholders to test methods and tools for multi-risk assessment.
Reported methods include in-depth place-based case studies co-produced with local stakeholders as part of MYRIAD-EU activities (2021–2025).
The main results are robust to inclusion of controls and a range of heterogeneity and moderation checks, supporting that findings are not driven by simple time trends or obvious confounders.
Reported robustness checks in the staggered-DID framework (control variables, alternative specifications, subgroup tests) and discussion of parallel-trends assumption.
Implementation of urban green data center pilot policies leads to measurable improvements in firms' energy utilization efficiency.
Staggered-adoption difference-in-differences (DID) using an unbalanced firm–year panel of Chinese A-share listed firms linked to prefecture-level cities (2012–2024); treatment is timing/location of urban green data center pilot designation; results reported as statistically significant and robust to controls and alternative specifications.
Mechanisms linking digital services to export performance include reduced transaction and search costs, platform network and scale effects, data as an input improving service quality and customization, and task‑level specialization changing comparative advantage.
Conceptual/theoretical synthesis drawing on multiple strands of literature and illustrative case studies presented in the review (no new causal identification).
Digital services trade is shifting from traditional cross‑border delivery toward online, platform‑based models, with cross‑border data flows a core input and determinant of competitiveness.
Integrative literature and policy review synthesizing domestic and international studies; theoretical/conceptual synthesis and cited case examples (no new econometric analysis or primary microdata).
Policy recommendations include standards on explainability, audit trails, certification for finance/tax AI systems, stronger data governance, and public–private coordination to update regulatory guidance.
Paper's policy and governance recommendations drawn from case findings and literature synthesis; prescriptive content rather than evaluated interventions.
Deployments should build governance, explainability, and auditability into systems and start with pilots on high-volume, well-structured tasks before scaling.
Paper recommendations based on case experience and analytic framing; advocated strategy rather than empirically validated at scale within the paper.
To mitigate risks and realize benefits, AI systems in finance/tax should combine AI with human-in-the-loop controls and clear escalation paths.
Prescriptive recommendation grounded in case lessons and literature on safe AI deployment; presented as a best-practice guideline rather than tested intervention.
Technical building blocks leveraged in these deployments include large language models (LLMs), OCR plus structured information extraction, retrieval-augmented generation (RAG) and knowledge bases, and process automation/RPA.
Explicit technical characteristics section and case descriptions in the paper identify these components as core to implementations.
Generative AI is used for risk control and audit functions, including real-time monitoring, fraud detection, KYC/AML screening, and automated exception reporting.
Reported use-cases in the two case organizations and corroborating industry reports discussed in the literature review portion of the paper.
For tax declaration, generative AI enables extraction of tax-relevant facts from invoices and contracts, drafting of tax returns, compliance checks, and scenario simulations.
Case examples and literature synthesis describing OCR + information extraction and LLM-assisted drafting workflows used in practice.
Generative AI is applied to fund management tasks such as cashflow forecasting, anomaly detection, and automated workflows for payments and collections.
Case descriptions and technical mapping in the paper showing implementations at the sharing center and professional services firm level.
Accounting automation use-cases include automated bookkeeping, reconciliations, journal entry suggestion, and error detection using LLMs and document understanding.
Detailed scope mapping and case examples in Xiaomi and Deloitte illustrating these accounting applications; supported by literature review of technical capabilities.
Realizing those AI-driven gains in Vietnam requires legal and institutional redesigns.
Close reading of Vietnam's constitutional provisions, administrative statutes, procedural rules and judicial doctrine (doctrinal legal analysis) combined with comparative lessons from other jurisdictions; no quantitative data.
A supplemental theological differentiator probe achieved perfect rank-order agreement between the two ceiling judges (Spearman rs = 1.00), supporting judge reliability for the ceiling probe.
Reported Spearman rank correlation rs = 1.00 between Gemini Pro and Copilot Pro on the theological differentiator probe used as a reliability check.
Rigorous research priorities include randomized controlled trials with long-run follow-ups, cost-effectiveness studies, structural adoption models, and validated metrics for feedback quality and learning durability.
Actionable research recommendations produced by the 50-scholar interdisciplinary meeting; prescriptive synthesis rather than empirical results.
Different model families (Sonnet 4.6 vs. Opus 4.6) exhibit stable, systematic differences in methodological preferences and choice patterns—distinct empirical 'styles'.
Comparison of choice patterns and methodological decisions across agents instantiated with Sonnet 4.6 versus Opus 4.6 within the 150-agent experiment, showing consistent between-family differences in measure selection and estimation procedures.
Agents split on measure choice (e.g., autocorrelation vs. variance-ratio tests; dollar-volume vs. share-volume measures), producing different substantive estimates from the same raw data and hypotheses.
Observed categorical divergences in measure selection across the 150 agents during independent analyses of SPY TAQ (2015–2024); documented alternative test/measure families and corresponding divergent effect estimates for the six hypotheses.
AI-to-AI variation (nonstandard errors, NSEs) across autonomous coding agents produces substantial uncertainty in empirical results analogous to human researcher heterogeneity.
Experimental results from 150 autonomous Claude Code agents (two model families: Sonnet 4.6 and Opus 4.6) independently analyzing the same SPY TAQ data (NYSE TAQ, 2015–2024) on six pre-specified hypotheses; recorded agent-to-agent variation in methodological choices and resulting effect estimates (dispersion measured via IQR and related diagnostics).
Observations span multiple agent platforms (Moltbook, The Colony, 4claw) with more than 167,000 agents interacting as peers.
Author-reported coverage from naturalistic observations across the named platforms during the one-month observation window; count reported as ≈167k agents.
Structured argumentation frameworks make chains of inference inspectable and machine-checkable, improving transparency and verifiability of AI outputs.
Argument from formal properties of AFs and representation; no empirical user studies but relies on known formal semantics.