Evidence (7448 claims)
Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 378 | 106 | 59 | 455 | 1007 |
| Governance & Regulation | 379 | 176 | 116 | 58 | 739 |
| Research Productivity | 240 | 96 | 34 | 294 | 668 |
| Organizational Efficiency | 370 | 82 | 63 | 35 | 553 |
| Technology Adoption Rate | 296 | 118 | 66 | 29 | 513 |
| Firm Productivity | 277 | 34 | 68 | 10 | 394 |
| AI Safety & Ethics | 117 | 177 | 44 | 24 | 364 |
| Output Quality | 244 | 61 | 23 | 26 | 354 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 168 | 74 | 37 | 19 | 301 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 89 | 32 | 39 | 9 | 169 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 106 | 12 | 21 | 11 | 151 |
| Consumer Welfare | 70 | 30 | 37 | 7 | 144 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 75 | 11 | 29 | 6 | 121 |
| Training Effectiveness | 55 | 12 | 12 | 16 | 96 |
| Error Rate | 42 | 48 | 6 | — | 96 |
| Worker Satisfaction | 45 | 32 | 11 | 6 | 94 |
| Task Completion Time | 78 | 5 | 4 | 2 | 89 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 17 | 9 | 5 | 50 |
| Job Displacement | 5 | 31 | 12 | — | 48 |
| Social Protection | 21 | 10 | 6 | 2 | 39 |
| Developer Productivity | 29 | 3 | 3 | 1 | 36 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Skill Obsolescence | 3 | 19 | 2 | — | 24 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Labor Share of Income | 10 | 4 | 9 | — | 23 |
Guerreiro et al. (2022) characterize optimal Mirrleesian tax system with automation and find that robot taxes should be transitional—high when incumbent workers cannot retrain, converging to zero as new cohorts adjust skill investments.
Citation reported in the paper summarizing Guerreiro et al. (2022)'s theoretical result on transitional robot taxes.
If labor becomes economically redundant, the policy focus shifts from steering innovation to redesigning public finance and redistribution (e.g., new tax instruments, redistribution mechanisms).
Theoretical scenario analysis in the paper with references to related works (Korinek and Juelfs 2024; Korinek and Lockwood 2026).
Evaluation is carried out under three frozen context configurations (diff only: config_A; diff with file content: config_B; full context: config_C) enabling systematic ablation of context provision strategies.
Methodological description: three fixed context configurations defined and used for ablation experiments.
Traffic performance is evaluated using the Fundamental Diagram (FD) under varying driver heterogeneity, heterogeneous time-gap penetration levels, and different shares of RL-controlled vehicles.
Description of experimental/evaluation setup in the paper: macroscopic evaluation via Fundamental Diagram across varied scenario parameters. No numeric sample size provided in the claim text.
CriQ is a sister app to Dream11, India's largest fantasy sports platform with over 250 million users.
Descriptive statement in the paper providing context about the application domain and user base.
We performed an extensive evaluation of 37 state-of-the-art Vision-Language Models on MultihopSpatial.
Empirical evaluation described in the paper listing the number of models evaluated (37).
We critically compare LLM-generated rulings against 10,000 real-world court judgments from China Judgments Online (CJOL).
Dataset statement: the paper compares model outputs to a corpus of 10,000 CJOL labor dispute judgments.
We introduce a novel stress test that evaluates LLM-generated labor dispute outcomes by injecting social media sentiment as an external pressure.
Methodological description in the paper: a designed stress test where social media sentiment is used to perturb LLM outputs for labor dispute cases.
The paper treats data as a new type of production factor and endogenizes it within the production function.
Theoretical/methodological: the paper constructs a macro-level theoretical model that explicitly includes data as an endogenous input in the production function (no empirical/sample data).
In the near term, the most plausible equilibrium is bounded autonomy, in which AI agents operate as supervised co-pilots, monitoring systems, and constrained execution modules embedded within human decision processes.
Theoretical argument and forward-looking assessment by the authors based on the proposed framework and plausibility considerations; not presented as the result of a causal empirical study in the excerpt.
Economic evaluations of GLAI should account for end-to-end risk externalities (error propagation, institutional trust, rights impacts), not only short-term productivity gains.
Methodological recommendation grounded in conceptual synthesis of technical, behavioral, and legal risks; normative argument rather than empirical result.
Generative Legal AI (GLAI) systems are built on token-prediction (LLM) architectures rather than formal legal-reasoning architectures.
Conceptual and technical analysis in the paper distinguishing GLAI from other legal-tech; literature synthesis on common LLM architectures. No original empirical dataset or sample size—qualitative/technical review.
The paper's formalism shows that prompt/system messages shape distributions over possible execution paths (indirect control) but do not evaluate actual partial paths at runtime.
Formal mapping in the paper that treats prompts as shaping prior over paths; conceptual argument and illustrative examples.
Through a thematic review of existing research, the authors identified recurring themes about incentive schemes: their components, how researchers manipulate them, and their impact on research outcomes.
Authors' stated method and findings: thematic review (the scope/number of reviewed papers not specified in excerpt).
A critical aspect of conducting human–AI decision-making studies is the role of participants, often recruited through crowdsourcing platforms.
Claim based on the authors' thematic literature review noting participant sourcing practices (specific studies and counts not given in excerpt).
Researchers conduct empirical studies investigating how humans use AI assistance for decision-making and how this collaboration impacts results.
Statement summarizing the research landscape; supported implicitly by the authors' thematic review of existing empirical studies (number of studies not specified in excerpt).
The study provides empirical evidence specific to a small open EU economy (Slovakia) on the relationship between AI adoption and labour productivity.
Use of harmonised Eurostat enterprise and productivity data for Slovakia and EU27 over 2021–2024, analysed with descriptive statistics, gap analysis, dynamics of change, correlation, and an illustrative regression model.
Returns to AI are heterogeneous across firms; estimating treatment effects requires attention to selection, complementarities, and dynamic adoption pipelines.
Methodological argument referencing treatment-effect literature and observed firm heterogeneity; supported by conceptual examples rather than a single empirical treatment-effect estimate.
A subset of four datasets included settings in which the AI provided explanations of its decision.
Paper states that four of the datasets involved AI explanations (explicitly stated in abstract).
The study compared HCT to the AI-as-advisor approach using 10 datasets from various domains, including medical diagnostics and misinformation discernment.
Paper reports an empirical comparison across 10 datasets spanning multiple domains (explicitly stated in abstract).
The hybrid confirmation tree (HCT) elicits a human judgment and an AI judgment independently; if they agree that decision is accepted, and if they disagree a second human breaks the tie.
Description of the HCT method in the paper (procedural/design specification).
This chapter is based on a systematic literature review using the PRISMA framework and includes a thematic analysis followed by a bibliometric coupling of 23 documents from the Scopus database.
Methodological statement in the paper: systematic literature review using PRISMA, thematic analysis, bibliometric coupling; sample drawn from Scopus; 23 documents.
The cross-sectional, self-reported survey design prevents strong causal claims about the effect of algorithms or selective exposure on polarization.
Authors explicitly note methodological limitations: cross-sectional survey of N = 450, reliance on self-reported consumption, and lack of platform log or longitudinal/experimental data.
The study adopted a positivist philosophy and a descriptive-correlational design.
Methods section statement in the paper describing the research philosophy and study design.
Data were collected from innovation-focused executives across 39 licensed Kenyan commercial banks.
Paper statement specifying sample source: 'Using data from innovation-focused executives across 39 licensed banks.'
Technological innovation was assessed via adoption of new systems, integration of digital channels, and use of Artificial Intelligence and data analytics.
Measurement description provided in the paper listing the components used to operationalize technological innovation.
Competitiveness in the study was measured through market share, return on equity and customer satisfaction.
Measurement description provided in the paper describing dependent variable operationalization (explicit list of three indicators).
Metode penelitian yang digunakan adalah penelitian hukum normatif dengan pendekatan perundang-undangan, konseptual, dan komparatif, didukung oleh analisis literatur dari jurnal nasional terindeks SINTA dan jurnal internasional bereputasi.
Pernyataan metode yang jelas tercantum dalam abstrak/metodologi makalah.
Penelitian menilai kecukupan perlindungan hukum yang tersedia bagi pekerja terdampak PHK akibat adopsi AI.
Pernyataan tujuan penelitian dan pendekatan analitis (normatif, komparatif) yang didukung oleh tinjauan literatur pada jurnal-jurnal terpilih.
Penelitian ini bertujuan menganalisis bagaimana Undang-Undang Cipta Kerja dan peraturan turunannya mengklasifikasikan dan menjustifikasi Pemutusan Hubungan Kerja (PHK) akibat adopsi AI.
Pernyataan tujuan penelitian yang tercantum di bagian metodologi/pendahuluan; pendekatan peraturan-perundang-undangan dalam penelitian hukum normatif.
The user study had N=50 participants.
Reported user study sample size (N=50) used to evaluate AI-assisted intent expansion in ecologically valid settings.
Under the current evaluation resolution, 5W3H, CO-STAR, and RISEN achieve similarly high goal-alignment scores, suggesting that dimensional decomposition itself is an important active ingredient.
Controlled comparison between three structured frameworks (5W3H, CO-STAR, RISEN) across the evaluated outputs, with no meaningful differences reported between them.
The study evaluated 3,240 model outputs (3 languages x 6 conditions x 3 models x 3 domains x 20 tasks) using an independent judge (DeepSeek-V3).
Reported experimental design and evaluation: 3 languages, 6 conditions, 3 models, 3 domains, 20 tasks; judged by DeepSeek-V3.
The paper frames the LLM-politician relationship through principal-agent theory and bounded rationality, conceptualizing the legislator as a principal delegating advisory tasks to a boundedly rational agent under structural information asymmetry.
Explicit theoretical framing described in the introduction or theory section of the paper.
Model outputs were evaluated using a dual framework combining LLM-as-Judge semantic scoring and programmatic text similarity metrics.
Paper describes the evaluation methodology: semantic scoring via LLM-as-Judge plus programmatic text similarity measures applied to model-generated rationales vs official memoranda.
Six LLMs were evaluated: GPT-5-mini, GPT-5-chat (OpenAI), Claude Haiku 4.5 (Anthropic), and Llama 4 Maverick, Llama 3.3 70B, Llama 3.1 8B (Meta).
Paper explicitly lists the six evaluated models spanning three provider families and multiple capability tiers.
The study uses a dataset of 15 Romanian Senate law proposals paired with their official explanatory memoranda (expuneri de motive).
Explicit statement in the paper describing the dataset composition: 15 Romanian Senate law proposals each paired with its official explanatory memorandum.
We implement a rigorously controlled execution-based testbed featuring Git worktree isolation and explicit global memory to evaluate agent coordination frameworks.
Methodological description in the paper indicating the testbed design choices (Git worktree isolation, explicit global memory) used to ensure controlled, reproducible execution of agent-generated code.
We benchmark a single-agent baseline against two multi-agent paradigms: a subagent architecture (parallel exploration with post-hoc consolidation) and an agent team architecture (experts with pre-execution handoffs) using a rigorously controlled, execution-based testbed.
Description of experimental setup in the paper: an execution-based testbed with Git worktree isolation and explicit global memory; experiments explicitly compare single-agent, subagent, and agent-team architectures under fixed computational time budgets.
Rather than proposing new recognition models, the contribution focuses on a system-level comparison of both paradigms under realistic edge constraints.
Stated scope in the abstract: the paper emphasizes system-level comparison instead of introducing new recognition models, demonstrated via the described hybrid system and evaluations.
The system is implemented on a GPU-enabled edge device and evaluated with respect to latency, resource usage, and operational trade-offs using a demonstrator-based setup.
Authors state implementation on a GPU-enabled edge device and describe evaluation of latency, resource usage, and operational trade-offs in a demonstrator-based experiment. The abstract does not include numeric metrics or sample sizes.
Data construction: The authors treat Wikipedia technology pages as distinct technologies and trace them across patents and job postings from 1976 to 2007, using technical bigrams to identify technologies in texts.
Description of dataset construction building on Kalyani et al. (2025) in Section 2; methodological description of linking Wikipedia pages, patent text, and job postings.
Proposition 1: With a constant pace of technology creation (m(b)=m), the model admits a unique balanced growth path (BGP) along which real wages and output grow at rate g, the skill premium remains constant and is independent of m.
Analytical result (proposition) proved in the paper's model appendix under model assumptions.
The modal technology in the top 1% densest locations (e.g., New York, San Francisco) is 34 years old, while the modal technology in the bottom 50% lowest-density locations is 48 years old, indicating sizable diffusion gaps.
Empirical measurement from the text-based technology dataset tracking vintage of technologies across locations; reported modal ages by location density percentile.
Limitations: the Comscore data observe household internet activity on home (non-mobile) devices and do not capture offline or mobile device activities, so extrapolation to total at-home activities should be done with caution.
Authors' explicit limitation discussion in paper stating data do not include mobile devices or offline activities.
ChatGPT adoption leaves the total time spent on productive online activities (including any time spent using ChatGPT) unchanged.
Same IV long-difference estimates as above; authors state 'leaving time spent on productive digital tasks unchanged' and that total productive activity time does not decline significantly.
The analysis uses detailed Internet browsing microdata from over 200,000 U.S. households' home devices from 2021 to 2024.
Comscore web browsing panel described in paper; authors state dataset covers 'over 200,000 U.S. households' across 2021-2024; data provides timestamps, visit durations, URLs, demographic bins, etc.
The present review examined the intersection of artificial intelligence, sustainable finance, ESG performance, FinTech, climate risk analytics, algorithmic governance, and responsible investing.
Statement of the paper's scope and aims (description of the review content and topics covered).
The literature on AI-based ESG scoring, green finance, and data-driven sustainability reporting is disjointed across finance, management, and technology fields and requires application of the PRISMA framework to provide transparency and methodological rigor in systematic reviews.
Paper's methodological assessment and recommendation based on the authors' systematic review process and literature mapping (statement about the state of the literature and methodological needs). No numeric evidence provided in the excerpt.
The analysis draws on data from 170 countries for 2020–2024 for the Government AI Readiness Index (GAIRI)–EGDI comparison.
Data description in abstract explicitly reporting the GAIRI–EGDI sample coverage as 170 countries for 2020–2024.