Evidence (14055 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Each of the four published papers used in the experiments contained an error that I helped identify or correct.
Author statement that the 4 papers each contained an error; author involvement in identification/correction is asserted.
I conducted experiments in which I asked several AI models (Gemini, Refine, Claude, and ChatGPT) to check the correctness of four published papers in economic theory.
Author reports running direct experiments: prompted listed models to check 4 published economic-theory papers.
The paper proposes a five-pillar diagnostic framework combining fundamental valuation, residual-exuberance tests, SADF/GSADF explosive-root procedures, LPPL/HLPPL price-pattern diagnostics, sentiment and issuance measures, and capex-payback analysis.
Methodological proposal presented in the paper (framework description); this is a stated contribution rather than an empirical result.
From Codeforces histories we build an AI-prompt signature characterised by more first-attempt acceptances and fewer attempts and retries, consistent with AI-assisted practice.
Empirical construction from CF submission histories (pattern: increased first-try accepts, fewer retries). Method: analysis of historical submission logs; sample size not stated in abstract.
The International Collegiate Programming Contest (ICPC) and the International Olympiad in Informatics (IOI) prohibit AI under proctoring and admit entrants through qualification rounds, whereas online Codeforces (CF) contests are unproctored and open to all.
Descriptive factual claim about contest rules and formats (institutional description in paper); based on contest rules and organizational formats referenced by authors.
Future research should adopt a more intersectional approach exploring how race, class, and geography interact with gender to shape platform work experiences.
Research limitations and implications section of the paper recommends more intersectional research directions.
This paper conducted a systematic literature review and thematic synthesis of 48 peer‑reviewed studies (2010–2024) to analyze the gendered dynamics of AI‑mediated digital labor.
Methods statement in the paper: systematic literature review and thematic synthesis; explicitly reports reviewing 48 peer‑reviewed studies covering 2010–2024.
We evaluate the system on operator feedback and a question set collected from production usage, graded by human and automated panels.
Paper's stated evaluation methodology: operator feedback + production question set, graded by humans and automated panels.
There is a need to examine the impacts of LLM on workers in jobs where the technology is prominent.
Recommendation in the paper's conclusion based on the observed concentration of LLM exposure in lower-precarity occupations.
These occupations (those with higher LLM exposure and lower precariousness) have previously been sheltered from technological change.
Statement in the paper's conclusion asserting that occupations with higher LLM exposure are ones historically sheltered from technological change (no specific empirical evidence provided in abstract).
The study used Canada's Labour Force Survey, developed a multidimensional index summarizing occupational exposure to precarity (contractual instability, earnings inadequacy, schedule unpredictability, working-time mismatch), and estimated associations using four multivariate linear regression models with cluster-robust standard errors plus a fifth model for the multidimensional index.
Methods description in abstract specifying data source (Canada's Labour Force Survey), index construction, and multivariate linear regression models with cluster-robust standard errors.
This study benchmarks Algeria’s readiness to adopt AI against Morocco, Egypt, and Turkey using data from the World Bank (2022), the Oxford Insights Government AI Readiness Index, and sector-specific studies.
Methodological statement in the paper specifying data sources used for the comparative assessment (World Bank 2022, Oxford Insights index, sector studies).
Over 100 participants collaborated with one of four frontier models (Claude-Opus-4.6, GPT-5.4, Gemini-3.1-Pro, and MiniMax-M2.7) on a long-horizon coding task lasting around five hours.
Study description: experimental participants (reported as "Over 100 participants") each paired with one of four named models on a ~5-hour coding task designed to mimic real-world workflows.
We conduct the first large-scale study of human oversight in AI coding sabotage.
Authors state they ran a large-scale user study; described as the first such study focused on human oversight in AI coding sabotage (methodological claim).
The study uses four waves of data from the China Family Panel Studies (CFPS) from 2022 to 2025, constructs an individual-level indicator of the skill wage gap, and adopts an occupational task automation exposure index as a proxy variable for technological shocks.
Authors report using four waves of CFPS (2022–2025); they state they constructed an individual-level skill-wage-gap indicator and use an occupational task automation exposure index as the proxy for technological shocks (methodological description in paper).
The article aims to provide systematic literature support for subsequent research and adaptive policy formulation.
Statement of the paper's stated objective; methodological and policy-intent claim from the authors.
This article is based on a systematic literature review and summarizes the four core theoretical mechanisms of substitution, complementarity, new task creation, and skill mismatch.
Methodological claim from the paper: the authors conducted a systematic literature review and identified these four theoretical mechanisms.
Traditional software and agentic systems are distinct: in traditional software code is the carrier of decision logic, whereas in agentic systems code is ephemeral tooling used by an LLM-driven reasoning loop.
Formalization and conceptual definitions developed in the paper (first-principles formal distinction; no empirical sample size reported).
For over half a century, software engineering has operated on a foundational premise: human engineers decompose problems, encode decision logic into static code, and manually adapt that code as requirements evolve.
Historical/descriptive claim presented in the paper's framing and literature review; citation of longstanding software engineering practices (qualitative, no empirical sample size reported).
We implement a two-stage processing architecture separating document-level extraction (Stage 1) from claim-level synthesis (Stage 2).
Implementation description in paper: architecture design and pipeline stages described by the authors.
The study introduces a methodological framework for evaluating LLM citation behaviors, integrating information retrieval theory, semantic search optimization, and structured content engineering.
Explicit claim about the paper's contribution: introduction of a methodological framework combining IR theory, semantic search, and structured content engineering. This is a factual statement about the paper's content (no sample size reported in excerpt).
Traditional SEO strategies have historically focused on keyword density, backlink authority, and ranking positions within search engine results pages (SERPs).
Descriptive claim about historical SEO practices presented as background/context in the paper; based on domain knowledge and literature references (no new empirical data reported in the excerpt).
We extend the representation-completion principle to device cold-start by constructing cohort-based embeddings from demographic features.
Methodological extension described in paper (approach for device cold-start handled via cohort-based demographic embeddings).
We propose Shallow-RHS, an asymmetric link-prediction architecture in which the left-hand side (LHS) device tower leverages temporally valid watch-history message passing to capture collaborative signals, while the right-hand side (RHS) content tower is intentionally shallow and encodes content solely from intrinsic features.
Model architecture description in paper (design specification; no numeric evaluation included in excerpt).
We formulate cold-start recommendation as an inductive graph-completion problem on a temporal bipartite device-content graph.
Methodological framing presented in the paper (problem formulation).
In Tubi's production retrieval system, new content must be assigned a standalone embedding immediately, and the model must also produce device embeddings suitable for approximate nearest-neighbor retrieval.
Description of production serving constraints in Tubi stated in paper (system design / operational constraint).
In neither unit did internal control mechanisms identify any information-security incident, sensitive-data leakage, or formal compliance challenge from external oversight bodies during the period examined.
Author reports absence of recorded incidents in internal control mechanisms and no external oversight challenges for both units over the study period; based on internal records and SEI-GDF auditable indicators.
Verified word-count analysis of the Executive Order shows the word 'security' appears 17× and the word 'cyber' appears 14×, while there are zero mentions of 'labor', 'education', 'culture', 'fairness', 'transparency', 'attribution', 'provenance', 'meaning', or 'commons'.
Automated/count-based analysis of the EO text (single-document word-count reported in the paper).
The aggregate Stanford HAI AI Vibrancy Score shows no significant within-country effect on tourism’s direct GDP share after controlling for macroeconomic factors.
Fixed-effects estimation with clustered standard errors on panel data from 33 countries (2017–2023); reported coefficient β = 0.061, p = 0.622, with macroeconomic controls.
The study integrates ICT4D, socio-technical systems theory, and the capability approach as its theoretical framing.
Methodological/theoretical statement in the paper describing the integrative framework used for analysis.
While grounded in the DRC, the findings offer broader insights into AI adoption dynamics across informal economies in Sub-Saharan Africa and beyond.
Authors' claim of broader relevance/generalizability based on the DRC case study and theoretical framing.
AI adoption in the DRC emerges through hybrid socio-technical interactions between bottom-up youth innovation and weakly coordinated institutional frameworks, rather than following policy-led or infrastructure-first trajectories.
Theoretical integration (ICT4D, socio-technical systems, capability approach) and qualitative interview evidence used to characterize observed adoption pathways.
The article introduces 'compressed professionalization', defined as the accelerated acquisition and immediate market enactment of professional-level digital capabilities outside formal institutional pathways.
Conceptual/theoretical contribution presented and defined in the paper, supported by illustrative field observations from the interviews.
The study drew on 125 semi-structured interviews conducted in Kinshasa, Lubumbashi, and Goma.
Primary qualitative fieldwork reported in the paper: 125 semi-structured interviews across three DRC cities (Kinshasa, Lubumbashi, Goma).
The research is grounded in the Resource-Based View (RBV) and Dynamic Capabilities Theory (DCT) to explain how technological and managerial resources contribute to organizational performance.
Author statement in the paper describing the theoretical framework (RBV and DCT) used to frame the study.
The study adopts a quantitative research design and analyzes collected data using Partial Least Squares Structural Equation Modeling (PLS-SEM).
Author statement in the paper describing research design and analytical method.
Digital Leadership did not demonstrate a statistically significant direct effect on Employee Productivity (β = -0.094, p = 0.275).
Reported quantitative result from the study using PLS-SEM; β and p-value provided in the paper showing a non-significant direct effect. Sample size not reported in the excerpt.
We scored over 2.1 million twin responses on 500 participants and 183 held-out questions.
Reported evaluation counts in the paper: 2.1M responses, 500 participants, 183 held-out questions.
The construction-method grid covers three open-weight LLMs, five cumulative information depths ranked by normalized Shannon entropy, two embedding methods, and two reasoning modes.
Paper's experimental design specification (methods section).
We construct detailed individual-level twins from the German Socio-Economic Panel (SOEP) and evaluate them across a 3 × 5 × 2 × 2 construction-method grid.
Methodological description of the study: experimental construction and evaluation on SOEP data.
These are mechanism-oriented synthetic results, not estimates of real firm behavior in a jurisdiction or industry.
Explicit qualification in the abstract stating the scope and limits of inference (paper text).
The study uses a synthetic agent-based reinforcement-learning simulation that separates actual conduct near a legal threshold from proximity in the computable enforcement signal.
Methodological description in abstract: ABM/RL simulation with explicit separation of conduct vs. computable signal; run counts reported (150 seed-level scenario runs, 378 computability-sweep runs, 288 Latin-hypercube runs) and a 2,880,000-row firm-period panel.
Ordinary adaptive updates do not reliably reduce boundary search.
ABM/RL simulation experiments reported in the paper (multiple runs and the firm-period panel); qualitative comparative statement from simulation outputs.
There is no evidence of improved win rates for AI-flagged complaints; AI-flagged complaints are more likely to be dismissed and to terminate at earlier procedural phases.
Outcome analysis linking AI-flag status to litigation outcomes (win rates, dismissal rates, termination phase) using case metadata.
A large-scale empirical study on Harvey LAB used 12,510 agent trajectories.
Paper states an empirical study run on Harvey LAB with a sample described as 12,510 agent trajectories.
The paper analyzes multiple dimensions of scientific creativity and impact, specifically recombinant novelty, object novelty, 3-year short-run citation impact, and 10-year long-run citation impact.
Methodological description in paper listing the specific dependent variables and time horizons used to measure novelty and impact.
The analysis draws on over one million publications from OpenAlex.
Descriptive statement in paper specifying dataset source (OpenAlex) and sample size of publications used for analysis.
This study uses panel data from 281 Chinese cities between 2005 and 2022, treats establishment of national GIPs as a quasi‑natural experiment, and applies a double machine learning approach.
Methods description in the paper explicitly states data coverage (281 Chinese cities, 2005–2022), research design (quasi‑natural experiment), and estimation strategy (double machine learning).
Experts rated 24 AI risks on harm probability and severity, sector and actor vulnerability, actor responsibility, and overall concern.
Study design described in paper: set of 24 defined AI risks rated across several dimensions by Delphi panel participants (n=272).
We conducted a three-round Delphi study conducted late 2025 with 272 international AI experts.
Methodological description in the paper: three-round Delphi study, timing reported as late 2025, sample size reported as 272 international AI experts.