Evidence (13827 claims)
Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 195 | 97 | 889 | 1979 |
| Governance & Regulation | 815 | 391 | 188 | 121 | 1539 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 624 | 233 | 123 | 96 | 1084 |
| Research Productivity | 410 | 121 | 56 | 331 | 929 |
| Output Quality | 466 | 177 | 59 | 47 | 749 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 166 | 122 | 24 | 495 |
| Task Allocation | 206 | 64 | 70 | 31 | 376 |
| Skill Acquisition | 165 | 57 | 60 | 17 | 299 |
| Innovation Output | 201 | 27 | 41 | 18 | 288 |
| Employment Level | 105 | 51 | 107 | 13 | 278 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 149 | 46 | 26 | 3 | 224 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 61 | 20 | 12 | 182 |
| Error Rate | 69 | 91 | 10 | 2 | 172 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 92 | 19 | 13 | 19 | 145 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Skill Obsolescence | 5 | 45 | 6 | 1 | 57 |
| Creative Output | 31 | 16 | 7 | 2 | 57 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
PRISM successfully identifies and repairs production regressions caused by LLM behavioral drift within a 24-hour detection window.
Reported result in abstract claiming detection and repair of production regressions within 24 hours during deployment.
PRISM achieves 99% production reliability across all evaluated agents.
Reported quantitative outcome in abstract from the three-week evaluation over 35 agents.
PRISM reduces median prompt authoring time from 2 days to under 30 minutes.
Reported quantitative outcome in abstract from the three-week evaluation across the 35 agents.
We demonstrate its extraterritorial scope for gaining access to elements such as employment contracts and NDAs that have never been provided to the workers concerned.
Reported legal/empirical demonstration in paper: GDPR requests resulting in access to employment contracts and nondisclosure agreements (NDAs) that workers had not previously received. (Exact number of successful requests not stated in the excerpt.)
We audit the working conditions of content moderators in Kenya and Nigeria employed by business process outsourcing (BPO) companies by using the European General Data Protection Regulation (GDPR).
Method reported in paper: use of GDPR data-subject access / information requests to BPOs and platforms to obtain employment-related documents for content moderators in Kenya and Nigeria. (Sample size / number of requests not stated in the excerpt.)
Design principles that promote disagreement and decentralization—contextual grounding, community customization, continual adaptation, and polycentric governance—should be used so oversight is distributed across many legitimate centers rather than centralized in one institutional or moral chokepoint.
Normative design recommendations and governance proposals provided in the paper (argumentative; no empirical governance evaluation reported).
A range of technical directions (e.g., data filtering and upsampling, pre- and post-training, evaluations, collaborative value collection) are relevant for supporting positive alignment across different phases of the LLM and agents lifecycle.
Prescriptive technical recommendations and research directions described by the authors (conceptual proposals, not reported empirical tests).
Several existing failures of alignment (e.g., engagement hacking, loss of human autonomy, failures in truth-seeking, low epistemic humility, error correction, lack of diverse viewpoints, and being primarily reactive rather than proactive) may be better addressed through positive alignment, including cultivating virtues and maximizing human flourishing.
Theoretical argument and illustrative examples presented in the paper (no experimental or observational results reported).
Positive Alignment is a distinct and necessary agenda within AI alignment research.
Normative argumentation in the paper advocating for a separate research agenda (no empirical validation presented).
Positive Alignment is the development of AI systems that (i) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way while (ii) remaining safe and cooperative.
Paper's definitional proposal / conceptual framing (normative definition rather than empirical evidence).
Policy frameworks are necessary to govern verifiable machine intelligence in modern socio-technical infrastructures.
Normative recommendation and policy discussion in the paper; no empirical policy evaluation or legislative case studies are presented in the supplied text.
Process-based supervision has broader implications for algorithmic fairness and can reduce black-box opacity.
High-level discussion in the paper linking process-verifiability to fairness and reduced opacity; no empirical fairness audits or quantitative fairness metrics reported in the provided text.
Integrating reinforcement learning with process-oriented feedback can foster a more transparent AI ecosystem where the path to a conclusion is as scrutinized as the conclusion itself.
Conceptual claim and proposed benefit in the paper; presented as an argument rather than supported by empirical transparency or interpretability studies in the supplied text.
Process-based supervision significantly improves the reliability of models in high-stakes domains such as law, medicine, and engineering.
Asserted by the authors as an advantage of PRMs for high-stakes applications; presented as argumentation rather than backed by reported empirical trials or case-study sample sizes in the provided text.
Optimizing PRMs through reinforcement learning enhances the verifiability and robustness of multi-step reasoning in large-scale model architectures.
Central argumentative claim of the paper (theoretical proposal and conceptual analysis); no experimental results or quantitative evaluation provided in the text supplied.
Process-Based Reward Models (PRMs) assign value to each distinct stage of a reasoning chain, providing a more granular signal for training than outcome-only approaches.
Methodological description and conceptual argument in the paper; described as a design/approach rather than empirically validated with data.
Overall, the study provides a cross-sectoral empirical foundation for understanding how budget flexibility, governance, and technology interact to support resilient financial systems in uncertain economic environments.
Synthesis statement based on the paper's cross-sectoral comparative analysis combining firm 10-K data (four firms), Open Budget Survey, OECD database, GAO reports, and the Flexibility Index.
In the public sector, systems characterized by strong transparency frameworks and Medium-Term Expenditure Frameworks demonstrate higher alignment between planned and actual expenditures.
Cross-sectional analysis using Open Budget Survey 2023, OECD Budget Practices Database, and U.S. GAO oversight reports linking transparency and MTEFs to alignment between planned and actual expenditures.
Firms with decentralized budgeting structures and embedded predictive analytics exhibit lower forecast deviations and faster resource reallocation.
Comparative empirical analysis of four large firms using Form 10-K data (2019–2023) and the Flexibility Index to relate decentralization and AI integration to forecast deviations and reallocation speed.
Methodologically, the study demonstrates how expert reasoning can be operationalized as a benchmark for evaluating AI systems in urban infrastructure contexts, addressing gaps in empirical assessment and governance tools.
Study design: creation of Delphi-derived rubric from 20 experts and its use as an evaluation benchmark for six LLMs; reported as a methodological contribution.
The Delphi process elicited and refined expert reasoning criteria, producing a rubric that emphasized public safety, regulatory compliance, contextual judgment, financial stewardship, and system reliability.
Method: Delphi process with 20 infrastructure professionals that generated and refined reasoning criteria; resulting rubric content reported in paper.
Policymakers should combine support for technological development with strategic investments in finance, trade integration, and public infrastructure to maximize AI's economic benefits and transform its potential into sustainable and inclusive growth.
Policy recommendation derived from the empirical findings (positive AI effects and positive interactions with financial innovation, trade openness, and government consumption) reported for 19 G20 countries (2005–2023) using GMM.
The interaction between AI and government final consumption expenditure helps strengthen economic growth by improving public infrastructure, institutional quality, and capacity to leverage new technologies.
GMM interaction specifications using panel data for 19 G20 countries (2005–2023); reported AI × government final consumption expenditure interaction coefficient is positive and statistically significant, with interpretation linking it to public infrastructure and institutional capacity.
The interaction between AI and trade openness is positive and significant, underscoring the role of international trade in technological diffusion and competitiveness to boost growth.
GMM interaction models on panel data (19 G20 countries, 2005–2023); reported AI × trade openness interaction coefficient is positive and statistically significant.
The interaction between AI and financial innovation has a positive and significant impact on economic growth, indicating that innovative finance mediates AI's technological potential into tangible economic gains.
GMM models with interaction terms using panel data of 19 G20 countries (2005–2023); reported AI × financial innovation interaction coefficient is positive and statistically significant.
AI-related innovation has a positive and significant effect on economic growth (linear model, GMM).
Panel analysis of 19 G20 countries (2005–2023) using the Generalized Method of Moments (GMM) linear model; reported positive and statistically significant coefficient for AI-related innovation.
In an empirical study of the Community Health Centers rollout, estimated spillovers account for a substantial share of the effect on older-adult mortality.
Empirical application reported in the paper applying the proposed methods to the Community Health Centers rollout; estimated spillover component contributes substantially to the measured effect on older-adult mortality (results from observational data analysis).
Monte Carlo simulations show the proposed estimators have small bias for these effects and the associated confidence intervals have coverage close to the nominal level.
Monte Carlo simulation evidence reported in the paper indicating small bias of the proposed estimators and coverage of confidence intervals close to nominal in the simulated settings.
Synthetic scenarios in the paper illustrate that the revised metric distinguishes between frequent low-leverage use, semantically repetitive prompting, and more autonomous, higher-consequence AI-assisted work.
Paper includes synthetic scenario simulations/illustrations demonstrating metric behavior across different usage patterns (synthetic examples; no real-world sample reported).
The authors derive sub-daily update rules and a bounded interpretation layer for estimated efficiency and financial impact from the IIQ metric.
Analytic derivation in the methods: paper presents update rules (sub-daily) and an interpretation layer mapping IIQ to estimated efficiency and financial impact (theoretical derivation / worked examples). No empirical validation sample reported.
The formulation produces a raw Intelligence Adoption Index (IAI) and a normalized 0-1000 IIQ index for comparison between heterogeneous users and units.
Methodological description: authors define a raw IAI and describe normalization to a 0–1000 IIQ scale for comparability (model/specification). No empirical sample reported.
IIQ combines a novelty-weighted, time-decayed token stock with usage frequency, a grace-period recency gate, organizational leverage, task complexity, and autonomy to form its measurement.
Methodological formulation in the paper: component-level specification of the IIQ metric (theoretical specification / algorithmic description). No empirical validation sample reported.
The Intelligence Impact Quotient (IIQ) is a composite metric intended to quantify the depth to which AI systems are integrated into organizational work and their impact.
Paper framing and definition: the authors introduce IIQ as a composite metric and describe its purpose as quantifying AI integration depth and impact (conceptual/methodological description). No empirical sample reported.
Experiments show consistent advantages in viewer engagement.
Reported experimental comparison vs named baselines; claim of consistent advantage in viewer engagement without numeric effect size provided in the excerpt.
Experiments show consistent advantages in tactfulness.
Reported experimental comparison vs named baselines; claim of consistent advantage in tactfulness without numeric effect size provided in the excerpt.
Experiments against GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro, and other baselines demonstrate gains of 18% on factual correctness.
Reported experimental comparison vs named baselines; specific numeric improvement stated (18% gain on factual correctness). Evaluation dataset or sample size not provided in the excerpt.
Experiments against GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro, and other baselines demonstrate gains of 23% on informativeness.
Reported experimental comparison vs named baselines; specific numeric improvement stated (23% gain on informativeness). Evaluation dataset or sample size not provided in the excerpt.
We fine-tune a large language model on this data to deliver empathetic, commercially oriented responses, adapting to viewer intent through empathetic amplification, evidence-backed rebuttal, and humor-mediated deflection.
Methodological contribution: fine-tuning an LLM on the collected annotated data, described in the paper.
We collect and annotate 1,475 live-commerce interactions spanning diverse viewer intents.
Dataset creation reported in the methods: explicitly states 1,475 annotated live-commerce interactions.
We construct a domain knowledge base of product specifications and a curated sales terminology lexicon that anchor product-related responses in verified expertise.
Methodological contribution described in the paper: construction of a domain knowledge base and curated sales terminology lexicon.
A skilled live-commerce host is not merely a narrator, but a sales agent who converts viewer curiosity into purchase intent through expert product knowledge, emotionally intelligent response tactics, and entertainment that serves as a vehicle for product exposure.
Conceptual description in the paper's introduction; no empirical data or experimental method cited in the excerpt.
Das Dokument leistet einen Beitrag zu den laufenden Bemühungen der G7 und der OECD, die Verbreitung innovativer, vertrauenswürdiger und produktivitätssteigernder KI im Einklang mit den KI-Grundsätzen der OECD zu fördern.
Descriptive claim about the paper's intended contribution to G7/OECD efforts and alignment with OECD AI Principles (self-declared by the paper).
Die Erkenntnisse unterstreichen, dass die Regierungen Strategien unterstützen sollten, die die Einführung von KI in KMU beschleunigen und eine digitale Transformation fördern, die allen zugutekommt.
Policy recommendation based on the paper's synthesis of data analysis and case studies; presented as the paper's conclusion (no causal estimate provided in excerpt).
Das Dokument führt eine Taxonomie der KI-nutzenden KMU auf Basis des digitalen Reifegrads, der Komplexität der Nutzung und des Umfangs der Anwendung ein, die darauf abzielt, die Politikgestaltung zu unterstützen.
Descriptive statement that the paper develops a taxonomy (method: taxonomy construction based on those three dimensions); presented as part of the paper's contributions (no empirical validation details in excerpt).
Dieses Diskussionspapier wurde auf Ersuchen der G7-Präsidentschaft vom OECD-Sekretariat erstellt, um Hintergrundmaterial für die Diskussionen der G7 über einen Blueprint zur Einführung von KI in KMU bereitzustellen.
Descriptive statement of the paper's provenance and purpose (administrative/factual about document preparation).
Im Rahmen der G7-Präsidentschaft Kanadas 2025 wurde die beschleunigte Einführung von KI in KMU zu einer Hauptpriorität erklärt.
Factual statement about G7 policy priorities as reported by the paper (administrative/policy fact reported by OECD secretariat).
Künstliche Intelligenz (KI) ist ein vielversprechender Ansatz, um Produktivität und Innovation in Unternehmen, insbesondere kleinen und mittleren Unternehmen (KMU), zu steigern.
Authoritative statement in the paper's summary; based on literature review and general argumentation rather than a specific empirical test reported in this excerpt.
The framework contributes to improving understanding of enterprise coordination and governance under constrained legal conditions and offers a basis for future analytical and empirical research.
Author-stated contribution of the paper based on the developed theoretical framework; positioned as foundation for future work.
The analysis identifies theoretical conditions under which such governance may support verifiable integrity, adaptive compliance, and access to formal markets.
Theoretical conditions derived from the review and theory synthesis (no empirical testing reported in this paper).
The study develops a theory-based framework explaining how RegTech-supported governance may, under specified conditions, enable sanctions-safe enterprise ecosystems during post-conflict reconstruction.
Primary contribution of the paper: theory synthesis built from integrative review of five literature streams (RegTech, sanctions compliance, institutional voids, supply-chain governance, algorithmic accountability).