Evidence (13827 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	195	97	889	1979
Governance & Regulation	815	391	188	121	1539
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	624	233	123	96	1084
Research Productivity	410	121	56	331	929
Output Quality	466	177	59	47	749
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	166	122	24	495
Task Allocation	206	64	70	31	376
Skill Acquisition	165	57	60	17	299
Innovation Output	201	27	41	18	288
Employment Level	105	51	107	13	278
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	149	46	26	3	224
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	61	20	12	182
Error Rate	69	91	10	2	172
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	92	19	13	19	145
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Skill Obsolescence	5	45	6	1	57
Creative Output	31	16	7	2	57
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

PRISM successfully identifies and repairs production regressions caused by LLM behavioral drift within a 24-hour detection window.

Reported result in abstract claiming detection and repair of production regressions within 24 hours during deployment.

high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... time-to-detection/repair of production regressions

PRISM achieves 99% production reliability across all evaluated agents.

Reported quantitative outcome in abstract from the three-week evaluation over 35 agents.

high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... production reliability

PRISM reduces median prompt authoring time from 2 days to under 30 minutes.

Reported quantitative outcome in abstract from the three-week evaluation across the 35 agents.

high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... median prompt authoring time

We demonstrate its extraterritorial scope for gaining access to elements such as employment contracts and NDAs that have never been provided to the workers concerned.

Reported legal/empirical demonstration in paper: GDPR requests resulting in access to employment contracts and nondisclosure agreements (NDAs) that workers had not previously received. (Exact number of successful requests not stated in the excerpt.)

high positive Auditing African Content Moderators' Working Conditions by U... access to employment contracts and NDAs via GDPR (extraterritorial application)

We audit the working conditions of content moderators in Kenya and Nigeria employed by business process outsourcing (BPO) companies by using the European General Data Protection Regulation (GDPR).

Method reported in paper: use of GDPR data-subject access / information requests to BPOs and platforms to obtain employment-related documents for content moderators in Kenya and Nigeria. (Sample size / number of requests not stated in the excerpt.)

high positive Auditing African Content Moderators' Working Conditions by U... use of GDPR to access employment-related documents for content moderators

Design principles that promote disagreement and decentralization—contextual grounding, community customization, continual adaptation, and polycentric governance—should be used so oversight is distributed across many legitimate centers rather than centralized in one institutional or moral chokepoint.

Normative design recommendations and governance proposals provided in the paper (argumentative; no empirical governance evaluation reported).

high positive Positive Alignment: Artificial Intelligence for Human Flouri... promotion of disagreement and decentralization in AI oversight/governance

A range of technical directions (e.g., data filtering and upsampling, pre- and post-training, evaluations, collaborative value collection) are relevant for supporting positive alignment across different phases of the LLM and agents lifecycle.

Prescriptive technical recommendations and research directions described by the authors (conceptual proposals, not reported empirical tests).

high positive Positive Alignment: Artificial Intelligence for Human Flouri... applicability of listed technical interventions to LLM/agent lifecycle for posit...

Several existing failures of alignment (e.g., engagement hacking, loss of human autonomy, failures in truth-seeking, low epistemic humility, error correction, lack of diverse viewpoints, and being primarily reactive rather than proactive) may be better addressed through positive alignment, including cultivating virtues and maximizing human flourishing.

Theoretical argument and illustrative examples presented in the paper (no experimental or observational results reported).

high positive Positive Alignment: Artificial Intelligence for Human Flouri... mitigation of specific alignment failures (engagement hacking, autonomy loss, tr...

Positive Alignment is a distinct and necessary agenda within AI alignment research.

Normative argumentation in the paper advocating for a separate research agenda (no empirical validation presented).

high positive Positive Alignment: Artificial Intelligence for Human Flouri... need for a distinct research agenda in alignment

Positive Alignment is the development of AI systems that (i) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way while (ii) remaining safe and cooperative.

Paper's definitional proposal / conceptual framing (normative definition rather than empirical evidence).

high positive Positive Alignment: Artificial Intelligence for Human Flouri... definition and intended properties of 'Positive Alignment' systems

Policy frameworks are necessary to govern verifiable machine intelligence in modern socio-technical infrastructures.

Normative recommendation and policy discussion in the paper; no empirical policy evaluation or legislative case studies are presented in the supplied text.

high positive Optimizing Process Based Reward Models through Reinforcement... existence/need for governance and regulation

Process-based supervision has broader implications for algorithmic fairness and can reduce black-box opacity.

High-level discussion in the paper linking process-verifiability to fairness and reduced opacity; no empirical fairness audits or quantitative fairness metrics reported in the provided text.

high positive Optimizing Process Based Reward Models through Reinforcement... algorithmic fairness / model opacity

Integrating reinforcement learning with process-oriented feedback can foster a more transparent AI ecosystem where the path to a conclusion is as scrutinized as the conclusion itself.

Conceptual claim and proposed benefit in the paper; presented as an argument rather than supported by empirical transparency or interpretability studies in the supplied text.

high positive Optimizing Process Based Reward Models through Reinforcement... transparency / interpretability of model reasoning

Process-based supervision significantly improves the reliability of models in high-stakes domains such as law, medicine, and engineering.

Asserted by the authors as an advantage of PRMs for high-stakes applications; presented as argumentation rather than backed by reported empirical trials or case-study sample sizes in the provided text.

high positive Optimizing Process Based Reward Models through Reinforcement... model reliability in high-stakes domains

Optimizing PRMs through reinforcement learning enhances the verifiability and robustness of multi-step reasoning in large-scale model architectures.

Central argumentative claim of the paper (theoretical proposal and conceptual analysis); no experimental results or quantitative evaluation provided in the text supplied.

high positive Optimizing Process Based Reward Models through Reinforcement... verifiability and robustness of multi-step reasoning

Process-Based Reward Models (PRMs) assign value to each distinct stage of a reasoning chain, providing a more granular signal for training than outcome-only approaches.

Methodological description and conceptual argument in the paper; described as a design/approach rather than empirically validated with data.

high positive Optimizing Process Based Reward Models through Reinforcement... training signal granularity / training effectiveness

Overall, the study provides a cross-sectoral empirical foundation for understanding how budget flexibility, governance, and technology interact to support resilient financial systems in uncertain economic environments.

Synthesis statement based on the paper's cross-sectoral comparative analysis combining firm 10-K data (four firms), Open Budget Survey, OECD database, GAO reports, and the Flexibility Index.

high positive Budgeting for Agility: A Cross-Sectoral Analysis of Fiscal F... resilience of financial systems to uncertainty

In the public sector, systems characterized by strong transparency frameworks and Medium-Term Expenditure Frameworks demonstrate higher alignment between planned and actual expenditures.

Cross-sectional analysis using Open Budget Survey 2023, OECD Budget Practices Database, and U.S. GAO oversight reports linking transparency and MTEFs to alignment between planned and actual expenditures.

high positive Budgeting for Agility: A Cross-Sectoral Analysis of Fiscal F... alignment between planned and actual expenditures (forecast/policy alignment)

Firms with decentralized budgeting structures and embedded predictive analytics exhibit lower forecast deviations and faster resource reallocation.

Comparative empirical analysis of four large firms using Form 10-K data (2019–2023) and the Flexibility Index to relate decentralization and AI integration to forecast deviations and reallocation speed.

high positive Budgeting for Agility: A Cross-Sectoral Analysis of Fiscal F... forecast deviation (predictive alignment) and speed of resource reallocation

Methodologically, the study demonstrates how expert reasoning can be operationalized as a benchmark for evaluating AI systems in urban infrastructure contexts, addressing gaps in empirical assessment and governance tools.

Study design: creation of Delphi-derived rubric from 20 experts and its use as an evaluation benchmark for six LLMs; reported as a methodological contribution.

high positive Governance risks of AI reasoning in urban infrastructure thr... feasibility of operationalizing expert reasoning as evaluation benchmark

The Delphi process elicited and refined expert reasoning criteria, producing a rubric that emphasized public safety, regulatory compliance, contextual judgment, financial stewardship, and system reliability.

Method: Delphi process with 20 infrastructure professionals that generated and refined reasoning criteria; resulting rubric content reported in paper.

high positive Governance risks of AI reasoning in urban infrastructure thr... content/themes of the expert-derived rubric

Policymakers should combine support for technological development with strategic investments in finance, trade integration, and public infrastructure to maximize AI's economic benefits and transform its potential into sustainable and inclusive growth.

Policy recommendation derived from the empirical findings (positive AI effects and positive interactions with financial innovation, trade openness, and government consumption) reported for 19 G20 countries (2005–2023) using GMM.

high positive Artificial intelligence and economic growth in G20 economies... economic growth (implied)

The interaction between AI and government final consumption expenditure helps strengthen economic growth by improving public infrastructure, institutional quality, and capacity to leverage new technologies.

GMM interaction specifications using panel data for 19 G20 countries (2005–2023); reported AI × government final consumption expenditure interaction coefficient is positive and statistically significant, with interpretation linking it to public infrastructure and institutional capacity.

high positive Artificial intelligence and economic growth in G20 economies... economic growth

The interaction between AI and trade openness is positive and significant, underscoring the role of international trade in technological diffusion and competitiveness to boost growth.

GMM interaction models on panel data (19 G20 countries, 2005–2023); reported AI × trade openness interaction coefficient is positive and statistically significant.

high positive Artificial intelligence and economic growth in G20 economies... economic growth

The interaction between AI and financial innovation has a positive and significant impact on economic growth, indicating that innovative finance mediates AI's technological potential into tangible economic gains.

GMM models with interaction terms using panel data of 19 G20 countries (2005–2023); reported AI × financial innovation interaction coefficient is positive and statistically significant.

high positive Artificial intelligence and economic growth in G20 economies... economic growth

AI-related innovation has a positive and significant effect on economic growth (linear model, GMM).

Panel analysis of 19 G20 countries (2005–2023) using the Generalized Method of Moments (GMM) linear model; reported positive and statistically significant coefficient for AI-related innovation.

high positive Artificial intelligence and economic growth in G20 economies... economic growth

In an empirical study of the Community Health Centers rollout, estimated spillovers account for a substantial share of the effect on older-adult mortality.

Empirical application reported in the paper applying the proposed methods to the Community Health Centers rollout; estimated spillover component contributes substantially to the measured effect on older-adult mortality (results from observational data analysis).

high positive Identification and Estimation of Staggered Difference-in-Dif... older-adult mortality

Monte Carlo simulations show the proposed estimators have small bias for these effects and the associated confidence intervals have coverage close to the nominal level.

Monte Carlo simulation evidence reported in the paper indicating small bias of the proposed estimators and coverage of confidence intervals close to nominal in the simulated settings.

high positive Identification and Estimation of Staggered Difference-in-Dif... estimator bias and confidence interval coverage

Synthetic scenarios in the paper illustrate that the revised metric distinguishes between frequent low-leverage use, semantically repetitive prompting, and more autonomous, higher-consequence AI-assisted work.

Paper includes synthetic scenario simulations/illustrations demonstrating metric behavior across different usage patterns (synthetic examples; no real-world sample reported).

high positive Intelligence Impact Quotient (IIQ): A Framework for Measurin... ability of the metric to discriminate types of AI use

The authors derive sub-daily update rules and a bounded interpretation layer for estimated efficiency and financial impact from the IIQ metric.

Analytic derivation in the methods: paper presents update rules (sub-daily) and an interpretation layer mapping IIQ to estimated efficiency and financial impact (theoretical derivation / worked examples). No empirical validation sample reported.

high positive Intelligence Impact Quotient (IIQ): A Framework for Measurin... estimated efficiency and financial impact

The formulation produces a raw Intelligence Adoption Index (IAI) and a normalized 0-1000 IIQ index for comparison between heterogeneous users and units.

Methodological description: authors define a raw IAI and describe normalization to a 0–1000 IIQ scale for comparability (model/specification). No empirical sample reported.

high positive Intelligence Impact Quotient (IIQ): A Framework for Measurin... normalized adoption/index score across users/units

IIQ combines a novelty-weighted, time-decayed token stock with usage frequency, a grace-period recency gate, organizational leverage, task complexity, and autonomy to form its measurement.

Methodological formulation in the paper: component-level specification of the IIQ metric (theoretical specification / algorithmic description). No empirical validation sample reported.

high positive Intelligence Impact Quotient (IIQ): A Framework for Measurin... operationalization of AI usage (components driving the metric)

The Intelligence Impact Quotient (IIQ) is a composite metric intended to quantify the depth to which AI systems are integrated into organizational work and their impact.

Paper framing and definition: the authors introduce IIQ as a composite metric and describe its purpose as quantifying AI integration depth and impact (conceptual/methodological description). No empirical sample reported.

high positive Intelligence Impact Quotient (IIQ): A Framework for Measurin... depth of AI integration into work / AI impact

Experiments show consistent advantages in viewer engagement.

Reported experimental comparison vs named baselines; claim of consistent advantage in viewer engagement without numeric effect size provided in the excerpt.

high positive VerbalValue: A Socially Intelligent Virtual Host for Sales-D... viewer engagement

Experiments show consistent advantages in tactfulness.

Reported experimental comparison vs named baselines; claim of consistent advantage in tactfulness without numeric effect size provided in the excerpt.

high positive VerbalValue: A Socially Intelligent Virtual Host for Sales-D... tactfulness

Experiments against GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro, and other baselines demonstrate gains of 18% on factual correctness.

Reported experimental comparison vs named baselines; specific numeric improvement stated (18% gain on factual correctness). Evaluation dataset or sample size not provided in the excerpt.

high positive VerbalValue: A Socially Intelligent Virtual Host for Sales-D... factual correctness

Experiments against GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro, and other baselines demonstrate gains of 23% on informativeness.

Reported experimental comparison vs named baselines; specific numeric improvement stated (23% gain on informativeness). Evaluation dataset or sample size not provided in the excerpt.

high positive VerbalValue: A Socially Intelligent Virtual Host for Sales-D... informativeness

We fine-tune a large language model on this data to deliver empathetic, commercially oriented responses, adapting to viewer intent through empathetic amplification, evidence-backed rebuttal, and humor-mediated deflection.

Methodological contribution: fine-tuning an LLM on the collected annotated data, described in the paper.

high positive VerbalValue: A Socially Intelligent Virtual Host for Sales-D... ability to produce empathetic, commercially oriented responses

We collect and annotate 1,475 live-commerce interactions spanning diverse viewer intents.

Dataset creation reported in the methods: explicitly states 1,475 annotated live-commerce interactions.

high positive VerbalValue: A Socially Intelligent Virtual Host for Sales-D... size of annotated dataset

We construct a domain knowledge base of product specifications and a curated sales terminology lexicon that anchor product-related responses in verified expertise.

Methodological contribution described in the paper: construction of a domain knowledge base and curated sales terminology lexicon.

high positive VerbalValue: A Socially Intelligent Virtual Host for Sales-D... availability of domain knowledge and sales lexicon (artifact creation)

A skilled live-commerce host is not merely a narrator, but a sales agent who converts viewer curiosity into purchase intent through expert product knowledge, emotionally intelligent response tactics, and entertainment that serves as a vehicle for product exposure.

Conceptual description in the paper's introduction; no empirical data or experimental method cited in the excerpt.

high positive VerbalValue: A Socially Intelligent Virtual Host for Sales-D... purchase intent / sales conversion

Das Dokument leistet einen Beitrag zu den laufenden Bemühungen der G7 und der OECD, die Verbreitung innovativer, vertrauenswürdiger und produktivitätssteigernder KI im Einklang mit den KI-Grundsätzen der OECD zu fördern.

Descriptive claim about the paper's intended contribution to G7/OECD efforts and alignment with OECD AI Principles (self-declared by the paper).

high positive Einführung von KI in kleinen und mittleren Unternehmen Policy-aligned Förderung vertrauenswürdiger, produktivitätssteigernder KI

Die Erkenntnisse unterstreichen, dass die Regierungen Strategien unterstützen sollten, die die Einführung von KI in KMU beschleunigen und eine digitale Transformation fördern, die allen zugutekommt.

Policy recommendation based on the paper's synthesis of data analysis and case studies; presented as the paper's conclusion (no causal estimate provided in excerpt).

high positive Einführung von KI in kleinen und mittleren Unternehmen Wirksamkeit staatlicher Strategien zur Beschleunigung der KI-Einführung in KMU /...

Das Dokument führt eine Taxonomie der KI-nutzenden KMU auf Basis des digitalen Reifegrads, der Komplexität der Nutzung und des Umfangs der Anwendung ein, die darauf abzielt, die Politikgestaltung zu unterstützen.

Descriptive statement that the paper develops a taxonomy (method: taxonomy construction based on those three dimensions); presented as part of the paper's contributions (no empirical validation details in excerpt).

high positive Einführung von KI in kleinen und mittleren Unternehmen Kategorisierung/Taxonomie von KI-nutzenden KMU

Dieses Diskussionspapier wurde auf Ersuchen der G7-Präsidentschaft vom OECD-Sekretariat erstellt, um Hintergrundmaterial für die Diskussionen der G7 über einen Blueprint zur Einführung von KI in KMU bereitzustellen.

Descriptive statement of the paper's provenance and purpose (administrative/factual about document preparation).

high positive Einführung von KI in kleinen und mittleren Unternehmen Erstellung und Zweck des Diskussionspapiers

Im Rahmen der G7-Präsidentschaft Kanadas 2025 wurde die beschleunigte Einführung von KI in KMU zu einer Hauptpriorität erklärt.

Factual statement about G7 policy priorities as reported by the paper (administrative/policy fact reported by OECD secretariat).

high positive Einführung von KI in kleinen und mittleren Unternehmen political/policy prioritization of AI adoption in KMU

Künstliche Intelligenz (KI) ist ein vielversprechender Ansatz, um Produktivität und Innovation in Unternehmen, insbesondere kleinen und mittleren Unternehmen (KMU), zu steigern.

Authoritative statement in the paper's summary; based on literature review and general argumentation rather than a specific empirical test reported in this excerpt.

high positive Einführung von KI in kleinen und mittleren Unternehmen Produktivität und Innovation in Unternehmen (insbesondere KMU)

The framework contributes to improving understanding of enterprise coordination and governance under constrained legal conditions and offers a basis for future analytical and empirical research.

Author-stated contribution of the paper based on the developed theoretical framework; positioned as foundation for future work.

high positive RegTech-enabled governance of sanctions-safe enterprise ecos... conceptual contribution to understanding enterprise coordination and governance

The analysis identifies theoretical conditions under which such governance may support verifiable integrity, adaptive compliance, and access to formal markets.

Theoretical conditions derived from the review and theory synthesis (no empirical testing reported in this paper).

high positive RegTech-enabled governance of sanctions-safe enterprise ecos... verifiable integrity, adaptive compliance, access to formal markets

The study develops a theory-based framework explaining how RegTech-supported governance may, under specified conditions, enable sanctions-safe enterprise ecosystems during post-conflict reconstruction.

Primary contribution of the paper: theory synthesis built from integrative review of five literature streams (RegTech, sanctions compliance, institutional voids, supply-chain governance, algorithmic accountability).

high positive RegTech-enabled governance of sanctions-safe enterprise ecos... potential for RegTech-supported governance to enable sanctions-safe enterprise e...

« Prev 1 2 3 … 115 116 117 … 276 277 Next »