Evidence (5267 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	378	106	59	455	1007
Governance & Regulation	379	176	116	58	739
Research Productivity	240	96	34	294	668
Organizational Efficiency	370	82	63	35	553
Technology Adoption Rate	296	118	66	29	513
Firm Productivity	277	34	68	10	394
AI Safety & Ethics	117	177	44	24	364
Output Quality	244	61	23	26	354
Market Structure	107	123	85	14	334
Decision Quality	168	74	37	19	301
Fiscal & Macroeconomic	75	52	32	21	187
Employment Level	70	32	74	8	186
Skill Acquisition	89	32	39	9	169
Firm Revenue	96	34	22	—	152
Innovation Output	106	12	21	11	151
Consumer Welfare	70	30	37	7	144
Regulatory Compliance	52	61	13	3	129
Inequality Measures	24	68	31	4	127
Task Allocation	75	11	29	6	121
Training Effectiveness	55	12	12	16	96
Error Rate	42	48	6	—	96
Worker Satisfaction	45	32	11	6	94
Task Completion Time	78	5	4	2	89
Wages & Compensation	46	13	19	5	83
Team Performance	44	9	15	7	76
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	18	17	9	5	50
Job Displacement	5	31	12	—	48
Social Protection	21	10	6	2	39
Developer Productivity	29	3	3	1	36
Worker Turnover	10	12	—	3	25
Skill Obsolescence	3	19	2	—	24
Creative Output	15	5	3	1	24
Labor Share of Income	10	4	9	—	23

Adoption Remove filter

We applied this framework to four LLMs (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, Llama-3-8B-Base, Gemma-2-9B-Instruct) across 224,000 factual QA trials.

Experimental methods reported in the paper listing the four model variants and total trial count (224,000 factual QA trials).

high positive Do LLMs Know What They Know? Measuring Metacognitive Efficie... empirical evaluation of models' Type-1 and Type-2 metrics across factual QA tria...

We introduce an evaluation framework based on Type-2 Signal Detection Theory that decomposes these capacities using meta-d' and the metacognitive efficiency ratio M-ratio.

Methodological contribution described in the paper: specification of a Type-2 SDT framework and use of meta-d' and M-ratio as measurement constructs.

high positive Do LLMs Know What They Know? Measuring Metacognitive Efficie... decomposition of Type-1 vs Type-2 capacities using meta-d' and M-ratio

Deployment validation across 43 classrooms demonstrated an 18x efficiency gain in the assessment workflow.

Field deployment described in the paper: system was validated across 43 classrooms and an efficiency gain of 18x in the assessment workflow is reported.

high positive When AI Meets Early Childhood Education: Large Language Mode... efficiency of the assessment workflow (time/resources per assessment)

Interaction2Eval achieves up to 88% agreement with human expert judgments.

Reported evaluation results comparing Interaction2Eval outputs to human expert annotations (rubric-based judgments) on the dataset.

high positive When AI Meets Early Childhood Education: Large Language Mode... agreement between AI-generated assessments and human expert judgments

Interaction2Eval, an LLM-based framework, addresses domain-specific challenges (child speech recognition, Mandarin homophone disambiguation, rubric-based reasoning).

Methodological description in the paper: a specialized LLM-based pipeline designed to handle listed domain challenges; presented as the approach used to extract structured quality indicators.

high positive When AI Meets Early Childhood Education: Large Language Mode... capability to handle domain-specific technical challenges in automated assessmen...

TEPE-TCI-370h is the first large-scale dataset of naturalistic teacher-child interactions in Chinese preschools (370 hours, 105 classrooms) with standardized ECQRS-EC and SSTEW annotations.

Authors' dataset construction and description: 370 hours of recorded interactions from 105 classrooms, annotated with ECQRS-EC and SSTEW rubrics as reported in the paper.

high positive When AI Meets Early Childhood Education: Large Language Mode... availability of a large-scale annotated dataset for preschool teacher-child inte...

The dataset provides a reproducible and scalable foundation for research on technological diffusion, regional digitalisation, and industry-level transformation, and can be readily extended to future years or adapted to other countries.

Text asserts reproducibility, scalability, and extendability of the dataset and methods for future years and other countries.

high positive AI adoption in Spain (2023–2025): A web-derived dataset base... adoption_rate

By providing indicators for two benchmark years, the dataset supports the study of how AI adoption evolves across the Spanish business landscape.

Text highlights the availability of indicators for 2023 and 2025 and claims this supports temporal study of adoption evolution.

high positive AI adoption in Spain (2023–2025): A web-derived dataset base... adoption_rate

This multi-dimensional structure enables users to explore territorial patterns, sectoral differences, and size-related disparities in the uptake of AI.

Text claims that the dataset's dimensions make it possible to explore spatial (territorial), sectoral, and size-related patterns in AI uptake.

high positive AI adoption in Spain (2023–2025): A web-derived dataset base... adoption_rate

For each province–sector–size combination, the dataset reports whether firms adopt AI, whether they apply it internally, whether it is embedded in their offerings, and how many firms have valid website content.

Text explicitly lists the reported indicators at the province–sector–size aggregation level (adoption, internal use, embedded in offerings, count of valid website content).

high positive AI adoption in Spain (2023–2025): A web-derived dataset base... adoption_rate

The dataset offers a detailed portrait of AI adoption across regions (NUTS 3), industries, and firm size categories.

Text claims multi-dimensional reporting by region (NUTS 3), industry, and firm size categories in the dataset.

high positive AI adoption in Spain (2023–2025): A web-derived dataset base... adoption_rate

The pipeline identifies explicit evidence of AI use both in firms' internal processes and embedded in their products or services.

Text states the structured rubric is used to identify explicit evidence of AI use in internal processes and in products/services.

high positive AI adoption in Spain (2023–2025): A web-derived dataset base... adoption_rate

The paper uses a systemic pipeline based on large language models (LLMs) to segment website text, semantically filter it, and evaluate it with a structured rubric.

Text describes methodological pipeline components (LLM-based segmentation, semantic filtering, structured rubric evaluation).

high positive AI adoption in Spain (2023–2025): A web-derived dataset base... other

The dataset results in 225,628 firm-year observations.

Text explicitly reports 225,628 firm-year observations derived from the dataset across the two benchmark years.

high positive AI adoption in Spain (2023–2025): A web-derived dataset base... adoption_rate

The paper introduces a nationwide dataset that maps how 112,814 Spanish firms communicate and implement artificial intelligence (AI) on their corporate websites in 2023 and 2025.

Text states dataset coverage and firm count (112,814 firms) and benchmark years (2023 and 2025).

high positive AI adoption in Spain (2023–2025): A web-derived dataset base... adoption_rate

These results provide a mechanistic account of how humans adapt their trust in AI confidence signals through experience.

Combined behavioral evidence (N = 200) and computational modeling (LLO + Rescorla–Wagner) presented in the paper.

high positive Learning to Trust: How Humans Mentally Recalibrate AI Confid... mechanistic explanation of trust adaptation to AI confidence signals

The model indicates that humans adapt by updating two components: baseline trust and confidence sensitivity, and they use asymmetric learning rates that prioritize the most informative errors.

Parameter recovery / model-fitting results reported in the paper showing updates to baseline trust and sensitivity parameters and asymmetric learning-rate estimates.

high positive Learning to Trust: How Humans Mentally Recalibrate AI Confid... latent learning parameters (baseline trust, confidence sensitivity, asymmetric l...

A computational model using a linear-in-log-odds (LLO) transformation combined with a Rescorla–Wagner learning rule explains the observed learning dynamics.

Modeling analysis reported in the paper fitting an LLO + Rescorla–Wagner model to participants' behavioral data (N = 200).

high positive Learning to Trust: How Humans Mentally Recalibrate AI Confid... model fit to behavioral learning dynamics

Humans can compensate for monotonic miscalibration (overconfidence and underconfidence) through repeated experience.

Behavioral experiment results showing participants adapted successfully in overconfidence and underconfidence conditions (N = 200, 50 trials).

high positive Learning to Trust: How Humans Mentally Recalibrate AI Confid... compensation for monotonic miscalibration (ability to adjust to over/underconfid...

Robust learning occurred across all calibration conditions (standard, overconfidence, underconfidence, reverse) with participants improving accuracy, discrimination, and calibration.

Behavioral experiment (N = 200) reporting consistent learning improvements across the four experimental conditions over 50 trials.

high positive Learning to Trust: How Humans Mentally Recalibrate AI Confid... learning (improvements in accuracy, discrimination, calibration) across conditio...

Participants significantly improved their calibration alignment (alignment between their confidence predictions and actual AI correctness) over 50 trials.

Behavioral experiment (N = 200) reporting improvements in calibration alignment metrics across trials.

high positive Learning to Trust: How Humans Mentally Recalibrate AI Confid... calibration alignment (match between predicted confidence and AI correctness)

Participants significantly improved their discrimination (ability to distinguish correct vs. incorrect AI outputs) over 50 trials.

Behavioral experiment (N = 200) reporting improved discrimination metrics across repeated trials.

high positive Learning to Trust: How Humans Mentally Recalibrate AI Confid... discrimination (ability to separate correct from incorrect AI outputs)

Participants significantly improved their prediction accuracy of the AI's correctness over 50 trials.

Behavioral experiment (N = 200), longitudinal measurement across 50 trials reporting statistically significant improvement in accuracy.

high positive Learning to Trust: How Humans Mentally Recalibrate AI Confid... accuracy (participants' correctness in predicting AI correctness)

Extensive offline evaluations demonstrate OneSearch-V2's strong query recognition and user profiling capabilities.

Author statement referencing extensive offline evaluations showing these capabilities; no metrics, datasets, or sample sizes provided in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... query recognition and user profiling performance

OneSearch-V2 introduces a behavior preference alignment optimization system which mitigates reward hacking arising from the single conversion metric and addresses personal preference via direct user feedback.

Methodological description of an optimization/feedback component in the paper; no empirical quantification of mitigation or user-feedback effects provided in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... mitigation of reward hacking from single-metric optimization and alignment with ...

OneSearch-V2 contains a reasoning-internalized self-distillation training pipeline that uncovers users' potential yet precise e-commerce intentions beyond log-fitting through implicit in-context learning.

Methodological description of the training pipeline in the paper; no direct quantitative evidence or ablation results given in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... ability to infer latent user intent beyond behavior logs

OneSearch-V2 includes a thought-augmented complex query understanding module that enables deep query understanding and overcomes the shallow semantic matching limitations of direct inference.

Methodological description of the proposed module in the paper; no standalone evaluation numbers for this module provided in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... query understanding capability (depth of understanding vs. shallow semantic matc...

OneSearch-V2 effectively mitigates common search system issues such as information bubbles and long-tail sparsity, without incurring additional inference costs or serving latency.

Author claim in the paper stating mitigation of these issues and no added inference/latency costs; no quantitative measures, benchmarks, or latency numbers provided in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... information bubbles and long-tail sparsity (and inference/serving latency)

Manual evaluation confirms gains in query-item relevance, with +1.37%.

Reported manual evaluation metric in the paper; no sample size or annotation protocol provided in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... query-item relevance

Manual evaluation confirms gains in search experience quality, with +1.65% in page good rate.

Reported manual evaluation metric in the paper; no sample size or annotation protocol provided in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... page good rate

OneSearch-V2 increases order volume by +2.11% in online A/B tests.

Reported online A/B test result in the paper; no sample size, test duration, or statistical significance reported in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... order volume

OneSearch-V2 increases buyer conversion rate by +3.05% in online A/B tests.

Reported online A/B test result in the paper; no sample size, test duration, or statistical significance reported in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... buyer conversion rate

OneSearch-V2 increases item CTR by +3.98% in online A/B tests.

Reported online A/B test result in the paper; no sample size, test duration, or statistical significance reported in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... item CTR

OneSearch, as a representative industrial-scale deployed generative search framework, has brought significant commercial and operational benefits.

Author assertion describing OneSearch as industrial-scale and commercially/operationally beneficial; no supporting numerical evidence or sample size reported in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... commercial and operational benefits

Generative Retrieval (GR) offers advantages over multi-stage cascaded architectures such as end-to-end joint optimization and high computational efficiency.

Statement in paper positioning GR as a promising paradigm and listing these advantages; no quantitative study or sample size reported in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... computational efficiency and ability to perform end-to-end joint optimization

Late disclosure of AI involvement improved affective engagement for AI-enhanced content.

Reported experimental result in the abstract from the two online studies (study 1: n = 325; study 2: n = 371) manipulating disclosure timing (early vs. late).

high positive AI content labeling and user engagement on social media: The... affective engagement for AI-enhanced content under late disclosure

Automation in Japanese manufacturing increased even during periods of slow productivity growth.

Empirical finding from applying the framework to industry-level data in Japanese manufacturing; comparison of inferred automation trends with observed productivity growth periods (exact sample/time not provided in the summary).

high positive The macroeconomics of automation trend in automation versus productivity growth (automation increased despite slo...

Applying the framework to Japanese manufacturing industries shows that automation increased through capital deepening.

Empirical application of the theoretical framework to Japanese manufacturing industries (industry-level analysis); estimation/inference using industry macro observables. (Paper states result; exact sample size/time span not provided in the summary.)

high positive The macroeconomics of automation increase in automation (share of tasks by capital) attributable to capital deepe...

The model provides a transparent mapping from standard macroeconomic observables (capital-labor ratio, output per worker, elasticity of substitution) into the degree of automation, allowing automation to be measured without relying on technology-specific indicators.

Theoretical mapping derived from the CES structure that links observable macro variables to the endogenous degree of automation; methodological claim about inference procedure.

high positive The macroeconomics of automation degree of automation inferred from macro observables

Aggregating task-level decisions generates a CES production function in which the economy-wide degree of automation emerges endogenously.

Analytical derivation in the paper: aggregation of task-level adoption decisions yields a CES aggregate production function with endogenous automation parameter.

high positive The macroeconomics of automation form of aggregate production function / emergence of economy-wide automation par...

The degree of automation is defined as the share of tasks performed by capital rather than labor.

Explicit model definition provided in the paper (conceptual/theoretical definition).

high positive The macroeconomics of automation share of tasks performed by capital

The degree of automation in the aggregate economy emerges endogenously as an equilibrium outcome and can be inferred from standard macroeconomic data.

Theoretical development in a task-based production framework with endogenous technology adoption; mapping from model to observable macro variables (capital-labor ratio, output per worker, elasticity of substitution).

high positive The macroeconomics of automation degree of automation (economy-wide share of tasks performed by capital)

The results of this regional research outline a multi-dimensional policy roadmap that dives deep into the region’s current capabilities and the hurdles it faces in catching up with the AI revolution from a governance and policy perspective, presenting them in a practical framework for public sector leaders.

Report summary claiming that the study's results produce a comprehensive roadmap and practical framework (content description).

high positive Charting AI Governance Future in the Arab Region: A Policy R... comprehensiveness and practicality of the policy roadmap produced by the study

This executive report provides a roadmap for establishing an AI governance infrastructure through a set of strategic policy recommendations across seven key pillars.

Document assertion describing the content and structure of the report (authors' deliverable).

high positive Charting AI Governance Future in the Arab Region: A Policy R... existence of a multi-pillar policy roadmap in the report

The reality of limited AI governance capacity calls for a series of policy interventions at both local and regional levels to empower the AI ecosystem in the Arab region.

Authors' policy recommendation derived from the regional study and synthesis of findings.

high positive Charting AI Governance Future in the Arab Region: A Policy R... adoption of policy interventions to strengthen AI governance and ecosystem

A governance model linking 'trustworthy AI' practices to competitive advantage yields reduced uncertainty, faster deployment cycles, and higher stakeholder trust.

Central claim of the paper tying the proposed AIGSF to business benefits; supported by conceptual linkage and illustrative examples rather than quantified empirical evidence or controlled evaluation.

high positive Artificial Intelligence Governance In Corporate Strategy: Et... firm_revenue

Case illustrations across hiring, credit, consumer services, and generative AI draw lessons on controls such as model documentation, algorithmic audits, impact assessments, and human-in-the-loop oversight.

Paper includes qualitative case illustrations in the listed domains to demonstrate governance controls; these are presented as examples and lessons rather than as systematic empirical studies (no sample sizes reported).

high positive Artificial Intelligence Governance In Corporate Strategy: Et... regulatory_compliance

The paper develops an AI Governance Strategic Framework (AIGSF) and an implementation roadmap that connect ethical accountability, regulatory readiness, cybersecurity resilience, and performance outcomes.

Paper contribution described as an integrative conceptual framework and roadmap; supported by theoretical grounding and illustrative cases rather than empirical validation; no sample size provided.

high positive Artificial Intelligence Governance In Corporate Strategy: Et... organizational_efficiency

AI governance should be treated as a strategic governance function—anchored in board oversight and enterprise risk management—rather than a narrow technical or compliance task.

Central normative recommendation and thesis of the paper; derived from an integrative conceptual framework grounded in corporate governance theory, ERM, and emerging regulation. No empirical testing or sample reported.

high positive Artificial Intelligence Governance In Corporate Strategy: Et... governance_and_regulation

AI has moved from a peripheral digital capability to a central driver of corporate strategy, reshaping decision-making, customer engagement, operations, and risk exposure.

Statement presented in the paper's introduction and motivation; supported by integrative conceptual design and literature grounding (theory and descriptive citations). No empirical sample or quantitative analysis reported.

high positive Artificial Intelligence Governance In Corporate Strategy: Et... organizational_efficiency

« Prev 1 2 3 … 29 30 31 … 105 106 Next »