Evidence (13870 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	196	98	892	1984
Governance & Regulation	817	394	188	121	1544
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	627	233	123	96	1088
Research Productivity	411	123	56	332	933
Output Quality	467	178	59	47	751
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	167	122	24	496
Task Allocation	207	64	71	32	379
Skill Acquisition	165	59	60	17	301
Innovation Output	203	27	43	18	292
Employment Level	105	52	107	13	279
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	150	48	26	3	227
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	63	20	12	184
Error Rate	69	92	10	2	173
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	93	21	13	19	148
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Creative Output	31	17	7	3	59
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

OneSearch-V2 effectively mitigates common search system issues such as information bubbles and long-tail sparsity, without incurring additional inference costs or serving latency.

Author claim in the paper stating mitigation of these issues and no added inference/latency costs; no quantitative measures, benchmarks, or latency numbers provided in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... information bubbles and long-tail sparsity (and inference/serving latency)

Manual evaluation confirms gains in query-item relevance, with +1.37%.

Reported manual evaluation metric in the paper; no sample size or annotation protocol provided in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... query-item relevance

Manual evaluation confirms gains in search experience quality, with +1.65% in page good rate.

Reported manual evaluation metric in the paper; no sample size or annotation protocol provided in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... page good rate

OneSearch-V2 increases order volume by +2.11% in online A/B tests.

Reported online A/B test result in the paper; no sample size, test duration, or statistical significance reported in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... order volume

OneSearch-V2 increases buyer conversion rate by +3.05% in online A/B tests.

Reported online A/B test result in the paper; no sample size, test duration, or statistical significance reported in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... buyer conversion rate

OneSearch-V2 increases item CTR by +3.98% in online A/B tests.

Reported online A/B test result in the paper; no sample size, test duration, or statistical significance reported in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... item CTR

OneSearch, as a representative industrial-scale deployed generative search framework, has brought significant commercial and operational benefits.

Author assertion describing OneSearch as industrial-scale and commercially/operationally beneficial; no supporting numerical evidence or sample size reported in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... commercial and operational benefits

Generative Retrieval (GR) offers advantages over multi-stage cascaded architectures such as end-to-end joint optimization and high computational efficiency.

Statement in paper positioning GR as a promising paradigm and listing these advantages; no quantitative study or sample size reported in the excerpt.

high positive OneSearch-V2: The Latent Reasoning Enhanced Self-distillatio... computational efficiency and ability to perform end-to-end joint optimization

The framework aims to support more comparable benchmarks and cumulative research on human-AI readiness, advancing safer and more accountable human-AI collaboration.

Stated aims and intended impact in paper; aspirational/conceptual rather than empirically demonstrated in excerpt.

high positive From Accuracy to Readiness: Metrics and Benchmarks for Human... benchmarks, cumulative research, safety and accountability in human-AI collabora...

Operationalizing evaluation through interaction traces rather than model properties or self-reported trust enables deployment-relevant assessment of calibration, error recovery, and governance.

Methodological claim/proposed approach in paper; presented as enabling assessment but no empirical evaluation reported in excerpt.

high positive From Accuracy to Readiness: Metrics and Benchmarks for Human... assessment of calibration, error recovery, governance via interaction traces

The taxonomy and metrics are connected to the Understand-Control-Improve (U-C-I) lifecycle of human-AI onboarding and collaboration.

Conceptual mapping described in paper; no empirical tests or sample reported in excerpt.

high positive From Accuracy to Readiness: Metrics and Benchmarks for Human... linking metrics to U-C-I onboarding lifecycle

We introduce a four part taxonomy of evaluation metrics spanning outcomes, reliance behavior, safety signals, and learning over time.

Explicit methodological claim in paper announcing a taxonomy; described as a contribution rather than empirically tested in excerpt.

high positive From Accuracy to Readiness: Metrics and Benchmarks for Human... evaluation metrics taxonomy (outcomes, reliance behavior, safety signals, learni...

This paper proposes a measurement framework for evaluating human-AI decision-making centered on team readiness.

Methodological contribution presented in paper; conceptual framework proposed (no empirical validation reported in excerpt).

high positive From Accuracy to Readiness: Metrics and Benchmarks for Human... team readiness evaluation

Artificial intelligence (AI) systems are deployed as collaborators in human decision-making.

Statement in paper (conceptual/observational claim); no empirical sample or method provided in excerpt.

high positive From Accuracy to Readiness: Metrics and Benchmarks for Human... deployment of AI as collaborators

Late disclosure of AI involvement improved affective engagement for AI-enhanced content.

Reported experimental result in the abstract from the two online studies (study 1: n = 325; study 2: n = 371) manipulating disclosure timing (early vs. late).

high positive AI content labeling and user engagement on social media: The... affective engagement for AI-enhanced content under late disclosure

Automation in Japanese manufacturing increased even during periods of slow productivity growth.

Empirical finding from applying the framework to industry-level data in Japanese manufacturing; comparison of inferred automation trends with observed productivity growth periods (exact sample/time not provided in the summary).

high positive The macroeconomics of automation trend in automation versus productivity growth (automation increased despite slo...

Applying the framework to Japanese manufacturing industries shows that automation increased through capital deepening.

Empirical application of the theoretical framework to Japanese manufacturing industries (industry-level analysis); estimation/inference using industry macro observables. (Paper states result; exact sample size/time span not provided in the summary.)

high positive The macroeconomics of automation increase in automation (share of tasks by capital) attributable to capital deepe...

The model provides a transparent mapping from standard macroeconomic observables (capital-labor ratio, output per worker, elasticity of substitution) into the degree of automation, allowing automation to be measured without relying on technology-specific indicators.

Theoretical mapping derived from the CES structure that links observable macro variables to the endogenous degree of automation; methodological claim about inference procedure.

high positive The macroeconomics of automation degree of automation inferred from macro observables

Aggregating task-level decisions generates a CES production function in which the economy-wide degree of automation emerges endogenously.

Analytical derivation in the paper: aggregation of task-level adoption decisions yields a CES aggregate production function with endogenous automation parameter.

high positive The macroeconomics of automation form of aggregate production function / emergence of economy-wide automation par...

The degree of automation is defined as the share of tasks performed by capital rather than labor.

Explicit model definition provided in the paper (conceptual/theoretical definition).

high positive The macroeconomics of automation share of tasks performed by capital

The degree of automation in the aggregate economy emerges endogenously as an equilibrium outcome and can be inferred from standard macroeconomic data.

Theoretical development in a task-based production framework with endogenous technology adoption; mapping from model to observable macro variables (capital-labor ratio, output per worker, elasticity of substitution).

high positive The macroeconomics of automation degree of automation (economy-wide share of tasks performed by capital)

The results of this regional research outline a multi-dimensional policy roadmap that dives deep into the region’s current capabilities and the hurdles it faces in catching up with the AI revolution from a governance and policy perspective, presenting them in a practical framework for public sector leaders.

Report summary claiming that the study's results produce a comprehensive roadmap and practical framework (content description).

high positive Charting AI Governance Future in the Arab Region: A Policy R... comprehensiveness and practicality of the policy roadmap produced by the study

This executive report provides a roadmap for establishing an AI governance infrastructure through a set of strategic policy recommendations across seven key pillars.

Document assertion describing the content and structure of the report (authors' deliverable).

high positive Charting AI Governance Future in the Arab Region: A Policy R... existence of a multi-pillar policy roadmap in the report

The reality of limited AI governance capacity calls for a series of policy interventions at both local and regional levels to empower the AI ecosystem in the Arab region.

Authors' policy recommendation derived from the regional study and synthesis of findings.

high positive Charting AI Governance Future in the Arab Region: A Policy R... adoption of policy interventions to strengthen AI governance and ecosystem

A governance model linking 'trustworthy AI' practices to competitive advantage yields reduced uncertainty, faster deployment cycles, and higher stakeholder trust.

Central claim of the paper tying the proposed AIGSF to business benefits; supported by conceptual linkage and illustrative examples rather than quantified empirical evidence or controlled evaluation.

high positive Artificial Intelligence Governance In Corporate Strategy: Et... firm_revenue

Case illustrations across hiring, credit, consumer services, and generative AI draw lessons on controls such as model documentation, algorithmic audits, impact assessments, and human-in-the-loop oversight.

Paper includes qualitative case illustrations in the listed domains to demonstrate governance controls; these are presented as examples and lessons rather than as systematic empirical studies (no sample sizes reported).

high positive Artificial Intelligence Governance In Corporate Strategy: Et... regulatory_compliance

The paper develops an AI Governance Strategic Framework (AIGSF) and an implementation roadmap that connect ethical accountability, regulatory readiness, cybersecurity resilience, and performance outcomes.

Paper contribution described as an integrative conceptual framework and roadmap; supported by theoretical grounding and illustrative cases rather than empirical validation; no sample size provided.

high positive Artificial Intelligence Governance In Corporate Strategy: Et... organizational_efficiency

AI governance should be treated as a strategic governance function—anchored in board oversight and enterprise risk management—rather than a narrow technical or compliance task.

Central normative recommendation and thesis of the paper; derived from an integrative conceptual framework grounded in corporate governance theory, ERM, and emerging regulation. No empirical testing or sample reported.

high positive Artificial Intelligence Governance In Corporate Strategy: Et... governance_and_regulation

AI has moved from a peripheral digital capability to a central driver of corporate strategy, reshaping decision-making, customer engagement, operations, and risk exposure.

Statement presented in the paper's introduction and motivation; supported by integrative conceptual design and literature grounding (theory and descriptive citations). No empirical sample or quantitative analysis reported.

high positive Artificial Intelligence Governance In Corporate Strategy: Et... organizational_efficiency

A policy of 20% mandatory practice preserves 92% more capability than the simulation baseline (baseline includes a 5% background AI-failure rate).

Simulation comparing baseline (5% background AI-failure rate) to a counterfactual with 20% mandatory practice; reported 92% relative preservation of capability.

high positive The enrichment paradox: critical capability thresholds and i... preserved human capability under mandatory practice policy vs baseline

The model predicts that periodic AI failures improve human capability 2.7-fold (relative improvement reported in simulations).

Simulation experiments comparing scenarios with/without periodic AI failures; reported fold-change in capability of 2.7×.

high positive The enrichment paradox: critical capability thresholds and i... human capability (H) under periodic AI-failure regime

Validated against 15 countries' PISA data (102 points), the model achieves R^2 = 0.946 with 3 parameters and attains the lowest BIC among compared specifications.

Empirical validation using PISA dataset covering 15 countries and 102 data points; reported fit statistics (R^2, number of parameters, BIC).

high positive The enrichment paradox: critical capability thresholds and i... fit of model to PISA data (explained variance, model selection via BIC)

The model was calibrated to four domains: education, medicine, navigation, and aviation.

Model calibration procedures applied separately to four named domains reported in the paper.

high positive The enrichment paradox: critical capability thresholds and i... model parameter fits across domains

We present a two-variable dynamical systems model coupling capability (H) and delegation (D), grounded in three axioms: learning requires capability, practice, and disuse causes forgetting.

Model specification and theoretical construction described in the paper (two-variable dynamical system; three axioms).

high positive The enrichment paradox: critical capability thresholds and i... human capability as a dynamical variable (H) and delegation level (D)

These results demonstrate a practical path toward high-precision, low-latency text-to-SQL applications using domain-specialized, self-hosted language models in large-scale production environments.

Conclusion drawn by the authors based on their implementation, token reduction, and reported accuracy/latency-related claims; generalization to large-scale production is asserted but not supported by detailed production deployment metrics in the excerpt.

high positive Schema on the Inside: A Two-Phase Fine-Tuning Method for Hig... feasibility of production-grade text-to-SQL (precision and latency)

The resulting system achieves 98.4% execution success and 92.5% semantic accuracy, substantially outperforming a prompt-engineered baseline using Google's Gemini Flash 2.0 (95.6% execution, 89.4% semantic accuracy).

Reported empirical evaluation comparing the authors' system to a prompt-engineered baseline (Gemini Flash 2.0) with explicit performance percentages for execution success and semantic accuracy; no sample size, test set composition, statistical significance, or evaluation protocol provided in the excerpt.

high positive Schema on the Inside: A Two-Phase Fine-Tuning Method for Hig... execution success rate; semantic accuracy

The approach replaces costly external API calls with efficient local inference.

System design claim: the model is self-hosted and performs local inference instead of using external API-based LLM calls; no cost accounting or latency benchmarks provided in the excerpt.

high positive Schema on the Inside: A Two-Phase Fine-Tuning Method for Hig... use of external API calls vs local inference (cost/efficiency implication)

This reduces input tokens by over 99%, from a 17k-token baseline to fewer than 100.

Reported measurement comparing input token counts before and after applying their approach (explicit numerical baseline and resulting counts provided); no sample size or distribution of token counts reported.

high positive Schema on the Inside: A Two-Phase Fine-Tuning Method for Hig... input token count

A novel two-phase supervised fine-tuning approach enables the model to internalize the entire database schema, eliminating the need for long-context prompts.

Methodological description (two-phase supervised fine-tuning) and claim that this internalization removes reliance on long-context prompts; no detailed experimental protocol or sample size provided in the excerpt.

high positive Schema on the Inside: A Two-Phase Fine-Tuning Method for Hig... need for long-context prompts / model internalization of schema

We present a specialized, self-hosted 8B-parameter model designed for a conversational bot in CriQ, a sister app to Dream11 that answers user queries about cricket statistics.

Stated implementation detail in the paper describing the model architecture and deployment target (CriQ conversational bot). No experimental sample size reported for this statement.

high positive Schema on the Inside: A Two-Phase Fine-Tuning Method for Hig... model specification and deployment

Legal professionals, courts, and regulators should replace the outdated 'black box' mental model with verification protocols based on how these systems actually fail.

Policy recommendation stated in the abstract based on the paper's analysis; no trial or deployment evidence of such protocols provided in the excerpt.

high positive When AI output tips to bad but nobody notices: Legal implica... adoption of verification protocols / change in mental model

The adoption of generative AI across commercial and legal professions offers dramatic efficiency gains.

Asserted in the paper's introduction/abstract; no empirical data, sample, or quantitative study reported in the excerpt.

high positive When AI output tips to bad but nobody notices: Legal implica... efficiency gains

Those extended-model equilibria also show increasing concentration consistent with power-law-like distributions (i.e., winner-take-most / superstar effects).

Theoretical model combining quality heterogeneity and reinforcement dynamics that yields equilibrium distributions with heavy tails; argument and formalization presented in the paper; no empirical testing reported.

high positive The Economics of Builder Saturation in Digital Markets market concentration / distribution of returns (power-law-like)

Even as the number of producers increases and average attention per producer falls, total output expands (production scales elastically).

Same formal theoretical model (analytical result): production scales elastically in the model despite finite attention; no empirical validation provided.

high positive The Economics of Builder Saturation in Digital Markets total market output

Mechanisms identified — network structure evolution and increased relational embeddedness — contribute to a broader understanding of how digital transformation shapes innovation dynamics across geographical boundaries in a globalized knowledge economy.

Synthesis of empirical network evolution results and mediation/structural analyses from the 2011–2021 dataset of digital transformation indicators and patent collaboration networks among cities and firms.

high positive How Does Digital Transformation Affect Cross-Regional Collab... role of network structure evolution and relational embeddedness as mechanisms li...

These results provide empirical evidence from a major emerging economy (China) that can offer insights to inform policies and strategies in other regions undergoing digital transition.

Generalization claim based on empirical findings from the 2011–2021 analysis of A-share listed companies' digital transformation and patent collaboration patterns in China.

high positive How Does Digital Transformation Affect Cross-Regional Collab... policy relevance / generalizability of findings to other regions

When the volume of digital patent applications surpasses a certain threshold, the positive effect of digital transformation on the quality of cross-regional collaborative innovation accelerates (nonlinear threshold effect).

Threshold regression / nonlinear analysis relating counts of digital patent applications to the marginal effect of digital transformation on collaborative innovation quality, using 2011–2021 patent and digitalization data from A-share listed firms.

high positive How Does Digital Transformation Affect Cross-Regional Collab... quality of cross-regional collaborative innovation (and its change above a paten...

Advancement of digital transformation positively contributes to both the quality and the quantity of cross-regional cooperative innovation.

Empirical econometric analysis (panel regressions) linking measures of corporate/urban digital transformation to indicators of cross-regional cooperative innovation quality and counts, using A-share listed companies' digital transformation indicators and patent collaboration data, 2011–2021.

high positive How Does Digital Transformation Affect Cross-Regional Collab... quality and quantity (counts) of cross-regional cooperative innovation

China’s urban collaborative innovation network demonstrates a notable quadrilateral spatial structure and has evolved toward a multicenter pattern over time.

Spatio-temporal network analysis based on the same 2011–2021 dataset of digital transformation indicators and patent/co-patent links among cities inferred from A-share listed companies' patent data.

high positive How Does Digital Transformation Affect Cross-Regional Collab... spatio-temporal structure of urban collaborative innovation network (quadrilater...

The cooperative innovation network exhibits pronounced small-world characteristics.

Network analysis of cross-regional collaborative innovation using digital transformation and patent data from A-share listed companies on the Shanghai and Shenzhen stock exchanges (2011–2021).

high positive How Does Digital Transformation Affect Cross-Regional Collab... presence of small-world characteristics in the cooperative innovation network

« Prev 1 2 3 … 166 167 168 … 277 278 Next »