Evidence (13870 claims)
Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 196 | 98 | 892 | 1984 |
| Governance & Regulation | 817 | 394 | 188 | 121 | 1544 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 627 | 233 | 123 | 96 | 1088 |
| Research Productivity | 411 | 123 | 56 | 332 | 933 |
| Output Quality | 467 | 178 | 59 | 47 | 751 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 167 | 122 | 24 | 496 |
| Task Allocation | 207 | 64 | 71 | 32 | 379 |
| Skill Acquisition | 165 | 59 | 60 | 17 | 301 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 52 | 107 | 13 | 279 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 150 | 48 | 26 | 3 | 227 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 63 | 20 | 12 | 184 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 93 | 21 | 13 | 19 | 148 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Creative Output | 31 | 17 | 7 | 3 | 59 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
OneSearch-V2 effectively mitigates common search system issues such as information bubbles and long-tail sparsity, without incurring additional inference costs or serving latency.
Author claim in the paper stating mitigation of these issues and no added inference/latency costs; no quantitative measures, benchmarks, or latency numbers provided in the excerpt.
Manual evaluation confirms gains in query-item relevance, with +1.37%.
Reported manual evaluation metric in the paper; no sample size or annotation protocol provided in the excerpt.
Manual evaluation confirms gains in search experience quality, with +1.65% in page good rate.
Reported manual evaluation metric in the paper; no sample size or annotation protocol provided in the excerpt.
OneSearch-V2 increases order volume by +2.11% in online A/B tests.
Reported online A/B test result in the paper; no sample size, test duration, or statistical significance reported in the excerpt.
OneSearch-V2 increases buyer conversion rate by +3.05% in online A/B tests.
Reported online A/B test result in the paper; no sample size, test duration, or statistical significance reported in the excerpt.
OneSearch-V2 increases item CTR by +3.98% in online A/B tests.
Reported online A/B test result in the paper; no sample size, test duration, or statistical significance reported in the excerpt.
OneSearch, as a representative industrial-scale deployed generative search framework, has brought significant commercial and operational benefits.
Author assertion describing OneSearch as industrial-scale and commercially/operationally beneficial; no supporting numerical evidence or sample size reported in the excerpt.
Generative Retrieval (GR) offers advantages over multi-stage cascaded architectures such as end-to-end joint optimization and high computational efficiency.
Statement in paper positioning GR as a promising paradigm and listing these advantages; no quantitative study or sample size reported in the excerpt.
The framework aims to support more comparable benchmarks and cumulative research on human-AI readiness, advancing safer and more accountable human-AI collaboration.
Stated aims and intended impact in paper; aspirational/conceptual rather than empirically demonstrated in excerpt.
Operationalizing evaluation through interaction traces rather than model properties or self-reported trust enables deployment-relevant assessment of calibration, error recovery, and governance.
Methodological claim/proposed approach in paper; presented as enabling assessment but no empirical evaluation reported in excerpt.
The taxonomy and metrics are connected to the Understand-Control-Improve (U-C-I) lifecycle of human-AI onboarding and collaboration.
Conceptual mapping described in paper; no empirical tests or sample reported in excerpt.
We introduce a four part taxonomy of evaluation metrics spanning outcomes, reliance behavior, safety signals, and learning over time.
Explicit methodological claim in paper announcing a taxonomy; described as a contribution rather than empirically tested in excerpt.
This paper proposes a measurement framework for evaluating human-AI decision-making centered on team readiness.
Methodological contribution presented in paper; conceptual framework proposed (no empirical validation reported in excerpt).
Artificial intelligence (AI) systems are deployed as collaborators in human decision-making.
Statement in paper (conceptual/observational claim); no empirical sample or method provided in excerpt.
Late disclosure of AI involvement improved affective engagement for AI-enhanced content.
Reported experimental result in the abstract from the two online studies (study 1: n = 325; study 2: n = 371) manipulating disclosure timing (early vs. late).
Automation in Japanese manufacturing increased even during periods of slow productivity growth.
Empirical finding from applying the framework to industry-level data in Japanese manufacturing; comparison of inferred automation trends with observed productivity growth periods (exact sample/time not provided in the summary).
Applying the framework to Japanese manufacturing industries shows that automation increased through capital deepening.
Empirical application of the theoretical framework to Japanese manufacturing industries (industry-level analysis); estimation/inference using industry macro observables. (Paper states result; exact sample size/time span not provided in the summary.)
The model provides a transparent mapping from standard macroeconomic observables (capital-labor ratio, output per worker, elasticity of substitution) into the degree of automation, allowing automation to be measured without relying on technology-specific indicators.
Theoretical mapping derived from the CES structure that links observable macro variables to the endogenous degree of automation; methodological claim about inference procedure.
Aggregating task-level decisions generates a CES production function in which the economy-wide degree of automation emerges endogenously.
Analytical derivation in the paper: aggregation of task-level adoption decisions yields a CES aggregate production function with endogenous automation parameter.
The degree of automation is defined as the share of tasks performed by capital rather than labor.
Explicit model definition provided in the paper (conceptual/theoretical definition).
The degree of automation in the aggregate economy emerges endogenously as an equilibrium outcome and can be inferred from standard macroeconomic data.
Theoretical development in a task-based production framework with endogenous technology adoption; mapping from model to observable macro variables (capital-labor ratio, output per worker, elasticity of substitution).
The results of this regional research outline a multi-dimensional policy roadmap that dives deep into the region’s current capabilities and the hurdles it faces in catching up with the AI revolution from a governance and policy perspective, presenting them in a practical framework for public sector leaders.
Report summary claiming that the study's results produce a comprehensive roadmap and practical framework (content description).
This executive report provides a roadmap for establishing an AI governance infrastructure through a set of strategic policy recommendations across seven key pillars.
Document assertion describing the content and structure of the report (authors' deliverable).
The reality of limited AI governance capacity calls for a series of policy interventions at both local and regional levels to empower the AI ecosystem in the Arab region.
Authors' policy recommendation derived from the regional study and synthesis of findings.
A governance model linking 'trustworthy AI' practices to competitive advantage yields reduced uncertainty, faster deployment cycles, and higher stakeholder trust.
Central claim of the paper tying the proposed AIGSF to business benefits; supported by conceptual linkage and illustrative examples rather than quantified empirical evidence or controlled evaluation.
Case illustrations across hiring, credit, consumer services, and generative AI draw lessons on controls such as model documentation, algorithmic audits, impact assessments, and human-in-the-loop oversight.
Paper includes qualitative case illustrations in the listed domains to demonstrate governance controls; these are presented as examples and lessons rather than as systematic empirical studies (no sample sizes reported).
The paper develops an AI Governance Strategic Framework (AIGSF) and an implementation roadmap that connect ethical accountability, regulatory readiness, cybersecurity resilience, and performance outcomes.
Paper contribution described as an integrative conceptual framework and roadmap; supported by theoretical grounding and illustrative cases rather than empirical validation; no sample size provided.
AI governance should be treated as a strategic governance function—anchored in board oversight and enterprise risk management—rather than a narrow technical or compliance task.
Central normative recommendation and thesis of the paper; derived from an integrative conceptual framework grounded in corporate governance theory, ERM, and emerging regulation. No empirical testing or sample reported.
AI has moved from a peripheral digital capability to a central driver of corporate strategy, reshaping decision-making, customer engagement, operations, and risk exposure.
Statement presented in the paper's introduction and motivation; supported by integrative conceptual design and literature grounding (theory and descriptive citations). No empirical sample or quantitative analysis reported.
A policy of 20% mandatory practice preserves 92% more capability than the simulation baseline (baseline includes a 5% background AI-failure rate).
Simulation comparing baseline (5% background AI-failure rate) to a counterfactual with 20% mandatory practice; reported 92% relative preservation of capability.
The model predicts that periodic AI failures improve human capability 2.7-fold (relative improvement reported in simulations).
Simulation experiments comparing scenarios with/without periodic AI failures; reported fold-change in capability of 2.7×.
Validated against 15 countries' PISA data (102 points), the model achieves R^2 = 0.946 with 3 parameters and attains the lowest BIC among compared specifications.
Empirical validation using PISA dataset covering 15 countries and 102 data points; reported fit statistics (R^2, number of parameters, BIC).
The model was calibrated to four domains: education, medicine, navigation, and aviation.
Model calibration procedures applied separately to four named domains reported in the paper.
We present a two-variable dynamical systems model coupling capability (H) and delegation (D), grounded in three axioms: learning requires capability, practice, and disuse causes forgetting.
Model specification and theoretical construction described in the paper (two-variable dynamical system; three axioms).
These results demonstrate a practical path toward high-precision, low-latency text-to-SQL applications using domain-specialized, self-hosted language models in large-scale production environments.
Conclusion drawn by the authors based on their implementation, token reduction, and reported accuracy/latency-related claims; generalization to large-scale production is asserted but not supported by detailed production deployment metrics in the excerpt.
The resulting system achieves 98.4% execution success and 92.5% semantic accuracy, substantially outperforming a prompt-engineered baseline using Google's Gemini Flash 2.0 (95.6% execution, 89.4% semantic accuracy).
Reported empirical evaluation comparing the authors' system to a prompt-engineered baseline (Gemini Flash 2.0) with explicit performance percentages for execution success and semantic accuracy; no sample size, test set composition, statistical significance, or evaluation protocol provided in the excerpt.
The approach replaces costly external API calls with efficient local inference.
System design claim: the model is self-hosted and performs local inference instead of using external API-based LLM calls; no cost accounting or latency benchmarks provided in the excerpt.
This reduces input tokens by over 99%, from a 17k-token baseline to fewer than 100.
Reported measurement comparing input token counts before and after applying their approach (explicit numerical baseline and resulting counts provided); no sample size or distribution of token counts reported.
A novel two-phase supervised fine-tuning approach enables the model to internalize the entire database schema, eliminating the need for long-context prompts.
Methodological description (two-phase supervised fine-tuning) and claim that this internalization removes reliance on long-context prompts; no detailed experimental protocol or sample size provided in the excerpt.
We present a specialized, self-hosted 8B-parameter model designed for a conversational bot in CriQ, a sister app to Dream11 that answers user queries about cricket statistics.
Stated implementation detail in the paper describing the model architecture and deployment target (CriQ conversational bot). No experimental sample size reported for this statement.
Legal professionals, courts, and regulators should replace the outdated 'black box' mental model with verification protocols based on how these systems actually fail.
Policy recommendation stated in the abstract based on the paper's analysis; no trial or deployment evidence of such protocols provided in the excerpt.
The adoption of generative AI across commercial and legal professions offers dramatic efficiency gains.
Asserted in the paper's introduction/abstract; no empirical data, sample, or quantitative study reported in the excerpt.
Those extended-model equilibria also show increasing concentration consistent with power-law-like distributions (i.e., winner-take-most / superstar effects).
Theoretical model combining quality heterogeneity and reinforcement dynamics that yields equilibrium distributions with heavy tails; argument and formalization presented in the paper; no empirical testing reported.
Even as the number of producers increases and average attention per producer falls, total output expands (production scales elastically).
Same formal theoretical model (analytical result): production scales elastically in the model despite finite attention; no empirical validation provided.
Mechanisms identified — network structure evolution and increased relational embeddedness — contribute to a broader understanding of how digital transformation shapes innovation dynamics across geographical boundaries in a globalized knowledge economy.
Synthesis of empirical network evolution results and mediation/structural analyses from the 2011–2021 dataset of digital transformation indicators and patent collaboration networks among cities and firms.
These results provide empirical evidence from a major emerging economy (China) that can offer insights to inform policies and strategies in other regions undergoing digital transition.
Generalization claim based on empirical findings from the 2011–2021 analysis of A-share listed companies' digital transformation and patent collaboration patterns in China.
When the volume of digital patent applications surpasses a certain threshold, the positive effect of digital transformation on the quality of cross-regional collaborative innovation accelerates (nonlinear threshold effect).
Threshold regression / nonlinear analysis relating counts of digital patent applications to the marginal effect of digital transformation on collaborative innovation quality, using 2011–2021 patent and digitalization data from A-share listed firms.
Advancement of digital transformation positively contributes to both the quality and the quantity of cross-regional cooperative innovation.
Empirical econometric analysis (panel regressions) linking measures of corporate/urban digital transformation to indicators of cross-regional cooperative innovation quality and counts, using A-share listed companies' digital transformation indicators and patent collaboration data, 2011–2021.
China’s urban collaborative innovation network demonstrates a notable quadrilateral spatial structure and has evolved toward a multicenter pattern over time.
Spatio-temporal network analysis based on the same 2011–2021 dataset of digital transformation indicators and patent/co-patent links among cities inferred from A-share listed companies' patent data.
The cooperative innovation network exhibits pronounced small-world characteristics.
Network analysis of cross-regional collaborative innovation using digital transformation and patent data from A-share listed companies on the Shanghai and Shenzhen stock exchanges (2011–2021).