Evidence (14055 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

We analyzed over 1.5M assets and 128K agents in EvoMap.

Descriptive dataset statement in the paper reporting the scope of the empirical analysis (assets and agents counts).

high null result Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent... dataset_size

We conducted a global large-scale randomized field experiment, delivering customized LLM-generated feedback for over 31,000 arXiv preprints across 150 fields and more than 45,000 researchers from 133 geographic regions.

Statement in paper describing experimental design and scale: randomized field experiment; sample described as >31,000 preprints, >45,000 researchers, 150 fields, 133 regions.

high null result Human-AI Collaboration in Science at Scale: A Global Large-s... n/a (description of experimental sample and coverage)

The study uses 5 million job postings from Beijing covering 2018--2024 as its primary data source.

Stated dataset scope and size in the paper's description of data.

high null result Generative AI impacts on intra-urban inequality and skill pr... dataset size and temporal coverage

We construct a neighborhood-level GenAI Exposure Index by aggregating task-level assessments from five leading large language models.

Methodological construction described in the paper: task-level GenAI suitability assessments from five LLMs applied to tasks in 5 million Beijing job postings (2018--2024), aggregated to the neighborhood level.

high null result Generative AI impacts on intra-urban inequality and skill pr... GenAI Exposure Index (measurement / adoption proxy)

Decision-makers (DMs) are similarly ambiguity-seeking and ambiguity-generated insensitive (a-insensitive) regardless of whether the analyst is human or a machine learning (ML) model.

Incentivized laboratory experiment in which participants' ambiguity attitudes were measured for forecasts attributed to human and ML analysts; comparison of ambiguity-seeking and a-insensitivity across analyst type reported in the paper (sample size not reported in abstract).

high null result Trusting human versus machine predictions as a decision unde... ambiguity attitude (ambiguity-seeking and a-insensitivity)

There is a significant deficiency in India-centric qualitative investigations on human-AI collaboration in the IT sector.

Authors' review of peer-reviewed literature and secondary data concluding a gap in India-focused qualitative studies (literature gap analysis). No numeric count provided.

high null result Human–AI Collaboration in the Indian IT Industry: A Qualitat... quantity/coverage of India-centric qualitative research

The same bias was not observed when imagining help from another human participant.

Empirical comparison reported in the abstract: predictions about receiving help from another human did not show the same faster-than-reality bias as predictions about AI assistance (from the same preregistered study, N = 1237).

high null result Cognitive offloading and the speedup illusion in human-AI in... predicted completion time when imagining help from another human

Actual completion times between independent completion and AI-assisted completion did not differ.

Empirical result reported in the abstract comparing measured completion times for independent vs. AI-assisted task completion in the preregistered study (N = 1237).

high null result Cognitive offloading and the speedup illusion in human-AI in... actual completion time

We conducted a preregistered large-scale behavioral study (N = 1237) to characterize mismatches between expectations and reality, with a focus on simple cognitive tasks.

Authors report study design and sample size in the abstract: preregistered behavioral experiment with N = 1237 participants.

high null result Cognitive offloading and the speedup illusion in human-AI in... study design / sample size (methodological claim)

Identification strategy exploits import lumpiness in product categories linked to automation technologies (including robots) to disentangle adoption effects from selection into adoption.

Methodological claim: use of import 'lumpiness' in automation-related product categories as a plausibly exogenous source of adoption variation within a difference-in-differences framework.

high null result Firm size and the automation wage premium identification strategy (exogeneity of adoption variation)

We integrate datasets on trade activities, firm, and worker characteristics for the population of Italian importing firms from 2011 to 2019.

Data integration described in abstract; population-level administrative datasets on trade, firm, and worker characteristics for Italian importing firms covering years 2011–2019.

high null result Firm size and the automation wage premium coverage of datasets (population of Italian importing firms 2011–2019)

The study examines the impact of AI technologies on Uzbekistan's labor market transformation in the context of implementing the national strategy 'Digital Uzbekistan - 2030' and the Strategy for the Development of AI Technologies until 2030.

Framing and scope statement in the paper; analysis based on national strategy documents, statistical data, industry reviews, and regulatory legal documents.

high null result The Impact of Artificial Intelligence During the Transformat... impact of AI in the context of national digital/AI strategies

The degree of persuasiveness for LLM-based narrative explanations did not meaningfully impact decision accuracy over a simple AI prediction alone.

Large-scale human behavioral experiment comparing decision accuracy with AI prediction alone versus AI prediction plus narrative explanations of varying persuasiveness (method described in paper).

high null result Human Decision-Making with Persuasive and Narrative LLM Expl... decision accuracy

The system was evaluated on a real 64-GPU A100 testbed emulating three wind-powered sites with Azure production traces.

Experimental evaluation described in abstract: 64-GPU A100 testbed, emulation of three sites, use of Azure production traces.

high null result XWind: A Cross-site Router for Large Language Model Inferenc... experimental evaluation setup

The paper includes comparisons against accelerated baselines (reported experimental comparisons).

Statement in experimental section that comparisons to accelerated baselines were performed; specific baselines and results are in the paper.

high null result CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolv... comparative performance vs. baselines

The paper examines the legal implications of overusing export controls.

Statement of the paper's analytic scope and structure (description of content).

high null result Strategic Stalemates: The Paradox of Export Controls in the ... legal implications of export control overuse

AI infrastructure decisions involve trade-offs across physical resource systems including energy, land, water, and labor.

Descriptive claim in the abstract and framing sections; supported by cited prior work on the economic, physical, and moral limits of AI development and by illustrative regional cases.

high null result The AI Infrastructure Triad in Regional Governance: How Regi... resource demands/trade-offs (energy, land, water, labor) associated with AI infr...

The evidence is used illustratively rather than as a full causal test.

Explicit methodological statement in the abstract describing the role of the evidence (coded comments and cases) as illustrative.

high null result The AI Infrastructure Triad in Regional Governance: How Regi... strength/type of empirical inference (illustrative vs causal)

The article interprets stakeholder and regional positions as different ways of prioritizing the triad's frontiers.

Analysis of the coded public comments and illustrative regional cases used to map stakeholder/regional positions onto the Progress/Sustainability/Equity triad.

high null result The AI Infrastructure Triad in Regional Governance: How Regi... mapping of stakeholder/regional positions onto triad priorities

The article draws on a previously coded dataset of 10,068 public comments submitted to the 2025 U.S. AI Action Plan.

Empirical resource used in the paper; dataset size explicitly reported as 10,068 coded public comments.

high null result The AI Infrastructure Triad in Regional Governance: How Regi... stakeholder/public comment content regarding the U.S. AI Action Plan

We sample 50 benchmark games from a 2,000-game generated pool and evaluate nine frontier and open-weight LLMs in a head-to-head tournament with over 36,000 matches.

Empirical setup reported in the paper's abstract: 50 sampled games, 2,000-game pool, nine LLMs, >36,000 head-to-head matches.

high null result GENSTRAT: Toward a Science of Strategic Reasoning in Large L... evaluation sample size / tournament scale (matches run)

We interviewed 24 product-focused individuals at a large technology firm about how AI has impacted their own work, their work within their product team, and their professional interactions.

Qualitative semi-structured interviews with 24 product-focused employees at a single large technology firm; sample size = 24.

high null result Beyond the Org Chart: AI and the Transformation of Invisible... description of sample and data collection

This study is a systematic literature review conducted following PRISMA 2020 guidelines synthesizing peer-reviewed studies published between 2019 and 2025 identified via searches in Scopus, Web of Science and Google Scholar.

Author-stated methodology in the paper: PRISMA 2020 systematic literature review covering 2019–2025 with database searches in Scopus, Web of Science, and Google Scholar.

high null result Yapay Zeka Sistemleri ve İnsan İşbirliğinin Psikolojik, Sosy... scope and coverage of literature search / methodological transparency

This scoping review adhered to the PRISMA-ScR guidelines and encompassed 29 peer-reviewed empirical studies published from 2020 to 2025.

Methods statement in the paper (explicit methodological description).

high null result The influence of AI-Driven Employee Performance Management (... scope and methodological adherence of the review (PRISMA-ScR; n=29 studies)

AI capability is conceptualized/measured as having sub-dimensions including technical infrastructure and management.

Measurement/model description in paper: AI capability broken into sub-dimensions (technical infrastructure, management); supported by survey instrument and measurement model using PLS-SEM on 251 firms.

high null result AI for decision-making: exploring the linkage from AI capabi... construct dimensionality of AI capability

The mixed-method approach, combining partial least squares–structural equation modeling (PLS-SEM) and fuzzy-set qualitative comparative analysis (fsQCA), was used for analyzing the survey data of 251 firms.

Methods statement in paper: authors report using a mixed-method approach (PLS-SEM and fsQCA) on survey data; sample size explicitly stated as 251 firms.

high null result AI for decision-making: exploring the linkage from AI capabi... research methodology / analytic approach

The paper identifies five major research gaps and proposes future research directions in intelligent international marketing.

Author-reported outcome of the paper's systematic review and content analysis (2010–2025); descriptive claim about the paper's contributions.

high null result Research on International Marketing in the Context of Intell... identification of research gaps and proposed directions

Prior productivity does not predict AI use.

Analysis linking prior productivity measures to reported AI adoption in the Census Bureau survey data; finding of no predictive relationship reported.

high null result The Adoption of Industrial AI in America predictive relationship between prior productivity and AI adoption

The analysis uses a mandatory, purpose-designed Census Bureau survey of approximately 28,500 establishments.

Census Bureau mandatory survey specifically designed for this study; sample size stated as approximately 28,500 establishments.

high null result The Adoption of Industrial AI in America survey_sample_size / data source

Large language models are routinely used as automated evaluators (to review code, moderate content, or score outputs), often with many items passing through one conversation.

Background/introductory claim in the paper describing common practice; not an experimental result but contextual motivation.

high null result AMEL: Accumulated Message Effects on LLM Judgments prevalence of LLM use as automated evaluators

Position of biased turns does not matter: five biased turns placed anywhere in a 50-turn history produce the same shift.

Follow-up experiment manipulating the positions of biased turns within 50-turn histories and observing equivalent bias magnitudes.

high null result AMEL: Accumulated Message Effects on LLM Judgments dependence of AMEL on the position of biased messages in conversation history

Bias does not grow with context length: 5 prior turns and 50 produce the same shift (Spearman |r| < 0.01; OLS slope p = 0.80).

Correlation and OLS analysis of bias magnitude versus context-length (number of prior turns) reported in the experiments.

high null result AMEL: Accumulated Message Effects on LLM Judgments relationship between context length and magnitude of AMEL

We conducted 75,898 API calls to 11 models from 4 providers (OpenAI, Anthropic, Google, and four open-source models).

Descriptive statement of the experimental scope reported in the paper: total number of API calls and models/providers tested.

high null result AMEL: Accumulated Message Effects on LLM Judgments experimental sample size / scope (number of API calls and models)

When execution is standardized on a cheaper Gemini Flash scaffold (separating planning from execution), a pooled 32-game planner bakeoff is consistent with near-equality (p approx 0.821).

Empirical experiment: 32-game planner-only comparison where execution was standardized; reported p-value ≈ 0.821 indicating no significant difference among planners.

high null result Evaluating Large Language Models as Live Strategic Agents: P... planner performance equality (pooled test)

We study this setting in a timed multi-phase Risk environment with explicit victory targets and repeated planning and execution cycles.

Methodological description of the experimental environment used in the paper (timed multi-phase Risk environment with explicit victory targets and repeated cycles).

high null result Evaluating Large Language Models as Live Strategic Agents: P... experimental_environment_description

Identification of effects uses within-firm variation with firm and city-by-year fixed effects.

Identification strategy reported in abstract: within-firm variation under firm and city-by-year fixed effects.

high null result Toward Sustainable Workforce Development: How AI Reshapes Sk... identification approach / econometric controls

The study measures four skill-category demand shares and their within-category importance from job-description text.

Methodological statement in abstract: measurement of four skill-category demand shares and within-category importance via job-description text.

high null result Toward Sustainable Workforce Development: How AI Reshapes Sk... skill-category demand shares and within-category importance

AI exposure is decomposed into displacement and augmentation components based on task routineness.

Methodological claim in abstract: decomposition of exposure into displacement and augmentation using a routineness criterion for tasks.

high null result Toward Sustainable Workforce Development: How AI Reshapes Sk... decomposed AI exposure measures (displacement vs augmentation)

The authors construct firm-by-year potential AI exposure via semantic matching between AI patent texts and detailed occupation task descriptions.

Method description in abstract: semantic matching of AI patent texts to occupation task descriptions to build firm-by-year exposure.

high null result Toward Sustainable Workforce Development: How AI Reshapes Sk... firm-by-year potential AI exposure (constructed measure)

The study uses approximately 67 million online job postings from two major Chinese recruitment platforms (2019–2024).

Statement in paper abstract describing dataset size and source (job postings from two major Chinese recruitment platforms over 2019–2024).

high null result Toward Sustainable Workforce Development: How AI Reshapes Sk... dataset size and coverage (number of job postings, platforms, years)

The study extends the Technology Acceptance Model (TAM), Dynamic Capabilities Theory, and the Technology-Organisation-Environment (TOE) framework into the qualitative, emerging-economy entrepreneurial context.

Authors' stated theoretical contribution based on mapping thematic results to TAM, Dynamic Capabilities, and TOE frameworks within analysis and discussion sections.

high null result Navigating the Intelligence Frontier: AI Adoption as a Succe... theoretical contribution / framework extension

This study employed an interpretivist, qualitative research design using sixteen in-depth semi-structured interviews with entrepreneurs across fintech, edtech, health-tech, logistics, retail, and SaaS in Delhi/NCR, India, and used Braun & Clarke's (2006) six-phase thematic analysis framework.

Explicit methodological description in the paper: interpretivist qualitative design; n=16 in-depth semi-structured interviews across specified sectors in Delhi/NCR; thematic analysis following Braun & Clarke (2006).

high null result Navigating the Intelligence Frontier: AI Adoption as a Succe... research design / data collection (qualitative interviews)

Using a qualitative approach with 17 expert interviews from employees at startups.

Methods statement in paper specifying qualitative study design and sample size of 17 interviews.

high null result From Prompt To Process: Qualitative Insights On How Genai Us... study methodology and sample

Process-related insights into how GenAI transforms startups are limited.

Authors' literature positioning / gap statement in paper (no empirical metric provided).

high null result From Prompt To Process: Qualitative Insights On How Genai Us... availability of process-related insights in literature

The paper's findings are based on three pre-registered user studies with a combined sample size of N = 2691.

Statement in the paper's abstract reporting three pre-registered user studies and combined N = 2691.

high null result The efficiency-gain illusion: People underestimate the rate ... study_sample_description

Light AI users perform similarly to matched users who do not use AI.

Same controlled logical reasoning experiment with on-demand AI assistance comparing light AI users to matched non-users (sample size not stated in abstract).

high null result The Impact of AI Usage and Informativeness on Skill Developm... post-AI performance / skill development

We map that space through six interconnected elements: sociotechnical context, decision-making frameworks, human decision participants, AI capabilities, interaction, and holistic evaluation.

The paper's proposed analytical/framework contribution listing six elements (descriptive of the authors' mapping work).

high null result Addressing the Synergy Gap: The Six Elements of the Design S... n/a (framework description)

Most current work treats human-AI combination as an engineering problem and concentrates on interpretability, trust calibration, or interface design.

Authors' characterization of the existing literature and dominant research foci (qualitative literature assessment; no quantitative breakdown provided).

high null result Addressing the Synergy Gap: The Six Elements of the Design S... research focus/themes in human-AI combination literature

We call this persistent shortfall the 'synergy gap.'

Terminology/definition introduced by the authors in the paper (conceptual claim, not an empirical finding).

high null result Addressing the Synergy Gap: The Six Elements of the Design S... n/a (terminology defining a phenomenon)

Agentic payments are distinct from traditional automated systems because they emphasise autonomy, contextual reasoning and adaptability.

Conceptual distinction asserted in the abstract (comparative analysis between agentic payments and traditional automated systems).

high null result AI Agents in Payments: Applications, Risks and Regulations system characteristics (autonomy, contextual reasoning, adaptability)

« Prev 1 2 3 … 64 65 66 … 281 282 Next »