Evidence (14055 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Total (aggregate) unemployment is statistically insignificant in explaining sustainable development, indicating aggregate measures mask critical distributional differences across skill groups.

ARDL estimation results reported in the paper showing an insignificant coefficient for total unemployment; discussion emphasizing distributional masking.

high null result Artificial Intelligence, Disaggregated Unemployment, And Sus... sustainable development (effect of total unemployment)

This study constructs a comprehensive evaluation system of urban ecological resilience from three dimensions: potential, elasticity, and stability.

Methodological description in the paper: authors state they constructed a composite resilience evaluation system composed of three specified dimensions for prefecture-level cities.

high null result The impact of artificial intelligence on urban ecological re... urban ecological resilience index (constructed measure)

AI-assisted feedback does not reduce time per character (i.e., it does not increase time cost per unit of feedback).

Time-per-character was measured in the randomized field experiment; authors report no reduction (no increase in time per character) associated with the AI-assisted drafts. Student-level/completion-level data from the experiment (n=88); 11 TAs.

high null result AI Assistance for Discretionary Work: Increasing Feedback Pr... time per character (effort per unit of feedback)

AI-assisted feedback does not negatively affect student usefulness ratings.

Measured student ratings of usefulness in the randomized field experiment; authors report no negative effect of the treatment on these ratings (no significant decrease reported). Student-level sample n=88; 11 TAs.

high null result AI Assistance for Discretionary Work: Increasing Feedback Pr... student usefulness ratings of feedback

Two-stage field experiments in healthcare prescription messaging encompassed 693,139 patient visits in total.

Paper statement of total sample size across Stage 1 and Stage 2.

high null result Beyond One-shot: AI Agents for Learning in Field Experiments total experimental sample size

Stage 2 (Tool-Augmented Agentic AI) autonomously extracted principles from Stage 1 data and generated 17 new message variants tested on 248,448 patient visits.

Study design and reported results from Stage 2 of the two-stage field experiment described in the paper.

high null result Beyond One-shot: AI Agents for Learning in Field Experiments experiment sample allocation (AI-generated variants and patient visits)

Stage 1 (Human + Chatbot) produced 13 message variants and was tested on 444,691 patient visits.

Study design details reported in the paper describing the two-stage field experiment.

high null result Beyond One-shot: AI Agents for Learning in Field Experiments experiment sample allocation (number of message variants and patient visits)

The empirical analysis is based on panel data of new energy vehicle firms in the Yangtze River Delta from 2001 to 2023.

Dataset description provided in the paper's abstract/introduction indicating the time span and regional coverage.

high null result Mechanisms and Effects of Artificial Intelligence on New Qua... dataset/time coverage

R&D expenditure does not constitute a significant mediating channel between artificial intelligence and firms' new quality productive forces.

Mediation analysis using the panel data and constructed indicators; reported nonsignificant mediation effect of R&D expenditure (no sample size or statistics reported in excerpt).

high null result Mechanisms and Effects of Artificial Intelligence on New Qua... new quality productive forces (mediating role of R&D expenditure)

The system was evaluated on OMH-Polyglot, a multilingual coding benchmark spanning Turkish, Arabic, Chinese, and code-switched specifications.

Experimental evaluation reported in the paper using the OMH-Polyglot benchmark.

high null result Cross-Lingual Token Arbitrage: Optimizing Code Agent Context... benchmark evaluation on OMH-Polyglot (coverage of languages and code-switched sp...

Explicit commercial content (product placement) shows no engagement premium (−3.8%, not significant).

Analysis comparing videos labeled for explicit commercial content (product placement) to others; reported percent difference and non-significance.

high null result Auditing Engagement Incentives in the Kidfluencer Ecosystem:... view counts (percent difference)

We conducted a multimodal AI audit of 5,051 videos across 79 kidfluencer channels using weak supervision (LLM-based classification of titles and GPT-4 Vision analysis of thumbnails and descriptions across six literature-grounded dimensions) to assign a probabilistic exploitation score to each video.

Described dataset and methods in paper: multimodal automated pipeline combining weak supervision labeling functions (LLM classifiers on titles, GPT-4 Vision on thumbnails/descriptions) applied to 5,051 videos from 79 channels.

high null result Auditing Engagement Incentives in the Kidfluencer Ecosystem:... probabilistic exploitation score (automated)

The study developed a manufacturing value chain resilience (MVCR) index system based on three dimensions: Readiness, Response, and Recovery, using the CSMAR database.

Methodological description: construction of MVCR index using CSMAR microdata and a three-dimension framework (Readiness, Response, Recovery).

high null result Industrial Robot Application and the Manufacturing Value Cha... manufacturing value chain resilience (MVCR) index

The study constructed indices of industrial robot application at the enterprise-industry-year level by matching industry-level industrial robot data published by the IFR with microdata from Chinese A-share listed companies.

Methodological description in the paper: matching IFR industry-level industrial robot data to microdata from Chinese A-share listed firms to build enterprise-industry-year robot-application indices.

high null result Industrial Robot Application and the Manufacturing Value Cha... index of industrial robot application (enterprise-industry-year)

The study uses listed companies in China's manufacturing industry from 2010 to 2023 as the research sample.

Authors explicitly state the empirical sample: listed manufacturing firms in China covering 2010–2023.

high null result Big data technology application and carbon emission efficien... research sample/time period (data description)

The positive relationship between BDTA and CEE remains robust after a series of robustness tests and endogeneity tests.

Authors state they conducted robustness checks and endogeneity tests (unspecified in the summary) and report that the main regression results remain robust.

high null result Big data technology application and carbon emission efficien... carbon emission efficiency (CEE) (robustness of main effect)

Brain privacy has both personal and social attributes; its protection therefore implicates individual interests and technological development.

Normative/legal argumentation and conceptual analysis presented in the paper (no empirical data reported).

high null result Empowerment or behavioral regulation? governing brain–comput... scope of brain-privacy (personal vs. social) and implicated interests

The experiment was run twice: a first run with unrealistically loud injections, and a second run with signals rescaled to a physically motivated SNR range.

Protocol described in paper explicitly states two runs with different injection SNR scalings (one 'unrealistically loud', one physically motivated).

high null result First head-to-head comparison of agentic AI applied to the a... other

Both agents received identical written specifications and identical compute resources.

Methodological statement in paper specifying that both agents were given the same written spec and the same shared computing infrastructure.

high null result First head-to-head comparison of agentic AI applied to the a... other

The pipeline comprised power spectral density estimation from raw Einstein Telescope simulated noise, geometric template bank generation, matched filter recovery of 100 binary black hole signal injections, automated results generation, and large language model-assisted production of a manuscript formatted in the style of Physical Review D.

Protocol description in paper; matched filter recovery included 100 injected signals (explicitly stated).

high null result First head-to-head comparison of agentic AI applied to the a... other

We compared two state-of-the-art agentic AI systems, Claude Code (Anthropic) and Codex (OpenAI), tasked with autonomously executing a simple end-to-end gravitational wave data analysis pipeline on a shared computing infrastructure without human intervention.

Experimental design described in paper: two named agents were given identical written specifications and identical compute resources and executed the full pipeline autonomously.

high null result First head-to-head comparison of agentic AI applied to the a... other

Greater frontier-level compute does not consistently translate to better performance.

Empirical observation in the paper's findings: increasing compute capacity at the Pareto frontier did not uniformly improve task performance across evaluated tasks.

high null result When Cloud Agents Meet Device Agents: Lessons from Hybrid Mu... task performance as a function of available compute at the frontier

New York City’s Local Law 144 mandates annual bias audits to increase transparency.

Statement of law/policy in paper (factual claim about NYC Local Law 144); legal requirement as described in the text.

high null result Towards Using Ai Bias Audits As Inputs For Red Teaming And P... annual bias audit mandate (LL144)

The fairness of AI-enabled hiring systems remains uncertain.

Statement in paper (background/interpretive claim); no direct empirical measure provided in the excerpt.

high null result Towards Using Ai Bias Audits As Inputs For Red Teaming And P... fairness of AI-enabled hiring systems

The study employs a comparative mixed-methods approach (comparative institutional analysis) of leading financial systems in China, the United States, and the United Kingdom (2022–2025), integrating secondary quantitative indicators with qualitative documentary evidence.

Direct methodological statement in the abstract describing the study design and data sources.

high null result Artificial Intelligence in Financial Security Markets: Catal... methodological approach (comparative mixed-methods)

The distinction matters: debt is a stock of design and governance liability, while the tax is a flow of operating cost that arises because stochastic agents act through tools and workflows.

Conceptual argument in the paper articulating difference between two defined concepts (Agentic Technical Debt vs Stochastic Tax); no empirical demonstration.

high null result Governing Technical Debt in Agentic AI Systems conceptual distinction between liability (stock) and operating cost (flow)

Stochastic Tax is the recurring operating burden of keeping probabilistic agent behavior within acceptable bounds.

Paper provides a formal definition / conceptual framing of 'Stochastic Tax'; stated as an operational concept (no empirical quantification provided).

high null result Governing Technical Debt in Agentic AI Systems operating burden from probabilistic agent behavior

Agentic Technical Debt is the accumulated liability created when prompts, memory, tool schemas, orchestration graphs, control policies, and observability routines are patched together faster than they can be validated, standardized, and governed.

Paper provides a formal definition / conceptual framing of 'Agentic Technical Debt'; presented as a definitional contribution rather than an empirically measured quantity.

high null result Governing Technical Debt in Agentic AI Systems conceptual definition of a technical/governance liability

Agentic AI systems reason over multiple steps, call tools, act through workflows, and adapt through memory and feedback.

Descriptive/definitional statement in the paper; presented as characteristics of agentic systems rather than supported by empirical measurement.

high null result Governing Technical Debt in Agentic AI Systems architectural/behavioral characteristics of agentic AI systems

Agentic AI systems are increasingly being explored as production infrastructure.

Stated as an observation in the paper's introduction/abstract; no empirical data, sample, or formal measurement provided (conceptual/observational claim).

high null result Governing Technical Debt in Agentic AI Systems exploration/adoption of agentic AI as production infrastructure

The audit samples 2,000 runs over a design space of 10 personas x 8 prompts x 3 model configurations x N=10 reps, with the two OpenAI cells at full 8-prompt coverage and the Anthropic sonnet-4.6 / low cell at 4-prompt coverage.

Stated audit design and sample counts in paper (method section describing factorial design and coverage of model/prompt cells).

high null result Persona Conditioning of Brand Recommendations in Retrieval-A... audit sample size and experimental design coverage

The paper evaluates the proposed architecture using the outcome metric 'time-to-insight'.

Methodological statement in the paper listing evaluation metrics.

high null result Beyond the Data Mesh Illusion: Designing Modern AI-augmented... time-to-insight (time required to generate actionable insight from data)

The paper evaluates the proposed architecture using the outcome metric 'time-to-find'.

Methodological statement in the paper listing evaluation metrics.

high null result Beyond the Data Mesh Illusion: Designing Modern AI-augmented... time-to-find (time required to locate relevant data/products)

The paper evaluates the proposed architecture using the outcome metric 'data product adoption'.

Methodological statement in the paper listing evaluation metrics.

high null result Beyond the Data Mesh Illusion: Designing Modern AI-augmented... data product adoption

In the first acquisition the acquirer pursued a disruptive 'rip-and-replace' strategy for the target’s proprietary ERP system.

Empirical observation from the paper's comparative case study of two consecutive acquisitions of the same digital target (qualitative case evidence).

high null result From Knowledge Loss To Knowledge Leverage: How Gen Ai Afford... IS integration strategy (rip-and-replace)

We identify four archetypes (data orchestrators, aggregators, niche specialists, and cloud orchestrators).

Paper states it develops a taxonomy and explicitly lists four archetypes; based on the taxonomy development and conceptual classification reported in the paper (no sample size or quantitative empirical test reported in abstract).

high null result An Ai Economy Beyond Big Tech Hyperscalers? A Taxonomy Of Ma... presence_of_archetypes (data orchestrators, aggregators, niche specialists, clou...

This paper contributes a large-scale empirical dataset involving 57,954 essays from 10,195 students across 120 schools over two years.

The paper explicitly states the dataset size and coverage in the abstract: 57,954 essays, 10,195 students, 120 schools, two-year period.

high null result Double-Edged Sword or Sharp Tool? Designing and Evaluating T... dataset_creation / sample coverage

We leverage logo design job posts before and after the launch of an early-stage platform-embedded logo-AI tool on the online labour market EPWK, using a difference-in-differences design and a new large language model-based skill extraction and embedding framework.

Paper's described empirical design and methods: dataset of logo design job posts on EPWK around the logo-AI tool launch; difference-in-differences analytic approach; LLM-based skill extraction and embedding pipeline. No sample size provided in the abstract.

high null result Exploring The Effect Of Platform-Embedded Generative Ai On S... methodological approach (use of DID and LLM-based skill extraction on EPWK logo ...

Existing research mainly examines general-purpose GenAI, such as ChatGPT, and focuses on aggregate outcomes, including falling demand and compressed prices in easily automated tasks, while revealing little about the demand for work skills and the role of platform-embedded GenAI.

Paper's literature review / background statement summarizing prior empirical work on general-purpose GenAI (e.g., studies documenting falling demand and price compression in automatable tasks). No sample size reported in this statement.

high null result Exploring The Effect Of Platform-Embedded Generative Ai On S... scope of existing research (focus on aggregate outcomes like demand and prices v...

We distill our findings into a meta-design and four design principles (DPs), grounded in kernel theories, for systems where human contextual intelligence and algorithmic recognition must coexist.

Design contribution presented in the paper (meta-design artifact and four DPs derived from the study).

high null result Schnitzel-Prediction: Designing Human-Ai Collaboration For C... design principles and meta-design artifact

We developed a collaborative forecasting system that leverages semantic processing using large language models (LLMs) to solve the 'cold-start' problem for novel menu items while preserving human agency via override mechanisms.

Description of system design and implementation produced during the ADR project (practice-driven abductive approach).

high null result Schnitzel-Prediction: Designing Human-Ai Collaboration For C... resolution of cold-start forecasting for novel menu items; preservation of human...

This paper reports on a 9-month action design research (ADR) project at a German financial services firm.

Explicit methodological description in the paper (study duration and organizational context).

high null result Schnitzel-Prediction: Designing Human-Ai Collaboration For C... study duration and setting

We examined how different degrees of embodiment affect team performance and conversational dynamics in a real-life escape room; teams were composed of either three humans or two humans and an artificial agent (a Box, an Avatar, or a hyper-realistic humanoid).

Experimental field study reported in the paper: a real-life escape room experiment comparing team compositions (3 humans vs. 2 humans + agent of three embodiment types). Sample size not reported in the provided text.

high null result Teaming Up with Artificial Agents in Non-routine Analytical ... team composition / experimental manipulation (embodiment)

To the best of the authors' knowledge, no prior study has examined the psychological mechanism through which algorithmic management shapes employee voice and silence behaviour outside of gig economy and platform work contexts.

Author claim based on literature review (stated gap in existing research).

high null result Algorithmic Management and Acquiescent Silence: The Mediatin... existence/absence of prior studies on psychological mechanisms in non-platform c...

The empirical strategy uses panel local projections to estimate the dynamic effects of AI adoption.

Methodological statement in the paper: application of panel local projections to panel data of industries/establishments over 2017-2025.

high null result AI Adoption and Labor Market Responses: Evidence from Job Po... estimation method / dynamic impulse responses

AI adoption is measured using the share of establishment-level job postings that explicitly require AI-related skills across 13 industries over 2017-2025.

Study design / data description: share of establishment-level job postings requiring AI skills; coverage across 13 industries for years 2017-2025.

high null result AI Adoption and Labor Market Responses: Evidence from Job Po... AI adoption (share of job postings requiring AI skills)

Estimation accuracy depended only weakly on message volume, indicating that more text alone does not guarantee better inference.

Analysis reported in the paper examining the relationship between message volume and estimation accuracy; described as a weak dependency.

high null result Can AI Guess What You Know? Performance Comparison of Large ... relationship between message volume (amount of text) and model estimation accura...

This paper uses the Difference-in-Differences method for empirical research.

Methodological statement in the excerpt explicitly naming the DiD approach.

high null result Impact of artificial intelligence innovation on labor struct... research design / estimation strategy (Difference-in-Differences)

Regression models and moderation analyses were performed in R to examine associations between governance exposure, AI maturity, and adaptation intensity.

Methods statement: 'Regression models and moderation analyses were performed in R (R Computing, Austria) to examine associations between governance exposure, AI maturity, and adaptation intensity.'

high null result Research on the adaptation path of corporate strategy based ... associations_between_governance_exposure_AI_maturity_and_adaptation_indices

Path-specific composite indices for bifurcation, modularity, ethical signaling, and compartmentalization were quantified using validated scales.

Methods description in the paper: 'Path-specific composite indices ... were quantified using validated scales.'

high null result Research on the adaptation path of corporate strategy based ... composite_adaptation_indices (bifurcation, modularity, ethical signaling, compar...

« Prev 1 2 3 … 62 63 64 … 281 282 Next »