Evidence (13870 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	196	98	892	1984
Governance & Regulation	817	394	188	121	1544
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	627	233	123	96	1088
Research Productivity	411	123	56	332	933
Output Quality	467	178	59	47	751
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	167	122	24	496
Task Allocation	207	64	71	32	379
Skill Acquisition	165	59	60	17	301
Innovation Output	203	27	43	18	292
Employment Level	105	52	107	13	279
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	150	48	26	3	227
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	63	20	12	184
Error Rate	69	92	10	2	173
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	93	21	13	19	148
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Creative Output	31	17	7	3	59
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Most existing approaches implicitly assume that once a decision is produced, it is eligible for execution.

Author assertion / conceptual critique of existing approaches presented in the paper (no empirical test reported).

high null result Right-to-Act: A Pre-Execution Non-Compensatory Decision Prot... implicit execution-eligibility assumption in prior AI safety approaches

Most existing approaches to AI safety, risk management, and governance focus on post-hoc validation, probabilistic risk estimation, or certification of model behavior.

Author statement summarizing the literature / prior work in AI safety and governance (conceptual claim in the paper's introduction). No empirical survey or sample size reported.

high null result Right-to-Act: A Pre-Execution Non-Compensatory Decision Prot... characterization of prevailing AI safety and governance approaches (post-hoc val...

We develop a formal model in which institutions choose the scale of automation, the degree of codification, and safeguards on iterative use.

Methodological statement: the paper presents a formal/theoretical model specifying institutional choice variables (model description rather than empirical result).

high null result AI Governance under Political Turnover: The Alignment Surfac... institutional choices regarding automation scale, codification, and safeguards (...

On the n=11 subset with published SWE-bench scores, composite and benchmark-only rankings are nearly uncorrelated (ρ_s=0.25).

Spearman rank correlation between composite rankings and benchmark-only rankings on an 11-agent subset that has published SWE-bench scores; reported correlation.

high null result AgentPulse: A Continuous Multi-Signal Framework for Evaluati... rank correlation between composite ranking and benchmark-only ranking

We document the performance of a market-based scaffolding with these LLMs.

Empirical documentation reported in the paper describing how a market-based scaffolding performs when using the six LLMs on the 93 tasks.

high null result MarketBench: Evaluating AI Agents as Market Participants performance metrics of a market-based scaffolding using LLM self-reports

We use a 93-task subset of SWE-bench Lite, a software engineering benchmark, with six recently released LLMs as a demonstration.

Empirical setup described in the paper: evaluation uses a 93-task subset of SWE-bench Lite and six recent LLMs.

high null result MarketBench: Evaluating AI Agents as Market Participants experimental dataset size and model set used for demonstration

We propose MarketBench, a benchmark for assessing whether AI agents have these capabilities.

Paper contribution claim: introduction of a benchmark named MarketBench described in the paper.

high null result MarketBench: Evaluating AI Agents as Market Participants existence of the MarketBench benchmark

In order to effectively participate in markets, agents need to have informative signals of their own ability to successfully complete a task and the cost of doing so.

Conceptual claim / design requirement motivating the benchmark; stated as part of the paper's framing rather than an empirical result.

high null result MarketBench: Evaluating AI Agents as Market Participants informativeness/calibration of self-reported ability and cost signals

We instrument ITAS, a four-agent tutoring system built on Gemini 2.5 Flash and Google Vertex AI, across three throughput tiers (Standard PayGo, Priority PayGo, and Provisioned Throughput) and eleven concurrency levels up to 50 simultaneous users, producing over 3,000 requests drawn from a live graduate STEM deployment.

Methods statement in paper describing experimental setup: four-agent ITAS built on Gemini 2.5 Flash and Google Vertex AI; three throughput tiers; eleven concurrency levels up to 50; over 3,000 requests from a live graduate STEM deployment.

high null result Latency and Cost of Multi-Agent Intelligent Tutoring at Scal... instrumented request sample (number of requests and concurrency levels)

We compare LLM-guided bidding against truthful and heuristic strategies using the Vickrey-Clarke-Groves (VCG) mechanism as a benchmark for incentive-compatible, dominant-strategy truthfulness.

Methodological claim describing the comparative experimental design: simulations use VCG as benchmark and include comparisons to truthful and heuristic bidding strategies. No sample size or detailed experimental parameters are provided in the excerpt.

high null result Strategic Bidding in 6G Spectrum Auctions with Large Languag... comparative performance of bidding strategies

When the theoretical assumptions guaranteeing truthfulness hold, LLM bidders recover near-equilibrium outcomes consistent with VCG predictions.

Simulation experiments comparing LLM-guided bidding to the VCG benchmark and to truthful/heuristic strategies under conditions where VCG assumptions are satisfied. The paper reports that LLM outcomes were close to the VCG-predicted equilibrium. No numeric sample size or quantitative effect sizes reported in the provided text.

high null result Strategic Bidding in 6G Spectrum Auctions with Large Languag... equilibrium outcomes / allocation and utility relative to VCG benchmark

We investigate the use of Large Language Models (LLMs) as bidding agents in repeated 6G spectrum auctions with budget constraints in vehicular networks.

Descriptive statement of the study design: the paper reports simulation/experimental evaluation where each user equipment (UE) is modeled as a rational player in repeated spectrum auctions; comparison against truthful and heuristic strategies under Vickrey-Clarke-Groves (VCG) benchmark. No numeric sample size reported in the provided text.

high null result Strategic Bidding in 6G Spectrum Auctions with Large Languag... use of LLMs as bidding agents (methodological evaluation)

The welfare consequences of genAI can be organized by a two-dimensional taxonomy: the strength of the incentive to perform the task without AI, and the severity of model collapse.

Analytical organization derived from the theoretical model presented in the paper (conceptual taxonomy based on model parameters; no empirical sample reported in abstract).

high null result Generative artificial intelligence reduces social welfare th... social welfare outcomes as a function of incentive strength and model collapse s...

We develop a parsimonious model of behavior in collaborative interactions in which individuals can either exert human effort, rely on genAI, or refrain from work altogether.

Methodological claim: authors present a formal theoretical model with the specified choice set (model description in paper; no empirical sample reported in abstract).

high null result Generative artificial intelligence reduces social welfare th... choice among effort modalities (human effort, genAI reliance, abstention)

Predictive performance exhibits saturation beyond a certain context length.

Experiments varying the context (input) length in foundation models and observing changes in forecasting performance; reported saturation effect in analyses.

high null result FETS Benchmark: Foundation Models Outperform Dataset-specifi... change in forecast accuracy as context length increases

Task difficulty rated by human experts only weakly aligns with actual token costs, revealing a fundamental gap between human-perceived complexity and the computational effort agents actually expend.

Analysis comparing human expert difficulty ratings to measured token costs for tasks in SWE-bench Verified; weak alignment reported in the paper between ratings and token consumption.

high null result How Do AI Agents Spend Your Money? Analyzing and Predicting ... correspondence/alignment between human-rated task difficulty and measured token ...

Higher token usage does not translate into higher accuracy; accuracy often peaks at intermediate cost and saturates at higher costs.

Comparison of accuracy (task success) versus total token usage across runs/trajectories in the agentic coding experiments on SWE-bench Verified; reported observed relationship (peak at intermediate costs and saturation thereafter).

high null result How Do AI Agents Spend Your Money? Analyzing and Predicting ... task accuracy as a function of token usage

Learning-based control offers a more adaptive alternative, but it remains unclear whether such methods... can sustain hours of reliable operation, deliver consistent quality, and behave safely around people on a live production line.

Framing of a research gap in the paper's introduction; no primary experimental data presented here (statement of uncertainty motivating the study).

high null result Learning-augmented robotic automation for real-world manufac... operational reliability, product quality consistency, safety around people for l...

Die Studie basiert auf einer wiederholten Querschnittsbefragung lizenzierter Beschäftigter einer außeruniversitären Forschungseinrichtung.

Autorenangabe im Abstract: wiederholte Querschnittsbefragung (survey) unter lizenzieren Beschäftigten der untersuchten Forschungseinrichtung; methodische Beschreibung im Abstract.

high null result Generative KI in der Wissensarbeit: Wahrnehmung, Nutzen und ... Studiendesign / Datengrundlage (repeated cross-sectional survey)

The paper provides a natural definition of benchmark hacking in this strategic context by comparing a player's equilibrium effort allocation to that of a single-agent baseline scenario.

Conceptual/theoretical definition introduced in the model comparing equilibrium effort allocations to a single-agent (non-competitive) baseline.

high null result On Benchmark Hacking in ML Contests: Modeling, Insights and ... benchmark hacking (difference in effort allocation versus single-agent baseline)

The main findings are robust to multiple robustness checks.

Paper reports multiple unspecified robustness checks applied to the fixed-effects regression analyses on the panel of publicly listed Chinese firms (2012–2023).

high null result Following the Herd or the Bellwether: Peer Effects in Firms’... robustness of reported peer effect findings

We use a unified amortized framework to isolate semantic differences between eight Shapley variants under the low-latency constraints of operational risk workflows.

Methodological contribution described in the paper: a unified amortized computational framework applied to eight Shapley variants, evaluated under latency constraints typical of operational workflows.

high null result Rethinking XAI Evaluation: A Human-Centered Audit of Shapley... ability to isolate semantic differences among Shapley variants under low-latency...

No formulation improved objective analyst performance.

Controlled/empirical experiment reported in the paper evaluating eight Shapley variants with professional analysts in the fraud-detection environment; performance measured over 3,735 case reviews.

high null result Rethinking XAI Evaluation: A Human-Centered Audit of Shapley... objective analyst performance (e.g., accuracy on case reviews)

Standard quantitative metrics, such as sparsity and faithfulness, are decoupled from human-perceived clarity and decision utility.

Empirical comparison in the paper between quantitative metrics (sparsity, faithfulness) and human-judged clarity/decision-utility across the datasets and analyst reviews; based on the authors' large-scale evaluation.

high null result Rethinking XAI Evaluation: A Human-Centered Audit of Shapley... correlation/alignment between quantitative explanation metrics (sparsity, faithf...

We conduct a large-scale empirical evaluation across four risk datasets and a realistic fraud-detection environment involving professional analysts and 3,735 case reviews.

Experimental methods reported in the paper: evaluation across four risk datasets and a fraud-detection environment with professional analysts; stated sample of 3,735 case reviews.

high null result Rethinking XAI Evaluation: A Human-Centered Audit of Shapley... number of case reviews / scale of empirical evaluation

A central issue is how humans interpret the algorithm's choice of features, which affects the design and evaluation of highlighting policies.

Framing and motivation in the paper: conceptual claim motivating the formal models and analysis (theoretical/argumentative).

high null result Algorithmic Feature Highlighting for Human-AI Decision-Makin... impact of human interpretation on policy design and evaluation

We illustrate our framework in a calibrated empirical exercise based on the American Housing Survey.

An empirical/calibrated exercise using data from the American Housing Survey reported in the paper; the claim is that the framework is illustrated empirically (data-based demonstration).

high null result Algorithmic Feature Highlighting for Human-AI Decision-Makin... empirical illustration of highlighting policies using American Housing Survey da...

Humans may interpret the algorithm's choice of features in different ways: a sophisticated agent correctly conditions on the selection rule, while a naive agent updates only on revealed feature values and treats the selection event as exogenous.

Conceptual/behavioral modeling in the paper that defines two agent-types (sophisticated vs naive) and analyzes their distinct inference processes (theoretical/modeling).

high null result Algorithmic Feature Highlighting for Human-AI Decision-Makin... human inference model (conditioning behavior) in response to highlighted feature...

Highlighting can be modeled as a constrained information policy that selects a small number of features to reveal.

Modeling framework developed in the paper: formal definition of highlighting as an information policy with a feature-selection constraint (theoretical/modeling).

high null result Algorithmic Feature Highlighting for Human-AI Decision-Makin... representation of highlighting within a formal decision-theoretic framework

We study this question using 10,659 matched human-agent pairs from Moltbook, a social media platform where each autonomous agent is publicly linked to its owner's Twitter/X account.

Descriptive statement of the study dataset reported in the paper: dataset of 10,659 matched human-agent pairs from Moltbook with public linkage to owner's Twitter/X account.

high null result Behavioral Transfer in AI Agents: Evidence and Privacy Impli... matched_human-agent_pairs_count

The paper proposes a conceptual framework linking AI adoption to employability and role transformation, mediated by skill adaptation, continuous learning, and organizational readiness.

Author-proposed conceptual framework presented in the review paper (theoretical linkage based on literature synthesis).

high null result The Impact of AI on Employability and Evolving Job Roles of ... linkage between AI adoption and employability

This study takes food delivery riders as the research object and analyzes the dilemma of labor relations determination under AIGC.

Methodological statement in the paper specifying the chosen subject of analysis (food delivery riders); this is an explicit description of the paper's scope rather than an empirical finding.

high null result AIGC+ Determination of Labor Relations in the Context of the... research scope / sample (food delivery riders)

The paper develops an interdisciplinary conceptual framework that integrates insights from economics, management theory, and digital governance to characterize algorithmic enterprises.

Methodological claim about the paper's approach; stated in abstract as the paper's contribution (conceptual framework built from interdisciplinary literature).

high null result Algorithmic Enterprises: Rethinking Firm Strategy in the Age... existence and structure of a conceptual interdisciplinary framework

Future research should strengthen cross-national comparisons, longitudinal tracking, and interdisciplinary collaboration to support development of a technology governance framework that balances efficiency with equity.

Author recommendation based on identified research gaps in the literature review (prescriptive/recommendation).

high null result From Technological Substitution to Institutional Response: A... recommended research approaches and governance framework design

Existing research has clear gaps: limited evidence from developing-country contexts, insufficient attention to within-occupation heterogeneity, incomplete accounts of psychological mechanisms underlying AI anxiety, and a shortage of rigorous evaluations of reskilling policy effectiveness.

Author's assessment based on the reviewed literature identifying thematic gaps and methodological limitations (critical literature review).

high null result From Technological Substitution to Institutional Response: A... completeness and scope of existing research (research gaps)

The study uses a mixed-methods design combining a quantitative survey of 312 senior managers/strategy professionals and 28 semi-structured interviews across four sectors in Zimbabwe.

Methods reported in the paper: quantitative survey n = 312; qualitative 28 interviews across manufacturing, financial services, telecommunications, and retail.

high null result Harnessing Competitive Intelligence and AI for Corporate Gro... study design / sample composition

This study leverages the establishment of National New-Generation Artificial Intelligence Innovation and Development Pilot Zones as a quasi-natural experiment and employs a multi-period DID model on A-share listed manufacturing firms from 2010 to 2023.

Methodological description provided in the paper: policy rollout as quasi-natural experiment; multi-period difference-in-differences estimation; sample frame specified as A-share listed manufacturing firms on the Shanghai and Shenzhen Stock Exchanges, 2010–2023.

high null result The Impact of National New-Generation Artificial Intelligenc... method/design (DID on firm panel 2010–2023)

The paper synthesizes sector-specific insights across manufacturing, information technology, healthcare, and finance to examine AI's influence on task automation, job augmentation, and skill requirements.

Descriptive claim about the scope of the review (sectors named in the abstract); no breakdown of sectoral evidence or counts provided in the abstract.

high null result AI and the Future of Job Profiles: A systematic Review of Se... sectoral coverage in the review

There is a lack of comparative sectoral assessments and standardized risk evaluation frameworks in the literature.

Identified research gap reported by the authors from their systematic review (no counts or formal gap-analysis metrics provided in the abstract).

high null result AI and the Future of Job Profiles: A systematic Review of Se... availability of comparative assessments and standardized frameworks

A structured methodology (systematic review) was adopted to identify literature on AI-driven job transformation and associated employment risks using major academic databases.

Methodological statement in the paper claiming a systematic review approach (specific databases, search terms, inclusion/exclusion criteria and number of studies are not reported in the abstract).

high null result AI and the Future of Job Profiles: A systematic Review of Se... methodological approach / literature coverage

An exploratory evaluation compared unstructured vibe coding, structured prompt engineering, and the Shift-Up approach in the development of a web application.

Paper reports an exploratory evaluation / comparative study described in the abstract; the task context is a web application development exercise comparing three approaches (no sample size reported in abstract).

high null result Shift-Up: A Framework for Software Engineering Guardrails in... comparative evaluation of development approaches

The First Fundamental Theorem of Welfare Economics assumes that welfare-bearing agents are autonomous and implicitly relies on a binary distinction between autonomy and instrumentality.

Explicit statement in the paper's introduction/abstract describing the theorem's assumptions; conceptual/theoretical textual analysis (no empirical sample).

high null result Post-AGI Economies: Autonomy and the First Fundamental Theor... assumption about welfare-bearing agents (autonomy vs instrumentality)

This paper was generated by AI, using https://github.com/chenandrewy/ralph-wiggum-asset-pricing/.

Author statement in the abstract declaring the paper was generated by AI and providing a GitHub link.

high null result Hedging the Singularity authorship/generation method of the paper

The paper integrates information processing theory, the resource-based view, and the dynamic capabilities perspective to develop an integrated framework linking digital technology adoption, visibility, and resilience.

Theoretical framing described in the paper (explicit mention of the three theories and their integration).

high null result The Role of Digital Technologies in Enhancing Supply Chain V... theoretical integration / conceptual framework

The study employs hierarchical regression, structural equation modeling (SEM), and rigorous endogeneity controls including instrumental variables and propensity score matching.

Methods section summary reported in the paper; explicit listing of regression, SEM, IV, and propensity score matching.

high null result The Role of Digital Technologies in Enhancing Supply Chain V... methodological approach / identification strategy

The study draws on survey data from 742 manufacturing and logistics firms across 23 countries.

Reported sample description in the paper: survey of 742 firms across 23 countries (manufacturing and logistics).

high null result The Role of Digital Technologies in Enhancing Supply Chain V... sample_scope (firms sampled)

This review was conducted following the guidelines of the Preferred Reporting of Items in a Systematic Review and Meta-Analysis (PRISMA).

Methodological statement in the paper's abstract indicating PRISMA adherence; no further protocol details or study counts provided in the abstract.

high null result Artificial Intelligence, Public Policy and Governance - impl... methodological adherence to PRISMA reporting standards

The staggered expansion of Turkey's national natural gas pipeline network provides plausibly exogenous variation in connectivity because pipeline routing is determined by energy distribution priorities rather than digital demand.

Identification strategy described by the authors: using pipeline expansion as an instrument/conduit for fiber-optic deployment; argument rests on institutional routing rules and timing.

high null result Digital Infrastructure, AI Adoption, and Firm Performance * exogeneity of pipeline-based connectivity variation (instrument validity assumpt...

We evaluate structural validity, semantic alignment, reproducibility, and refinement effort to characterize authoring scalability.

Reported evaluation dimensions in the paper; implies empirical assessments were performed along these axes (details not provided in the abstract).

high null result Developing Models of Procedural Skills using an AI-assisted ... structural validity; semantic alignment; reproducibility; refinement effort

The paper foregrounds industrial firms' own digital agency as a less understood aspect in the literature on digitalization and governance.

Authors' positioning of their contribution and literature review claim in the paper (qualitative/theoretical claim).

high null result Industry 4.0 Inc.—Mergers and acquisitions and the digital t... research gap concerning firms' digital agency

« Prev 1 2 3 … 69 70 71 … 277 278 Next »