Evidence (3492 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Innovation Remove filter

The near-uncorrelated rankings and rank shifts on the n=11 subset are driven by a strong negative Adoption-Capability correlation among closed-source high-capability agents within this subset.

Subgroup analysis/observation within the 11-agent SWE-bench overlap indicating a negative correlation between Adoption and Capability for closed-source high-capability agents (no numerical coefficient reported in the excerpt).

high negative AgentPulse: A Continuous Multi-Signal Framework for Evaluati... Adoption-Capability correlation among closed-source high-capability agents

Static benchmarks measure what AI agents can do at a fixed point in time but not how they are adopted, maintained, or experienced in deployment.

Conceptual statement in the paper; no empirical sample cited for this specific claim (framing/argumentation).

high negative AgentPulse: A Continuous Multi-Signal Framework for Evaluati... scope of measurement of static benchmarks (capability vs. deployment/adoption)

Under our definition, contestants with types below certain threshold (low types) always engage in benchmark hacking, whereas those above the threshold do not.

Theoretical result (characterization/theorem) derived from the contest model showing threshold behavior in equilibrium across contestant types.

high negative On Benchmark Hacking in ML Contests: Modeling, Insights and ... incidence of benchmark hacking by contestant type (below vs above threshold)

Each new task domain requires painstaking, expert-driven harness engineering: designing the prompts, tools, orchestration logic, and evaluation criteria that make a foundation model effective.

Author assertion in the paper's introduction/abstract describing the state of practice; no empirical method, dataset, or sample size reported in the excerpt.

high negative The Last Harness You'll Ever Build need for human (expert) harness engineering

Industry digital maturity weakens the effect of the peer leader on a focal firm’s AI adoption.

Interaction/heterogeneity analysis in fixed-effects regression models on panel data of publicly listed Chinese firms (2012–2023), using an industry digital maturity moderator.

high negative Following the Herd or the Bellwether: Peer Effects in Firms’... focal firm AI adoption level (moderated by industry digital maturity for peer le...

Technological interdependence is not dissolving but being selectively restructured, producing a durable condition of partial, segmented decoupling in which interdependence persists under increasingly politicized rules of access.

Interpretation based on case-study observations of export controls, allied coordination, Chinese countermeasures, and emergent supply-chain and regulatory changes described in the paper.

high negative Weaponized Interdependence and Dynamics of Partial Decouplin... degree and form of technological interdependence between the U.S. and China (str...

When the United States employs export controls and allied coordination to manage perceived technological risks, China responds through defensive reconfiguration aimed at reducing asymmetric vulnerability, in addition to retaliation in rare-earth export controls in certain instances.

Case-study evidence centered on advanced-technology sectors (particularly semiconductors) and observed policy responses following U.S. export restraints after the first Trump administration (qualitative policy and reaction examples described in the paper).

high negative Weaponized Interdependence and Dynamics of Partial Decouplin... China's policy responses (defensive reconfiguration, occasional rare-earth expor...

The transformation toward algorithmic enterprises raises critical concerns regarding agency, accountability, data monopolization, and algorithmic bias.

Presented as a principal concern in the paper's conceptual discussion and interdisciplinary critique; based on analysis of governance and ethical literature rather than new empirical evidence in the abstract.

high negative Algorithmic Enterprises: Rethinking Firm Strategy in the Age... risks to agency, accountability, market power (data monopolization), and algorit...

Market incompleteness distorts the efficient development of AI (i.e., distorts innovation/output).

Claim made in the abstract as a theoretical implication of the asset-pricing model; no empirical data provided.

high negative Hedging the Singularity efficiency of AI development / innovation output

Market incompleteness distorts valuations.

Stated in the abstract as an implication of the model (theoretical analysis); no empirical quantification provided.

high negative Hedging the Singularity distortion of asset valuations

Every additional mechanism we test (planner evolution, per-tool selection, cold-start initialization, skill extraction, and three credit assignment methods) degrades performance.

Findings from the nine-variant ablation study reported in the paper; comparison of variants that add each listed mechanism versus the memory+reflection combination.

high negative AEL: Agent Evolving Learning for Open-Ended Environments performance (e.g., Sharpe ratio or other benchmark metrics) relative to memory+r...

Agentic AI introduces novel challenges related to market stability, regulatory compliance, interpretability, and systemic risk.

Survey discussion synthesizing literature on systemic and governance risks of autonomous systems in markets; draws on conceptual and empirical prior work but does not present new quantitative results.

high negative Agentic Artificial Intelligence in Finance: A Comprehensive ... market stability, regulatory compliance burden, interpretability deficits, syste...

Consolidation of corporate control of critical technologies (driven by AI industrial strategies that do not center democratic economic governance) threatens key democratic and societal objectives.

Stated implication in the paper's opening argument; supported by the paper's conceptual framing and (as indicated) review of how past and emerging tech/AI industrial strategies interact with democratic objectives. No quantitative sample size provided in the excerpt.

high negative Fighting for Democracy Amid the AI Race: Designing Tech In... threats to democratic and societal objectives (e.g., democratic governance, publ...

Unless governments develop industrial policy strategies centered on strengthening democratic economic governance, they risk consolidating corporate control of critical technologies.

Main argumentative claim of the paper as stated in the abstract/introduction; presented as a normative risk argument supported in the paper by conceptual analysis and review of policy trends and historical examples (no empirical sample size reported in the excerpt).

high negative Fighting for Democracy Amid the AI Race: Designing Tech In... consolidation of corporate control over critical technologies

A threat model taxonomy mapping misuse vectors to hardware, software, institutional, and liability layers illustrates why no single governance mechanism suffices.

Threat model taxonomy developed in the paper (conceptual taxonomy; illustrative mapping rather than empirical testing).

high negative The Open-Weight Paradox: Why Restricting Access to AI Models... completeness/adequacy of single governance mechanisms

Restricting access to open-weight models deepens asymmetries while driving proliferation into unsupervised settings.

Argumentation and threat-model reasoning in the paper describing likely consequences of restrictions (theoretical analysis; no empirical sample cited).

high negative The Open-Weight Paradox: Why Restricting Access to AI Models... geopolitical asymmetries and proliferation into unsupervised settings

Access restrictions, without governed alternatives, may displace risks rather than reduce them.

Theoretical argument and threat-model analysis in the paper showing possible risk displacement (conceptual reasoning; no empirical sample reported).

high negative The Open-Weight Paradox: Why Restricting Access to AI Models... risk displacement vs risk reduction from access restrictions

No single policy instrument is sufficient to produce high regional science and technology industrial competitiveness.

Result of fuzzy-set qualitative comparative analysis (fsQCA) on AI policy instruments issued by provincial-level governments in China, reported in the study; fsQCA finds no individual condition is sufficient.

high negative How Can Artificial Intelligence Policies Promote the Sustain... regional science and technology industrial competitiveness

The observed negative OPM effect is consistent with short-term 'J-curve' transition costs (process redesign and capability buildup) during early AI adoption.

Interpretation of empirical patterns (short-term decline in OPM concurrent with no ROA change) offered by the authors as an explanatory mechanism; not presented as separately estimated or experimentally tested.

high negative The Dynamic Causal Effects of Corporate AI Adoption on Profi... operating profit margin dynamics / transition costs interpretation

AI adoption had a significantly negative impact on the operating profit margin (OPM).

Causal analysis of KOSDAQ-listed companies (2018–2025) with AI-adoption timing identified via multi-step, contextually validated text analysis of DART business reports; endogeneity addressed using two-way fixed effects (TWFE) and Propensity Score Matching (PSM).

high negative The Dynamic Causal Effects of Corporate AI Adoption on Profi... operating profit margin (OPM)

Artificial intelligence introduces systemic risks through unprovenanced AI-derived metadata.

Cautionary claim made by the authors; stated as a systemic risk linked to provenance issues of AI-generated metadata, without empirical incident data in the excerpt.

high negative Market Dynamics, Governance and Open Research Metadata in th... systemic risk from unprovenanced AI-derived metadata (e.g., reduced trust, relia...

The debate about scholarly knowledge infrastructure has long been framed as a contest between openness and commercial enclosure, and this framing distorts both policy and practice.

Conceptual/persuasive claim made in the paper's opening paragraph; no empirical data or sample reported in the excerpt.

high negative Market Dynamics, Governance and Open Research Metadata in th... policy and practice framing (openness vs commercial enclosure)

AI is driving states to reconsider interdependence not as the source of peace, but as a battlefield of power.

Normative and interpretive conclusion drawn from the paper's analysis of AI's geopolitical implications; no empirical data or sample reported in the abstract.

high negative ARTIFICIAL INTELLIGENCE AND THE WEAPONIZATION OF ECONOMIC IN... states' strategic framing of interdependence (from peace-building to power conte...

AI is redefining foreign policy in a multipolar world by making the line between economic cooperation and strategic vulnerability indistinct.

Theoretical claim and synthesis in the paper's thesis; no empirical evidence or sample size provided in the abstract.

high negative ARTIFICIAL INTELLIGENCE AND THE WEAPONIZATION OF ECONOMIC IN... ambiguity between economic cooperation and strategic vulnerability in foreign po...

AI is reshaping economic relationships between countries that were previously sources of mutually beneficial relations into instruments of coercion.

The paper presents a theoretical analysis drawing on international political economy and foreign policy theory; no empirical measurements reported in the abstract.

high negative ARTIFICIAL INTELLIGENCE AND THE WEAPONIZATION OF ECONOMIC IN... transformation of international economic relationships from cooperation to coerc...

AI enhances the weaponization of economic interdependence by enabling states to monitor, predict, manipulate, and disrupt transnational networks with unprecedented accuracy.

The paper advances a theoretical argument and synthesis of international political economy and foreign policy literatures; no empirical sample or quantitative data reported in the abstract.

high negative ARTIFICIAL INTELLIGENCE AND THE WEAPONIZATION OF ECONOMIC IN... capacity to monitor, predict, manipulate, and disrupt transnational networks

Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations.

Paper asserts that existing/standard benchmarks do not adequately isolate parsing and computation-orchestration abilities, motivating the new benchmark.

high negative Time Series Augmented Generation for Financial Applications benchmark adequacy for isolating parsing/computation orchestration

Current session-based context handling (sessions ending, context windows filling, memory APIs returning flat facts) produces intelligence that is powerful per session but amnesiac across time.

Descriptive diagnostic argument in the paper; no empirical measurement reported in this text.

high negative The Continuity Layer: Why Intelligence Needs an Architecture... temporal persistence of model 'understanding' (memory/continuity)

The US restricts mobility and knowledge flows and challenges regulatory efforts to protect its advantage.

Descriptive claim about US strategy (policy observation stated in the paper's framing; not quantified in the excerpt).

high negative Polarization and Integration in Global AI Research policy of restricting mobility and knowledge flows / effects on regulatory effor...

The AI race amplifies security risks and international tensions.

Introductory/interpretive claim motivating the study (no specific empirical quantification provided in the excerpt).

high negative Polarization and Integration in Global AI Research security risks and international tensions

The US and China form two poles around which global AI research increasingly revolves (i.e., global AI research is polarizing around these two countries).

Longitudinal network analysis of international collaboration and citation patterns derived from publication data compared to random realizations.

high negative Polarization and Integration in Global AI Research degree of polarization in global AI research networks

The US and China have long diverged in both cross-country collaboration and citation links, forming two poles around which global AI research increasingly revolves.

Large-scale data of scientific publications spanning three decades; analysis comparing cross-country collaboration and citation links to their random realizations (null models).

high negative Polarization and Integration in Global AI Research cross-country collaboration and citation links

Under logit demand and symmetric rivals, the QoS gap is strictly decreasing in API price and rival entry elasticity.

Comparative statics derived from the analytical model (logit demand, symmetric rivals).

high negative The Inference Bottleneck: A Formal Model of Vertical Foreclo... QoS gap

Traditional machine learning approaches, including the baseline methodology proposed in previous studies, typically optimize global predictive accuracy and therefore fail to capture business-critical outcomes, especially the identification of high-risk clients.

Conceptual critique and literature/contextual claim in the paper; contrasted with the study's business-aware methods (no direct external benchmarking numbers provided in the abstract).

high negative Advanced Insurance Risk Modeling for Pseudo-New Customers Us... identification_of_high-risk_clients

Classifying customers without a prior history at a given company is particularly challenging due to the absence of historical behavior, extreme class imbalance, heavy-tailed loss distributions, and strict operational constraints.

Argumentation / problem statement in the paper (no empirical test reported); descriptive characterization of the insurance cold-start classification problem.

high negative Advanced Insurance Risk Modeling for Pseudo-New Customers Us... classification_difficulty

The pharmaceutical R&D process is persistently challenged by high financial costs, protracted timelines, and remarkably low success rates.

Background statement in the review synthesizing prior literature and field knowledge; no original empirical data or sample sizes reported in the provided text.

high negative Artificial intelligence in drug discovery from advanced mole... financial costs, timelines, and success rates of pharmaceutical R&D

In the geographical network, both technological diversity and technological proximity inhibit main path formation, implying macro-regional evolution requires specialized focus and complementary knowledge.

ERGM results for the geographical diffusion layer showing negative (inhibitory) associations for diversity and proximity variables; interpreted in regional evolution context.

high negative Mapping China’s digital transformation: a multilayer network... effect of diversity and proximity on main path formation (geographical layer)

AI adoption is reinforcing existing structural disparities within the BRICS bloc, creating a two‑tier productivity hierarchy (China & India vs. Brazil, Russia & South Africa).

Observed divergence in TFP trajectories and differing links between AI indicators and TC/EC across the five BRICS economies; comparative analysis shows stronger frontier-shifting effects in China and India and weaker or negative effects in the other three economies.

high negative AI-driven productivity dynamics in BRICS economies: Evidence... Cross-country divergence in Total Factor Productivity (TFP) growth and its compo...

Brazil, Russia, and South Africa experience stagnation or decline in both efficiency and technological advancement over 2005–2023.

Malmquist TFP decomposition (EC and TC) for each BRICS economy showing flat or negative trends in EC and TC for Brazil, Russia, and South Africa during 2005–2023.

high negative AI-driven productivity dynamics in BRICS economies: Evidence... Efficiency Change (EC) and Technological Change (TC) components of the Malmquist...

Despite rapid progress, a key problem remains: none of these systems can build complex 3D assemblies with moving parts. For example, no existing system can build a piston, a pendulum, or even a pair of scissors.

Negative capability claim based on the authors' survey of prior work (asserted limitation); no systematic benchmark or exhaustive evaluation numbers provided in the excerpt.

high negative Agent-Aided Design for Dynamic CAD Models capability to generate complex 3D assemblies with moving parts

While achieving financial autonomy, firms are also getting exposed to new constraints by shifting their reliance on third-party software, technological infrastructures and opaque algorithms (Gaviyau & Godi, 2025; Suhrab et al., 2026).

Stated with citations to Gaviyau & Godi (2025) and Suhrab et al. (2026); presented as an observed/paraphrased risk or unintended consequence in the paper. No empirical sample details in the excerpt.

high negative Re-Evaluation of Resource Dependence in AI Enabled SME Finan... increased reliance/dependency on third-party technology and opaque algorithms (n...

SMEs are suffering from various financial constraints, mostly relying heavily on traditional financial institutions for their survival (Kadzima et al., 2025).

Statement supported by citation to Kadzima et al. (2025); presented as a literature-supported empirical generalization in the paper's background/introduction. No sample size or empirical details given in the excerpt.

high negative Re-Evaluation of Resource Dependence in AI Enabled SME Finan... financial constraints / reliance on traditional financial institutions

Aligning the generative policy with nuanced user preference signals is a challenge for generative recommendation.

Paper lists this as one of three scaling challenges motivating the proposed methods (problem statement about preference alignment).

high negative GenRec: A Preference-Oriented Generative Framework for Large... policy alignment with user preferences

Encoding long user behavior sequences with multi-token item representations based on semantic IDs is prohibitively costly (a scaling challenge).

Paper lists this as one of three scaling challenges for deploying GR at industrial scale (problem statement about computational/cost burden).

high negative GenRec: A Preference-Oriented Generative Framework for Large... encoding cost / input length

Within a single request, identical model inputs may produce inconsistent outputs due to the pagination request mechanism (a challenge for GR/NTP recommendation at industrial scale).

Paper lists this as one of three scaling challenges for generative retrieval in large-scale industrial systems (problem statement).

high negative GenRec: A Preference-Oriented Generative Framework for Large... output consistency per request

Early iterations suffered severe execution decay.

Reported observation from the longitudinal study describing early-phase performance problems (qualitative; no quantitative metric in the excerpt).

high negative OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Al... execution decay (degradation of execution/performance in early iterations)

Execution-based environments suffer from adversarial 'Test Evasion' by unconstrained agents.

Stated assertion in the paper's motivation/abstract; presented as a limitation of execution-based evaluation (no empirical sample size or experiment details provided in the excerpt).

high negative OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Al... test evasion (agents adversarially bypassing execution-based tests)

Current paradigms, such as Reinforcement Learning from Human Feedback (RLHF) and AI Feedback (RLAIF), frequently induce model sycophancy.

Stated assertion in the paper's motivation/abstract; presented as a limitation of existing alignment paradigms (no empirical sample size or experiment details provided in the excerpt).

high negative OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Al... model sycophancy (agents producing sycophantic behaviour)

Environmental demands place an upper bound on the degree of heterogeneity required in a distributed production system.

Theoretical claim derived from the Distributed Production System framework and discussed in the paper; supported by conceptual argument and model constraints rather than empirical data; no sample size reported.

high negative The Principle of Maximum Heterogeneity Optimises Productivit... required degree of heterogeneity (upper bound) given environmental demands

Lower survival rates among BDA adopters are driven by greater uncertainty in sales.

Paper states greater uncertainty in sales is an interrelated factor explaining lower survival for BDA adopters, based on empirical analysis of German start-ups.

high negative Big data-based management decisions and start-up performance uncertainty in sales (sales volatility/variance)

« Prev 1 2 3 … 5 6 7 … 69 70 Next »