Evidence (3492 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Innovation
Remove filter
The near-uncorrelated rankings and rank shifts on the n=11 subset are driven by a strong negative Adoption-Capability correlation among closed-source high-capability agents within this subset.
Subgroup analysis/observation within the 11-agent SWE-bench overlap indicating a negative correlation between Adoption and Capability for closed-source high-capability agents (no numerical coefficient reported in the excerpt).
Static benchmarks measure what AI agents can do at a fixed point in time but not how they are adopted, maintained, or experienced in deployment.
Conceptual statement in the paper; no empirical sample cited for this specific claim (framing/argumentation).
Under our definition, contestants with types below certain threshold (low types) always engage in benchmark hacking, whereas those above the threshold do not.
Theoretical result (characterization/theorem) derived from the contest model showing threshold behavior in equilibrium across contestant types.
Each new task domain requires painstaking, expert-driven harness engineering: designing the prompts, tools, orchestration logic, and evaluation criteria that make a foundation model effective.
Author assertion in the paper's introduction/abstract describing the state of practice; no empirical method, dataset, or sample size reported in the excerpt.
Industry digital maturity weakens the effect of the peer leader on a focal firm’s AI adoption.
Interaction/heterogeneity analysis in fixed-effects regression models on panel data of publicly listed Chinese firms (2012–2023), using an industry digital maturity moderator.
Technological interdependence is not dissolving but being selectively restructured, producing a durable condition of partial, segmented decoupling in which interdependence persists under increasingly politicized rules of access.
Interpretation based on case-study observations of export controls, allied coordination, Chinese countermeasures, and emergent supply-chain and regulatory changes described in the paper.
When the United States employs export controls and allied coordination to manage perceived technological risks, China responds through defensive reconfiguration aimed at reducing asymmetric vulnerability, in addition to retaliation in rare-earth export controls in certain instances.
Case-study evidence centered on advanced-technology sectors (particularly semiconductors) and observed policy responses following U.S. export restraints after the first Trump administration (qualitative policy and reaction examples described in the paper).
The transformation toward algorithmic enterprises raises critical concerns regarding agency, accountability, data monopolization, and algorithmic bias.
Presented as a principal concern in the paper's conceptual discussion and interdisciplinary critique; based on analysis of governance and ethical literature rather than new empirical evidence in the abstract.
Market incompleteness distorts the efficient development of AI (i.e., distorts innovation/output).
Claim made in the abstract as a theoretical implication of the asset-pricing model; no empirical data provided.
Market incompleteness distorts valuations.
Stated in the abstract as an implication of the model (theoretical analysis); no empirical quantification provided.
Every additional mechanism we test (planner evolution, per-tool selection, cold-start initialization, skill extraction, and three credit assignment methods) degrades performance.
Findings from the nine-variant ablation study reported in the paper; comparison of variants that add each listed mechanism versus the memory+reflection combination.
Agentic AI introduces novel challenges related to market stability, regulatory compliance, interpretability, and systemic risk.
Survey discussion synthesizing literature on systemic and governance risks of autonomous systems in markets; draws on conceptual and empirical prior work but does not present new quantitative results.
Consolidation of corporate control of critical technologies (driven by AI industrial strategies that do not center democratic economic governance) threatens key democratic and societal objectives.
Stated implication in the paper's opening argument; supported by the paper's conceptual framing and (as indicated) review of how past and emerging tech/AI industrial strategies interact with democratic objectives. No quantitative sample size provided in the excerpt.
Unless governments develop industrial policy strategies centered on strengthening democratic economic governance, they risk consolidating corporate control of critical technologies.
Main argumentative claim of the paper as stated in the abstract/introduction; presented as a normative risk argument supported in the paper by conceptual analysis and review of policy trends and historical examples (no empirical sample size reported in the excerpt).
A threat model taxonomy mapping misuse vectors to hardware, software, institutional, and liability layers illustrates why no single governance mechanism suffices.
Threat model taxonomy developed in the paper (conceptual taxonomy; illustrative mapping rather than empirical testing).
Restricting access to open-weight models deepens asymmetries while driving proliferation into unsupervised settings.
Argumentation and threat-model reasoning in the paper describing likely consequences of restrictions (theoretical analysis; no empirical sample cited).
Access restrictions, without governed alternatives, may displace risks rather than reduce them.
Theoretical argument and threat-model analysis in the paper showing possible risk displacement (conceptual reasoning; no empirical sample reported).
No single policy instrument is sufficient to produce high regional science and technology industrial competitiveness.
Result of fuzzy-set qualitative comparative analysis (fsQCA) on AI policy instruments issued by provincial-level governments in China, reported in the study; fsQCA finds no individual condition is sufficient.
The observed negative OPM effect is consistent with short-term 'J-curve' transition costs (process redesign and capability buildup) during early AI adoption.
Interpretation of empirical patterns (short-term decline in OPM concurrent with no ROA change) offered by the authors as an explanatory mechanism; not presented as separately estimated or experimentally tested.
AI adoption had a significantly negative impact on the operating profit margin (OPM).
Causal analysis of KOSDAQ-listed companies (2018–2025) with AI-adoption timing identified via multi-step, contextually validated text analysis of DART business reports; endogeneity addressed using two-way fixed effects (TWFE) and Propensity Score Matching (PSM).
Artificial intelligence introduces systemic risks through unprovenanced AI-derived metadata.
Cautionary claim made by the authors; stated as a systemic risk linked to provenance issues of AI-generated metadata, without empirical incident data in the excerpt.
The debate about scholarly knowledge infrastructure has long been framed as a contest between openness and commercial enclosure, and this framing distorts both policy and practice.
Conceptual/persuasive claim made in the paper's opening paragraph; no empirical data or sample reported in the excerpt.
AI is driving states to reconsider interdependence not as the source of peace, but as a battlefield of power.
Normative and interpretive conclusion drawn from the paper's analysis of AI's geopolitical implications; no empirical data or sample reported in the abstract.
AI is redefining foreign policy in a multipolar world by making the line between economic cooperation and strategic vulnerability indistinct.
Theoretical claim and synthesis in the paper's thesis; no empirical evidence or sample size provided in the abstract.
AI is reshaping economic relationships between countries that were previously sources of mutually beneficial relations into instruments of coercion.
The paper presents a theoretical analysis drawing on international political economy and foreign policy theory; no empirical measurements reported in the abstract.
AI enhances the weaponization of economic interdependence by enabling states to monitor, predict, manipulate, and disrupt transnational networks with unprecedented accuracy.
The paper advances a theoretical argument and synthesis of international political economy and foreign policy literatures; no empirical sample or quantitative data reported in the abstract.
Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations.
Paper asserts that existing/standard benchmarks do not adequately isolate parsing and computation-orchestration abilities, motivating the new benchmark.
Current session-based context handling (sessions ending, context windows filling, memory APIs returning flat facts) produces intelligence that is powerful per session but amnesiac across time.
Descriptive diagnostic argument in the paper; no empirical measurement reported in this text.
The US restricts mobility and knowledge flows and challenges regulatory efforts to protect its advantage.
Descriptive claim about US strategy (policy observation stated in the paper's framing; not quantified in the excerpt).
The AI race amplifies security risks and international tensions.
Introductory/interpretive claim motivating the study (no specific empirical quantification provided in the excerpt).
The US and China form two poles around which global AI research increasingly revolves (i.e., global AI research is polarizing around these two countries).
Longitudinal network analysis of international collaboration and citation patterns derived from publication data compared to random realizations.
The US and China have long diverged in both cross-country collaboration and citation links, forming two poles around which global AI research increasingly revolves.
Large-scale data of scientific publications spanning three decades; analysis comparing cross-country collaboration and citation links to their random realizations (null models).
Under logit demand and symmetric rivals, the QoS gap is strictly decreasing in API price and rival entry elasticity.
Comparative statics derived from the analytical model (logit demand, symmetric rivals).
Traditional machine learning approaches, including the baseline methodology proposed in previous studies, typically optimize global predictive accuracy and therefore fail to capture business-critical outcomes, especially the identification of high-risk clients.
Conceptual critique and literature/contextual claim in the paper; contrasted with the study's business-aware methods (no direct external benchmarking numbers provided in the abstract).
Classifying customers without a prior history at a given company is particularly challenging due to the absence of historical behavior, extreme class imbalance, heavy-tailed loss distributions, and strict operational constraints.
Argumentation / problem statement in the paper (no empirical test reported); descriptive characterization of the insurance cold-start classification problem.
The pharmaceutical R&D process is persistently challenged by high financial costs, protracted timelines, and remarkably low success rates.
Background statement in the review synthesizing prior literature and field knowledge; no original empirical data or sample sizes reported in the provided text.
In the geographical network, both technological diversity and technological proximity inhibit main path formation, implying macro-regional evolution requires specialized focus and complementary knowledge.
ERGM results for the geographical diffusion layer showing negative (inhibitory) associations for diversity and proximity variables; interpreted in regional evolution context.
AI adoption is reinforcing existing structural disparities within the BRICS bloc, creating a two‑tier productivity hierarchy (China & India vs. Brazil, Russia & South Africa).
Observed divergence in TFP trajectories and differing links between AI indicators and TC/EC across the five BRICS economies; comparative analysis shows stronger frontier-shifting effects in China and India and weaker or negative effects in the other three economies.
Brazil, Russia, and South Africa experience stagnation or decline in both efficiency and technological advancement over 2005–2023.
Malmquist TFP decomposition (EC and TC) for each BRICS economy showing flat or negative trends in EC and TC for Brazil, Russia, and South Africa during 2005–2023.
Despite rapid progress, a key problem remains: none of these systems can build complex 3D assemblies with moving parts. For example, no existing system can build a piston, a pendulum, or even a pair of scissors.
Negative capability claim based on the authors' survey of prior work (asserted limitation); no systematic benchmark or exhaustive evaluation numbers provided in the excerpt.
While achieving financial autonomy, firms are also getting exposed to new constraints by shifting their reliance on third-party software, technological infrastructures and opaque algorithms (Gaviyau & Godi, 2025; Suhrab et al., 2026).
Stated with citations to Gaviyau & Godi (2025) and Suhrab et al. (2026); presented as an observed/paraphrased risk or unintended consequence in the paper. No empirical sample details in the excerpt.
SMEs are suffering from various financial constraints, mostly relying heavily on traditional financial institutions for their survival (Kadzima et al., 2025).
Statement supported by citation to Kadzima et al. (2025); presented as a literature-supported empirical generalization in the paper's background/introduction. No sample size or empirical details given in the excerpt.
Aligning the generative policy with nuanced user preference signals is a challenge for generative recommendation.
Paper lists this as one of three scaling challenges motivating the proposed methods (problem statement about preference alignment).
Encoding long user behavior sequences with multi-token item representations based on semantic IDs is prohibitively costly (a scaling challenge).
Paper lists this as one of three scaling challenges for deploying GR at industrial scale (problem statement about computational/cost burden).
Within a single request, identical model inputs may produce inconsistent outputs due to the pagination request mechanism (a challenge for GR/NTP recommendation at industrial scale).
Paper lists this as one of three scaling challenges for generative retrieval in large-scale industrial systems (problem statement).
Early iterations suffered severe execution decay.
Reported observation from the longitudinal study describing early-phase performance problems (qualitative; no quantitative metric in the excerpt).
Execution-based environments suffer from adversarial 'Test Evasion' by unconstrained agents.
Stated assertion in the paper's motivation/abstract; presented as a limitation of execution-based evaluation (no empirical sample size or experiment details provided in the excerpt).
Current paradigms, such as Reinforcement Learning from Human Feedback (RLHF) and AI Feedback (RLAIF), frequently induce model sycophancy.
Stated assertion in the paper's motivation/abstract; presented as a limitation of existing alignment paradigms (no empirical sample size or experiment details provided in the excerpt).
Environmental demands place an upper bound on the degree of heterogeneity required in a distributed production system.
Theoretical claim derived from the Distributed Production System framework and discussed in the paper; supported by conceptual argument and model constraints rather than empirical data; no sample size reported.
Lower survival rates among BDA adopters are driven by greater uncertainty in sales.
Paper states greater uncertainty in sales is an interrelated factor explaining lower survival for BDA adopters, based on empirical analysis of German start-ups.