Evidence (14922 claims)

Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.

The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).

Browse by theme

Nine broad, paper-level topics. Click one to filter the claims below.

Human-AI Collaboration

Claims by outcome category

Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.

Outcome	Positive	Negative	Mixed	Null	Total
Other	795	210	105	955	2131
Governance & Regulation	886	414	197	126	1654
Organizational Efficiency	826	204	129	87	1257
Technology Adoption Rate	681	259	128	110	1189
Research Productivity	464	138	65	349	1028
Output Quality	503	196	61	53	813
Decision Quality	351	180	84	51	673
AI Safety & Ethics	238	288	71	34	637
Firm Productivity	455	58	92	20	631
Market Structure	186	172	123	25	511
Task Allocation	222	70	76	34	407
Innovation Output	238	28	48	18	334
Skill Acquisition	177	62	62	17	318
Employment Level	107	57	108	13	287
Fiscal & Macroeconomic	135	72	44	26	284
Firm Revenue	172	50	28	5	256
Consumer Welfare	121	68	45	12	246
Task Completion Time	183	33	10	13	240
Inequality Measures	45	126	50	6	227
Worker Satisfaction	95	74	23	12	204
Error Rate	77	98	11	4	190
Regulatory Compliance	84	73	17	7	181
Automation Exposure	61	61	27	14	166
Training Effectiveness	98	21	14	19	154
Wages & Compensation	78	37	25	6	146
Developer Productivity	105	18	14	6	144
Team Performance	87	17	28	10	143
Job Displacement	12	83	23	1	119
Hiring & Recruitment	53	8	8	3	72
Social Protection	39	17	8	2	66
Creative Output	32	20	8	3	64
Skill Obsolescence	5	50	6	1	62
Labor Share of Income	17	20	17	—	54
Worker Turnover	15	15	—	3	33
Industry	—	—	—	1	1

The reasoning preset (1:5 input:output) elevates frontier closed models that the chat preset penalizes on price.

Observed leaderboard reordering when using a reasoning workload preset (1:5) compared to the chat preset; specific elevation of frontier closed models noted (no numeric counts provided in excerpt).

medium positive Token Arena: A Continuous Benchmark Unifying Energy and Cogn... leaderboard ranking changes for frontier closed models under the reasoning prese...

To our knowledge, this is the first demonstration of an AI agentic system autonomously identifying and experimentally validating a nontrivial, previously unreported physical mechanism.

Authors' novelty claim, supported by the reported autonomous proposal and experimental validation of the optical bilinear interaction in their study.

medium positive End-to-end autonomous scientific discovery on a real optical... novelty of AI-driven autonomous experimental discovery (identification + experim...

Qiushi Engine converts an abstract coherence-order theory into experimental observables, providing the first observation of this class of coherence-order structure.

Reported experimental procedure translating coherence-order theory into measurable observables and claiming the first observation of that class of structure; experimental data and analysis presented in the paper supporting the observation.

medium positive End-to-end autonomous scientific discovery on a real optical... observation of coherence-order structure predicted by theory

Gradient attribution is established as a computationally validated signal for model-informed reward allocation in participatory weather sensing.

Synthesis/conclusion in paper based on the computational experiments and evaluations (results across >400 configurations demonstrating fidelity and limitations).

medium positive Calibrating Attribution Proxies for Reward Allocation in Par... validity of gradient attribution as a reward allocation signal

Attribution captures near-optimal sensor placement utility with monotonically faithful payments.

Comparative experiments in the paper showing that gradient attribution corresponds closely to near-optimal sensor placement utility and yields monotonically faithful payment signals (experimental comparisons to optimal/benchmark placements).

medium positive Calibrating Attribution Proxies for Reward Allocation in Par... sensor placement utility captured by attribution; monotonicity/faithfulness of p...

Principled symbolic abstraction bridges generative AI and the numerical precision required for engineering design.

Broader conclusion drawn by the authors based on the reported modular architecture, symbolic lifting operator, and experimental improvements in geometric error and structural validity.

medium positive Language Models Refine Mechanical Linkage Designs Through Sy... ability to combine generative models with numerical precision for engineering ta...

These simulations produce rich experiential learning signals, whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations.

Evaluation experiments reported in the paper claiming statistically/qualitatively significant improvements in agent performance on in-domain and out-of-domain productivity benchmarks after training on simulation-generated signals.

medium positive Synthetic Computers at Scale for Long-Horizon Productivity S... agent performance on productivity evaluations (in-domain and out-of-domain)

Embedding governance into agent reasoning produces more consistent, explainable, and auditable compliance than external enforcement.

Comparative claim asserted in the paper, apparently supported by the reported production deployment results (95% compliance, zero false escalations); explicit experimental comparison details are not provided in the abstract.

medium positive Think Before You Act -- A Neurocognitive Governance Model fo... consistency, explainability, and auditability of compliance

Technology has increased efficiency in organisations based in large cities in India.

Review result statement claiming observed efficiency gains in urban organisations according to the literature summarized; based on reviewed studies (no single sample size reported in excerpt).

medium positive A Comprehensive Review of Technology Adoption and Its Impact... organizational efficiency gains in urban organisations

Controversial questions frequently result in an AIO.

Analysis of the 11,500-query benchmark with annotation/identification of 'controversial' queries and observed higher incidence of AIO generation for those queries.

medium positive How Generative AI Disrupts Search: An Empirical Study of Goo... likelihood of AIO generation for controversial queries

AI-enabled process capability contributes to sustained enterprise value growth.

Authors report empirical associations between PI (AI-enabled process capability) and measures tied to enterprise value (Feltham–Ohlson based abnormal earnings / profitability) across the panel sample.

medium positive A Data-Driven Evaluation Framework for Quantifying the Impac... enterprise value / sustained value growth

Prompt modifications, Chain-of-Thought (CoT) reasoning, and visual token reduction can mitigate visual-priming effects on VLM behavior (with varying effectiveness across models).

Intervention experiments applying prompt engineering, CoT-style prompts, and reducing the number of visual tokens to observe whether these interventions reduce the influence of image content and color cues on IPD choices across several VLMs. (Abstract states these mitigation strategies were explored and their effectiveness varied by model; precise quantitative mitigation effects not provided in abstract.)

medium positive The Effects of Visual Priming on Cooperative Behavior in Vis... reduction in priming-induced changes to cooperation/defection choices after appl...

Generative AI is increasingly embedded in China's short-video production.

Authors' background claim supported by qualitative data collection with 16 in-depth interviews of short-video creators active on Xiaohongshu and Douyin; observational grounding in participant reports.

medium positive AI passing and invisible authenticity labor: trust vulnerabi... degree of AI adoption in short-video production

Reliability did not come from the base model alone; it emerged from the operating layer around the model (prompt compilation, typed controls, policy validation, execution guards, memory design, and trace-level observability).

Comparative analysis of system design and observed failure modes during pre-launch testing and deployment; qualitative/operational reasoning linking operating-layer components to improved reliability.

medium positive Operating-Layer Controls for Onchain Language-Model Agents U... system reliability attributable to operating-layer components

The proposed, validated model can equip fintech managers and regulators with a governance-based approach to tackling algorithmic bias and better position them to engender trust and financial inclusion.

Concluding assertion based on the integrated framework developed from the SLR (45 papers) and the structured five-expert validation; positioned as the intended practical utility of the model rather than an empirically measured outcome.

medium positive Corporate-Governance-Driven Algorithmic Fairness in SME Fint... trust and financial inclusion outcomes resulting from governance-based mitigatio...

Participants showed strong willingness to substitute human IT support at costs well below human benchmarks.

Participant responses and willingness-to-pay / substitution questions collected in the controlled study (reported qualitatively in the paper); comparison to unspecified human-cost benchmarks.

medium positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... willingness to substitute human IT support (cost thresholds / preferences)

Claude 3.5 Sonnet aligns with a narrative funder profile, showing greater responsiveness to qualitative aspects of the pitch, somewhat higher funding levels, and strong cross-run reliability.

Comparative observations across the experiment: Claude 3.5 Sonnet was more responsive to qualitative information in pitch decks, tended to recommend higher funding levels, and demonstrated strong reliability across runs.

medium positive Algorithmic personalities and the myth of neutrality: financ... responsiveness to qualitative aspects; funding levels; reliability

Agentic Architect is the first end-to-end open-source framework for agentic AI architecture exploration and optimization.

Authors' claim in the abstract and paper asserting novelty and open-source release. No independent verification provided in the abstract.

medium positive Agentic Architect: An Agentic AI Framework for Architecture ... adoption_rate

Our results establish C2C as a testbed for studying and building LM-based agents that can navigate the sophisticated coordination required for real-world deployments.

Authors' interpretation/implication based on the experiments and dataset produced (conclusion statement).

medium positive Cooperate to Compete: Strategic Coordination in Multi-Agent ... suitability of C2C as a research testbed

The system provides infrastructure supporting interactive search, cohort generation, and downstream LLM-powered clinical applications without requiring specialized informatics expertise.

Conclusion asserts the deployed system can support interactive search, cohort generation, and downstream LLM applications and that it does not require specialized informatics expertise for these uses.

medium positive Health System Scale Semantic Search Across Unstructured Clin... need_for_specialized_informatics_expertise / applicability_to_downstream_applica...

Health-system-scale semantic search is both technically and operationally feasible.

Authors conclude feasibility based on successful deployment, measured latency, cost, retrieval quality, and clinical utility experiments.

medium positive Health System Scale Semantic Search Across Unstructured Clin... feasibility_of_health_system_scale_deployment

The paper's findings provide practical guidance for selecting between joint and modular training modalities based on environmental conditions to optimize reinforcement learning–based scheduling performance.

Authors' stated implication/conclusion based on their sensitivity analysis and comparative evaluations across environmental regimes.

medium positive An Analysis of the Coordination Gap between Joint and Modula... guidance effectiveness for selecting training modality to optimize performance

Foundation models are strong potential solutions for scalable and generalizable forecasting in the energy domain, particularly in data-constrained and privacy-sensitive settings.

Synthesis and interpretation of benchmark results showing generalization across datasets and better performance in scenarios with limited data; argument made in paper conclusions.

medium positive FETS Benchmark: Foundation Models Outperform Dataset-specifi... suitability of foundation models for data-constrained and privacy-sensitive fore...

These results establish a practical pathway for extending industrial automation with learning-based methods.

Authors' concluding claim based on the reported deployment results (interpretation/implication rather than a new empirical measurement).

medium positive Learning-augmented robotic automation for real-world manufac... practical applicability/adoption potential of learning-based automation methods

The paper proposes a safety-oriented inductive bias for rational AI decision-makers whose desiderata align with implementable policy constraints in high-stakes, low-signal situations.

Theoretical proposal and normative argument in the paper linking the proposed inductive bias (negligibility threshold and associated norms) to policy-implementable constraints; argued rather than empirically demonstrated.

medium positive Bounding the Long Tail: Ai Norms for Decision-Making Under N... alignment of a proposed inductive bias with implementable policy constraints; im...

These patterns are consistent with transfer emerging through accumulated interaction between owners (or owners' computer environments) and their agents in everyday use.

Interpretation offered by the authors based on observed alignment patterns and robustness checks; the paper argues consistency with an interaction-driven transfer mechanism rather than providing a direct experimental causal test.

medium positive Behavioral Transfer in AI Agents: Evidence and Privacy Impli... inferred_mechanism_of_transfer (accumulated_interaction)

This transfer persists among agents without explicit configuration.

Subgroup analyses (described in paper) isolating agents lacking explicit configuration settings and comparing behavioral alignment to owners; reported persistence of alignment in that subgroup.

medium positive Behavioral Transfer in AI Agents: Evidence and Privacy Impli... behavioral_alignment_in_unconfigured_agents

Trade unions have increasingly pursued algorithmic transparency and stronger technology governance rights through collective bargaining, and governments are accelerating legislative initiatives to establish and protect workplace technology rights.

Descriptive review of labor-movement responses and recent government legislative initiatives reported in the literature (case studies and policy reviews).

medium positive From Technological Substitution to Institutional Response: A... union bargaining activity and government legislative action on workplace technol...

Using these artifacts shifts human effort toward higher-level design and validation activities.

Reported as a preliminary finding from the exploratory evaluation; the abstract states that human effort shifted from low-level implementation to higher-level design/validation when artifacts were embedded (no sample size or time-allocation metrics provided).

medium positive Shift-Up: A Framework for Software Engineering Guardrails in... allocation of human effort to design and validation

Embedding machine-readable requirements and architectural artifacts stabilizes agent behavior.

Reported as a preliminary finding from the exploratory evaluation comparing approaches; the abstract states that embedding such artifacts stabilizes agent behavior (no numeric metrics or sample size reported).

medium positive Shift-Up: A Framework for Software Engineering Guardrails in... agent behavior stability

The central obstacle to agent self-improvement is not what to remember but how to use what has been remembered (which retrieval policy to apply, how to interpret prior outcomes, and when the current strategy itself must change).

Conceptual claim supported by authors' argumentation and by the experimental results (ablation showing gains from reflection/use mechanisms rather than added architectural complexity).

medium positive AEL: Agent Evolving Learning for Open-Ended Environments bottleneck characterization for agent self-improvement

Visibility mechanisms, such as public algorithm registers or role-sensitive explainability, can be effective tools in regaining citizen trust.

Review examines studies on transparency/visibility mechanisms; abstract states these mechanisms are examined for effectiveness but does not report definitive quantitative results or study counts.

medium positive Artificial Intelligence, Public Policy and Governance - impl... citizen trust in algorithmic governance

By capturing complete interaction traces with human vs. agent code authorship attribution, SWE-chat provides an empirical foundation for moving beyond curated benchmarks towards an evidence-based understanding of how AI agents perform in real developer workflows.

Claims about dataset capabilities and intended use: the dataset contains interaction traces and authorship labels enabling empirical research; asserted by authors as an implication of the dataset contents.

medium positive SWE-chat: Coding Agent Interactions From Real Users in the W... utility of SWE-chat for empirical research and benchmark improvement

Enterprise deployment of long-horizon decision agents in regulated domains (underwriting, claims adjudication, tax examination) is dominated by retrieval-augmented pipelines despite a decade of increasingly sophisticated stateful memory architectures.

Stated observation/argument in the paper's introduction; no empirical sample size or systematic industry survey reported in the abstract.

medium positive Stateless Decision Memory for Enterprise AI Agents prevalence of retrieval-augmented pipelines in enterprise deployment

An accompanying open-source interactive tool, the Co-creation Provenance Lab, enables policymakers to audit and iteratively improve summaries, establishing genuine human-in-the-loop oversight at scale.

Statement in the paper about an open-source tool released alongside the research; likely demonstration or software repository provided.

medium positive Participatory provenance as representational auditing for AI... availability and claimed capability of the Co-creation Provenance Lab to support...

AI adoption enhances the reliability of financial reporting and the effectiveness of audits by reducing information asymmetry and strengthening internal monitoring processes.

Argument grounded in theory and supported empirically via SEM showing AI adoption associated with greater reporting transparency and internal control quality, which are linked to higher audit quality.

medium positive Artificial Intelligence Adoption in Financial Reporting and ... financial reporting reliability and audit effectiveness (via reduced information...

AI-enabled reporting systems strengthen firm-level governance mechanisms (e.g., reporting transparency and internal controls), which enhances audit quality (governance substitution perspective complemented by institutional and technology diffusion theories).

Theoretical framing (governance substitution, institutional and technology diffusion theories) combined with empirical SEM results linking AI adoption to proxies for governance (reporting transparency, internal control quality) and to audit quality.

medium positive Artificial Intelligence Adoption in Financial Reporting and ... firm-level governance mechanisms (reporting transparency, internal control quali...

Differences in institutional quality, digital infrastructure, and absorptive capacity explain the disparity in technology impacts between GCC and non-GCC countries.

Exploratory/mediation or interaction analysis linking institutional quality, measures of digital infrastructure, and absorptive capacity to heterogeneity in estimated technology effects across countries in the panel.

medium positive Digital Transformation, AI Efficiency, and Sustainable Devel... heterogeneity in the effect of digital transformation/AI on sustainable developm...

The capital market evaluates AI investment as a future 'growth option' selectively in industries with strong data infrastructure, digital workforce readiness, and absorptive capacity.

Inference from heterogeneous positive Tobin's Q effect found in the ICT industry and null average effect across all firms; authors argue market valuation responds to industry-specific complementary assets and ecosystem conditions.

medium positive The Dynamic Causal Effects of Corporate AI Adoption on Profi... market valuation response to AI investment (interpreted as growth-option pricing...

Pair programming between students is well studied and known to be beneficial to self-efficacy and academic achievement.

Background literature claim presented in the paper's introduction (cites existing research on pair programming benefits).

medium positive Fast and Forgettable: A Controlled Study of Novices' Perform... self-efficacy and academic achievement associated with pair programming

Developing and further developed countries only integrate with China, signaling China's expanding influence over the international AI research landscape.

Observed integration patterns in the publication-based collaboration and citation networks showing that (some) developing and further developed countries connect primarily with China rather than the US; comparison to randomized networks.

medium positive Polarization and Integration in Global AI Research international research integration of developing and further developed countries...

The calibration mapping suggests Google and OpenAI face conditions most conducive to foreclosure.

Outcomes of the paper's stylized calibration/comparative mapping across four providers (April 2026 data); authors' interpretation.

medium positive The Inference Bottleneck: A Formal Model of Vertical Foreclo... conduciveness to foreclosure

Artificial intelligence algorithms are increasingly used by firms to set prices.

Statement in paper's introduction/abstract referencing prior adoption trends; no specific empirical study or sample reported in the excerpt.

medium positive Convergence to collusion in algorithmic pricing use/adoption of AI algorithms for pricing by firms

The proposed approach aligns machine learning with actuarial portfolio optimization by explicitly integrating profit-driven objectives and operational constraints, offering two practical and scalable solutions for risk-based decision-making in real-world insurance settings.

Conceptual claim supported by the combination of methodological design and empirical results presented in the paper (method descriptions + experimental validation).

medium positive Advanced Insurance Risk Modeling for Pseudo-New Customers Us... risk-based_decision-making_effectiveness

The balanced ensemble provides the most favourable trade-off between predictive performance, robustness, interpretability, and computational efficiency, making it suitable for deployment in regulated insurance environments.

Authors' synthesis of experimental results (performance, robustness tests, interpretability considerations, and computational efficiency measurements) and discussion regarding regulatory deployment suitability.

medium positive Advanced Insurance Risk Modeling for Pseudo-New Customers Us... suitability_for_deployment / trade-off_between_metrics

These variables (education, gender inclusiveness, digital literacy, perceived fairness) are mutually dependent and the use of AI combined with inclusive policies is necessary to sustainably realize financial inclusion.

Paper asserts mutual dependence based on SEM results and provides a policy recommendation that AI plus inclusive policies are necessary, citing prior literature (Salami et al., 2025; Berg et al., 2019; Fuster et al., 2021).

medium positive A Machine Learning Perspective on FinTech-Driven Inclusion: ... sustainable financial inclusion

Synthetic experiments complement the theoretical results and showcase the benefits of collective action across different market regimes.

Simulation-based experiments described in the paper (synthetic experiments across market regimes). Paper does not report a real-world sample size; results are from computational experiments.

medium positive Stochastic wage suppression on gig platforms and how to orga... benefit of collective action (improvements in wages/total spending across simula...

Spatial heterogeneity: Eastern regions are driven by knowledge recombination opportunities.

Reported spatial heterogeneity findings indicating Eastern China’s diffusion is driven more by recombination/opportunity measures than by reliance on core hubs.

medium positive Mapping China’s digital transformation: a multilayer network... drivers of diffusion in Eastern regions (knowledge recombination)

Spatial heterogeneity: Western regions rely heavily on core technological hubs.

Spatial analysis / heterogeneity results reported by region indicating Western China depends on core technological hubs as diffusion sources or anchors.

medium positive Mapping China’s digital transformation: a multilayer network... regional dependence on core technological hubs (Western regions)

Heterogeneity analysis: market-driven enterprises heavily rely on high-value core technologies.

Reported heterogeneity results indicating enterprises (market-driven actors) concentrate on and depend upon core, high-value technologies within identified diffusion paths.

medium positive Mapping China’s digital transformation: a multilayer network... enterprises' reliance on core high-value technologies

« Prev 1 2 3 … 242 243 244 … 298 299 Next »