Evidence (14922 claims)

Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.

The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).

Browse by theme

Nine broad, paper-level topics. Click one to filter the claims below.

Human-AI Collaboration

Claims by outcome category

Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.

Outcome	Positive	Negative	Mixed	Null	Total
Other	795	210	105	955	2131
Governance & Regulation	886	414	197	126	1654
Organizational Efficiency	826	204	129	87	1257
Technology Adoption Rate	681	259	128	110	1189
Research Productivity	464	138	65	349	1028
Output Quality	503	196	61	53	813
Decision Quality	351	180	84	51	673
AI Safety & Ethics	238	288	71	34	637
Firm Productivity	455	58	92	20	631
Market Structure	186	172	123	25	511
Task Allocation	222	70	76	34	407
Innovation Output	238	28	48	18	334
Skill Acquisition	177	62	62	17	318
Employment Level	107	57	108	13	287
Fiscal & Macroeconomic	135	72	44	26	284
Firm Revenue	172	50	28	5	256
Consumer Welfare	121	68	45	12	246
Task Completion Time	183	33	10	13	240
Inequality Measures	45	126	50	6	227
Worker Satisfaction	95	74	23	12	204
Error Rate	77	98	11	4	190
Regulatory Compliance	84	73	17	7	181
Automation Exposure	61	61	27	14	166
Training Effectiveness	98	21	14	19	154
Wages & Compensation	78	37	25	6	146
Developer Productivity	105	18	14	6	144
Team Performance	87	17	28	10	143
Job Displacement	12	83	23	1	119
Hiring & Recruitment	53	8	8	3	72
Social Protection	39	17	8	2	66
Creative Output	32	20	8	3	64
Skill Obsolescence	5	50	6	1	62
Labor Share of Income	17	20	17	—	54
Worker Turnover	15	15	—	3	33
Industry	—	—	—	1	1

Firms that successfully combine AI with learning and knowledge coordination can reduce inefficiencies, accelerate innovation cycles and improve overall performance.

Authors' conclusion and managerial implication derived from observed associations in the survey (AIDLC → KO → OI → IP).

medium positive Enhancing innovation in Pakistan’s IT sector efficiency, innovation cycle speed, overall performance

AI can reduce knowledge gaps and help employees adapt to change; well-designed AI systems complement human creativity, improve judgment and reduce repetitive tasks rather than simply replacing workers.

Authors' discussion and normative claim drawing on study findings and literature; not presented as a directly tested causal result in the survey.

medium positive Enhancing innovation in Pakistan’s IT sector job displacement / complementarity (adaptation, creativity, reduction of repetit...

This LLM-based retrieval ensures that small creative variants from the advertiser yield consistent and explainable delivery results to the user.

Paper asserts that semantic-aware retrieval produces consistent and explainable delivery across small creative perturbations; claimed empirical support via online validation/experiments but no quantitative numbers provided in excerpt.

medium positive LLM Retrieval for Stable and Predictable Ad Recommendations consistency/repeatability and explainability of delivery for small creative vari...

The findings offer practical implications for corporate R&D strategies and innovation policy design in the era of AI.

Discussion/implications section asserting that the study's findings can inform corporate R&D and policy design.

medium positive Knowledge flows from science to AI technology: Identifying c... practical implications for R&D strategy and policy design

The study elucidates the structural pathways of knowledge flow from science to technology in AI.

Combined analysis of patent–publication citation links and semantic topic mapping intended to reveal structural knowledge-flow pathways.

medium positive Knowledge flows from science to AI technology: Identifying c... structure/pathways of science-to-technology knowledge flow

The analysis traces key technological trends in AI across the studied period.

Results from topic modeling and longitudinal analysis of patent and cited-publication topics across 2002–2021.

medium positive Knowledge flows from science to AI technology: Identifying c... technological trends over time

Text optimization with LLM-based search is a general-purpose problem-solving paradigm, unifying tasks traditionally requiring domain-specific algorithms under a single framework (claimed as a first-time result).

Synthesis claim based on the collection of experiments across the six tasks and ablations reported in the paper; presented as a novel, unifying demonstration.

medium positive optimize_anything: A Universal API for Optimizing any Text P... generality / applicability of LLM-based text optimization across problem types

The self-evolving verification layer improves verifier reliability using execution-grounded feedback.

Design and experimental claim in the paper that the verification layer is self-evolving and that it enhances verifier reliability via execution-grounded feedback loops.

medium positive OpenComputer: Verifiable Software Worlds for Computer-Use Ag... verifier reliability / accuracy

Human-governed collaboration is the most credible deployment paradigm.

Policy/recommendation from the paper based on cross-stage analysis and synthesis; not presented as the result of a controlled experiment in the excerpt.

medium positive AI for Auto-Research: Roadmap & User Guide credibility of deployment paradigms (human-governed vs autonomous)

Only RL-based predictions yield product-repositioning decisions for impulse products that align with those derived from actual trajectory data, resulting in comparable estimated profit gains.

Comparison of recommended repositioning decisions derived from RL versus those derived from observed (actual) trajectories and from heuristic models; reported that RL recommendations match actual-derived recommendations and produce similar estimated profit gains. No numerical profit figures or sample sizes are provided in the excerpt.

medium positive Modelling Customer Trajectories with Reinforcement Learning ... alignment of repositioning decisions and estimated profit gains from repositioni...

RL-based trajectories provide more accurate estimates of impulse purchase rates and shelf traffic densities than TSP and PNN.

Model-based comparisons against real-world trajectory data showing that outputs from RL more closely match observed impulse purchase rates and shelf traffic densities; specific quantitative comparisons and sample sizes not provided in the excerpt.

medium positive Modelling Customer Trajectories with Reinforcement Learning ... accuracy of estimated impulse purchase rates and shelf traffic densities

Extensive online analysis and A/B testing demonstrate GrowthGR's positive impact on the overall ecosystem value.

Paper reports extensive online analysis and A/B testing as supporting evidence (no further quantitative details or sample sizes provided in the excerpt).

medium positive Towards Sustainable Growth: A Multi-Value-Aware Retrieval Fr... overall ecosystem value (aggregate platform/ecosystem metrics)

Behavioral studies report that compact trajectories correlate with higher resolution rates.

Statement summarizing prior behavioral studies in SE literature (no specific study or sample size cited in excerpt).

medium positive Same Signal, Different Semantics: A Cross-Framework Behavior... issue resolution rate (correlation with trajectory compactness)

Behavioral studies report that short error cascades correlate with higher resolution rates.

Statement summarizing prior behavioral studies in SE literature (no specific study or sample size cited in excerpt).

medium positive Same Signal, Different Semantics: A Cross-Framework Behavior... issue resolution rate (correlation with short error cascades)

Behavioral studies of LLM-based software engineering agents extract operational rules about which trajectory shapes correlate with higher resolution rates (e.g., that a test step follows a code modification).

Statement summarizing prior behavioral studies in SE literature (no specific study or sample size cited in excerpt).

medium positive Same Signal, Different Semantics: A Cross-Framework Behavior... issue resolution rate (correlation with trajectory pattern: test step following ...

Hierarchical decomposition without deliberation achieves the best absolute performance for most models.

Observed performance rankings across the evaluated configurations and models (six models across five model families) in the CybORG CAGE-2 evaluation (3,475 episodes), comparing monolithic ReAct vs. delegation to specialized sub-agents with and without deliberation tools.

medium positive Context, Reasoning, and Hierarchy: A Cost-Performance Study ... absolute mean return

Effective AI implementation, coupled with employee training and transparent communication, can reduce resistance and anxiety among employees.

Interpretation and conclusion drawn from the observed negative relationship between perceived opportunities and challenges and the pattern of survey responses; presented as a recommended approach in the study.

medium positive Opportunities and Challenges of Human- AI Collaboration in W... reduction in resistance/anxiety (perceived)

Our work also highlights the benefits of legislation aimed at protecting individuals' data rights as a counterweight to the tech industry's discourse of exceptionalism, which obscures its dependence on BPOs to externalise labour costs and accountability.

Argument and empirical demonstration in paper that data-rights legislation (GDPR) enabled access to documents and exposed BPO practices; used to argue for policy benefits. (Empirical extent and generalizability not quantified in the excerpt.)

medium positive Auditing African Content Moderators' Working Conditions by U... effectiveness of data-protection legislation in revealing/exercising worker righ...

PRIF shifts forensic accounting from reactive detection to proactive prevention, advancing stakeholder trust and industry standards.

Paper's concluding claim about the conceptual shift and expected industry/stakeholder outcomes following PRIF adoption (argumentative/interpretive).

medium positive Enhancing Forensic Accounting Practice: A Proactive Risk Man... shift from reactive to proactive practices; stakeholder trust and standards

PRIF provides practical benefits including scalable toolkits for firms and policy guidance for regulators with a broader impact on financial governance.

Paper's discussion/recommendations claiming practical toolkits and policy guidance; asserted broader impact on financial governance.

medium positive Enhancing Forensic Accounting Practice: A Proactive Risk Man... availability of scalable toolkits and policy guidance (practical benefits)

Wage inequality increased due to differential skill adaptation across workers.

Authors' conclusion drawn from observed effects of AI adoption and skill transformation on wage dynamics in the SEM applied to the survey (n=320); statement presented qualitatively in the results/discussion (no inequality coefficient provided in the summary).

medium positive ARTIFICIAL INTELLIGENCE, AUTOMATION, AND LABOR MARKET TRANSF... wage inequality / distributional effects

AI created opportunities by increasing demand for high-skilled labor.

Authors' interpretation of SEM results and descriptive analysis from the survey of n=320 employees indicating skill-upgrading effects; specific numerical evidence for 'demand for high-skilled labor' not reported in the summary.

medium positive ARTIFICIAL INTELLIGENCE, AUTOMATION, AND LABOR MARKET TRANSF... demand for high-skilled labor

Our results provide actionable guidance for firms choosing among multiple candidate recommendation systems.

Claim in abstract that theoretical results can inform firm decisions; implies prescriptive insights derived from the framework (no empirical validation or sample size given in abstract).

medium positive Logging Policy Design for Off-Policy Evaluation guidance usefulness for selecting recommendation systems (improved selection via...

Participants reported greater trust in the process under the same conditions where facilitators exerted directional influence on outcomes.

Post-task survey trust measures reported higher trust for facilitator conditions that also showed directional shifts in allocation outcomes (as measured above).

medium positive Real-Time Group Dynamics with LLM Facilitation: Evidence fro... trust in the deliberation process (self-reported)

AI-assisted annotation has become standard in large-scale labeling workflows.

Background claim made in the paper's introduction as contextual motivation for the study (no specific evidence or data reported in the abstract).

medium positive From Model Uncertainty to Human Attention: Localization-Awar... adoption of AI-assisted annotation

"Augmented Intelligence" models, which combine human contextual judgment with algorithmic precision, reduce attrition by 22% compared with complete automation.

Reported comparative result in the paper's analysis (paper claims comparative attrition rates between augmented and fully automated approaches; exact data source not explicitly tied to one of the stated samples in the abstract).

medium positive Augmented Intelligence: Resolving the AI integration-obsoles... employee attrition (turnover)

The shift toward solo entry is particularly pronounced in categories that historically favored team-based ventures.

Category-level breakdowns within the Product Hunt dataset showing larger increases in solo-founder launches in categories with a historical bias toward team-based ventures.

medium positive Generative AI Fuels Solo Entrepreneurship, but Teams Still L... change in solo-founder share by category (relative increase)

Across the (lambda, kappa) grid both arms pass family-wise scenario-clustered correction (p<0.001 / p=0.008).

Statistical analysis across a grid of governance parameter settings (lambda, kappa) with family-wise scenario-clustered multiple-testing correction; p-values reported for both arms.

medium positive TourMart: A Parametric Audit Instrument for Commission Steer... statistical significance of steering effects across parameter grid after correct...

Societies have long governed opaque expertise through credentials, monitoring, liability, appeal, and revocation rather than mechanism-level explanation.

Historical/institutional claim made by the authors as conceptual evidence for alternative governance approaches (argument and analogy to existing institutions).

medium positive The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... prevalent governance mechanisms for opaque expertise

This paper connects formal fairness research with legal and ethical requirements to search for less discriminatory alternatives, offering a principled foundation for evaluating and comparing algorithmic decision systems.

Conceptual discussion linking the theoretical characterization of the Pareto frontier and fairness trade-offs to legal/ethical norms and decision-making practice; proposed framework for evaluation/comparison based on the derived results.

medium positive Fairness vs Performance: Characterizing the Pareto Frontier ... utility of the theoretical results for legal/ethical evaluation and comparison o...

For lenders and investors, wider VTech adoption can enhance valuation accuracy, portfolio transparency and collateral risk assessment, strengthening confidence in property markets and capital allocation.

Interpretation and implications drawn from interview data and theoretical synthesis; no quantitative measurement reported in the study.

medium positive Exploring barriers to valuation technology adoption in prope... valuation accuracy, portfolio transparency and collateral risk assessment

Switchcraft enables cost-aware agentic AI deployment without sacrificing correctness.

Synthesis claim based on Switchcraft achieving comparable accuracy (82.9%) while substantially reducing cost (84% reduction) in the paper's experiments.

medium positive Switchcraft: AI Model Router for Agentic Tool Calling trade-off between cost reduction and correctness (accuracy)

Based on the findings, firms should invest in proprietary AI models and governments should promote open data initiatives.

Policy recommendations presented in the conclusion, motivated by empirical findings (inverted-U, homogenization trap, heterogeneity).

medium positive The Inverted-U Relationship Between AI and Corporate Innovat... policy/recommendation implications (firm and government actions)

High-performing, human-comparable legal AI no longer requires the largest externally hosted models.

Conclusion/interpretation in paper based on the Olava Extract results outperforming/competing with frontier models while being self-hosted and smaller.

medium positive A Few Good Clauses: Comparing LLMs vs Domain-Trained Small L... requirement of large externally hosted models for high-performing legal AI (impl...

Fewer hallucinations and unsupported extractions reduce operational risk and downstream review burden in legal workflows.

Argument presented in paper linking lower hallucination/unsupported extraction rates to reduced operational risk and review burden; framed as an important distinction for legal workflows.

medium positive A Few Good Clauses: Comparing LLMs vs Domain-Trained Small L... operational risk and downstream review burden

Olava Extract achieved the highest precision scores, producing fewer hallucinated and unsupported extractions.

Reported precision metrics and qualitative/quantitative statements about hallucination/unsupported extraction rates in the comparison against frontier models.

medium positive A Few Good Clauses: Comparing LLMs vs Domain-Trained Small L... precision score and hallucination (unsupported extraction) rate

Smart manufacturing provides a practical pathway for enhancing economic performance while reducing environmental impact.

Framing/theoretical claim in the paper's introduction motivating the study; supported by cited literature rather than the paper's primary empirical DiD test.

medium positive Environmental policy synergies: unintended benefits for smar... economic performance and environmental impact in relation to smart manufacturing...

Improvements in firms' resource allocation efficiency enhance their ability to adopt smart manufacturing technologies (mechanism).

Mechanism analysis within the study showing that gains in resource allocation efficiency at the firm level are associated with higher adoption of smart manufacturing after LCCP implementation.

medium positive Environmental policy synergies: unintended benefits for smar... firms' resource allocation efficiency and subsequent adoption of smart manufactu...

City-level human capital upgrading lowers firms' costs of adopting smart manufacturing technologies, facilitating adoption (mechanism).

Mechanism analysis reported in the paper linking city-level human capital improvements to reduced firm-level adoption costs and increased adoption; likely based on city-level measures of human capital interacting with treatment in the DiD framework.

medium positive Environmental policy synergies: unintended benefits for smar... firms' cost of adopting smart manufacturing technologies (mediated by city-level...

Generation-protocol variants show that crowding can be reduced through targeted design, making diversity collapse an actionable, development-time evaluation target for population-aware creative AI.

Experimental evidence in the paper demonstrating that modifying generation protocols (design choices) reduces crowding; abstract states results across protocol variants but does not provide quantitative effect sizes or sample counts.

medium positive Ex Ante Evaluation of AI-Induced Idea Diversity Collapse change in crowding (Δ or ρ) under generation-protocol variants

Estimates stabilize with feasible model-only sample sizes.

Empirical/stability analysis reported in the paper (abstract claims convergence/stabilization of estimates with feasible numbers of model-only samples), but the abstract does not quantify what 'feasible' means or give sample counts.

medium positive Ex Ante Evaluation of AI-Induced Idea Diversity Collapse stability/convergence of crowding estimates as model-only sample size increases

Resource-based environmental taxation (the water resource tax reform) can play a role in promoting food security under rigid water constraints.

Interpretation and policy discussion based on the empirical results showing increased grain yield following the reform.

medium positive Can water resource tax reform increase grain yield?—Evidence... food security (via grain yield)

The reform improves water-use efficiency (a channel through which it raises agricultural productivity).

Mechanism analysis in the paper indicating strengthened water-use efficiency following the reform.

medium positive Can water resource tax reform increase grain yield?—Evidence... water-use efficiency

Trajectory-level evaluation is essential in regulated domains.

Conclusion drawn by the authors based on the ASR findings (hidden shortcuts, metric blind spots, and remediation gains); presented as a policy/recommendation implication.

medium positive Beyond Task Success: Measuring Workflow Fidelity in LLM-Base... suitability/necessity of trajectory-level evaluation in regulated contexts

A DLM (Schema-1) eliminates the preprocessing pipelines that currently stand between raw tabular data and AI systems that consume it.

Claims based on model's native consumption of raw cell values and experimental demonstrations (design and reported evaluations suggest reduced need for preprocessing; specific operational workflow impacts not quantified in the abstract).

medium positive Data Language Models: A New Foundation Model Class for Tabul... presence/absence or reduction of preprocessing pipeline steps required before mo...

Schema-1 identifies the industry sector of any unseen dataset from raw cell values alone, reliably across any domain—a task no prior tabular model can perform.

Reported experiments demonstrating industry-sector identification from raw cell values on unseen datasets and cross-domain reliability (details of datasets, number of domains, and metrics not provided in the abstract).

medium positive Data Language Models: A New Foundation Model Class for Tabul... accuracy/reliability of industry-sector identification from raw tabular data

Reinforcement learning in post-training, now the dominant paradigm at the frontier, is structured around task completion and maps more directly onto the task-based architecture of occupational classifications than prior approaches.

Argument based on current ML research practices (framing claim about dominant technical paradigm) and theoretical mapping to task-based occupational taxonomies.

medium positive What Jobs Can AI Learn? Measuring Exposure by Reinforcement ... suitability of RL (post-training) for modeling occupational tasks

Future progress in AI-based software engineering depends on equipping agents with explicit architectural foresight so generated software is maintainable, not just functional.

Conclusion/recommendation based on the empirical findings (Reasoning-Complexity Trade-off and Volume-Quality Inverse Law) and failures of prompting and correctness to mitigate decay.

medium positive AI-Generated Smells: An Analysis of Code and Architecture in... need for architectural foresight in agent design to improve maintainability

Production agentic systems make many model calls per user request, and most of those calls are short, structured, and routine.

Contextual claim motivating the work; presented as an empirical generalization about production agent pipelines, but not quantified in the abstract.

medium positive AgentFloor: How Far Up the tool use Ladder Can Small Open-We... distribution of model-call types in production agentic systems (short/structured...

Small and mid-sized open-weight models are already sufficient for much of the short-horizon, structured tool use work that dominates real agent pipelines.

Aggregate benchmark results across AgentFloor tiers showing high performance of smaller and mid-sized open-weight models on short-horizon structured tasks; supported by the 16,542 scored runs and model comparisons reported in the paper.

medium positive AgentFloor: How Far Up the tool use Ladder Can Small Open-We... ability to complete short-horizon, structured tool-use tasks on the AgentFloor b...

« Prev 1 2 3 … 241 242 243 … 298 299 Next »