Evidence (8974 claims)

Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.

The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).

Browse by theme

Nine broad, paper-level topics. Click one to filter the claims below.

Human-AI Collaboration

Claims by outcome category

Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.

Outcome	Positive	Negative	Mixed	Null	Total
Other	882	244	117	1097	2424
Governance & Regulation	1010	469	229	135	1875
Organizational Efficiency	977	235	149	90	1462
Technology Adoption Rate	781	299	143	128	1362
Research Productivity	506	155	74	363	1110
Output Quality	555	219	71	70	915
Decision Quality	395	200	95	54	751
Firm Productivity	523	67	101	27	724
AI Safety & Ethics	262	309	75	36	688
Market Structure	195	201	135	30	566
Task Allocation	248	77	96	38	464
Innovation Output	300	34	55	20	411
Skill Acquisition	207	75	65	21	368
Employment Level	138	67	119	24	350
Fiscal & Macroeconomic	156	80	53	33	329
Task Completion Time	211	38	13	16	280
Firm Revenue	183	52	29	5	270
Consumer Welfare	131	77	48	13	269
Inequality Measures	50	141	54	9	254
Worker Satisfaction	104	85	25	13	227
Error Rate	87	112	11	5	215
Automation Exposure	69	69	37	20	198
Wages & Compensation	102	49	31	11	193
Team Performance	115	30	30	11	187
Regulatory Compliance	88	74	17	7	186
Training Effectiveness	109	22	14	21	168
Developer Productivity	116	21	15	8	161
Job Displacement	12	92	26	1	131
Hiring & Recruitment	57	12	9	5	83
Skill Obsolescence	6	59	10	2	77
Social Protection	43	17	8	2	70
Creative Output	35	21	9	4	70
Labor Share of Income	18	23	17	1	59
Worker Turnover	15	16	—	4	35
Industry	—	—	—	1	1

Productivity Remove filter

The positive impact of AI application on enterprise innovation efficiency is stronger in state-owned enterprises.

Heterogeneity/subsample analysis using firm ownership status in the 2012–2023 A-share panel showing larger effects for state-owned enterprises.

high positive Research on the Influence Mechanism of Artificial Intelligen... enterprise innovation efficiency (heterogeneous by ownership)

The positive impact of AI application on enterprise innovation efficiency is stronger in firms located in central and western regions of China.

Heterogeneity/subsample analysis on the 2012–2023 panel of Shanghai and Shenzhen A-share listed firms showing larger estimated effects for firms in central and western regions.

high positive Research on the Influence Mechanism of Artificial Intelligen... enterprise innovation efficiency (heterogeneous by region)

The effect of AI application on enterprise innovation efficiency is mediated by enterprise ESG performance.

Mediation analysis on Shanghai and Shenzhen A-share listed firms (2012–2023) demonstrating a significant mediating role for ESG performance.

high positive Research on the Influence Mechanism of Artificial Intelligen... enterprise innovation efficiency (via ESG performance)

The effect of AI application on enterprise innovation efficiency is mediated by the enterprise's data factor utilization level.

Mediation analysis (empirical) using the 2012–2023 A-share firm panel showing significant mediating effects of data factor utilization.

high positive Research on the Influence Mechanism of Artificial Intelligen... enterprise innovation efficiency (via data factor utilization)

The effect of AI application on enterprise innovation efficiency is mediated by improvements in enterprise "new-quality productivity".

Mediation analysis (empirical) on the 2012–2023 panel of Shanghai and Shenzhen A-share listed firms showing a significant mediating role for new-quality productivity.

high positive Research on the Influence Mechanism of Artificial Intelligen... enterprise innovation efficiency (via new-quality productivity)

AI application can significantly improve enterprise innovation efficiency.

Empirical analysis of Shanghai and Shenzhen A-share listed enterprises using panel data from 2012–2023; baseline regressions showing a significant positive relationship between AI application measures and enterprise innovation efficiency.

high positive Research on the Influence Mechanism of Artificial Intelligen... enterprise innovation efficiency

The review synthesizes fragmented evidence and links AI use to SME performance improvements, while outlining directions for future research on sustainable AI adoption.

Self-description of the article's contribution based on the authors' focused literature review (2016-2024).

high positive The Role of Artificial Intelligence in Strengthening Financi... synthesis quality and linkage of AI to performance improvements

Cloud-based AI solutions, targeted employee training, and explainable AI are identified strategies to overcome AI adoption challenges in SMEs.

Recommendations synthesized from the reviewed literature (2016-2024); presented as enabling strategies rather than results from a single empirical intervention).

high positive The Role of Artificial Intelligence in Strengthening Financi... effectiveness of strategies for enabling AI adoption

AI supports more data-driven financial planning for SMEs.

Identified across the reviewed empirical and conceptual studies in the 2016-2024 literature (synthesis rather than new empirical estimate).

high positive The Role of Artificial Intelligence in Strengthening Financi... use of data-driven methods in financial planning

AI enables real-time fraud detection for SMEs.

Synthesis of empirical and conceptual literature reporting AI applications in fraud detection (review-level claim; no aggregated quantitative effect provided).

high positive The Role of Artificial Intelligence in Strengthening Financi... timeliness and effectiveness of fraud detection

AI enables more accurate credit risk assessment for SMEs.

Review synthesizing studies on credit scoring and risk assessment within the 2016-2024 corpus (no single pooled sample size or unified effect estimate provided).

high positive The Role of Artificial Intelligence in Strengthening Financi... credit risk assessment accuracy

AI improves cash flow and financial forecasting for SMEs.

Synthesis of empirical studies and conceptual papers in the 2016-2024 literature reviewed (review article does not report primary sample sizes/effect estimates).

high positive The Role of Artificial Intelligence in Strengthening Financi... cash flow and financial forecasting accuracy

AI offers strong potential to enhance the financial stability and growth of SMEs when supported by suitable organizational capacities and governance.

Focused review of high-quality research (2016-2024) synthesizing empirical and conceptual studies on AI applications in SME finance (no single-sample primary data reported).

high positive The Role of Artificial Intelligence in Strengthening Financi... financial stability and growth of SMEs

There is a need for privacy-preserving deployments and richer, structure-aware representations of human knowledge for practical use.

Authors' recommendation/conclusion drawn from observed accuracy/limitations and privacy considerations in using long-term Slack logs.

high positive Can AI Guess What You Know? Performance Comparison of Large ... requirement for privacy-preserving deployment practices and improved representat...

Gemini 2.5 Flash achieved the lowest error (MAE 21.13%).

Reported model evaluation results comparing MAE across models; Gemini 2.5 Flash reported as lowest with MAE 21.13%.

high positive Can AI Guess What You Know? Performance Comparison of Large ... mean absolute error (MAE) of skill estimates

We analyze 27,188 messages from 43 users to investigate whether LLMs can infer individual domain knowledge from long-term Slack logs.

Dataset description reported in the paper: 27,188 Slack messages from 43 users.

high positive Can AI Guess What You Know? Performance Comparison of Large ... dataset size and coverage (messages and users analyzed)

Our project website, including the leaderboard, dataset, and code, is available at https://dong7313.github.io/muse-benchmark/.

Statement in abstract and provided URL pointing to project artifacts.

high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... availability of project website, leaderboard, dataset, and code

Together, MUSE provides a realistic benchmark and evaluation framework for advancing Text-to-CAD from geometric generation toward true engineering design.

Paper's stated contribution and intended purpose (abstract) and provision of dataset/benchmark artifacts via project website.

high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... utility of benchmark and evaluation framework for advancing Text-to-CAD toward e...

To enable scalable evaluation, we use a rubric-based visual language model (VLM) judge and validate its reliability through human annotation.

Method and validation claim in abstract stating use of rubric-based VLM and validation against human annotations.

high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... reliability of rubric-based VLM judge (agreement with human annotation)

The final stage uses design-specific rubrics to assess functionality, manufacturability, and assemblability, moving beyond shape matching toward practical design quality.

Paper's description of the benchmark's evaluation rubric and intended assessment criteria (abstract).

high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... assessed functionality, manufacturability, and assemblability of generated CAD m...

MUSE pairs practical design instances with structured Design Specifications and evaluates generated models through a three-stage protocol: code check, geometric check, and design-intent alignment.

Methodological description in abstract indicating dataset pairing and three-stage evaluation protocol.

high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... evaluation pipeline effectiveness (code executability, geometric validity, desig...

We introduce MUSE, a Text-to-CAD benchmark focused on complex, editable boundary representation (B-Rep) assemblies.

Paper contribution / dataset creation described in abstract; supported by project website and accompanying dataset/code.

high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... availability of a Text-to-CAD benchmark for complex B-Rep assemblies

Two non-negotiable design requirements guide the architecture: cognitive-load redistribution (DR1) and bounded autonomy with alignment (DR2).

Design requirements explicitly stated in the paper guiding the HARMONY architecture.

high positive From Replacement to Orchestration: A Socio-Technical Archite... degree to which design reduces researcher cognitive load and constrains agentic ...

The model introduces 'Orchestration Leverage' as a candidate productivity metric suited to human–agent hybrid systems.

Conceptual proposal within the paper (new metric introduced as part of HARMONY).

high positive From Replacement to Orchestration: A Socio-Technical Archite... productivity of human–agent hybrid research teams (via proposed metric)

We propose HARMONY (Hybrid Agentic Research Model for Organisational New Yield), a four-pillar socio-technical architecture comprising ResOps (Industrialized Execution), the Control Tower (Strategic Visibility and Drift Detection), the Ethics Fabric (Bounded Autonomy by Design), and the Talent Studio (Sciencepreneur Capability).

Design Science Research artifact (proposed operating model described in the paper).

high positive From Replacement to Orchestration: A Socio-Technical Archite... organizational capability to conduct agentic R&D / R&D productivity

Augment Engineering completes a three-discipline progression: Prompt Engineering (one tool), Context Engineering (reproducible pipelines), Augment Engineering (a portfolio of tools across domains).

Conceptual framing presented in the paper describing a proposed progression of disciplines.

high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... conceptual progression among related disciplines

A Wright's Law fit (n = 82 artifacts, p < 0.01) shows production acceleration across the artifact portfolio.

Quantitative model reported in the paper: Wright's Law fit on 82 artifacts with reported p-value < 0.01.

high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... production acceleration (learning curve effects) across produced artifacts

A Cochran-Armitage trend test (n = 200 interactions across two chat LLMs, p < 0.01) shows first-pass acceptance rising with prompt-sophistication level.

Quantitative test reported in the paper: Cochran-Armitage trend test on 200 interactions across two chat LLMs, reported p-value < 0.01.

high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... first-pass acceptance rate of generated outputs as a function of prompt sophisti...

A 5-month formative case study (Nov 2025 to Mar 2026) documents a single practitioner applying Augment Engineering skills across a ten-component orchestration stack spanning seven professional domains, producing work products that would traditionally involve separate domain specialists.

Case study reported in the paper describing one practitioner's activities over five months across a 10-component stack in seven domains; sample size = 1 practitioner.

high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... ability of one practitioner to produce cross-domain work products that tradition...

The paper presents a six-phase orchestration methodology and four portability metrics for Augment Engineering.

Stated methodological contribution within the paper (description of methodology and metrics).

high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... methodology and metrics for orchestration and portability

Augment Engineering is a discipline of orchestrating multiple purpose-built AI tools across distinct professional domains, applying prompt and context engineering as portable competencies that transfer across tool boundaries.

Definition and conceptual development presented in the paper (methodological contribution).

high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... existence/definition of a new discipline (Augment Engineering)

Prompt engineering (interaction-level optimization) and context engineering (structured input pipeline design) are domain-portable meta-skills: a practitioner who masters them can apply them to any purpose-built AI tool in any domain.

Conceptual claim supported by the paper's argumentation and exemplified by a single-practitioner case study.

high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... portability of prompt and context engineering skills across tools and domains

The framework has implications for digital health, education, AI personalisation, and personal agency.

Authors' discussion in paper of potential implications across these application domains; presented qualitatively.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... implications for listed application domains

The authors list six operational requirements for state-aware systems.

Explicit statement in paper that six operational requirements are listed; descriptive rather than empirically tested in abstract.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... number of operational requirements

The authors derive seven testable predictions from the state-aware framework.

Explicit statement in paper that seven testable predictions are derived from the framework; no individual prediction effects quantified in abstract.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... number of derived testable predictions

The paper is supported by a 24-month observational base from a deployed behavioural platform spanning more than 200,000 consented users across four occupational personas (research period 2023 to 2026).

Empirical dataset described in the paper: observational deployment over 24 months, >200,000 consented users, four occupational personas, timeframe given (2023–2026).

high positive You Are in Control of Your State: Why Human Outcomes Are Con... existence and scale of observational dataset

The framework is motivated by six strands of established evidence: causal inference, predictive processing, allostasis, attentional bottleneck, chronobiology, and computational psychiatry.

Explicit statement in paper describing the literature strands used to motivate the framework.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... theoretical grounding of framework

Taken together, these claims imply that the outcome of a given event is controllable, conditionally, on the state-trajectory at the time of intervention.

Synthesis/implication drawn by authors from the conceptual framework and the six literature strands; argued but not quantified in abstract.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... conditional controllability of event outcomes

The conscious channel through which outcomes are reportable is a narrow attentional bottleneck whose contents are themselves state-dependent.

Theoretical claim supported by attentional bottleneck literature cited in the paper; presented as part of the conceptual framework.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... attentional bottleneck content dependency on state

The weighting vector (state) is dynamic at sub-daily timescales.

Claim motivated by chronobiology and related literature cited in the paper; authors state the sub-daily dynamism as part of their framework.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... temporal dynamics of latent state

The relationship between state, decision, and outcome is causal rather than correlational.

Argument grounded in causal inference literature cited by the authors; presented as a core theoretical claim in the paper rather than demonstrated by a specific randomized experiment in the abstract.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... causal influence of state on decisions/outcomes

A state can be defined as the time-indexed weighting vector over the dimensions that govern how an individual's biology, physiology, and neuropsychology process the next event into a decision and an outcome.

Explicit definitional claim / framework component introduced by the authors; justified conceptually via multidisciplinary literature cited in the paper.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... conceptual definition of latent state

Human outcomes are controllable in a precise and operational sense through interventions that target the state and its weighting at the moment a decision is being formed.

Theoretical argument in the paper, motivated by the six literature strands; supported in part by the authors' deployed behavioural platform (see separate claim about dataset) but no randomized effect sizes reported in abstract.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... controllability of outcomes via state-targeted interventions

This persistent variability belongs in a dynamic latent state of the person (i.e., is best modelled as a time-varying latent state).

Conceptual claim supported by integration of six strands of established evidence (causal inference, predictive processing, allostasis, attentional bottleneck, chronobiology, computational psychiatry) cited in the paper.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... attribution of outcome variance to latent state

Within-person variability persists: the same individual, presented with the same observable input, produces different outcomes on different occasions, and different individuals produce divergent outcomes that no observable covariate fully predicts.

Statement motivated by literature review across behavioural sciences; argued in paper as empirical puzzle rather than proven with new statistics in this manuscript.

high positive You Are in Control of Your State: Why Human Outcomes Are Con... variation in individual outcomes / decisions

Agents share successes and failures to reduce redundant exploration during long-running experiments.

Design of AutoScientists includes mechanisms for recording and sharing experimental outcomes; asserted benefit in paper that this reduces redundant exploration (qualitative and supported by experimental comparisons).

high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... redundant exploration (qualitative/system-level reduction)

Applied without modification across all 217 ProteinGym assays, the same method improves over the prior state of the art by +6.5% (Spearman correlation).

Empirical evaluation across all 217 assays in the ProteinGym benchmark; reported aggregate improvement in Spearman correlation versus prior state-of-the-art.

high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... Spearman correlation averaged across 217 ProteinGym assays

On ProteinGym fitness prediction, AutoScientists discovers a method for ACE2-Spike binding that improves over the current state-of-the-art model by +12.5% in Spearman correlation.

Empirical evaluation on the ACE2-Spike assay within the ProteinGym benchmark; reported relative improvement in Spearman correlation versus prior state-of-the-art.

high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... Spearman correlation on ACE2-Spike binding fitness prediction

On GPT training optimization, AutoScientists continues discovering improvements from a starting champion where the single-agent approach finds none (7 vs. 0 accepted improvements).

Empirical comparison of discovered/accepted improvements during GPT training optimization; counts of accepted improvements for AutoScientists (7) versus single-agent approach (0).

high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... count of accepted improvements discovered

On GPT training optimization, AutoScientists reaches a target validation bits-per-byte 1.9x faster than Autoresearch.

Empirical training-time comparison between AutoScientists and Autoresearch on GPT training optimization tasks; reported speedup multiplier to reach a validation bits-per-byte target.

high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... time-to-target (validation bits-per-byte)

« Prev 1 2 3 … 75 76 77 … 179 180 Next »