Evidence (8807 claims)

Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.

The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).

Browse by theme

Nine broad, paper-level topics. Click one to filter the claims below.

Human-AI Collaboration

Claims by outcome category

Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.

Outcome	Positive	Negative	Mixed	Null	Total
Other	870	233	116	1066	2363
Governance & Regulation	976	451	218	133	1809
Organizational Efficiency	949	224	144	88	1416
Technology Adoption Rate	764	287	141	122	1325
Research Productivity	501	152	74	362	1101
Output Quality	542	216	69	69	896
Decision Quality	387	198	94	54	740
Firm Productivity	513	67	101	27	714
AI Safety & Ethics	249	303	73	36	667
Market Structure	190	192	134	27	548
Task Allocation	243	77	91	36	452
Innovation Output	291	33	55	20	401
Skill Acquisition	206	72	65	21	364
Employment Level	133	63	115	22	335
Fiscal & Macroeconomic	153	79	52	32	323
Task Completion Time	206	37	12	15	272
Firm Revenue	179	52	29	5	266
Consumer Welfare	130	76	47	13	266
Inequality Measures	48	137	51	6	242
Worker Satisfaction	101	81	25	13	220
Error Rate	84	110	11	5	210
Wages & Compensation	98	47	30	10	185
Regulatory Compliance	88	73	17	7	185
Automation Exposure	66	64	33	16	182
Team Performance	105	29	30	11	176
Training Effectiveness	109	22	14	21	168
Developer Productivity	114	21	14	8	158
Job Displacement	12	90	24	1	127
Hiring & Recruitment	57	9	9	5	80
Skill Obsolescence	6	56	9	1	72
Social Protection	43	17	8	2	70
Creative Output	35	21	9	4	70
Labor Share of Income	18	21	17	1	57
Worker Turnover	15	16	—	4	35
Industry	—	—	—	1	1

Productivity Remove filter

Current models heavily rely on large static datasets and batch training and exhibit poor lifelong/continual learning.

Synthesis of common practices in contemporary ML (supervised pretraining and offline training paradigms); no new experiments provided.

high negative Why AI systems don't learn and what to do about it: Lessons ... continual learning performance; dependence on dataset size and batch training

When identical replies are labeled as coming from AI rather than from a human, recipients report feeling less heard and less validated (an attribution effect).

Controlled attribution labeling experiment within the study: identical replies presented with different source labels (AI vs. human) and recipient-rated perceptions of being heard/validated measured.

high negative Practicing with Language Models Cultivates Human Empathic Co... recipient-rated feelings of being heard and validated

HindSight scores are negatively correlated with LLM-judged novelty (Spearman ρ = −0.29, p < 0.01), indicating LLM judges tend to overvalue novel-sounding ideas that do not materialize in the literature.

Reported Spearman correlation between HindSight scores and LLM-judged novelty across the generated ideas; ρ = −0.29 with p < 0.01. Interpretation that LLMs overvalue novel-sounding ideas is drawn from the negative correlation.

high negative HindSight: Evaluating LLM-Generated Research Ideas via Futur... Correlation between HindSight score (downstream impact) and LLM-judged novelty s...

Barriers to adoption include toolchain cost, trace data storage/transfer demands, IP-security concerns when sharing traces, and organizational inertia.

Listed as practical caveats and limitations in the summary; based on authors' experience and reasoning rather than quantified study.

high negative ODIN-Based CPU-GPU Architecture with Replay-Driven Simulatio... adoption barriers (cost, storage, security, organizational factors)

Adoption requires up-front investment in tooling and infrastructure for deterministic capture/replay, plus management of large trace data and integration with existing validation/IP/security workflows.

Authors explicitly list these practical caveats in the summary: needs tooling/infrastructure, trace data management, and integration with validation flows and IP/security constraints. (Descriptive claim based on implementation experience; no cost figures provided.)

high negative ODIN-Based CPU-GPU Architecture with Replay-Driven Simulatio... required tooling/infrastructure and trace-data management burden

Static ACLs evaluate deterministic rules that ignore partial execution paths and therefore can only capture a subset of organizational constraints.

Formal argument and examples showing static ACLs map to Policy functions that do not depend on partial_path; illustrative limitations presented.

high negative Runtime Governance for AI Agents: Policies on Paths coverage of organizational constraints by static ACLs (proportion of constraints...

Runtime evaluation imposes additional compute, latency, logging, and engineering costs that increase the marginal cost of deploying agents.

Operational discussion in the paper outlining additional runtime compute and logging requirements; cost implications argued qualitatively; no empirical cost measurements provided.

high negative Runtime Governance for AI Agents: Policies on Paths marginal deployment cost (compute/latency/engineering overhead)

Prompt-level instructions and static access control lists (ACLs) are limited special cases of a more general runtime policy-evaluation framework and cannot, in general, enforce path-dependent rules.

Formalization showing prompt/system messages and static ACLs map to restricted forms of the Policy(agent_id, partial_path, proposed_action, org_state) function; logical proof/argument in the paper and illustrative counterexamples.

high negative Runtime Governance for AI Agents: Policies on Paths ability to detect/enforce path-dependent policy violations (yes/no / coverage of...

LLM-based agent behavior is non-deterministic and path-dependent: an agent's safety/compliance risk depends on the entire execution path, not just the current prompt or single action.

Formal/abstract execution model defined in the paper (states, actions, execution paths) and conceptual arguments/illustrative examples showing how earlier states/actions affect later behavior; no large-scale empirical dataset reported.

high negative Runtime Governance for AI Agents: Policies on Paths path-dependent compliance/safety risk (probability of policy violation condition...

Real-world deployment will require representative data coverage and online adaptation despite the method’s robustness mechanisms.

Authors' discussion/limitations section: theoretical requirements for persistently exciting/representative trajectories for DeePC and recommendation for online adaptation and continual data collection for deployment.

high negative Data-driven generalized perimeter control: Zürich case study data representativeness and need for online adaptation (deployment readiness/ris...

Agent performance degrades markedly as environment complexity, stochasticity, and non-stationarity increase, revealing core limitations of current LLM-based agents for long-horizon, multi-factor decision problems.

Experimental results across progressively harder RetailBench environments showing performance falloff for multiple LLMs under increased task complexity and non-stationarity.

high negative RetailBench: Evaluating Long-Horizon Autonomous Decision-Mak... overall agent performance across increasing environment complexity (e.g., fulfil...

Proactive AI at national scale amplifies concerns around transparency, accountability, privacy, and potential misuse, necessitating robust regulatory and ethical frameworks.

Normative and ethical analysis in the paper, supported by general literature on large-scale AI governance; no empirical assessment of regulatory effectiveness in Russia included.

high negative DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... risks to transparency, accountability, privacy and potential for misuse

Aggregating informal and recommendation data raises privacy and consent issues in low-regulation contexts, requiring governance safeguards.

Policy and ethical consideration based on the nature of the data used; no specific privacy-impact assessment reported in the summary.

high negative AI-Driven Skill Mapping and Gig Economy Matching Algorithm f... privacy risk / consent compliance

NLP/ML systems can inherit biases from inputs (underrepresentation, noisy self-reports, biased recommendations) and may therefore disadvantage some youth unless transparency and fairness constraints are implemented.

Reasoned risk assessment grounded in known properties of ML/NLP; the pilot summary does not report an audit or measured bias outcomes.

high negative AI-Driven Skill Mapping and Gig Economy Matching Algorithm f... bias in match outcomes / differential access by demographic group

There are limited randomized controlled trials or longitudinal evaluations; few studies measure patient-relevant outcomes or economic impacts.

Literature synthesis noting scarcity of RCTs and long-term observational studies, and absence of widespread patient-outcome and cost-effectiveness evaluations in existing publications.

high negative Human-AI interaction and collaboration in radiology: from co... number of RCTs/longitudinal studies, frequency of patient outcome and economic o...

Many published studies focus on standalone algorithm accuracy rather than clinician–AI joint performance in routine workflows.

Review of the literature categorizing study designs (preponderance of algorithm development/validation studies, fewer reader-in-the-loop, simulation, or deployment studies).

high negative Human-AI interaction and collaboration in radiology: from co... proportion of studies reporting standalone algorithm metrics versus those report...

Regulators and payers remain central bottlenecks—AI can accelerate discovery but cannot bypass clinical evidence requirements.

Policy discussion and regulatory analysis in the paper noting that approvals require clinical evidence independent of discovery modality.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... regulatory and payer requirements as constraints on the impact of AI-driven disc...

Downstream clinical development costs and translational failure rates remain the major drivers of total R&D expenditure; early-stage AI savings may not translate into proportionate increases in approved drugs.

Economic analysis and discussion in the paper referencing known cost distributions in drug development and historical attrition rates in clinical phases.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... contribution of clinical development costs and failure rates to total R&D expend...

Inherent biological complexity and translational gaps between in silico predictions, preclinical models, and human biology constrain downstream success rates.

Review of translational failures and literature cited in the paper demonstrating mismatch between preclinical signals and clinical outcomes; conceptual analysis of biological complexity.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... translational success rate from preclinical predictions to clinical efficacy

Gaps exist between computational designs and chemical/experimental feasibility (e.g., synthetic accessibility and assay readiness), limiting the usefulness of some generative outputs.

Case studies and critiques in the paper showing generated molecules that are synthetically infeasible or incompatible with experimental constraints; discussion of missing integration of practical constraints in many generative models.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... fraction of computationally designed molecules that are synthetically accessible...

Many models have limited interpretability and insufficient uncertainty quantification, hampering trust and decision-making.

Methodological analysis in the paper noting common deep-learning approaches lacking clear interpretability and uncertainty estimates; references to literature on model explainability and calibration gaps.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... degree of model interpretability and presence/quality of uncertainty quantificat...

Poor data quality, fragmentation, and limited accessibility reduce model reliability and generalizability.

Survey of data characteristics and limitations presented in the paper; examples of biased or sparse datasets and the paper's discussion of impacts on model performance and transferability.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... model reliability/generalizability as a function of data quality, coverage, and ...

AI remains an augmenting technology rather than a standalone solution: no AI-only originated drug has yet achieved regulatory approval.

Review of drug-approval records and company disclosures summarized in the paper; explicit statement that to date no entirely AI-originated molecule has received full regulatory approval.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... regulatory approval status of AI-originated drug candidates (number of approvals...

Ethical and legal issues—patient privacy, algorithmic bias, intellectual property, and equitable access—pose risks to AI deployment in drug development.

Ethics and legal analyses, policy reports, and documented case examples collated in the review that identify these recurring concerns.

high negative From Algorithm to Medicine: AI in the Discovery and Developm... ethical/legal risk incidence; privacy breaches; bias outcomes; access inequities

Regulatory uncertainty about validation standards and liability for AI tools raises investment risk and may slow deployment.

Regulatory and policy reports included in the narrative review describing evolving standards and open questions about validation, explainability, and liability for ML-based tools.

high negative From Algorithm to Medicine: AI in the Discovery and Developm... regulatory clarity; investment risk and deployment timelines

Adoption of AI in drug R&D requires high upfront investment in data curation, compute infrastructure, and specialized talent.

Industry reports and economic analyses summarized in the review reporting capital and operational needs for building AI capabilities; qualitative synthesis rather than quantitative costing across firms.

high negative From Algorithm to Medicine: AI in the Discovery and Developm... fixed upfront costs (data curation, compute, hiring/training)

Limited transparency and interpretability of many AI algorithms (black-box models) complicate clinical and regulatory trust and adoption.

Regulatory reports, methodological critiques, and case examples in the review highlighting interpretability concerns and their impact on clinical/regulatory acceptance.

high negative From Algorithm to Medicine: AI in the Discovery and Developm... clinical/regulatory acceptance, trust, and adoption rates; explainability metric...

Performance of AI models in drug R&D depends on large, high-quality, and representative biomedical datasets; dataset bias or gaps substantially undermine model performance and generalizability.

Methodological literature and case studies cited in the review documenting failures or limited generalization when training data are biased, sparse, or non-representative; thematic synthesis rather than pooled quantification.

high negative From Algorithm to Medicine: AI in the Discovery and Developm... model performance/generalizability across populations and contexts

Predictions from AI depend on data quality and coverage and still require experimental (wet-lab) validation.

Discussion of early failures and limits in case studies and expert observations within the narrative review; methodological argument about dependence of ML models on input data.

high negative Learning from the successes and failures of early artificial... predictive validity of computational models / need for experimental validation

High-quality, standardized, interoperable data (clean, annotated, connected across modalities) is a critical limiting factor for translating AI capability into sustained impact.

Conceptual emphasis and domain knowledge argument in the editorial; no empirical measurement of data quality's causal effect included.

high negative AI as the Catalyst for a New Paradigm in Biomedical Research ability to translate AI capability into sustained impact (dependent on data qual...

The paper's evidence base is limited by early-stage projects with limited longitudinal outcome data and dependence on publicly available project information which may be incomplete or biased.

Methods and limitations explicitly stated in the paper (qualitative review; reliance on secondary sources; two case studies; absence of large-scale quantitative evaluation).

high negative Decentralized Autonomous Organizations in the Pharmaceutical... completeness and robustness of empirical evidence supporting claims about DAO ef...

Data protection and privacy (especially sensitive health data) complicate open-data DAO models.

Conceptual analysis referencing privacy/data-protection concerns for health data (e.g., GDPR-like regimes); no empirical evaluation of privacy breaches within DAOs provided.

high negative Decentralized Autonomous Organizations in the Pharmaceutical... data privacy risk level, feasibility of open-data sharing for clinical data

Significant barriers remain for DAOs in pharma: regulatory uncertainty about tokenized securities, IP fractionalization, and clinical data sharing.

Legal/regulatory analysis and literature synthesis highlighting unclear classifications and open regulatory questions; no new regulatory rulings provided.

high negative Decentralized Autonomous Organizations in the Pharmaceutical... regulatory clarity/status for tokenized securities and IP models; legal risk ind...

Pharmaceutical R&D faces rising costs, long approval timelines, supply-chain inefficiencies, and low patient involvement.

Literature review and synthesis of well-documented industry challenges cited in the paper (secondary sources); no new primary data presented in this study.

high negative Decentralized Autonomous Organizations in the Pharmaceutical... R&D cost per approved drug, average time-to-approval, supply-chain performance m...

The black-box nature of many deep learning models undermines scientific interpretability and experimental trust, limiting adoption in materials research.

Cited concerns and methodological papers advocating interpretable architectures and post hoc explanation methods reviewed in the paper; synthesis of community critique.

high negative Machine Learning-Driven R&D of Perovskites and Spinels: From... model interpretability and experimental adoption/trust

Insufficient attention to model reliability—particularly uncertainty miscalibration—reduces real-world utility because experimentalists need reliable confidence estimates, not only point predictions.

Survey of literature on uncertainty estimation and calibration (Bayesian NNs, ensembles, temperature scaling, conformal prediction) and papers reporting calibration issues; recommendations drawn from these sources.

high negative Machine Learning-Driven R&D of Perovskites and Spinels: From... calibration of predictive uncertainties (e.g., calibration error, coverage) and ...

Progress of DL-driven materials discovery is limited by scarcity of high-quality, diverse labeled datasets; small, noisy, or biased datasets limit model generalization.

Review and synthesis of empirical studies and methodological papers documenting dataset size/quality issues and their impact on model performance; no new dataset analysis in this paper.

high negative Machine Learning-Driven R&D of Perovskites and Spinels: From... model generalization / predictive performance on out-of-distribution materials o...

Advanced technologies' complexity and lack of explainability create risks for audit reliability and professional judgement.

Findings from literature synthesis and professional/regulatory perspectives included in the review; presented as an identified risk/challenge rather than quantified effect.

high negative Audit 5.0 and the Digital Transformation of Auditing: The Ro... audit reliability and the exercise of professional judgement in presence of opaq...

Audit 5.0 introduces key challenges: data quality and integration issues, complexity and explainability of advanced technologies, regulatory and ethical uncertainty, and skills shortages combined with cultural resistance.

Systematic literature review and synthesis of professional standards and regulatory perspectives; assertions based on reviewed literature rather than a single empirical dataset.

high negative Audit 5.0 and the Digital Transformation of Auditing: The Ro... barriers to adoption/readiness factors (data quality, explainability, regulatory...

At the question level, incorrect chatbot suggestions substantially reduce caseworker accuracy, with a two-thirds reduction on easy questions where the control group performed best.

Question-level analysis from the randomized experiment comparing cases where chatbot suggestions were incorrect versus control; paper reports a ~66% reduction in accuracy on easy questions when chatbot suggestions were incorrect (exact denominators and statistics not provided in the excerpt).

high negative LLMs in social services: How does chatbot accuracy affect hu... caseworker accuracy on easy questions when presented with incorrect chatbot sugg...

The article identifies and lays out several concerns regarding the government's approach to regulating AI.

Analytical critique presented in the paper (legal/policy analysis summarizing potential regulatory shortcomings). Based on the author's review and argumentation rather than primary empirical data.

high negative Regulation and governance of artificial intelligence in Indi... adequacy and risks of the government's AI regulatory approach

Gaps in infrastructure readiness, digital awareness, and inclusive policy frameworks hinder equitable AI adoption among micro‑enterprises.

Cross‑study synthesis of barriers identified across the 55 included articles; infrastructural, awareness, and policy barriers are explicitly reported as recurring themes.

high negative Role of AI in Enhancing Work Efficiency and Opportunities fo... barriers to AI adoption (infrastructure readiness, digital awareness, policy inc...

Japan's population is shrinking, the share of working-age people is falling, and the number of elderly is growing fast.

Statement grounded in official national statistics referenced by the paper (demographic time series used to initialize and calibrate the system dynamics model).

high negative Fiscal Dynamics in Japan under Demographic Pressure total population size; share (%) of working-age population; number and share (%)...

Significant challenges persist for AI-enhanced GS-BESS deployment, including limited data availability, poor model generalization, high computational requirements, scalability issues, and regulatory gaps.

Barriers and limitations identified across the literature as reported in this systematic review (PRISMA-based synthesis). The excerpt does not enumerate which studies reported each barrier or provide prevalence statistics.

high negative Grid-Scale Battery Energy Storage and AI-Driven Intelligent ... Barriers to effective AI application and large-scale GS-BESS deployment (data av...

The sample is limited to Chinese A-share-listed design enterprises (2014–2023), which may limit generalizability to small and medium-sized enterprises (SMEs) or firms in other countries/regions.

Study sample description: A-share-listed design-oriented enterprises in China between 2014 and 2023; authors explicitly note this as a limitation.

high negative AI-driven design management: enhancing organizational produc... External validity / generalizability of results

Using TFP as a proxy for project efficiency aggregates effects at the firm level and therefore lacks micro-level insight into specific project workflows or design iteration processes.

Methodological limitation acknowledged in the paper: TFP is used as a firm-level proxy and the dataset does not include micro-level project workflow or iteration logs.

high negative AI-driven design management: enhancing organizational produc... Granularity of project-efficiency measurement (limitation of TFP proxy)

AI adoption in Slovakia consistently remained below the EU27 average over the 2021–2024 period.

Gap analysis comparing Slovak enterprise AI adoption indicators to EU27 averages using harmonised Eurostat data for 2021–2024.

high negative Artificial Intelligence Adoption and Labour Productivity in ... AI adoption rate among enterprises (Slovakia vs EU27 average)

There exists a systemic governance vacuum around GenAI, including gaps in privacy, accountability, and intellectual property protections.

Authors' synthesis of governance-related gaps reported across the 28 secondary studies and research agendas in the review.

high negative The Landscape of Generative AI in Information Systems: A Syn... adequacy of governance mechanisms for privacy, accountability, and intellectual ...

Societal and ethical risks—such as bias, misuse, and skill erosion—constrain GenAI adoption.

Themes synthesized from the reviewed literature (28 papers) reporting societal and ethical concerns associated with GenAI deployment.

high negative The Landscape of Generative AI in Information Systems: A Syn... societal-ethical risk level associated with GenAI (bias incidence, misuse potent...

Technical unreliability—manifesting as hallucinations and performance drift—is a major constraint on GenAI adoption.

Recurring identification of technical reliability issues (hallucinations, performance drift) in the 28 reviewed papers and authors' aggregation of technical risks.

high negative The Landscape of Generative AI in Information Systems: A Syn... technical reliability of GenAI systems (frequency/severity of hallucinations and...

« Prev 1 2 3 … 28 29 30 … 176 177 Next »