Evidence (5126 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	369	105	58	432	972
Governance & Regulation	365	171	113	54	713
Research Productivity	229	95	33	294	655
Organizational Efficiency	354	82	58	34	531
Technology Adoption Rate	277	115	63	27	486
Firm Productivity	273	33	68	10	389
AI Safety & Ethics	112	177	43	24	358
Output Quality	228	61	23	25	337
Market Structure	105	118	81	14	323
Decision Quality	154	68	33	17	275
Employment Level	68	32	74	8	184
Fiscal & Macroeconomic	74	52	32	21	183
Skill Acquisition	85	31	38	9	163
Firm Revenue	96	30	22	—	148
Innovation Output	100	11	20	11	143
Consumer Welfare	66	29	35	7	137
Regulatory Compliance	51	61	13	3	128
Inequality Measures	24	66	31	4	125
Task Allocation	64	6	28	6	104
Error Rate	42	47	6	—	95
Training Effectiveness	55	12	10	16	93
Worker Satisfaction	42	32	11	6	91
Task Completion Time	71	5	3	1	80
Wages & Compensation	38	13	19	4	74
Team Performance	41	8	15	7	72
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	17	15	9	5	46
Job Displacement	5	28	12	—	45
Social Protection	18	8	6	1	33
Developer Productivity	25	1	2	1	29
Worker Turnover	10	12	—	3	25
Creative Output	15	5	3	1	24
Skill Obsolescence	3	18	2	—	23
Labor Share of Income	7	4	9	—	20

Adoption Remove filter

AI-driven impacts will be heterogeneous across education, race, gender, age, firm size, and geography, implying crucial equity concerns and the need for disaggregated reporting and targeted validation.

Policy analysis and literature synthesis in the paper; this claim reflects widely-documented labor economics findings about heterogeneous technological impacts though no new empirical breakdowns provided here.

high negative Enhancing BLS Methodologies for Projecting AI's Impact on Em... distribution of employment/wage/transition impacts across demographic and firm/r...

The study is limited by being a single-domain (CMM) case study with a likely modest sample size and dependence on specific AR hardware and MLLM capabilities; further validation across other machines and larger samples is needed.

Authors note these limitations in their discussion; the summary explicitly lists single-case domain, likely modest sample size, and dependency on particular hardware/MLLM as limitations.

high negative Augmented Reality-Based Training System Using Multimodal Lan... External validity/generalizability of findings (limitations stated)

Key failure modes for AI in drug R&D include overfitting, poor generalizability, dataset bias, insufficient external validation, and misalignment with evolving regulatory expectations.

Synthesis of literature and case reports in the narrative review describing observed failures and risks across projects (qualitative evidence).

high negative Artificial Intelligence in Drug Discovery and Development: R... failure incidence of AI projects (model performance collapse, regulatory rejecti...

Absent rigorous controls (validation, applicability-domain reporting, attention to dataset bias), AI models risk overfitting, producing inequitable outcomes and regulatory friction that can undermine economic benefits.

Theoretical arguments plus case reports and literature cited in the review documenting instances and mechanisms of overfitting, dataset bias, and regulatory challenges; narrative summary rather than systematic quantification.

high negative Artificial Intelligence in Drug Discovery and Development: R... model generalizability (out-of-sample performance), subgroup performance dispari...

Adaptive RL-driven campaigns complicate attribution and causal inference, so rigorous experimental designs (multi-armed trials, off-policy evaluation) are required for valid measurement.

Methodological claim in the implications section; supported by discussion of policy adaptivity and the need for specific evaluation techniques. No empirical demonstration provided.

high negative Personalized Content Selection in Marketing Using BERT and G... bias in causal estimates, validity of attribution, off-policy evaluation error

The system raises privacy, fairness, and safety risks including data leakage, demographic bias in generated content, manipulative targeting, and potential regulatory non-compliance.

Risk assessment and red-team / audit practices described; paper cites known classes of ML deployment risks and recommends logs/audits. This is a conceptual identification rather than a quantified empirical finding.

high negative Personalized Content Selection in Marketing Using BERT and G... incidence/risk of data leakage, demographic bias metrics, examples of manipulati...

Integration and engineering complexity (legacy systems, privacy/compliance pipelines, multi-channel platforms) is a persistent barrier to deployment.

Industry case studies and practitioner reports synthesized in the review documenting integration challenges; no systematic cost accounting or sample sizes presented.

high negative The Effectiveness of ChatGPT in Customer Service and Communi... integration complexity metrics, implementation time/cost, number of integration ...

Hallucinations and factual errors from generative AI can damage service quality and customer trust.

Documented failure cases and empirical reports from the literature aggregated by the review; no novel incident count or experimental data in this paper.

high negative The Effectiveness of ChatGPT in Customer Service and Communi... incidence of factual errors/hallucinations, measures of service quality and cust...

Factual errors and 'hallucinations' create misinformation risks and can produce costly service failures.

Model evaluation studies, incident case reports from deployments, and academic/industry analyses documenting hallucination rates and concrete failure examples.

high negative The Effectiveness of ChatGPT in Customer Service and Communi... factual accuracy / hallucination rate; incidents of service failure (operational...

The study population was restricted to CHI conference papers that had publicly shared study data and analysis code (a self-selected subset), which introduces a self-selection bias that may overestimate reproducibility rates for the broader set of CHI papers.

Authors' stated sampling strategy and limitations noted in the paper (sample restricted to artifact-sharing papers and potential overestimation of reproducibility).

high negative On the Computational Reproducibility of Human-Computer Inter... generalizability of the measured reproducibility rate (bias due to sampling)

Ethical, privacy, and legal restrictions sometimes limit the ability to share data and thereby hamper reproducibility.

Authors' observations from reproduction work and survey/interview responses indicating that some datasets could not be shared for legal/ethical reasons.

high negative On the Computational Reproducibility of Human-Computer Inter... incidence of data-sharing restrictions affecting reproducibility

High linguistic diversity in Africa makes building and evaluating multilingual language technologies more difficult and is a barrier to inclusive AI.

Synthesis of technical literature on NLP and multilingual model development and policy/NGO reports highlighting missing language resources; no original model evaluation reported.

high negative Towards Responsible Artificial Intelligence Adoption: Emergi... language technology availability, model performance across African languages, nu...

Structural constraints—limited digital infrastructure, scarce and skewed data, and high linguistic diversity—complicate AI development, deployment and evaluation in African contexts.

Desk review of infrastructure and data availability reports and scholarly literature demonstrating gaps and their effects; no new measurement in this paper.

high negative Towards Responsible Artificial Intelligence Adoption: Emergi... internet/digital infrastructure coverage, availability and representativeness of...

Privacy concerns, regulatory/compliance issues, biased or opaque models, and the need for change management and HR analytics capability building are significant risks constraining adoption.

Recurring risks and constraints reported by multiple included studies; summarized in the review's 'risks and constraints' theme.

high negative Data-Driven Strategies in Human Resource Management: The Rol... adoption constraints, incidence of privacy/regulatory/ bias issues

Implementation of data-driven HRM faces recurring challenges: data quality, privacy and ethics, algorithmic bias, and deficiencies in skills and organizational readiness.

Commonly reported implementation issues across the 47 reviewed studies; extracted as a central theme in the review's thematic analysis.

high negative Data-Driven Strategies in Human Resource Management: The Rol... implementation success/failure factors, incidence of data/ethical issues

Rapid skill obsolescence in AI necessitates frequent curriculum updates and responsive governance.

Identified as a risk: the paper notes AI skill change rates and recommends frequent updates and governance mechanisms. This aligns with general domain knowledge; the paper does not provide empirical measurement of obsolescence rates.

high negative Curriculum engineering: organisation, orientation, and manag... update frequency, lag between skill demand change and curriculum update

Aligning multiple standards is complex, posing a disadvantage and implementation risk.

Stated explicitly in Disadvantages/Risks: complexity of aligning multiple standards is listed. This is a reasoned observation in the paper rather than empirically demonstrated.

high negative Curriculum engineering: organisation, orientation, and manag... complexity measures (number of standards to reconcile, conflicts identified), ti...

Implementing this framework requires significant resources and continuous updating.

Stated explicitly under Main Finding and Disadvantages/Risks; paper lists cost/time metrics to track (cost-per-curriculum, time-to-update) and highlights resource intensity. Support is descriptive/analytic rather than empirical.

high negative Curriculum engineering: organisation, orientation, and manag... resource intensity (cost-per-curriculum), time-to-update, maintenance burden

Algorithmic bias, unequal digital financial literacy, caregiving time constraints, and limited access to personalized solutions can sustain or reproduce gender investment gaps if not addressed.

Synthesis of literature on barriers to financial inclusion and AI fairness concerns, plus platform report observations (review of empirical and conceptual studies; not a single empirical test).

high negative Women's Investment Behaviour and Technology: Exploring the I... gender investment gap, differential product offerings, access metrics

Women statistically exhibit greater risk aversion in some settings compared with men.

Summary of empirical survey and experimental studies on gender differences in risk attitudes discussed in the review (multiple cross‑sectional and lab/field experiments referenced).

high negative Women's Investment Behaviour and Technology: Exploring the I... measured risk aversion / willingness to take financial risk

The digital divide (lack of reliable electricity and connectivity) constrains adoption of MIS and AI, creating geographic and regional inequities in who benefits from the framework.

Infrastructure constraint argument presented in the paper; no quantified coverage maps or population-level access statistics included.

high negative Establishes a technical and academic bridge between the educ... coverage of system access, differential adoption rates by region, inequality in ...

AI-driven equivalency systems carry risks including algorithmic bias, opaque decisions without explainability, and potential reinforcement of inequities when training data under-represents some regions/institutions.

Risk assessment drawing on established AI ethics literature; no empirical bias audit from the proposed system is provided.

high negative Establishes a technical and academic bridge between the educ... measures of algorithmic bias (disparate impact), explainability scores, unequal ...

The major disadvantage of an MIS is dependency on reliable electricity and internet, creating systemic vulnerability due to the digital divide.

Paper notes infrastructure dependency as a constraint; assertion grounded in common infrastructural realities but no measured connectivity or outage statistics from DRC/SA are provided.

high negative Establishes a technical and academic bridge between the educ... geographic/regional access to equivalency services and system uptime availabilit...

Potential limitations include limited methodological detail on case selection and measurement, possible selection and reporting bias from practitioner-sourced examples, and variable generalizability to small firms or highly regulated industries.

Authors' self-reported limitations in the Methods/Limitations section (qualitative assessment).

high negative Governed Hyperautomation for CRM and ERP: A Reference Patter... methodological completeness and generalizability (qualitative limitation)

Prompt fraud exploits the natural-language interface of large language models (LLMs) to produce outputs that appear authoritative (reports, audit trails, explanations) without system intrusion, credential theft, or software exploitation.

Definition and threat-model description using conceptual examples and case vignettes; literature/regulatory review to position the threat relative to traditional fraud vectors.

high negative Prompt Engineering or Prompt Fraud? Governance Challenges fo... production of authoritative-appearing artifacts by LLMs without technical system...

Data privacy and cross-border compliance issues arise from using cloud and SECaaS, complicating legal compliance for firms.

Regulatory analyses and compliance reports; documented examples in case studies and industry guidance on cross-border data flows.

high negative Security- as- a- service: enhancing cloud security through m... compliance incident rates / regulatory risk exposure

The cloud shared responsibility model creates potential ambiguities in liability between providers and customers.

Regulatory guidance, legal analyses, and documented post-incident case studies showing confusion over responsibilities.

high negative Security- as- a- service: enhancing cloud security through m... clarity/ambiguity of security and liability responsibilities

China manages the openness–security trade-off through a centralized, developmentalist, techno‑sovereignty approach that privileges coordinated state direction and control.

Qualitative content analysis of national‑level policy texts: 18 Chinese policy documents coded across four analytical dimensions (coordination objectives, institutional actors, governance mechanisms, stakeholder legitimacy).

high negative Balancing openness and security in scientific data governanc... governance logic / institutional coordination type (centralized, state‑led)

Automation and LLM-driven orchestration add opacity; errors in instrument control or analysis could propagate quickly, raising liability, insurance, and reproducibility concerns.

Analytical discussion of risks and analogies to automated systems in other domains; no incident-level empirical data from microscopy given.

high negative ChatMicroscopy: A Perspective Review of Large Language Model... frequency and impact of errors, liability exposure, reproducibility failures

Ethical and governance issues related to LLM-driven microscopy include accountability, reproducibility, access inequities, data privacy, and concentration of capabilities in large providers.

Policy-oriented synthesis and analogies to governance challenges observed in other AI deployments; no new empirical measurement in microscopy contexts.

high negative ChatMicroscopy: A Perspective Review of Large Language Model... presence of governance risks: accountability gaps, reproducibility problems, une...

Integration of LLMs with microscopes faces challenges including safety and reliability of instrument control, verification of scientific outputs, data provenance, and alignment with experimental constraints.

Analytical discussion based on known reliability and safety issues in automated systems and AI tool use; no empirical incident data from microscopy provided.

high negative ChatMicroscopy: A Perspective Review of Large Language Model... risks to safety, reliability, and scientific validity when deploying LLM-driven ...

There is substantial uncertainty in economic forecasts due to possible scale-up failures, regulatory constraints, feedstock price volatility, and path‑dependent lock‑in effects.

Synthesis of technical failure modes, regulatory uncertainty, and sensitivity analyses reported in TEA/LCA literature and economic modeling sections of the review.

high negative Harnessing Microbial Factories: Biotechnology at the Edge of... forecast variance in cost trajectories, probability of commercial success, and s...

Regulatory and biosafety concerns (including environmental release risks and dual‑use issues) increase fixed costs and create entry barriers that shape industry structure and diffusion.

Policy and governance literature reviewed alongside technical case studies; citations of regulatory requirements, biosafety frameworks, and examples of compliance costs affecting project viability.

high negative Harnessing Microbial Factories: Biotechnology at the Edge of... regulatory compliance costs, time-to-market, number of approved facilities/proce...

Engineering and economic challenges—scale‑up hurdles, process robustness, feedstock cost, and downstream purification—limit industrial deployment of many bio-based processes.

Case study TEA/LCA summaries and process reports in the review highlighting scale-up failures or increased costs at larger scales, purification complexity for low‑concentration products, and sensitivity to feedstock prices.

high negative Harnessing Microbial Factories: Biotechnology at the Edge of... capital and operating costs, purification yield and cost, process robustness met...

Technical biological limitations—metabolic burden, pathway crosstalk, byproduct formation, and genetic instability—remain major constraints on strain performance and scalability.

Multiple experimental reports and method papers cited in the review documenting decreased growth/productivity due to engineered pathway burden, unintended interactions between pathways, accumulation of byproducts, and genetic mutations during production runs.

high negative Harnessing Microbial Factories: Biotechnology at the Edge of... strain growth rate, productivity (g/L/h), byproduct concentrations, genetic muta...

The described pipeline is cross-sectional as presented and should be extended to dynamic models (temporal embeddings, change-point detection) for trend or causal analyses.

Method description in summary indicates cross-sectional pipeline; recommendation to extend for temporal/dynamic modeling when analyzing trends or causal effects.

high negative Soft-Prompted Semantic Normalization for Unsupervised Analys... temporal modeling capabilities (ability to analyze trends/change over time)

LLMs and corpora may reflect disciplinary, geographic, or language biases; analyses should adjust or stratify accordingly.

Caveat explicitly stated in summary noting potential biases in LLMs and corpora; recommendation to adjust/stratify analyses.

high negative Soft-Prompted Semantic Normalization for Unsupervised Analys... presence and impact of disciplinary/geographic/language biases in topic maps and...

Cluster reliability should be validated (e.g., bootstrap, perturbations) and automatic labels complemented with expert human validation for critical analyses.

Caveat and recommended validation steps provided in summary; suggests bootstrap/perturbation and manual validation as best practices. No empirical stability metrics provided in summary.

high negative Soft-Prompted Semantic Normalization for Unsupervised Analys... cluster stability/reliability and accuracy of automatically generated labels

Results are sensitive to model and prompt choice; researchers should perform robustness checks across LLMs, soft prompts, and embedding models.

Caveat explicitly stated in the paper summary noting model and prompt sensitivity; recommended validation steps include robustness checks across models and prompts.

high negative Soft-Prompted Semantic Normalization for Unsupervised Analys... sensitivity of clustering/labeling results to LLM, prompt design, and embedding ...

Empirical validation is concentrated on the Agora-12 corpus; generalizability to other architectures, scales, or deployment contexts is unproven and identified as a limitation.

Authors' own limitations section and scope of empirical tests (analyses limited to Agora-12 and four clinical cases).

high negative Model Medicine: A Clinical Framework for Understanding, Diag... Scope of empirical validation (limited to Agora-12 dataset and 4 case studies)

Higher complaint volume is significantly associated with near-term stock price declines.

Fixed-effects panel path models estimated on monthly data for 261 financial firms (2018–2023) report statistically significant negative associations between firm–month complaint volume and subsequent abnormal returns.

high negative More than words: valuation of words for stock price by using... near-term abnormal stock returns

Consumer complaints—measured by monthly volume, topic composition, and VADER sentiment of complaint narratives—contain behavioral signals that predict short-term abnormal stock returns in U.S. financial firms.

CFPB complaint records matched to 261 publicly traded U.S. financial firms (monthly observations, 2018–2023); analyses use fixed-effects panel path models to link firm–month complaint features (volume, LDA topic prevalences, aggregated VADER sentiment) to firm-level abnormal returns; complementary machine-learning models evaluate out-of-sample predictive performance.

high negative More than words: valuation of words for stock price by using... short-term firm-level abnormal stock returns

Measurement issues (task-based output measurement, attributing output changes to AI) and selection into early adoption bias estimated productivity gains upward.

Methodological robustness checks reported in the paper: task-based measures, bounding exercises, placebo tests, and analysis of pre-trends; discussions of selection on unobservables and potential upward bias.

high negative S-TCO: A Sustainable Teacher Context Ontology for Educationa... validity/bias of estimated productivity effects

Implementing the governed hyperautomation pattern raises upfront costs (governance tooling, monitoring, validation, compliance processes).

Economic and cost-structure discussion in the paper, based on qualitative reasoning and industry experience; no quantified cost estimates or sample-based cost analysis provided.

high negative Governed Hyperautomation for CRM and ERP: A Reference Patter... upfront implementation costs (governance tooling, validation, compliance overhea...

Use of standardized (non-adaptive) dialogues limits ecological validity relative to live adaptive chatbots.

Limitations section acknowledges that standardized (non-adaptive) experimental dialogues reduce ecological validity compared with live/adaptive chatbot interactions.

high negative AI Chatbots as Informatics-Enabled Marketing Service Systems... ecological validity

Platform KPIs (e.g., eCPM) can diverge from social welfare metrics (consumer surplus, privacy harms), creating metric misalignment.

Conceptual critique with examples of common platform metrics versus welfare economics; not accompanied by a quantitative comparison dataset.

high negative Artificial Intelligence for Personalized Digital Advertising... alignment between platform KPIs and social welfare measures

Privacy constraints reduce observability and necessitate privacy-preserving study designs that complicate estimation.

Methodological analysis referencing differential privacy, federated learning and their effects on statistical power/observability; no experimental power analyses with sample sizes presented here.

high negative Artificial Intelligence for Personalized Digital Advertising... observability and estimation precision under privacy constraints

Data access asymmetries (platforms holding proprietary logs) limit external auditability and replication of advertising research.

Empirical and institutional observation about industry data practices; supported by calls for privacy-preserving shared datasets in the paper; no quantified survey sample included.

high negative Artificial Intelligence for Personalized Digital Advertising... external auditability and ability to replicate studies

Attribution complexity — multi-touch, cross-device, and delayed conversions — confounds causal inference in advertising measurement.

Methodological discussion referencing causal inference challenges and standard problems in attribution; widely-documented in the literature though not re-measured in this paper.

high negative Artificial Intelligence for Personalized Digital Advertising... accuracy of causal attribution for ad effects

Complex automated systems make attribution and responsibility harder when harms occur (Automation vs accountability trade-off).

Qualitative institutional analysis and case-study reasoning about multi-agent automated pipelines and opaque model decisions; no single empirical incident dataset provided.

high negative Artificial Intelligence for Personalized Digital Advertising... clarity of attribution and accountability in case of harms

« Prev 1 2 3 … 9 10 11 … 102 103 Next »