The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (2954 claims)

Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 369 105 58 432 972
Governance & Regulation 365 171 113 54 713
Research Productivity 229 95 33 294 655
Organizational Efficiency 354 82 58 34 531
Technology Adoption Rate 277 115 63 27 486
Firm Productivity 273 33 68 10 389
AI Safety & Ethics 112 177 43 24 358
Output Quality 228 61 23 25 337
Market Structure 105 118 81 14 323
Decision Quality 154 68 33 17 275
Employment Level 68 32 74 8 184
Fiscal & Macroeconomic 74 52 32 21 183
Skill Acquisition 85 31 38 9 163
Firm Revenue 96 30 22 148
Innovation Output 100 11 20 11 143
Consumer Welfare 66 29 35 7 137
Regulatory Compliance 51 61 13 3 128
Inequality Measures 24 66 31 4 125
Task Allocation 64 6 28 6 104
Error Rate 42 47 6 95
Training Effectiveness 55 12 10 16 93
Worker Satisfaction 42 32 11 6 91
Task Completion Time 71 5 3 1 80
Wages & Compensation 38 13 19 4 74
Team Performance 41 8 15 7 72
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 17 15 9 5 46
Job Displacement 5 28 12 45
Social Protection 18 8 6 1 33
Developer Productivity 25 1 2 1 29
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 7 4 9 20
Clear
Human Ai Collab Remove filter
Improving explainability can trade off with predictive performance, privacy, and robustness; these trade-offs must be managed rather than ignored.
Review aggregates technical literature and conceptual analyses documenting trade-offs reported by researchers (e.g., simpler interpretable models sometimes having lower predictive accuracy; disclosure risks to privacy; robustness concerns). No single causal estimate provided.
high negative Explainable AI in High-Stakes Domains: Improving Trust, Tran... predictive performance, privacy risk, model robustness
The evidence base presented is limited to a single SME pilot, so generalizability across sectors, firm sizes, and data regimes is untested and requires further research.
Explicit limitation noted in the paper and the fact that the pilot illustrated is a single case study (sample size = 1 SME pilot).
high negative ALGORITHM FOR IMPLEMENTING AI IN THE MANAGEMENT LOOP OF SMES... external validity / generalizability of results beyond the single pilot
Tasks that are routine, repetitive, or pattern‑based (e.g., boilerplate coding, refactoring, unit test generation, some accessibility fixes) will be increasingly automated by AI.
Task‑level decomposition and examples of current automation capabilities (code generation, test suggestion tools); conceptual projection rather than empirical measurement.
high negative How AI Will Transform the Daily Life of a Techie within 5 Ye... rate of automation for routine software development tasks (proportion of such ta...
Upfront costs for AI adoption are substantial: development, clinical validation, regulatory compliance, EHR integration, and ongoing monitoring.
Implementation and regulatory literature synthesized in the review documenting typical cost categories and reported expenditures for clinical AI projects.
high negative Will AI Replace Physicians in the Near Future? AI Adoption B... fixed and recurring implementation costs
Large language models (LLMs) suffer from hallucinations (fabricated facts), overconfidence, and unpredictable failure modes in open-ended tasks.
Technical papers and benchmarks on LLM factuality, calibration, and failure modes summarized in the review; empirical evaluations showing instances of fabricated outputs and calibration issues.
high negative Will AI Replace Physicians in the Near Future? AI Adoption B... factual accuracy of outputs; calibration (confidence vs accuracy); failure rate ...
Contemporary AI systems have no capacity for physical examination, sensorimotor procedures, or direct patient-contact diagnostics.
Technical limitations of CNNs and LLMs described in literature (lack of embodiment, no sensorimotor capabilities) and absence of credible empirical demonstrations of safe autonomous physical clinical procedures in reviewed studies.
high negative Will AI Replace Physicians in the Near Future? AI Adoption B... ability to perform physical exam / procedural tasks / direct patient-contact dia...
Current models exhibit poor out-of-distribution (OOD) generalization: performance degrades when inputs differ from training distributions.
Technical literature and robustness/domain-shift research reviewed in the paper documenting declines in model accuracy under domain shift and dataset changes.
high negative Will AI Replace Physicians in the Near Future? AI Adoption B... model accuracy/performance under domain shift / OOD inputs
Heterogeneity in study designs and contexts within the literature limits direct comparability and generalizability of findings.
Limitation noted in the paper based on the authors' assessment of diversity across the 103 reviewed studies (varying methods, contexts, metrics).
high negative Models, applications, and limitations of the responsible ado... comparability/generalizability of evidence across studies
Institutional inertia, fragmented governance structures, limited technical capacity, and weak data stewardship impede scale‑up of AI systems in the public sector.
Thematic synthesis of barriers reported across empirical studies and institutional reports within the systematic review (103 items).
high negative Models, applications, and limitations of the responsible ado... ability to scale AI systems / scale‑up rate
Low‑ and middle‑income contexts face persistent gaps—infrastructure, data ecosystems, and talent retention—that slow AI adoption in public governance.
Consistent findings across multiple studies in the 103‑item corpus reporting infrastructure deficits, weak data ecosystems, and brain drain/retention issues in LMIC settings.
high negative Models, applications, and limitations of the responsible ado... rate/extent of AI adoption in public governance in low- and middle‑income contex...
AI-generated code can introduce security vulnerabilities and raise licensing/intellectual-property concerns.
Case studies of security incidents, analyses of generated code provenance, and vulnerability-detection studies synthesized in the review.
high negative ChatGPT as a Tool for Programming Assistance and Code Develo... incidence of security vulnerabilities in generated code; instances of license or...
LLMs sometimes generate incorrect, nonsensical, or insecure code (hallucinations).
Multiple benchmarks, code-generation accuracy tests, and incident case studies documented in the empirical literature showing incorrect or fabricated outputs.
high negative ChatGPT as a Tool for Programming Assistance and Code Develo... code correctness/error rate; incidence of hallucinated outputs (false or fabrica...
Results reflect small-scale e-commerce use cases; external validity to larger firms, other sectors, or more complex tasks is not established.
Scope of deployments limited to small-scale e-commerce settings as stated in methods; no cross-sector or large-firm samples reported in summary.
high negative Artificial Intelligence Agents in Knowledge Work: Transformi... generalisability/external validity of observed productivity effects
The study's evidence is observational rather than randomized controlled trials, so causal estimates about productivity impacts are suggestive rather than definitive.
Declared study design: applied experimentation and observational analysis of deployments (no randomized assignment); methods section explicitly notes observational limitation.
high negative Artificial Intelligence Agents in Knowledge Work: Transformi... strength of causal inference (ability to attribute observed productivity changes...
High upfront costs, weak digital/physical infrastructure, limited access to credit, low digital literacy, insecure land tenure, and sociocultural factors (including gendered access) limit uptake of digital and precision technologies among smallholders.
Consistent findings across program evaluations, qualitative stakeholder interviews, participatory assessments, and case studies cited in the synthesis.
high negative MODERN APPROACHES TO SUSTAINABLE AGRICULTURAL TRANSFORMATION technology adoption rates (uptake), barriers to adoption
Integrating AI raises questions of accountability, transparency, fairness, privacy, and bias; managerial responsibility includes governance design, validation, and audit of AI decisions.
Normative and governance-focused synthesis citing ethical frameworks and illustrative cases; identifies governance tasks and validation/audit needs rather than empirical prevalence rates.
high negative Modern Management in the Age of Artificial Intelligence: Str... presence and quality of AI governance mechanisms (accountability frameworks, tra...
Deficits in governance, auditing, and interpretability constrain the safe deployment of generative AI in firms.
Synthesis of industry reports and conceptual literature noting gaps in governance and interpretability; no quantitative governance dataset reported.
high negative The Use of ChatGPT in Business Productivity and Workflow Opt... presence/absence of governance processes, frequency of audit findings, deploymen...
Algorithmic biases in generative AI can amplify and codify discriminatory patterns in organizational decisions.
Extensive literature on algorithmic bias synthesized in the review and applied to generative models; case examples referenced.
high negative The Use of ChatGPT in Business Productivity and Workflow Opt... disparities in decision outcomes (error rates, disparate impact metrics by group...
Generative AI use introduces significant organizational risks including data privacy breaches and leakage when models or third‑party services are used.
Conceptual analysis and references to documented incidents and industry reports within the review; no single aggregated incident dataset provided.
high negative The Use of ChatGPT in Business Productivity and Workflow Opt... incidence of data breaches/leakage, number of privacy violations
Generated code can introduce security vulnerabilities.
Security analyses and code audits documenting examples where LLM-generated code contains known vulnerability patterns; incident-oriented case studies and controlled experiments assessing vulnerability incidence.
high negative ChatGPT as a Tool for Programming Assistance and Code Develo... incidence of security vulnerabilities in AI-generated code
LLMs can produce plausible-looking but incorrect or insecure code (so-called 'hallucinations').
Benchmarks and controlled tests demonstrating incorrect outputs; security analyses and replicated examples showing erroneous or insecure snippets produced by LLMs across multiple models and prompts.
high negative ChatGPT as a Tool for Programming Assistance and Code Develo... code correctness/error rate and frequency of insecure code returned
Risks: dependence on LLM behavior means hallucinations, bias, or misaligned reasoning can propagate into simulated outcomes; Chain-of-Thought reasoning may be hard to fully verify, posing interpretability/auditability challenges.
Paper's cautions section listing potential failure modes and ethical/interpretability risks; these are identified risks rather than quantified failures observed in experiments.
high negative An LLM-Driven Multi-Agent Simulation Framework for Coupled E... propagation of LLM-induced errors/bias into simulation outcomes and interpretabi...
The study is limited by being a single-domain (CMM) case study with a likely modest sample size and dependence on specific AR hardware and MLLM capabilities; further validation across other machines and larger samples is needed.
Authors note these limitations in their discussion; the summary explicitly lists single-case domain, likely modest sample size, and dependency on particular hardware/MLLM as limitations.
high negative Augmented Reality-Based Training System Using Multimodal Lan... External validity/generalizability of findings (limitations stated)
Governing-logic stability uncertainty (whether decision logic or objectives remain stationary) is a distinct risk posed by agentic AI.
Conceptual argument and proposed taxonomy; no empirical tests reported.
high negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... stability of AI decision logic/objectives over time
Epistemic grounding uncertainty (uncertainty about how/why an AI produced a particular output) increases with agentic AI.
Literature synthesis on model-level opacity and causal explanation limits; conceptual reasoning in the paper.
high negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... ability to explain/ground AI outputs
Behavioral trajectory uncertainty (difficulty predicting long-run actions) is a primary form of uncertainty introduced by agentic AI.
Conceptual classification and argument; proposed as one of three principal uncertainties; no empirical estimation.
high negative Visioning Human-Agentic AI Teaming: Continuity, Tension, and... predictability of long-run agentic AI actions
Integration cost: AI-generated outputs often require human revision, testing, and manual integration into existing systems.
Reported practitioner experience and observed practices from the field study at Netlight; authors note time and effort spent on revision and integration; no quantitative time-cost estimates provided.
high negative Rethinking How IT Professionals Build IT Products with Artif... human time/effort required to adapt AI outputs for production
AI systems lack full project context, design rationale, and long-term constraints, creating context gaps for development tasks.
Interviews and workflow observations at Netlight where practitioners reported contextual limitations of AI tools; qualitative examples provided; single-firm qualitative evidence.
high negative Rethinking How IT Professionals Build IT Products with Artif... degree of project/contextual awareness in AI-produced recommendations
AI outputs commonly contain errors and hallucinations: generated code can be incorrect, incomplete, or misleading.
Practitioner reports and observed interactions with AI tools documented in the Netlight qualitative study; specific instances and practitioner concerns described in the paper; no quantitative error rates provided.
high negative Rethinking How IT Professionals Build IT Products with Artif... accuracy and correctness of AI-generated outputs
Integration and engineering complexity (legacy systems, privacy/compliance pipelines, multi-channel platforms) is a persistent barrier to deployment.
Industry case studies and practitioner reports synthesized in the review documenting integration challenges; no systematic cost accounting or sample sizes presented.
high negative The Effectiveness of ChatGPT in Customer Service and Communi... integration complexity metrics, implementation time/cost, number of integration ...
Hallucinations and factual errors from generative AI can damage service quality and customer trust.
Documented failure cases and empirical reports from the literature aggregated by the review; no novel incident count or experimental data in this paper.
high negative The Effectiveness of ChatGPT in Customer Service and Communi... incidence of factual errors/hallucinations, measures of service quality and cust...
Generative AI is susceptible to social and representational biases and to factual errors or hallucinations; it lacks tacit, contextual domain expertise.
Documented examples in the literature of biased outputs and hallucinations; controlled evaluations and audits of model outputs; qualitative reports highlighting lack of tacit knowledge in domain-specific tasks.
high negative ChatGPT as an Innovative Tool for Idea Generation and Proble... incidence of biased content; factual error/hallucination rate; performance on do...
The quality of AI-generated outputs is highly variable; models frequently produce mediocre but plausible-sounding content that requires human filtering.
Multiple user studies and qualitative reports documenting variability in output quality and the need for human curation; outcome measures include error rates, user-rated quality, and time spent vetting.
high negative ChatGPT as an Innovative Tool for Idea Generation and Proble... output quality distributions; user-perceived quality; time/effort for human filt...
Factual errors and 'hallucinations' create misinformation risks and can produce costly service failures.
Model evaluation studies, incident case reports from deployments, and academic/industry analyses documenting hallucination rates and concrete failure examples.
high negative The Effectiveness of ChatGPT in Customer Service and Communi... factual accuracy / hallucination rate; incidents of service failure (operational...
Resource, compute, privacy, and deployment costs associated with CRAEA were not fully quantified in the paper.
Authors note that resource, compute, privacy, and deployment costs were not fully quantified; no cost analyses or benchmarks provided in the summary.
high negative Context-Rich Adaptive Embodied Agents: Enhancing LLM-Powered... Quantification of resource/compute/privacy/deployment costs (absence of measurem...
Evaluation was performed in an artificial/simulated home environment; therefore real-world transfer, robustness to noisy perception, and hardware constraints remain open questions.
Authors explicitly state evaluations occurred in a simulated home environment and acknowledge limits on real-world transfer and robustness. This is a stated limitation rather than an experimental finding.
high negative Context-Rich Adaptive Embodied Agents: Enhancing LLM-Powered... Generalizability/real-world transfer (qualitative limitation)
High linguistic diversity in Africa makes building and evaluating multilingual language technologies more difficult and is a barrier to inclusive AI.
Synthesis of technical literature on NLP and multilingual model development and policy/NGO reports highlighting missing language resources; no original model evaluation reported.
high negative Towards Responsible Artificial Intelligence Adoption: Emergi... language technology availability, model performance across African languages, nu...
Structural constraints—limited digital infrastructure, scarce and skewed data, and high linguistic diversity—complicate AI development, deployment and evaluation in African contexts.
Desk review of infrastructure and data availability reports and scholarly literature demonstrating gaps and their effects; no new measurement in this paper.
high negative Towards Responsible Artificial Intelligence Adoption: Emergi... internet/digital infrastructure coverage, availability and representativeness of...
Privacy concerns, regulatory/compliance issues, biased or opaque models, and the need for change management and HR analytics capability building are significant risks constraining adoption.
Recurring risks and constraints reported by multiple included studies; summarized in the review's 'risks and constraints' theme.
high negative Data-Driven Strategies in Human Resource Management: The Rol... adoption constraints, incidence of privacy/regulatory/ bias issues
Implementation of data-driven HRM faces recurring challenges: data quality, privacy and ethics, algorithmic bias, and deficiencies in skills and organizational readiness.
Commonly reported implementation issues across the 47 reviewed studies; extracted as a central theme in the review's thematic analysis.
high negative Data-Driven Strategies in Human Resource Management: The Rol... implementation success/failure factors, incidence of data/ethical issues
Constraints and risks include model risk (overfitting, drift), algorithmic bias, privacy and data-sharing limits, legacy ERP complexity, interoperability challenges, and limited organizational readiness and skills.
Reviewed literature (empirical studies, technical evaluations, and standards) documenting technical and organizational failures, risk incidents, and common barriers to implementation.
high negative Integrating Artificial Intelligence and Enterprise Resource ... risk-related outcomes (e.g., model degradation rates, incidence of biased decisi...
Algorithmic bias, unequal digital financial literacy, caregiving time constraints, and limited access to personalized solutions can sustain or reproduce gender investment gaps if not addressed.
Synthesis of literature on barriers to financial inclusion and AI fairness concerns, plus platform report observations (review of empirical and conceptual studies; not a single empirical test).
high negative Women's Investment Behaviour and Technology: Exploring the I... gender investment gap, differential product offerings, access metrics
Women statistically exhibit greater risk aversion in some settings compared with men.
Summary of empirical survey and experimental studies on gender differences in risk attitudes discussed in the review (multiple cross‑sectional and lab/field experiments referenced).
high negative Women's Investment Behaviour and Technology: Exploring the I... measured risk aversion / willingness to take financial risk
Evaluation is carried out under three frozen context configurations (diff only: config_A; diff with file content: config_B; full context: config_C) enabling systematic ablation of context provision strategies.
Methodological description: three fixed context configurations defined and used for ablation experiments.
high neutral SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... effect of context-provision design on model performance
We performed an extensive evaluation of 37 state-of-the-art Vision-Language Models on MultihopSpatial.
Empirical evaluation described in the paper listing the number of models evaluated (37).
high neutral MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... benchmark coverage across models evaluated
Economic evaluations of GLAI should account for end-to-end risk externalities (error propagation, institutional trust, rights impacts), not only short-term productivity gains.
Methodological recommendation grounded in conceptual synthesis of technical, behavioral, and legal risks; normative argument rather than empirical result.
high neutral Why Avoid Generative Legal AI Systems? Hallucination, Overre... comprehensiveness of economic evaluations (inclusion of externalities vs. narrow...
Generative Legal AI (GLAI) systems are built on token-prediction (LLM) architectures rather than formal legal-reasoning architectures.
Conceptual and technical analysis in the paper distinguishing GLAI from other legal-tech; literature synthesis on common LLM architectures. No original empirical dataset or sample size—qualitative/technical review.
high neutral Why Avoid Generative Legal AI Systems? Hallucination, Overre... underlying model architecture type (token-prediction vs. formal-reasoning)
Through a thematic review of existing research, the authors identified recurring themes about incentive schemes: their components, how researchers manipulate them, and their impact on research outcomes.
Authors' stated method and findings: thematic review (the scope/number of reviewed papers not specified in excerpt).
high neutral Incentive-Tuning: Understanding and Designing Incentives for... themes in incentive design practices and reported impacts on empirical study out...
A critical aspect of conducting human–AI decision-making studies is the role of participants, often recruited through crowdsourcing platforms.
Claim based on the authors' thematic literature review noting participant sourcing practices (specific studies and counts not given in excerpt).
high neutral Incentive-Tuning: Understanding and Designing Incentives for... participant recruitment source (e.g., crowdsourcing) and its influence on study ...
Researchers conduct empirical studies investigating how humans use AI assistance for decision-making and how this collaboration impacts results.
Statement summarizing the research landscape; supported implicitly by the authors' thematic review of existing empirical studies (number of studies not specified in excerpt).
high neutral Incentive-Tuning: Understanding and Designing Incentives for... human behaviour and decision outcomes when assisted by AI (empirical study outco...