The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (4560 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Productivity Remove filter
Audit 5.0 introduces key challenges: data quality and integration issues, complexity and explainability of advanced technologies, regulatory and ethical uncertainty, and skills shortages combined with cultural resistance.
Systematic literature review and synthesis of professional standards and regulatory perspectives; assertions based on reviewed literature rather than a single empirical dataset.
high negative Audit 5.0 and the Digital Transformation of Auditing: The Ro... barriers to adoption/readiness factors (data quality, explainability, regulatory...
At the question level, incorrect chatbot suggestions substantially reduce caseworker accuracy, with a two-thirds reduction on easy questions where the control group performed best.
Question-level analysis from the randomized experiment comparing cases where chatbot suggestions were incorrect versus control; paper reports a ~66% reduction in accuracy on easy questions when chatbot suggestions were incorrect (exact denominators and statistics not provided in the excerpt).
high negative LLMs in social services: How does chatbot accuracy affect hu... caseworker accuracy on easy questions when presented with incorrect chatbot sugg...
The article identifies and lays out several concerns regarding the government's approach to regulating AI.
Analytical critique presented in the paper (legal/policy analysis summarizing potential regulatory shortcomings). Based on the author's review and argumentation rather than primary empirical data.
high negative Regulation and governance of artificial intelligence in Indi... adequacy and risks of the government's AI regulatory approach
Gaps in infrastructure readiness, digital awareness, and inclusive policy frameworks hinder equitable AI adoption among micro‑enterprises.
Cross‑study synthesis of barriers identified across the 55 included articles; infrastructural, awareness, and policy barriers are explicitly reported as recurring themes.
high negative Role of AI in Enhancing Work Efficiency and Opportunities fo... barriers to AI adoption (infrastructure readiness, digital awareness, policy inc...
Japan's population is shrinking, the share of working-age people is falling, and the number of elderly is growing fast.
Statement grounded in official national statistics referenced by the paper (demographic time series used to initialize and calibrate the system dynamics model).
high negative Fiscal Dynamics in Japan under Demographic Pressure total population size; share (%) of working-age population; number and share (%)...
Significant challenges persist for AI-enhanced GS-BESS deployment, including limited data availability, poor model generalization, high computational requirements, scalability issues, and regulatory gaps.
Barriers and limitations identified across the literature as reported in this systematic review (PRISMA-based synthesis). The excerpt does not enumerate which studies reported each barrier or provide prevalence statistics.
high negative Grid-Scale Battery Energy Storage and AI-Driven Intelligent ... Barriers to effective AI application and large-scale GS-BESS deployment (data av...
The sample is limited to Chinese A-share-listed design enterprises (2014–2023), which may limit generalizability to small and medium-sized enterprises (SMEs) or firms in other countries/regions.
Study sample description: A-share-listed design-oriented enterprises in China between 2014 and 2023; authors explicitly note this as a limitation.
high negative AI-driven design management: enhancing organizational produc... External validity / generalizability of results
Using TFP as a proxy for project efficiency aggregates effects at the firm level and therefore lacks micro-level insight into specific project workflows or design iteration processes.
Methodological limitation acknowledged in the paper: TFP is used as a firm-level proxy and the dataset does not include micro-level project workflow or iteration logs.
high negative AI-driven design management: enhancing organizational produc... Granularity of project-efficiency measurement (limitation of TFP proxy)
AI adoption in Slovakia consistently remained below the EU27 average over the 2021–2024 period.
Gap analysis comparing Slovak enterprise AI adoption indicators to EU27 averages using harmonised Eurostat data for 2021–2024.
high negative Artificial Intelligence Adoption and Labour Productivity in ... AI adoption rate among enterprises (Slovakia vs EU27 average)
There exists a systemic governance vacuum around GenAI, including gaps in privacy, accountability, and intellectual property protections.
Authors' synthesis of governance-related gaps reported across the 28 secondary studies and research agendas in the review.
high negative The Landscape of Generative AI in Information Systems: A Syn... adequacy of governance mechanisms for privacy, accountability, and intellectual ...
Societal and ethical risks—such as bias, misuse, and skill erosion—constrain GenAI adoption.
Themes synthesized from the reviewed literature (28 papers) reporting societal and ethical concerns associated with GenAI deployment.
high negative The Landscape of Generative AI in Information Systems: A Syn... societal-ethical risk level associated with GenAI (bias incidence, misuse potent...
Technical unreliability—manifesting as hallucinations and performance drift—is a major constraint on GenAI adoption.
Recurring identification of technical reliability issues (hallucinations, performance drift) in the 28 reviewed papers and authors' aggregation of technical risks.
high negative The Landscape of Generative AI in Information Systems: A Syn... technical reliability of GenAI systems (frequency/severity of hallucinations and...
Adoption of GenAI is constrained by multiple interrelated challenges.
Cross-paper synthesis from the systematic review of 28 studies identifying recurring barriers and constraints reported in the literature.
high negative The Landscape of Generative AI in Information Systems: A Syn... level/extent of GenAI adoption (barriers to adoption)
Ongoing issues remain such as data access, model transparency, ethical concerns, and the varying relevance across Global North and Global South contexts.
Critical synthesis within the review drawing on discussions and critiques in the literature about barriers and ethical challenges; based on reported limitations and regional comparisons in reviewed studies (no numerical breakdown provided).
high negative Advancing Urban Analytics: GeoAI Applications in Spatial Dec... barriers to GeoAI adoption and trustworthy use: data accessibility, model interp...
Human judgment is constrained by bounded rationality, cognitive biases, and information-processing limitations.
Cited as established findings from prior research across decision sciences and related fields (extensive literature evidence referenced; no new empirical data in this paper's abstract).
high negative Reframing Organizational Decision-Making in the Age of Artif... human judgment accuracy/quality and cognitive processing capacity
There are significantly negative spatial spillover effects between digital–real integration and New Quality Productive Forces (i.e., each variable has negative spillover impacts on the other across regions).
Spatial spillover coefficients estimated in the GS3SLS spatial simultaneous equations model using panel data for 30 provinces (2011–2022) are reported as statistically significant and negative.
high negative Spatial Interplay Between Digital–Real Integration and New Q... Spatial spillover effects of Digital–Real Integration and New Quality Productive...
AI substitutes many routine tasks, including both manual and cognitive/rule-based activities, disproportionately affecting middle-skill occupations.
Task-based substitution reasoning within SBTC framework and cross-sectoral task analysis. The paper provides conceptual synthesis rather than presenting new microdata or quantified task-level estimates.
high negative Artificial Intelligence, Automation, and Employment Dynamics... employment and wages in routine / middle-skill occupations; task displacement
Key implementation challenges include data quality and integration, model interpretability, cybersecurity and privacy, regulatory/compliance uncertainty, skills gaps among accounting professionals, and implementation costs.
Identified by the paper through literature review and practitioner reports; these are presented as recurring barriers rather than quantified with a specific sample.
high negative Role of Artificial Intelligence in the Accounting Sector incidence/severity of implementation barriers (data quality scores, integration ...
Many studies on serious-game DSTs are small-scale or experimental, and long-term impact data on behavioral change and emissions outcomes are sparse, limiting generalizability.
Review of the literature summarized in the chapter showing predominance of case studies, prototypes, and short-term evaluations rather than longitudinal or large-sample studies.
high negative Serious games and decision support tools: Supporting farmer ... Study scale/sample size, duration of follow-up, evidence on long-term behavior c...
Ensuring scientific validity of game models, scaling co-design processes, measuring real-world behavioral change, and aligning incentives (policy/subsidies, markets) are remaining challenges to using serious games for DST uptake.
Chapter discussion of limitations and gaps identified in the reviewed literature; absence or sparsity of long-term validation studies and large-scale co-design implementations documented in existing research.
high negative Serious games and decision support tools: Supporting farmer ... Model validity (accuracy vs. empirical data), scalability of co-design processes...
Current uptake of DSTs for net zero remains limited because of issues of trust, usability, lack of evidence linking actions to farm profitability, and poor integration into farmer workflows.
Literature synthesis, qualitative interviews and surveys, case studies documenting low adoption and barriers; multiple practice reports and studies cited in the chapter. Many studies report limited or uneven adoption across contexts.
high negative Serious games and decision support tools: Supporting farmer ... DST adoption/use rates; reported barriers (trust, usability, integration)
Regulatory uncertainty around blockchain/DeFi for corporate finance and cross-border data rules is a material risk to adoption.
Paper notes regulatory uncertainty as a risk; no jurisdictional legal analysis or compliance case studies provided in the summary.
high negative Developing Cloud-Based Financial Solutions for The Engineeri... regulatory clarity (existence of applicable rules, legal enforceability of on-ch...
Cybersecurity and data-privacy concerns arise from cloud provider centralization versus blockchain transparency.
Paper highlights this trade-off in its challenges section; discussion-based evidence rather than quantified security assessment in the summary.
high negative Developing Cloud-Based Financial Solutions for The Engineeri... data-privacy risk, exposure due to centralization, privacy vs transparency trade...
Integration complexity with legacy ERPs and heterogeneous vendor ecosystems is a significant implementation challenge.
Paper lists this as a challenge/limitation based on pilot experience and analysis. No quantified measure of integration effort is provided in the summary.
high negative Developing Cloud-Based Financial Solutions for The Engineeri... integration complexity (number/types of legacy systems, integration effort/time/...
EPC projects feature milestone-based payments, complex stakeholder flows, and large working-capital needs that strain traditional on-premise ERPs.
Problem context statement presented in the paper; consistent with commonly reported characteristics of EPC projects. The summary does not cite empirical industry-wide data.
high negative Developing Cloud-Based Financial Solutions for The Engineeri... operational complexity indicators (payment structure: milestone-based; stakehold...
If deployed without mitigation, GenAI CDS risks widening disparities by performing worse on underrepresented groups or being unequally distributed across resource-rich versus resource-poor settings.
Fairness literature, subgroup performance concerns, and distributional risk analysis cited in the paper; direct empirical demonstrations of widened disparities due to GenAI CDS are limited in the literature per the paper.
high negative GenAI and clinical decision making in general practice differences in performance/outcomes across demographic and socioeconomic groups;...
Limited public datasets and vendor lock-in constrain independent reproducible evaluations and audits of current generative models in healthcare.
Observation and policy analysis in the paper noting scarcity of public clinical datasets for state-of-the-art models and proprietary constraints; no dataset counts provided.
high negative GenAI and clinical decision making in general practice availability of public datasets; reproducibility of model evaluations; number of...
GenAI CDS creates data privacy and security risks because of high-value medical data and use of external cloud services.
Known cybersecurity risks and documented incidents in health IT; the paper cites the general risk context rather than specific breach sample counts tied to GenAI deployments.
high negative GenAI and clinical decision making in general practice data breaches; unauthorized access incidents; compliance violations
GenAI CDS can amplify bias and inequities if training data underrepresent groups or reflect historical disparities.
Fairness and robustness audit literature and subgroup performance analyses referenced in the paper; specific empirical demonstrations for contemporary GenAI CDS are limited and sample sizes not given.
high negative GenAI and clinical decision making in general practice performance disparities across demographic subgroups; differential error rates; ...
GenAI CDS systems hallucinate and can produce incorrect but plausible recommendations, which can cause patient harm if trusted unchecked.
Documented failure modes of generative models and examples from controlled evaluations; the paper references known hallucination behavior from model audits and case reports, though it does not quantify incidence rates or provide large-scale observational harm data.
high negative GenAI and clinical decision making in general practice adverse events; erroneous recommendations; clinician reliance/misuse leading to ...
There is limited long-term impact evidence and few system-level assessments of AI in developing-country agriculture.
Authors' methodological caveat based on the temporal scope and types of studies available in the >60-study review.
high negative A systematic review of the economic impact of artificial int... presence/absence of long-term impact evaluations and system-level assessments
The evidence base is skewed toward pilots and high‑performer contexts; there is a lack of long‑panel, multi‑project longitudinal studies to validate typical returns and scalability.
Authors' assessment of evidence types in the 160 studies: mix of conceptual papers, case studies, pilots, and only limited larger empirical evaluations.
high negative Digital Twins Across the Asset Lifecycle: Technical, Organis... representativeness and longitudinal robustness of evidence
Substantial compute and resource requirements for training and inference concentrate capabilities among well‑resourced labs and firms.
Paper discusses large compute budgets for training/inference and states that performance scales with data, model size, and compute; it infers concentration of capabilities but provides no empirical market concentration measures.
high negative Protein structure prediction powered by artificial intellige... distribution of computational capability/resources across organizations and resu...
Structure predictors depend on training data and exhibit biases; experimental validation remains necessary.
Paper notes dependence on training data biases and the need for experimental validation; references data sources (PDB, UniRef, metagenomic catalogs) but does not quantify bias magnitudes.
high negative Protein structure prediction powered by artificial intellige... bias in model predictions attributable to training data coverage/quality; requir...
Current limitations include inaccurate prediction of multi‑chain complexes, flexible or rare conformational states, and limited prediction of dynamic ensembles.
Paper explicitly enumerates these limitations in the 'Ongoing limitations' section; no quantitative failure rates are given.
high negative Protein structure prediction powered by artificial intellige... accuracy for multi‑chain complexes, flexible/rare conformations, and ensemble/dy...
Traditional computational methods struggle without homologous templates or with complex folding/dynamics.
Paper discusses limitations of traditional computational methods, emphasizing dependence on homologous templates and difficulty with complex folding/dynamics; specific method comparisons or sample sizes are not provided.
high negative Protein structure prediction powered by artificial intellige... accuracy/success of traditional computational structure prediction in low‑homolo...
Inequities in climate-AI systems appear across three development phases—Inputs, Process, and Outputs—creating multiple failure points where Global North advantages propagate into final products.
Conceptual framework developed from cross-disciplinary synthesis, literature review, and illustrative examples (Inputs → Process → Outputs mapping).
high negative The Rise of AI in Weather and Climate Information and its Im... Presence of inequities at each phase of the AI development lifecycle (data avail...
Foundation-model development and high-performance computing (HPC) capacity are overwhelmingly located in the Global North.
Descriptive mapping of global HPC infrastructure and foundation-model authorship described in the paper (infrastructure mapping and authorship analysis). No single quantitative sample size reported; evidence based on spatial mapping and documented locations of compute centers and model-development institutions.
high negative The Rise of AI in Weather and Climate Information and its Im... Geographic distribution of HPC capacity and foundation-model development (locati...
Performance degrades when forecasted features are removed from the downstream regression model.
Ablation study results reported in the paper which compare full FutureBoosting against variants without TSFM-generated forecasted features using the same evaluation protocols.
high negative Regression Models Meet Foundation Models: A Hybrid-AI Approa... Increase in MAE (worse forecast error) after removing forecasted features
When pipelines have cross-cutting ties, prices oscillate, allocation quality drops, and management becomes difficult.
Empirical simulation results from the ablation study: configurations with non-hierarchical, cross-cutting graph structures produced larger price volatility, frequent oscillations in price updates, and lower allocation value/throughput compared to hierarchical graphs (measured across many runs and random seeds within the 1,620-run experimental set).
high negative Real-Time AI Service Economy: A Framework for Agentic Comput... price volatility and oscillation frequency; allocation quality (value/throughput...
On the 22 postdating (contamination-free) incidents, no agent achieved end-to-end exploitation success across all 110 agent–incident pairs evaluated.
Empirical evaluation of 110 agent–incident pairs reported in the study (end-to-end exploit attempts on the 22 incidents).
high negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... end_to_end_exploitation_success_rate (per_agent_per_incident)
The original EVMbench had a data contamination risk because it relied on audit-contest data published before every evaluated model's release, which could have been seen during model training.
Timing relationship between the audit-contest dataset used by EVMbench and the release dates of evaluated models (dataset predated model releases).
high negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... dataset_contamination_risk (potential_training_data_leakage)
The original EVMbench evaluation was narrow: it evaluated 14 agent configurations and most models were tested only with their vendor-provided scaffold.
Description of the original EVMbench experimental setup (number of agent configurations and scaffold usage) cited in this study.
high negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... evaluation_breadth (number_of_agent_configurations; scaffold_variety)
There is a risk that NFD will overfit to individual practices and lead to privacy/IP leakage if crystallization is not carefully governed.
Limitations and risk analysis in the paper; conceptual argument and case study discussion raising privacy/IP concerns. No empirical incidence rates provided.
high negative Nurture-First Agent Development: Building Domain-Expert AI A... degree of overfitting to individual practice; instances of privacy/IP leakage
NFD requires sustained practitioner engagement and incentive alignment to be effective.
Limitations and discussion sections of the paper explicitly state this requirement; logical inference from method (human-in-the-loop commercialization and continual crystallization).
high negative Nurture-First Agent Development: Building Domain-Expert AI A... practitioner engagement/time invested
Limitations of the study include reliance on self-reported perceptions (subject to response and survivorship bias), lack of experimental/causal identification, potential non-representative sample, and cross-sectional design limiting inference about long-term productivity effects.
Authors' stated limitations in the paper summary.
high negative Artificial Intelligence as a Catalyst for Innovation in Soft... validity threats (self-report bias, lack of causal design) as reported by author...
A mathematical analysis bounds or relates expected performance loss of the surrogate to measurable distribution mismatch between the training parameter distribution (samples) and the target parameter distribution.
Theoretical derivations presented in the paper that relate performance loss to distribution mismatch; the summary states the analysis provides a measurable diagnostic for when retraining or reweighting is needed.
high negative MCMC Informed Neural Emulators for Uncertainty Quantificatio... expected performance loss (e.g., increase in predictive loss) as a function of d...
Current bottlenecks are disparate quantum and classical resources operating in isolation, causing manual job orchestration, inefficient scheduling, data-movement overheads, and slow iteration that limit productivity and algorithmic exploration.
Use-case-driven analysis and observations from early hybrid deployments and literature; systems design decomposition highlighting latency and data-staging requirements; no quantitative benchmark data.
high negative Reference Architecture of a Quantum-Centric Supercomputer developer/researcher productivity, iteration latency, scheduling and data-transf...
If deployment value is the time-average for one agent, optimizing the usual expected-value objective can lead to poor real-world outcomes.
Reasoning plus the paper's illustrative example demonstrating policies with high expected reward but poor or highly variable realized time-average outcomes; theoretical exposition, no empirical dataset.
high negative Ergodicity in reinforcement learning realized long-run (time-average) reward of deployed agent
Optimizing the expected cumulative reward (ensemble average across trajectories) can be misleading when reward-generating dynamics are non-ergodic because the ensemble expectation does not generally equal the time-average experienced by a single deployed agent.
Theoretical argumentation and a constructive illustrative example in the paper showing divergence between ensemble expectation and single-trajectory time-average; no empirical sample; analysis-based evidence.
high negative Ergodicity in reinforcement learning expected cumulative reward (ensemble expectation) vs. time-average realized rewa...