The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6507 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 736 1615
Governance & Regulation 664 329 160 99 1273
Organizational Efficiency 624 143 105 70 949
Technology Adoption Rate 502 176 98 78 861
Research Productivity 348 109 48 322 836
Output Quality 391 120 44 40 595
Firm Productivity 385 46 85 17 539
Decision Quality 275 143 62 34 521
AI Safety & Ethics 183 241 59 30 517
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 105 40 6 187
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 78 8 1 151
Regulatory Compliance 69 64 14 3 150
Training Effectiveness 81 15 13 18 129
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Productivity Remove filter
Insufficient attention to model reliability—particularly uncertainty miscalibration—reduces real-world utility because experimentalists need reliable confidence estimates, not only point predictions.
Survey of literature on uncertainty estimation and calibration (Bayesian NNs, ensembles, temperature scaling, conformal prediction) and papers reporting calibration issues; recommendations drawn from these sources.
high negative Machine Learning-Driven R&D of Perovskites and Spinels: From... calibration of predictive uncertainties (e.g., calibration error, coverage) and ...
Progress of DL-driven materials discovery is limited by scarcity of high-quality, diverse labeled datasets; small, noisy, or biased datasets limit model generalization.
Review and synthesis of empirical studies and methodological papers documenting dataset size/quality issues and their impact on model performance; no new dataset analysis in this paper.
high negative Machine Learning-Driven R&D of Perovskites and Spinels: From... model generalization / predictive performance on out-of-distribution materials o...
Advanced technologies' complexity and lack of explainability create risks for audit reliability and professional judgement.
Findings from literature synthesis and professional/regulatory perspectives included in the review; presented as an identified risk/challenge rather than quantified effect.
high negative Audit 5.0 and the Digital Transformation of Auditing: The Ro... audit reliability and the exercise of professional judgement in presence of opaq...
Audit 5.0 introduces key challenges: data quality and integration issues, complexity and explainability of advanced technologies, regulatory and ethical uncertainty, and skills shortages combined with cultural resistance.
Systematic literature review and synthesis of professional standards and regulatory perspectives; assertions based on reviewed literature rather than a single empirical dataset.
high negative Audit 5.0 and the Digital Transformation of Auditing: The Ro... barriers to adoption/readiness factors (data quality, explainability, regulatory...
At the question level, incorrect chatbot suggestions substantially reduce caseworker accuracy, with a two-thirds reduction on easy questions where the control group performed best.
Question-level analysis from the randomized experiment comparing cases where chatbot suggestions were incorrect versus control; paper reports a ~66% reduction in accuracy on easy questions when chatbot suggestions were incorrect (exact denominators and statistics not provided in the excerpt).
high negative LLMs in social services: How does chatbot accuracy affect hu... caseworker accuracy on easy questions when presented with incorrect chatbot sugg...
The article identifies and lays out several concerns regarding the government's approach to regulating AI.
Analytical critique presented in the paper (legal/policy analysis summarizing potential regulatory shortcomings). Based on the author's review and argumentation rather than primary empirical data.
high negative Regulation and governance of artificial intelligence in Indi... adequacy and risks of the government's AI regulatory approach
Gaps in infrastructure readiness, digital awareness, and inclusive policy frameworks hinder equitable AI adoption among micro‑enterprises.
Cross‑study synthesis of barriers identified across the 55 included articles; infrastructural, awareness, and policy barriers are explicitly reported as recurring themes.
high negative Role of AI in Enhancing Work Efficiency and Opportunities fo... barriers to AI adoption (infrastructure readiness, digital awareness, policy inc...
Japan's population is shrinking, the share of working-age people is falling, and the number of elderly is growing fast.
Statement grounded in official national statistics referenced by the paper (demographic time series used to initialize and calibrate the system dynamics model).
high negative Fiscal Dynamics in Japan under Demographic Pressure total population size; share (%) of working-age population; number and share (%)...
Significant challenges persist for AI-enhanced GS-BESS deployment, including limited data availability, poor model generalization, high computational requirements, scalability issues, and regulatory gaps.
Barriers and limitations identified across the literature as reported in this systematic review (PRISMA-based synthesis). The excerpt does not enumerate which studies reported each barrier or provide prevalence statistics.
high negative Grid-Scale Battery Energy Storage and AI-Driven Intelligent ... Barriers to effective AI application and large-scale GS-BESS deployment (data av...
The sample is limited to Chinese A-share-listed design enterprises (2014–2023), which may limit generalizability to small and medium-sized enterprises (SMEs) or firms in other countries/regions.
Study sample description: A-share-listed design-oriented enterprises in China between 2014 and 2023; authors explicitly note this as a limitation.
high negative AI-driven design management: enhancing organizational produc... External validity / generalizability of results
Using TFP as a proxy for project efficiency aggregates effects at the firm level and therefore lacks micro-level insight into specific project workflows or design iteration processes.
Methodological limitation acknowledged in the paper: TFP is used as a firm-level proxy and the dataset does not include micro-level project workflow or iteration logs.
high negative AI-driven design management: enhancing organizational produc... Granularity of project-efficiency measurement (limitation of TFP proxy)
AI adoption in Slovakia consistently remained below the EU27 average over the 2021–2024 period.
Gap analysis comparing Slovak enterprise AI adoption indicators to EU27 averages using harmonised Eurostat data for 2021–2024.
high negative Artificial Intelligence Adoption and Labour Productivity in ... AI adoption rate among enterprises (Slovakia vs EU27 average)
There exists a systemic governance vacuum around GenAI, including gaps in privacy, accountability, and intellectual property protections.
Authors' synthesis of governance-related gaps reported across the 28 secondary studies and research agendas in the review.
high negative The Landscape of Generative AI in Information Systems: A Syn... adequacy of governance mechanisms for privacy, accountability, and intellectual ...
Societal and ethical risks—such as bias, misuse, and skill erosion—constrain GenAI adoption.
Themes synthesized from the reviewed literature (28 papers) reporting societal and ethical concerns associated with GenAI deployment.
high negative The Landscape of Generative AI in Information Systems: A Syn... societal-ethical risk level associated with GenAI (bias incidence, misuse potent...
Technical unreliability—manifesting as hallucinations and performance drift—is a major constraint on GenAI adoption.
Recurring identification of technical reliability issues (hallucinations, performance drift) in the 28 reviewed papers and authors' aggregation of technical risks.
high negative The Landscape of Generative AI in Information Systems: A Syn... technical reliability of GenAI systems (frequency/severity of hallucinations and...
Adoption of GenAI is constrained by multiple interrelated challenges.
Cross-paper synthesis from the systematic review of 28 studies identifying recurring barriers and constraints reported in the literature.
high negative The Landscape of Generative AI in Information Systems: A Syn... level/extent of GenAI adoption (barriers to adoption)
Ongoing issues remain such as data access, model transparency, ethical concerns, and the varying relevance across Global North and Global South contexts.
Critical synthesis within the review drawing on discussions and critiques in the literature about barriers and ethical challenges; based on reported limitations and regional comparisons in reviewed studies (no numerical breakdown provided).
high negative Advancing Urban Analytics: GeoAI Applications in Spatial Dec... barriers to GeoAI adoption and trustworthy use: data accessibility, model interp...
Human judgment is constrained by bounded rationality, cognitive biases, and information-processing limitations.
Cited as established findings from prior research across decision sciences and related fields (extensive literature evidence referenced; no new empirical data in this paper's abstract).
high negative Reframing Organizational Decision-Making in the Age of Artif... human judgment accuracy/quality and cognitive processing capacity
There are significantly negative spatial spillover effects between digital–real integration and New Quality Productive Forces (i.e., each variable has negative spillover impacts on the other across regions).
Spatial spillover coefficients estimated in the GS3SLS spatial simultaneous equations model using panel data for 30 provinces (2011–2022) are reported as statistically significant and negative.
high negative Spatial Interplay Between Digital–Real Integration and New Q... Spatial spillover effects of Digital–Real Integration and New Quality Productive...
AI substitutes many routine tasks, including both manual and cognitive/rule-based activities, disproportionately affecting middle-skill occupations.
Task-based substitution reasoning within SBTC framework and cross-sectoral task analysis. The paper provides conceptual synthesis rather than presenting new microdata or quantified task-level estimates.
high negative Artificial Intelligence, Automation, and Employment Dynamics... employment and wages in routine / middle-skill occupations; task displacement
Key implementation challenges include data quality and integration, model interpretability, cybersecurity and privacy, regulatory/compliance uncertainty, skills gaps among accounting professionals, and implementation costs.
Identified by the paper through literature review and practitioner reports; these are presented as recurring barriers rather than quantified with a specific sample.
high negative Role of Artificial Intelligence in the Accounting Sector incidence/severity of implementation barriers (data quality scores, integration ...
Many studies on serious-game DSTs are small-scale or experimental, and long-term impact data on behavioral change and emissions outcomes are sparse, limiting generalizability.
Review of the literature summarized in the chapter showing predominance of case studies, prototypes, and short-term evaluations rather than longitudinal or large-sample studies.
high negative Serious games and decision support tools: Supporting farmer ... Study scale/sample size, duration of follow-up, evidence on long-term behavior c...
Ensuring scientific validity of game models, scaling co-design processes, measuring real-world behavioral change, and aligning incentives (policy/subsidies, markets) are remaining challenges to using serious games for DST uptake.
Chapter discussion of limitations and gaps identified in the reviewed literature; absence or sparsity of long-term validation studies and large-scale co-design implementations documented in existing research.
high negative Serious games and decision support tools: Supporting farmer ... Model validity (accuracy vs. empirical data), scalability of co-design processes...
Current uptake of DSTs for net zero remains limited because of issues of trust, usability, lack of evidence linking actions to farm profitability, and poor integration into farmer workflows.
Literature synthesis, qualitative interviews and surveys, case studies documenting low adoption and barriers; multiple practice reports and studies cited in the chapter. Many studies report limited or uneven adoption across contexts.
high negative Serious games and decision support tools: Supporting farmer ... DST adoption/use rates; reported barriers (trust, usability, integration)
Regulatory uncertainty around blockchain/DeFi for corporate finance and cross-border data rules is a material risk to adoption.
Paper notes regulatory uncertainty as a risk; no jurisdictional legal analysis or compliance case studies provided in the summary.
high negative Developing Cloud-Based Financial Solutions for The Engineeri... regulatory clarity (existence of applicable rules, legal enforceability of on-ch...
Cybersecurity and data-privacy concerns arise from cloud provider centralization versus blockchain transparency.
Paper highlights this trade-off in its challenges section; discussion-based evidence rather than quantified security assessment in the summary.
high negative Developing Cloud-Based Financial Solutions for The Engineeri... data-privacy risk, exposure due to centralization, privacy vs transparency trade...
Integration complexity with legacy ERPs and heterogeneous vendor ecosystems is a significant implementation challenge.
Paper lists this as a challenge/limitation based on pilot experience and analysis. No quantified measure of integration effort is provided in the summary.
high negative Developing Cloud-Based Financial Solutions for The Engineeri... integration complexity (number/types of legacy systems, integration effort/time/...
EPC projects feature milestone-based payments, complex stakeholder flows, and large working-capital needs that strain traditional on-premise ERPs.
Problem context statement presented in the paper; consistent with commonly reported characteristics of EPC projects. The summary does not cite empirical industry-wide data.
high negative Developing Cloud-Based Financial Solutions for The Engineeri... operational complexity indicators (payment structure: milestone-based; stakehold...
If deployed without mitigation, GenAI CDS risks widening disparities by performing worse on underrepresented groups or being unequally distributed across resource-rich versus resource-poor settings.
Fairness literature, subgroup performance concerns, and distributional risk analysis cited in the paper; direct empirical demonstrations of widened disparities due to GenAI CDS are limited in the literature per the paper.
high negative GenAI and clinical decision making in general practice differences in performance/outcomes across demographic and socioeconomic groups;...
Limited public datasets and vendor lock-in constrain independent reproducible evaluations and audits of current generative models in healthcare.
Observation and policy analysis in the paper noting scarcity of public clinical datasets for state-of-the-art models and proprietary constraints; no dataset counts provided.
high negative GenAI and clinical decision making in general practice availability of public datasets; reproducibility of model evaluations; number of...
GenAI CDS creates data privacy and security risks because of high-value medical data and use of external cloud services.
Known cybersecurity risks and documented incidents in health IT; the paper cites the general risk context rather than specific breach sample counts tied to GenAI deployments.
high negative GenAI and clinical decision making in general practice data breaches; unauthorized access incidents; compliance violations
GenAI CDS can amplify bias and inequities if training data underrepresent groups or reflect historical disparities.
Fairness and robustness audit literature and subgroup performance analyses referenced in the paper; specific empirical demonstrations for contemporary GenAI CDS are limited and sample sizes not given.
high negative GenAI and clinical decision making in general practice performance disparities across demographic subgroups; differential error rates; ...
GenAI CDS systems hallucinate and can produce incorrect but plausible recommendations, which can cause patient harm if trusted unchecked.
Documented failure modes of generative models and examples from controlled evaluations; the paper references known hallucination behavior from model audits and case reports, though it does not quantify incidence rates or provide large-scale observational harm data.
high negative GenAI and clinical decision making in general practice adverse events; erroneous recommendations; clinician reliance/misuse leading to ...
There is limited long-term impact evidence and few system-level assessments of AI in developing-country agriculture.
Authors' methodological caveat based on the temporal scope and types of studies available in the >60-study review.
high negative A systematic review of the economic impact of artificial int... presence/absence of long-term impact evaluations and system-level assessments
The evidence base is skewed toward pilots and high‑performer contexts; there is a lack of long‑panel, multi‑project longitudinal studies to validate typical returns and scalability.
Authors' assessment of evidence types in the 160 studies: mix of conceptual papers, case studies, pilots, and only limited larger empirical evaluations.
high negative Digital Twins Across the Asset Lifecycle: Technical, Organis... representativeness and longitudinal robustness of evidence
Substantial compute and resource requirements for training and inference concentrate capabilities among well‑resourced labs and firms.
Paper discusses large compute budgets for training/inference and states that performance scales with data, model size, and compute; it infers concentration of capabilities but provides no empirical market concentration measures.
high negative Protein structure prediction powered by artificial intellige... distribution of computational capability/resources across organizations and resu...
Structure predictors depend on training data and exhibit biases; experimental validation remains necessary.
Paper notes dependence on training data biases and the need for experimental validation; references data sources (PDB, UniRef, metagenomic catalogs) but does not quantify bias magnitudes.
high negative Protein structure prediction powered by artificial intellige... bias in model predictions attributable to training data coverage/quality; requir...
Current limitations include inaccurate prediction of multi‑chain complexes, flexible or rare conformational states, and limited prediction of dynamic ensembles.
Paper explicitly enumerates these limitations in the 'Ongoing limitations' section; no quantitative failure rates are given.
high negative Protein structure prediction powered by artificial intellige... accuracy for multi‑chain complexes, flexible/rare conformations, and ensemble/dy...
Traditional computational methods struggle without homologous templates or with complex folding/dynamics.
Paper discusses limitations of traditional computational methods, emphasizing dependence on homologous templates and difficulty with complex folding/dynamics; specific method comparisons or sample sizes are not provided.
high negative Protein structure prediction powered by artificial intellige... accuracy/success of traditional computational structure prediction in low‑homolo...
Inequities in climate-AI systems appear across three development phases—Inputs, Process, and Outputs—creating multiple failure points where Global North advantages propagate into final products.
Conceptual framework developed from cross-disciplinary synthesis, literature review, and illustrative examples (Inputs → Process → Outputs mapping).
high negative The Rise of AI in Weather and Climate Information and its Im... Presence of inequities at each phase of the AI development lifecycle (data avail...
Foundation-model development and high-performance computing (HPC) capacity are overwhelmingly located in the Global North.
Descriptive mapping of global HPC infrastructure and foundation-model authorship described in the paper (infrastructure mapping and authorship analysis). No single quantitative sample size reported; evidence based on spatial mapping and documented locations of compute centers and model-development institutions.
high negative The Rise of AI in Weather and Climate Information and its Im... Geographic distribution of HPC capacity and foundation-model development (locati...
Performance degrades when forecasted features are removed from the downstream regression model.
Ablation study results reported in the paper which compare full FutureBoosting against variants without TSFM-generated forecasted features using the same evaluation protocols.
high negative Regression Models Meet Foundation Models: A Hybrid-AI Approa... Increase in MAE (worse forecast error) after removing forecasted features
When pipelines have cross-cutting ties, prices oscillate, allocation quality drops, and management becomes difficult.
Empirical simulation results from the ablation study: configurations with non-hierarchical, cross-cutting graph structures produced larger price volatility, frequent oscillations in price updates, and lower allocation value/throughput compared to hierarchical graphs (measured across many runs and random seeds within the 1,620-run experimental set).
high negative Real-Time AI Service Economy: A Framework for Agentic Comput... price volatility and oscillation frequency; allocation quality (value/throughput...
On the 22 postdating (contamination-free) incidents, no agent achieved end-to-end exploitation success across all 110 agent–incident pairs evaluated.
Empirical evaluation of 110 agent–incident pairs reported in the study (end-to-end exploit attempts on the 22 incidents).
high negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... end_to_end_exploitation_success_rate (per_agent_per_incident)
The original EVMbench had a data contamination risk because it relied on audit-contest data published before every evaluated model's release, which could have been seen during model training.
Timing relationship between the audit-contest dataset used by EVMbench and the release dates of evaluated models (dataset predated model releases).
high negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... dataset_contamination_risk (potential_training_data_leakage)
The original EVMbench evaluation was narrow: it evaluated 14 agent configurations and most models were tested only with their vendor-provided scaffold.
Description of the original EVMbench experimental setup (number of agent configurations and scaffold usage) cited in this study.
high negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... evaluation_breadth (number_of_agent_configurations; scaffold_variety)
There is a risk that NFD will overfit to individual practices and lead to privacy/IP leakage if crystallization is not carefully governed.
Limitations and risk analysis in the paper; conceptual argument and case study discussion raising privacy/IP concerns. No empirical incidence rates provided.
high negative Nurture-First Agent Development: Building Domain-Expert AI A... degree of overfitting to individual practice; instances of privacy/IP leakage
NFD requires sustained practitioner engagement and incentive alignment to be effective.
Limitations and discussion sections of the paper explicitly state this requirement; logical inference from method (human-in-the-loop commercialization and continual crystallization).
high negative Nurture-First Agent Development: Building Domain-Expert AI A... practitioner engagement/time invested
Limitations of the study include reliance on self-reported perceptions (subject to response and survivorship bias), lack of experimental/causal identification, potential non-representative sample, and cross-sectional design limiting inference about long-term productivity effects.
Authors' stated limitations in the paper summary.
high negative Artificial Intelligence as a Catalyst for Innovation in Soft... validity threats (self-report bias, lack of causal design) as reported by author...
A mathematical analysis bounds or relates expected performance loss of the surrogate to measurable distribution mismatch between the training parameter distribution (samples) and the target parameter distribution.
Theoretical derivations presented in the paper that relate performance loss to distribution mismatch; the summary states the analysis provides a measurable diagnostic for when retraining or reweighting is needed.
high negative MCMC Informed Neural Emulators for Uncertainty Quantificatio... expected performance loss (e.g., increase in predictive loss) as a function of d...