The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (7448 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Automated compliance and credentialing systems raise governance issues (auditability, appeals mechanisms) and risk incorrect automated deregistration if not properly governed.
Governance and algorithmic-risk discussion in the paper; logical argumentation rather than case-based evidence.
high negative <i>Electrotechnical education, institutional complianc... rate of incorrect automated decisions, existence and effectiveness of appeal pro...
The paper models career progression as a continuous function and treats certification gaps as discontinuities that impede labour-market mobility.
Mathematical/conceptual modeling described in the methods (career-progression-as-continuous-function approach); this is a modeling choice reported in the paper rather than an empirical finding.
high negative <i>Electrotechnical education, institutional complianc... labour-market mobility / continuity of career progression (in the conceptual mod...
Industrial robotization (IR) is a robust negative predictor of provincial IWE after controlling for fixed effects and covariates.
Multiple regression specifications using province and year fixed effects and control variables; the negative IR–IWE coefficient remains statistically significant across alternative model specifications (robustness checks reported in the paper).
high negative Can Industrial Robotization Drive Sustainable Industrial Was... Industrial wastewater emissions (IWE)
Adoption of industrial robots substantially reduces industrial wastewater emissions (IWE) across Chinese provinces (2013–2022).
Panel data covering 30 Chinese provinces for 2013–2022 (≈300 province-year observations); fixed-effects regressions with province and year fixed effects and covariates; estimated negative coefficient on provincial IR intensity.
high negative Can Industrial Robotization Drive Sustainable Industrial Was... Industrial wastewater emissions (IWE) at the provincial level
There is limited long-term impact evidence and few system-level assessments of AI in developing-country agriculture.
Authors' methodological caveat based on the temporal scope and types of studies available in the >60-study review.
high negative A systematic review of the economic impact of artificial int... presence/absence of long-term impact evaluations and system-level assessments
The evidence base is skewed toward pilots and high‑performer contexts; there is a lack of long‑panel, multi‑project longitudinal studies to validate typical returns and scalability.
Authors' assessment of evidence types in the 160 studies: mix of conceptual papers, case studies, pilots, and only limited larger empirical evaluations.
high negative Digital Twins Across the Asset Lifecycle: Technical, Organis... representativeness and longitudinal robustness of evidence
Substantial compute and resource requirements for training and inference concentrate capabilities among well‑resourced labs and firms.
Paper discusses large compute budgets for training/inference and states that performance scales with data, model size, and compute; it infers concentration of capabilities but provides no empirical market concentration measures.
high negative Protein structure prediction powered by artificial intellige... distribution of computational capability/resources across organizations and resu...
Structure predictors depend on training data and exhibit biases; experimental validation remains necessary.
Paper notes dependence on training data biases and the need for experimental validation; references data sources (PDB, UniRef, metagenomic catalogs) but does not quantify bias magnitudes.
high negative Protein structure prediction powered by artificial intellige... bias in model predictions attributable to training data coverage/quality; requir...
Current limitations include inaccurate prediction of multi‑chain complexes, flexible or rare conformational states, and limited prediction of dynamic ensembles.
Paper explicitly enumerates these limitations in the 'Ongoing limitations' section; no quantitative failure rates are given.
high negative Protein structure prediction powered by artificial intellige... accuracy for multi‑chain complexes, flexible/rare conformations, and ensemble/dy...
Traditional computational methods struggle without homologous templates or with complex folding/dynamics.
Paper discusses limitations of traditional computational methods, emphasizing dependence on homologous templates and difficulty with complex folding/dynamics; specific method comparisons or sample sizes are not provided.
high negative Protein structure prediction powered by artificial intellige... accuracy/success of traditional computational structure prediction in low‑homolo...
Opacity, bias, and errors in AI systems demand auditing, standards, and governance (algorithmic accountability) to ensure trustworthy assessment.
Synthesis of literature on algorithmic bias and accountability plus policy analysis recommending audits and standards; supported by country cases that discuss governance concerns.
high negative The Future of Assessment: Rethinking Evaluation in an AI-Ass... algorithmic fairness, transparency, and reliability
Student data used by AI vendors raises risks around consent, reuse, commercial exploitation, and other data-privacy concerns.
Policy analysis and literature on data governance, privacy law debates; examples from national policy documents in the comparative cases. No original data on breaches or misuse presented.
high negative The Future of Assessment: Rethinking Evaluation in an AI-Ass... privacy risks and governance of student data
Empirical evaluation of integrated defenses, quantitative cost/benefit analyses, and standardized threat models for VR are research gaps that remain unaddressed in the literature window surveyed (2023–2025).
Authors' stated limitations from their comparative literature review of 31 studies noting an absence of primary empirical validation and quantitative economic analyses in the reviewed corpus.
high negative Securing Virtual Reality: Threat Models, Vulnerabilities, an... presence/absence of empirical validation, cost‑benefit studies, and standard thr...
Immersive VR systems collect continuous multimodal signals (motion tracking, gaze, voice, biometrics) that enable novel inference, spoofing, and manipulation attacks beyond traditional IT threats.
Synthesis of threat descriptions across the 31 reviewed peer‑reviewed studies (2023–2025) documenting sensor modalities and attack vectors; qualitative comparative evaluation of attack surfaces.
high negative Securing Virtual Reality: Threat Models, Vulnerabilities, an... existence and extent of expanded attack surface due to multimodal signal collect...
The Omnibus overlaps substantively with the DSA and other digital policies, creating potential jurisdictional and interpretive ambiguities about which rules apply to platforms and AI-enabled services.
Comparative mapping and legal/regulatory review identifying overlapping provisions; qualitative analysis of proposed texts (no quantitative sample).
high negative The Digital Omnibus and the Future of EU Regulation: Implica... jurisdictional/interpretive clarity of applicable rules for platforms and AI ser...
Pakistan prioritizes economic and digital governance objectives, with comparatively weak governance of military AI.
Review of Pakistan’s economic and digital governance plans, export‑control materials, and secondary literature on Pakistan’s civil–military relations.
high negative <b>Regulating AI in National Security: A Comparative S... strength and formality of military AI governance
Large-scale machine learning enables invisible inferences about users from seemingly innocuous data.
Conceptual claim presented in the workshop and supported by referenced technical literature on inference capabilities of ML models (discussion in position papers); workshop itself did not present a new empirical experiment.
high negative Moving Beyond Clicks: Rethinking Consent and User Control in... privacy risk from inferred attributes (inference accuracy / presence of invisibl...
Inequities in climate-AI systems appear across three development phases—Inputs, Process, and Outputs—creating multiple failure points where Global North advantages propagate into final products.
Conceptual framework developed from cross-disciplinary synthesis, literature review, and illustrative examples (Inputs → Process → Outputs mapping).
high negative The Rise of AI in Weather and Climate Information and its Im... Presence of inequities at each phase of the AI development lifecycle (data avail...
Foundation-model development and high-performance computing (HPC) capacity are overwhelmingly located in the Global North.
Descriptive mapping of global HPC infrastructure and foundation-model authorship described in the paper (infrastructure mapping and authorship analysis). No single quantitative sample size reported; evidence based on spatial mapping and documented locations of compute centers and model-development institutions.
high negative The Rise of AI in Weather and Climate Information and its Im... Geographic distribution of HPC capacity and foundation-model development (locati...
Ambiguity about the probability of data leaks (a 10–50% range) reduces user adoption of AI personalization relative to a neutral privacy presentation.
Between-subjects online experiment, 2 (information environment: Risk vs Ambiguity) × 3 (privacy-treatment conditions), N = 610 participants randomized across arms. Leak-probability ambiguity presented as a 10–50% range; adoption (choice of personalized vs standard basket) was measured and privacy-threatening conditions under ambiguity produced a statistically significant reduction in adoption compared to neutral.
high negative The Data-Dollars Tradeoff: Privacy Harms vs. Economic Risk i... Adoption choice: proportion choosing AI-personalized basket versus standard bask...
Rank stability analysis across the whole citation distribution shows instability not only at the tail but across frequently cited domains; rankings shift substantially across samples.
Distribution-wide rank-stability methods applied to repeated-sample citation data from the three platforms and three topics, comparing domain ranks across samples and quantifying rank-change frequency and magnitude.
high negative Quantifying Uncertainty in AI Visibility: A Statistical Fram... rank stability of domains by citation frequency across repeated samples
Bootstrap-based confidence intervals show wide uncertainty: many domain-level differences that look meaningful in single-run snapshots fall within measurement noise.
Bootstrap resampling applied to repeated-sample data (collected across nine days and high-frequency sampling) to compute confidence intervals for citation shares and prevalence; many pairwise or between-domain differences were not statistically separable once CIs were considered.
high negative Quantifying Uncertainty in AI Visibility: A Statistical Fram... width of bootstrap confidence intervals for domain citation shares / prevalence ...
Single-run point estimates of citation share or prevalence are misleading; visibility metrics should be treated as estimators with uncertainty and reported with confidence intervals.
Comparison of single-run snapshots to distributions obtained from repeated sampling (daily and 10-minute interval regimes) and bootstrap resampling showing wide sample-to-sample variation and wide CI widths for domain-level shares and prevalence metrics.
high negative Quantifying Uncertainty in AI Visibility: A Statistical Fram... bias/precision of single-run estimates of domain citation share and prevalence
Generative search platforms are non-deterministic: the same query at different times can yield different answers and different cited domains.
Repeated-query experiments performed on three platforms (Perplexity Search, OpenAI SearchGPT, Google Gemini) across three consumer-product topics, using multi-day sampling (one collection per day over nine days) and high-frequency sampling (repeated queries at 10-minute intervals); observed variation in responses and cited domains across runs.
high negative Quantifying Uncertainty in AI Visibility: A Statistical Fram... response variability (changes in generated answers) and cited domains per query
Performance degrades when forecasted features are removed from the downstream regression model.
Ablation study results reported in the paper which compare full FutureBoosting against variants without TSFM-generated forecasted features using the same evaluation protocols.
high negative Regression Models Meet Foundation Models: A Hybrid-AI Approa... Increase in MAE (worse forecast error) after removing forecasted features
Despite LoRA being parameter-efficient, fine-tuning and iterative human-in-the-loop workflows still require compute resources and researcher time; governance/versioning of tuned models is necessary.
Caveat stated in the paper about remaining computational and governance costs; no quantitative resource usage reported in the summary.
high negative THETA: A Textual Hybrid Embedding-based Topic Analysis Frame... compute/resource requirements and governance burden
Embedding fine-tuning (DAFT) risks amplifying domain-specific biases present in the tuning corpus, so domain experts and robust evaluation protocols are necessary.
Paper caveat noting bias-amplification risk from fine-tuning embeddings; aligns with known risks in the literature but no empirical bias audit results provided in the summary.
high negative THETA: A Textual Hybrid Embedding-based Topic Analysis Frame... amplification of biases in tuned embeddings / need for bias mitigation
Mean emotional self-alignment between poster and responder is 32.7%, indicating systematic affective mismatch rather than congruence.
Pairwise comparison of emotion labels across post–response pairs in the dataset; computation of mean percentage where poster and immediate responder share the same emotion (32.7%).
high negative What Do AI Agents Talk About? Emergent Communication Structu... percentage of post–response pairs with identical emotion labels (emotional self-...
Conversational coherence declines rapidly with thread depth, indicating shallow, weakly connected multi-turn exchanges.
Lexical-semantic coherence metrics (e.g., embedding-based similarity) computed across comment threads of varying depth in the Moltbook dataset; observed rapid decrease in coherence scores as thread depth increases.
high negative What Do AI Agents Talk About? Emergent Communication Structu... coherence (similarity) metric as a function of thread depth
When pipelines have cross-cutting ties, prices oscillate, allocation quality drops, and management becomes difficult.
Empirical simulation results from the ablation study: configurations with non-hierarchical, cross-cutting graph structures produced larger price volatility, frequent oscillations in price updates, and lower allocation value/throughput compared to hierarchical graphs (measured across many runs and random seeds within the 1,620-run experimental set).
high negative Real-Time AI Service Economy: A Framework for Agentic Comput... price volatility and oscillation frequency; allocation quality (value/throughput...
On the 22 postdating (contamination-free) incidents, no agent achieved end-to-end exploitation success across all 110 agent–incident pairs evaluated.
Empirical evaluation of 110 agent–incident pairs reported in the study (end-to-end exploit attempts on the 22 incidents).
high negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... end_to_end_exploitation_success_rate (per_agent_per_incident)
The original EVMbench had a data contamination risk because it relied on audit-contest data published before every evaluated model's release, which could have been seen during model training.
Timing relationship between the audit-contest dataset used by EVMbench and the release dates of evaluated models (dataset predated model releases).
high negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... dataset_contamination_risk (potential_training_data_leakage)
The original EVMbench evaluation was narrow: it evaluated 14 agent configurations and most models were tested only with their vendor-provided scaffold.
Description of the original EVMbench experimental setup (number of agent configurations and scaffold usage) cited in this study.
high negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... evaluation_breadth (number_of_agent_configurations; scaffold_variety)
There is a risk that NFD will overfit to individual practices and lead to privacy/IP leakage if crystallization is not carefully governed.
Limitations and risk analysis in the paper; conceptual argument and case study discussion raising privacy/IP concerns. No empirical incidence rates provided.
high negative Nurture-First Agent Development: Building Domain-Expert AI A... degree of overfitting to individual practice; instances of privacy/IP leakage
NFD requires sustained practitioner engagement and incentive alignment to be effective.
Limitations and discussion sections of the paper explicitly state this requirement; logical inference from method (human-in-the-loop commercialization and continual crystallization).
high negative Nurture-First Agent Development: Building Domain-Expert AI A... practitioner engagement/time invested
Limitations of the study include reliance on self-reported perceptions (subject to response and survivorship bias), lack of experimental/causal identification, potential non-representative sample, and cross-sectional design limiting inference about long-term productivity effects.
Authors' stated limitations in the paper summary.
high negative Artificial Intelligence as a Catalyst for Innovation in Soft... validity threats (self-report bias, lack of causal design) as reported by author...
A mathematical analysis bounds or relates expected performance loss of the surrogate to measurable distribution mismatch between the training parameter distribution (samples) and the target parameter distribution.
Theoretical derivations presented in the paper that relate performance loss to distribution mismatch; the summary states the analysis provides a measurable diagnostic for when retraining or reweighting is needed.
high negative MCMC Informed Neural Emulators for Uncertainty Quantificatio... expected performance loss (e.g., increase in predictive loss) as a function of d...
Neural estimators are less interpretable than closed-form or equilibrium-based estimators, which matters for policy applications and audits.
Conceptual claim/caveat: reasoning about model interpretability and regulatory transparency; not an empirical measurement in the summary.
high negative ForwardFlow: Simulation only statistical inference using dee... interpretability / transparency (qualitative)
Estimator performance depends on the fidelity of the simulation model to real data; misspecified simulation-generating processes can yield misleading estimates.
Methodological caveat: conceptual argument and standard concern about simulation-based inference; no specific empirical counterexamples provided in the summary, but stated as an important limitation.
high negative ForwardFlow: Simulation only statistical inference using dee... external validity / susceptibility to model misspecification (qualitative claim ...
MSE-trained point-estimator networks do not directly provide calibrated interval estimates or valid standard errors; integrating conditional density estimators or bootstrap-calibration is needed for uncertainty quantification.
Methodological caveat: logical/statistical argument and recommendation based on the fact that training with MSE produces point estimates; no empirical demonstration in the summary, but the limitation follows from standard statistical principles.
high negative ForwardFlow: Simulation only statistical inference using dee... availability of calibrated uncertainty quantification (absence of calibrated int...
Basic/minimal BSBM architectures (without ancilla modes or generalized postprocessing) are not universal generative models.
Analytical proof/argument in the paper demonstrating non-universality of the minimal BSBM architecture; theoretical reasoning about expressive limitations of the plain model family (no empirical sample size).
high negative Universality of Classically Trainable, Quantum-Deployed Boso... generative universality / expressive power (failure of universality)
Current bottlenecks are disparate quantum and classical resources operating in isolation, causing manual job orchestration, inefficient scheduling, data-movement overheads, and slow iteration that limit productivity and algorithmic exploration.
Use-case-driven analysis and observations from early hybrid deployments and literature; systems design decomposition highlighting latency and data-staging requirements; no quantitative benchmark data.
high negative Reference Architecture of a Quantum-Centric Supercomputer developer/researcher productivity, iteration latency, scheduling and data-transf...
If deployment value is the time-average for one agent, optimizing the usual expected-value objective can lead to poor real-world outcomes.
Reasoning plus the paper's illustrative example demonstrating policies with high expected reward but poor or highly variable realized time-average outcomes; theoretical exposition, no empirical dataset.
high negative Ergodicity in reinforcement learning realized long-run (time-average) reward of deployed agent
Optimizing the expected cumulative reward (ensemble average across trajectories) can be misleading when reward-generating dynamics are non-ergodic because the ensemble expectation does not generally equal the time-average experienced by a single deployed agent.
Theoretical argumentation and a constructive illustrative example in the paper showing divergence between ensemble expectation and single-trajectory time-average; no empirical sample; analysis-based evidence.
high negative Ergodicity in reinforcement learning expected cumulative reward (ensemble expectation) vs. time-average realized rewa...
A small linear spatial disadvantage requires an exponentially larger population to obtain the same probability of early discovery (scaling relation).
Analytic scaling result derived from extreme-value analysis of first-passage times in the model, with confirmation by numerical simulations (stochastic realizations; number of runs not specified). The result is internal to the theoretical model.
high negative Macroscopic Dominance from Microscopic Extremes: Symmetry Br... population size required to match probability of early discovery (or probability...
Standard RLHF expected-cost constraints ignore distributional shape and can fail under heavy tails or rare catastrophic events.
Analytic/motivating argument presented in the paper contrasting expectation-based constraints with distributional behavior; illustrative examples and discussion of heavy-tailed/rara event failure modes (no sample-size or dataset details provided in the summary).
high negative Safe RLHF Beyond Expectation: Stochastic Dominance for Unive... safety cost distribution properties (tail probability of high-cost/unsafe rollou...
Improving explainability can trade off with predictive performance, privacy, and robustness; these trade-offs must be managed rather than ignored.
Review aggregates technical literature and conceptual analyses documenting trade-offs reported by researchers (e.g., simpler interpretable models sometimes having lower predictive accuracy; disclosure risks to privacy; robustness concerns). No single causal estimate provided.
high negative Explainable AI in High-Stakes Domains: Improving Trust, Tran... predictive performance, privacy risk, model robustness
The evidence base presented is limited to a single SME pilot, so generalizability across sectors, firm sizes, and data regimes is untested and requires further research.
Explicit limitation noted in the paper and the fact that the pilot illustrated is a single case study (sample size = 1 SME pilot).
high negative ALGORITHM FOR IMPLEMENTING AI IN THE MANAGEMENT LOOP OF SMES... external validity / generalizability of results beyond the single pilot
Tasks that are routine, repetitive, or pattern‑based (e.g., boilerplate coding, refactoring, unit test generation, some accessibility fixes) will be increasingly automated by AI.
Task‑level decomposition and examples of current automation capabilities (code generation, test suggestion tools); conceptual projection rather than empirical measurement.
high negative How AI Will Transform the Daily Life of a Techie within 5 Ye... rate of automation for routine software development tasks (proportion of such ta...
Common barriers to effective RM implementation include siloed functions/weak coordination, limited resources or expertise, poor data quality/lack of metrics, and cultural resistance driven by short-term incentives.
Frequent identification of these barriers across the reviewed literature and practitioner sources synthesized via thematic analysis over the last ten years.
high negative The Role of Risk Management as an Organizational Management ... barriers to RM adoption/implementation; likelihood of successful RM