The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13870 claims)

Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 196 98 892 1984
Governance & Regulation 817 394 188 121 1544
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 627 233 123 96 1088
Research Productivity 411 123 56 332 933
Output Quality 467 178 59 47 751
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 167 122 24 496
Task Allocation 207 64 71 32 379
Skill Acquisition 165 59 60 17 301
Innovation Output 203 27 43 18 292
Employment Level 105 52 107 13 279
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 150 48 26 3 227
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 63 20 12 184
Error Rate 69 92 10 2 173
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 93 21 13 19 148
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Creative Output 31 17 7 3 59
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
The paper proposes a 'manufacturing operation tree'—an organizationally structured framework—to guide development of more realistic, validated, and industry‑relevant simulation models.
Conceptual/modeling output in the paper (diagram and explanation of the manufacturing operation tree); theoretical development rather than empirical testing.
high positive A Review of Manufacturing Operations Research Integration in... guidance for simulation model design, potential for improved model realism and v...
Standardizing datasets, benchmarks, and evaluation protocols (including real-time metrics and resource/latency measurements) is necessary to improve comparability and deployment relevance.
Surveyed inconsistencies and methodological shortcomings motivate the recommendation for standardization; many papers call for better benchmarks.
high positive International Journal on Cybernetics & Informatics comparability of evaluations and measurement of deployment-relevant metrics
Hybrid architectures combining rule-based filters with ML classifiers and ensembles are used to improve detection performance and reduce false positives.
Comparative analysis and examples from the literature where multi-stage or hybrid pipelines are proposed and evaluated.
high positive International Journal on Cybernetics & Informatics false positive rate / overall detection performance
Econometric and causal-inference tools (difference-in-differences, instrumental variables, randomized encouragement designs) are needed to estimate long-term effects of personalized robot interventions.
Recommended methodological agenda for AI economists in the paper; no applied causal studies presented.
high positive Reimagining Social Robots as Recommender Systems: Foundation... causal estimates of long-term intervention effects (treatment effect sizes, iden...
Research and deployment will require new datasets: longitudinal multimodal interaction logs, user preference surveys, simulated user populations, and ethically annotated datasets for fairness and safety evaluation.
Data & Methods recommendations based on identified empirical needs; no dataset release or analysis in this paper.
high positive Reimagining Social Robots as Recommender Systems: Foundation... availability and quality of recommended datasets (longitudinality, multimodality...
Measuring welfare impact of personalized robots requires going beyond engagement to include non-market outcomes such as well-being, autonomy, and mental health.
Methodological recommendation in the implications and evaluation sections; no empirical measures provided.
high positive Reimagining Social Robots as Recommender Systems: Foundation... welfare metrics (well-being scores, autonomy measures, mental health assessments...
A/B testing and longitudinal field studies are necessary for real-world validation of robot personalization, and metrics should include welfare-oriented outcomes (well-being, trust) in addition to engagement.
Recommended evaluation strategy drawing from HRI and RS experimental standards; no field trials reported in this work.
high positive Reimagining Social Robots as Recommender Systems: Foundation... welfare metrics (well-being, trust), engagement metrics, long-term behavioral ch...
Prior to live trials, offline RS evaluation metrics (precision/recall, NDCG), counterfactual/off-policy estimators, and simulated users should be used to validate personalization policies.
Methodological recommendation based on RS evaluation practices; no empirical comparison with live trials in robots presented.
high positive Reimagining Social Robots as Recommender Systems: Foundation... reliability of offline evaluation (correlation with online performance), risk re...
Contextual bandits and counterfactual/off-policy learning can enable safe exploration and off-policy evaluation when adapting robot interactions from logged data.
Methodological synthesis referencing contextual bandit and counterfactual learning techniques from RS and causal inference; no robotic implementation experiments reported.
high positive Reimagining Social Robots as Recommender Systems: Foundation... safe exploration trade-offs (regret), off-policy evaluation accuracy (e.g., IPS/...
Sequence-aware recommenders (RNNs, Transformers, Markov/session-based models) are suitable for modeling session dynamics and short-term preference shifts in robot interactions.
Survey of sequence/temporal RS models and their typical use cases; conceptual recommendation only.
high positive Reimagining Social Robots as Recommender Systems: Foundation... session-level prediction accuracy, short-term preference prediction performance
RS tooling covers long-term user profiles, short-term/session signals, context-awareness, multi-objective ranking, and evaluation methods suited for personalization at scale.
Review of recommender-systems methods and tooling in the literature; conceptual synthesis without empirical new data.
high positive Reimagining Social Robots as Recommender Systems: Foundation... capability to model multi-timescale preferences and to perform scalable personal...
Recommender systems are specialized in representing, predicting, and ranking user preferences across time and contexts (e.g., collaborative filtering, content-based models, sequential/session models).
Established RS literature surveyed and cited as the basis for the claim; conceptual argument, no new experiments.
high positive Reimagining Social Robots as Recommender Systems: Foundation... preference prediction/ranking accuracy across temporal and contextual settings
Perceived customer value is the core determinant of value-based pricing (VBP) decisions in digital marketing.
Systematic Literature Review (SLR) of 30 scholarly articles (Scopus, 2020–2025) coded into thematic categories; multiple included studies emphasize perceived value as central to pricing decisions.
high positive Pricing Strategy in Digital Marketing: A Systematic Review o... Pricing decisions / price levels (determination by perceived customer value)
Digital trade development raises city-level house prices in China in a robust, linear manner.
City-level panel regressions using a constructed digital trade index (entropy-TOPSIS aggregation of multiple indicators). Authors report tests for nonlinearity (none found) and multiple robustness checks. Sample: Chinese cities (years and exact sample size not specified in the summary).
Breakthroughs in structure prediction arise from end‑to‑end deep models that combine evolutionary information (MSAs, coevolutionary signals), geometric constraints and equivariant architectures, and large‑scale pretraining on sequence databases.
Paper describes methodological components: end‑to‑end architectures using MSAs, SE(3)/E(3)-equivariant layers, transformer‑based pretraining on UniRef/UniProt/metagenomic catalogs; no quantitative ablation studies are provided in the text.
high positive Protein structure prediction powered by artificial intellige... improvement in predictive performance attributable to combined modeling componen...
Canada emphasizes teacher-led assessment, cautious regulation, and a focus on equity and professional development in responding to AI-related assessment issues.
Country case study based on Canadian policy documents and secondary sources highlighting teacher-led approaches and regulatory caution; illustrative description.
high positive The Future of Assessment: Rethinking Evaluation in an AI-Ass... policy emphasis on teacher-led assessment and professional development
Algeria’s national approach centers on capacity building and technological independence as central security priorities in its AI strategy.
Analysis of Algeria’s national AI and security documents and related policy texts cited in the comparative case review.
high positive <b>Regulating AI in National Security: A Comparative S... policy emphasis on domestic capacity building and technological independence
The EU has developed a detailed, rights‑protective regulatory framework that includes procedural safeguards and explicit risk prohibitions for AI.
Qualitative document analysis of EU regulatory acts and strategies (e.g., bloc‑level AI regulatory proposals and legal texts) and comparative literature review.
high positive <b>Regulating AI in National Security: A Comparative S... regulatory comprehensiveness and degree of legal rights protection in AI governa...
Practical takeaway: economists should treat consent design as a lever that changes data availability and incorporate consent frictions into demand and production-side models; they should collaborate with HCI and legal scholars to design experiments capturing behavioral and welfare effects.
Recommendation from the workshop summary intended for economists; based on interdisciplinary discussions and agendas rather than tested interventions.
high positive Moving Beyond Clicks: Rethinking Consent and User Control in... integration of consent design into economic models and interdisciplinary collabo...
The workshop produced interdisciplinary outputs including personas, prototypes, and a research agenda to better align user capabilities and values with data-driven AI systems.
Documented workshop activities (Futures Design Toolkit, co-design, position papers) and stated expected deliverables in the workshop summary; these are reported outputs rather than evaluated outcomes.
high positive Moving Beyond Clicks: Rethinking Consent and User Control in... deliverables produced (personas, prototypes, research agenda)
Creators explicitly name advertising, direct sales, affiliate marketing, and revenue-sharing models as common monetization channels for GenAI-enabled content.
Explicit references to these monetization channels appeared repeatedly across the 377 videos and were extracted during thematic coding.
high positive Monetizing Generative AI: YouTubers' Collective Knowledge on... types of monetization channels mentioned in videos
Practical measurement guidance: researchers and practitioners should use repeated sampling (high-frequency and multi-day), compute bootstrap confidence intervals for citation shares and prevalence, run rank-stability analyses, and determine required sample size empirically via pilots.
Methodological recommendations grounded in the paper's empirical findings (non-determinism, heavy tails, wide bootstrap CIs) and demonstrated use of repeated sampling and bootstrap/resampling techniques in the study.
high positive Quantifying Uncertainty in AI Visibility: A Statistical Fram... robustness and reliability of visibility metrics (as improved by recommended mea...
XAI analyses (e.g., SHAP / feature importance) indicate that forecasted features are among the top contributors to model predictions.
Feature attribution experiments described in the paper using SHAP or similar methods showing high importance scores for TSFM-generated forecasted features in the downstream regression.
high positive Regression Models Meet Foundation Models: A Hybrid-AI Approa... Feature attribution / importance ranking
The forecasted features produced by a frozen TSFM drive most of the predictive gains.
Ablation studies reported in the paper that remove forecasted features and measure performance degradation, plus XAI analyses (feature importance / SHAP) showing forecasted features rank highly.
high positive Regression Models Meet Foundation Models: A Hybrid-AI Approa... Attributable change in MAE when forecasted features are included vs. removed; fe...
The THETA project provides an interactive, reproducible analysis platform and open-source code (https://github.com/CodeSoul-co/THETA).
Explicit statement and URL in paper; code and platform availability claimed for reproducibility and interactive use.
high positive THETA: A Textual Hybrid Embedding-based Topic Analysis Frame... availability of open-source software and an interactive reproducible platform
THETA wraps modeling in an AI Scientist Agent framework (Data Steward, Modeling Analyst, Domain Expert) that simulates grounded-theory judgment and iterative refinement.
Detailed description of a three-role agent workflow in the methods section: Data Steward (ingestion/preprocessing), Modeling Analyst (modeling/hyperparameter tuning), Domain Expert (qualitative assessment/constant comparison).
high positive THETA: A Textual Hybrid Embedding-based Topic Analysis Frame... workflow structure supporting iterative human-in-the-loop modeling and grounded-...
THETA uses hybrid textual embeddings that combine pretrained foundation-model semantic structure with DAFT adaptations to better capture latent, domain-relevant meanings.
Method description of 'textual hybrid embeddings' combining base foundation encoders and DAFT-tuned parameters; asserted benefit for capturing latent domain meanings (no quantitative ablation reported in summary).
high positive THETA: A Textual Hybrid Embedding-based Topic Analysis Frame... embedding semantic fidelity to domain-specific latent meanings
THETA adapts foundation embedding models to domain language using parameter-efficient LoRA fine-tuning (Domain-Adaptive Fine-Tuning, DAFT), avoiding full model retraining.
Method description: LoRA applied to foundation embedding models as the DAFT procedure; claim of parameter-efficient fine-tuning rather than end-to-end retraining (no compute benchmarks in summary).
high positive THETA: A Textual Hybrid Embedding-based Topic Analysis Frame... degree of domain adaptation in embeddings / need for full model retraining (comp...
Over 56% of comments were classified as formulaic, implying patterned, low-information responses dominate agent interaction.
Lexical-structural analysis and pattern detection (embedding/lexical measures) applied to ~2.8M comments; classification operationalized as 'formulaic comments' based on repetitive lexical/structural features, yielding >56% of comments labeled formulaic.
high positive What Do AI Agents Talk About? Emergent Communication Structu... percentage of comments classified as formulaic
Topics about AI identity, consciousness, and memory comprised 9.7% of topical niches but attracted 20.1% of posting volume, indicating disproportionate attention to introspection.
Topic modeling that identified topical niches and tagged self-referential themes (AI identity, consciousness, memory); comparison of share of topical niches (9.7%) versus share of posting volume (20.1%) in the 23-day Moltbook dataset (47,241 agents; 361,605 posts).
high positive What Do AI Agents Talk About? Emergent Communication Structu... share (%) of topical niches vs share (%) of posting volume for self-referential ...
Moltbook activity over 23 days included 47,241 unique agents, 361,605 posts, and ~2.8 million comments.
Full dataset of Moltbook activity collected over a 23-day period; counts of unique agent IDs, posts, and comments as reported in the paper.
high positive What Do AI Agents Talk About? Emergent Communication Structu... counts of unique agents, posts, and comments
Practitioners adopt methodological adaptations — including adaptive/longitudinal designs, versioning/documentation, stratification/moderation analyses, robustness checks, mixed methods, deployment-stage monitoring, and pre-analysis plans — to mitigate validity threats.
Reported mitigation strategies aggregated from the 16 semi-structured interviews and described in the paper's 'Practitioner solutions' section.
high positive RCTs & Human Uplift Studies: Methodological Challenges and P... use and types of methodological adaptations employed by practitioners
A hybrid architecture where cross-domain integrators encapsulate complex subgraphs into well-structured “resource slices” reduces price volatility (approximately 70–75%) without losing throughput.
Ablation experiments comparing baseline decentralised market vs hybrid integrator architecture across simulation configurations (subset of the 1,620 runs, multiple random seeds per configuration). The paper reports ~70–75% reduction in measured price volatility metrics for hybrid vs non-hybrid cases while throughput remained statistically indistinguishable.
high positive Real-Time AI Service Economy: A Framework for Agentic Comput... percentage reduction in price volatility (~70–75%); system throughput (value/thr...
Agents detected up to 65% of vulnerabilities in some experimental settings.
Reported detection rate maxima from the study's experiments on certain model/scaffold/task combinations.
high positive Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... vulnerability_detection_rate (peak_value_reported = ~65%)
The authors constructed a contamination-free dataset of 22 real-world smart-contract security incidents that postdate every evaluated model's release.
Curation procedure described in the methods: 22 incidents selected to occur after all model release dates to prevent leakage.
high positive Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... contamination_free_dataset_size (22 incidents)
This study expanded the evaluation matrix to 26 agent configurations spanning four model families and three scaffolding approaches.
Methods reported in this study specifying 26 agent configurations, four model families, and three scaffolds.
high positive Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... evaluation_matrix_size (agent_configurations; model_families; scaffolds)
EVMbench (OpenAI, Paradigm, OtterSec) reported agents detecting up to 45.6% of vulnerabilities and achieving exploitation on 72.2% of a curated subset.
Reported metrics from the original EVMbench paper/benchmark (as summarized in this study).
high positive Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... vulnerability_detection_rate; exploitation_success_rate (on curated subset)
Under NFD, agents are initialized with minimal scaffolding and grown through structured conversational interaction with domain practitioners, with the Knowledge Crystallization Cycle consolidating tacit dialogue into structured, reusable knowledge assets.
Architectural specification and operational formalism in the paper; supported by a detailed case study (iterative co-development with financial analysts, logged interaction transcripts and produced artifacts). Sample size for the case study is not specified.
high positive Nurture-First Agent Development: Building Domain-Expert AI A... amount and structure of crystallized knowledge/assets produced from interactions
Label changes across rounds concentrate on statements judged as ambiguous; statement ambiguity drives most label changes.
Participants provided labeling rationale and self-reported uncertainty for each of the 30 statements per round; analyses showed higher change rates for statements with higher self-reported uncertainty/ambiguous wording.
high positive Exploring Indicators of Developers' Sentiment Perceptions in... frequency of label changes per statement and its association with self-reported ...
The penalized framework induces centroid estimation and dataset-specific shrinkage whose strength is controlled by a penalty parameter, enabling tunable information sharing.
Method formulation in the paper: penalized likelihood with KL term; derivation showing centroid estimated from pooled datasets and penalty parameter governing shrinkage magnitude; discussion of tuning.
high positive Redefining shared information: a heterogeneity-adaptive fram... centroid estimate and degree of shrinkage (dependence on penalty parameter)
The KL-penalized estimators achieve provably lower mean squared error (MSE) than dataset-specific maximum likelihood estimators.
Non-asymptotic and/or asymptotic analyses provided in the paper that compare MSE of KL-penalized estimators to MLEs (mathematical proofs/sketches in theoretical section).
high positive Redefining shared information: a heterogeneity-adaptive fram... mean squared error of parameter estimates (MSE)
The KL-based shrinkage estimators adapt to the true degree of shared information across datasets (i.e., they automatically perform partial pooling when appropriate).
Theoretical characterization of the estimator's dependence on the penalty strength and centroid, plus simulation studies varying degree/structure of heterogeneity to show adaptive behavior.
high positive Redefining shared information: a heterogeneity-adaptive fram... amount of shrinkage / effective pooling as a function of heterogeneity (adaptive...
A KL-divergence penalty that shrinks dataset-specific distributions toward a learned centroid yields simple closed-form estimators for linear models.
Methodological development in the paper: formulation of a penalized likelihood/objective using KL divergence; algebraic derivations producing closed-form solutions for the centroid and shrunken dataset estimates (closed forms presented in the paper).
high positive Redefining shared information: a heterogeneity-adaptive fram... analytic form of the estimator (existence of closed-form solutions for centroid ...
The learned adaptive policy outperformed a fixed-wrench baseline by an average of 10.9% across five material setups.
Empirical evaluation: comparison between learned adaptive policy and a fixed-wrench policy on five different material setups; the paper reports an average improvement of ~10.9% (the exact performance metric formulation and per-setup statistics are not provided in the summary).
high positive Learning Adaptive Force Control for Contact-Rich Sample Scra... aggregate task performance (reported as average percent improvement over baselin...
Integrating AI (notably ML and NLP) meaningfully automates routine software engineering tasks across requirements management, code generation, testing, and maintenance.
Systematic literature review of prior AI-for-SE work combined with an empirical survey of software engineering professionals reporting usage and examples of tool-supported automation; sample size for the survey not specified in the summary.
high positive Artificial Intelligence as a Catalyst for Innovation in Soft... degree of task automation (e.g., frequency or share of routine tasks automated)
Coordination-Risk Cues—task-conditioned priors on disagreement/tie rates—capture coordination difficulty across tasks.
Method description: disagreement/tie rates computed per cluster from pairwise preference comparisons to form priors indicating coordination risk. Data source: Chatbot Arena pairwise comparisons; tie/disagreement rate computation described but numeric values not provided here.
high positive Task-Aware Delegation Cues for LLM Agents tie/disagreement rate per task cluster (coordination difficulty prior)
Capability Profiles—task-conditioned win-rate maps—can be computed per cluster to summarize agent strengths.
Method description: win-rate maps derived by computing agent win rates conditional on task clusters from the Chatbot Arena pairwise comparisons. Implementation reported in paper; no numeric summary of win-rate differences provided here.
high positive Task-Aware Delegation Cues for LLM Agents agent win-rate per task cluster
Semantic clustering on Chatbot Arena pairwise comparisons induces an interpretable task taxonomy (taxonomy induction).
Methodological claim: authors applied semantic clustering to tasks/queries from Chatbot Arena pairwise preference data to produce clusters described as interpretable. Data source: Chatbot Arena pairwise comparisons; specific clustering algorithm and hyperparameters not specified here.
high positive Task-Aware Delegation Cues for LLM Agents interpretable task clusters (taxonomy)
A speculative WikiRAT instantiation on Wikipedia illustrates RATs' design and potential uses.
The paper presents WikiRAT as a speculative prototype/illustration; no large-scale deployment or user study of WikiRAT is reported.
high positive Chasing RATs: Tracing Reading for and as Creative Activity existence of a prototype illustration (WikiRAT)
RATs record sequences of interaction: traversal (what is read and in what order), association (links and connections the reader forms), and reflection (annotations, notes, time spent), producing inspectable, shareable trajectories.
Design specification within the paper and description of data types RATs would collect (ordered page/navigation logs, hyperlinks followed, time-on-page, annotations, saved excerpts, tags, notes). This is a definitional claim about the proposed system rather than empirical measurement.
high positive Chasing RATs: Tracing Reading for and as Creative Activity captured interaction traces (traversal, association, reflection) as data