The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
Tri-jurisdictional firms had larger workforces (5,380 ± 1,245).
Reported descriptive statistics in the paper: 'Tri-jurisdictional firms had larger workforces (5,380 ± 1,245)'.
Our project website, including the leaderboard, dataset, and code, is available at https://dong7313.github.io/muse-benchmark/.
Statement in abstract and provided URL pointing to project artifacts.
high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... availability of project website, leaderboard, dataset, and code
Together, MUSE provides a realistic benchmark and evaluation framework for advancing Text-to-CAD from geometric generation toward true engineering design.
Paper's stated contribution and intended purpose (abstract) and provision of dataset/benchmark artifacts via project website.
high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... utility of benchmark and evaluation framework for advancing Text-to-CAD toward e...
To enable scalable evaluation, we use a rubric-based visual language model (VLM) judge and validate its reliability through human annotation.
Method and validation claim in abstract stating use of rubric-based VLM and validation against human annotations.
high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... reliability of rubric-based VLM judge (agreement with human annotation)
The final stage uses design-specific rubrics to assess functionality, manufacturability, and assemblability, moving beyond shape matching toward practical design quality.
Paper's description of the benchmark's evaluation rubric and intended assessment criteria (abstract).
high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... assessed functionality, manufacturability, and assemblability of generated CAD m...
MUSE pairs practical design instances with structured Design Specifications and evaluates generated models through a three-stage protocol: code check, geometric check, and design-intent alignment.
Methodological description in abstract indicating dataset pairing and three-stage evaluation protocol.
high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... evaluation pipeline effectiveness (code executability, geometric validity, desig...
We introduce MUSE, a Text-to-CAD benchmark focused on complex, editable boundary representation (B-Rep) assemblies.
Paper contribution / dataset creation described in abstract; supported by project website and accompanying dataset/code.
high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... availability of a Text-to-CAD benchmark for complex B-Rep assemblies
By Round 3, equity-aware LLM refinement reduces energy costs by 3.2%.
Empirical results reported in abstract: energy cost reduction of 3.2% after three rounds of LLM-mediated reward refinement (15 experimental runs).
By Round 3, equity-aware LLM refinement improves satisfaction for Elderly Females (+567%).
Empirical results reported in abstract following three rounds of LLM-based reward refinement; improvement magnitude given as +567%. 15 experimental runs.
high positive OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... satisfaction for Elderly Females
By Round 3, equity-aware LLM refinement improves satisfaction for Health Sensitive (+53.8%).
Empirical results reported in abstract following three rounds of LLM-based reward refinement; improvement magnitude given as +53.8%. 15 experimental runs.
high positive OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... satisfaction for Health Sensitive occupants
By Round 3, equity-aware LLM refinement improves satisfaction for Mid-aged Females (+28.2%).
Empirical results reported in abstract following three rounds of LLM-based reward refinement; improvement magnitude given as +28.2%. 15 experimental runs.
high positive OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... satisfaction for Mid-aged Females
By Round 3, equity-aware LLM refinement improves satisfaction for Young Males (+17.6%).
Empirical results reported in abstract following three rounds of LLM-based reward refinement; improvement magnitude given as +17.6%. 15 experimental runs.
high positive OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... satisfaction for Young Males
We introduce the Comfort Equity Index (CEI) as a novel feedback signal.
Paper contribution / methodological description introducing CEI (no quantitative validation details reported in abstract).
high positive OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... Comfort Equity Index (CEI)
Multimodal contrastive learning enables generative AI to output images that closely align with text prompts.
Stated as background/technical premise in the paper (based on prior work on multimodal contrastive learning; no experiment details provided in the abstract).
high positive Utility-Aware Multimodal Contrastive Learning for Product Im... text-image alignment (semantic coherence)
Human-subject experiments further validate the commercial effectiveness of the utility-aware method.
Reported human-subject experiments in the paper that are said to validate commercial effectiveness (details such as sample size, design, and metrics are not provided in the abstract).
high positive Utility-Aware Multimodal Contrastive Learning for Product Im... commercial effectiveness (presumably purchase intent / preference in human-subje...
In downstream applications on Amazon and Airbnb, product images generated and edited by our method outperform state-of-the-art models in increasing demand and preserving fidelity, while maintaining text-image consistency.
Empirical evaluation on downstream applications using Amazon and Airbnb datasets / deployments reported in the paper (experiments comparing their method to state-of-the-art models; exact sample sizes and metrics not provided in the abstract).
high positive Utility-Aware Multimodal Contrastive Learning for Product Im... increase in demand; image fidelity; text-image consistency
The effect arises from a shift in the learned image-text representation space toward demand-driven visual cues, which we validate through a theoretical bound on the proposed objective.
Theoretical analysis presented in the paper claiming a bound that links the utility-aware objective to representation shifts toward demand-relevant features.
high positive Utility-Aware Multimodal Contrastive Learning for Product Im... shift in image-text representation toward demand-driven visual cues
Optimizing this utility-aware objective guides generation toward images that are both semantically coherent and demand-enhancing.
Claim supported in the paper by a theoretical bound and by downstream empirical evaluation (described in the abstract; experiments on marketplace data referenced).
high positive Utility-Aware Multimodal Contrastive Learning for Product Im... semantic coherence and demand (sales/engagement)
We propose a utility-aware multimodal contrastive learning framework that incorporates consumer demand into a novel Utility-Aware InfoNCE loss.
Methodological contribution described in the paper (proposal of a new loss function and framework; supported by method description and theoretical development).
high positive Utility-Aware Multimodal Contrastive Learning for Product Im... incorporation of consumer demand into representation learning (method-level outc...
Product images strongly influence consumer decision-making in online marketplaces.
Stated as background motivation in the paper (cites prior literature / widely accepted premise; no specific sample or experiment reported in the excerpt).
The paper provides a conceptual foundation for designing AI systems that model expert sensing over time, positioning cognition as an infrastructural, operational, and professional domain in persistent human-AI systems.
Stated contribution of the paper (conceptual/theoretical contribution rather than empirical evidence).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... conceptual-foundation-for-tacit-sensing-in-AI-design
The Cognitive Operations Research and Training Framework (CORTF) is introduced to support research, education, and workforce development.
Conceptual framework proposed in the paper (no empirical implementation or evaluation presented).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... research-and-training-framework-for-cognitive-operations
The Cognitive Operations Manager is proposed as a prototype AI-native professional role for coordinating tacit signal modelling, semantic modelling, AI system calibration, expert validation, and ethical governance.
Proposal of a new professional role in the paper (conceptual/visionary; no pilot study, job analysis, or workforce data reported).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... creation-and-coordination-of-a-new-AI-native-professional-role
Long-term Cognitive Operations are defined as the practices required to maintain and govern such systems, including memory curation, semantic organisation, tacit signal modelling, reasoning calibration, and cognitive governance.
Conceptual taxonomy/definition introduced in the paper (theoretical framing; no empirical validation).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... practices-for-maintaining-and-governing-tacit-sensing-systems
Tacit Signal Infrastructure is introduced as a layer for capturing, structuring, modelling, interpreting, and validating expert tacit signals over time.
Conceptual design/proposal presented in the paper (architectural description; no empirical implementation or evaluation reported).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... infrastructure-capability-for-tacit-signal-management
Next-generation AI systems should move beyond explicit knowledge processing toward the longitudinal modelling of expert tacit sensing.
Normative proposal / recommendation made in the paper as part of a vision; supported by conceptual rationale rather than empirical data.
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... longitudinal-tacit-sensing-modelling-adoption
High-level expertise also depends on tacit sensing: perceiving weak signals, recognising emerging tensions, detecting coherence degradation, and anticipating instability before formal indicators appear.
Conceptual claim grounded in cognitive-science-informed argumentation presented in the paper (no empirical study or sample size reported).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... expert-tacit-sensing-capability
Current generative AI systems are increasingly effective at processing explicit knowledge, including retrieving information, summarising documents, generating explanations, and supporting codified workflows.
Asserted in the paper as a descriptive trend; based on literature synthesis and observations of current generative AI capabilities (no empirical sample or experiment reported in the paper).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... explicit-knowledge-processing-capability (retrieval, summarisation, explanation,...
Beyond compute sharing, SwarmHarness is a foundational primitive for autonomous distributed AI agent networks in which agents hire compute, route subtasks, and settle credits without human intermediation.
Forward-looking claim about potential applications of the proposed protocol; described conceptually with no experimental validation or deployment case studies.
As nodes specialise toward high-reward skills and routing signals act as digital pheromones, the network exhibits emergent collective intelligence analogous to biological swarms.
Theoretical/analogical claim based on designed routing signals and incentives; presented as expected emergent behaviour without empirical demonstration.
Nodes earn credits by serving tasks and spend credits to submit them; idle nodes that never contribute drain credits and lose routing priority, creating a self-regulating participation economy.
Mechanism and expected economic dynamics described as part of the protocol design; no experimental or deployment evidence provided to demonstrate the claimed emergent behaviour.
SwarmHarness has three interlocking components: a SwarmRegistry built on a Distributed Hash Table (DHT) for peer discovery and capability advertisement; a SwarmRouter that dispatches tasks to nodes using a utility function over capability, load, latency, and trust; and SwarmCredit, an incentive mechanism that attributes compute-credit rewards to contributing nodes via a Shapley-value approximation.
Architectural description in the paper; specified components and mechanisms described as part of the proposed system design, without empirical validation.
We propose SwarmHarness, a decentralised protocol in which HarnessAPI skill nodes self-organise into a compute swarm without any central authority.
Design/proposal presented in the paper; no implementation results or deployment metrics provided.
To close this gap, we recommend calibrated confidence, evidence-grounded explanations, and mechanisms that help users refine trust.
Authors' recommendations based on observed shortcomings in human–AI collaboration in the study (no direct experimental test of these interventions reported in the abstract).
high positive AI, Take the Wheel: What Drives Delegation and Trust in Huma... improvements in human–AI trust and collaboration (proposed, not empirically test...
Human–AI collaboration performs better than either AI or humans alone.
Comparison of collaborative team performance versus AI-alone and human-alone performance reported from the experiment.
high positive AI, Take the Wheel: What Drives Delegation and Trust in Huma... team performance (win rate/accuracy) of human–AI collaboration compared to AI-on...
Two non-negotiable design requirements guide the architecture: cognitive-load redistribution (DR1) and bounded autonomy with alignment (DR2).
Design requirements explicitly stated in the paper guiding the HARMONY architecture.
high positive From Replacement to Orchestration: A Socio-Technical Archite... degree to which design reduces researcher cognitive load and constrains agentic ...
The model introduces 'Orchestration Leverage' as a candidate productivity metric suited to human–agent hybrid systems.
Conceptual proposal within the paper (new metric introduced as part of HARMONY).
high positive From Replacement to Orchestration: A Socio-Technical Archite... productivity of human–agent hybrid research teams (via proposed metric)
We propose HARMONY (Hybrid Agentic Research Model for Organisational New Yield), a four-pillar socio-technical architecture comprising ResOps (Industrialized Execution), the Control Tower (Strategic Visibility and Drift Detection), the Ethics Fabric (Bounded Autonomy by Design), and the Talent Studio (Sciencepreneur Capability).
Design Science Research artifact (proposed operating model described in the paper).
high positive From Replacement to Orchestration: A Socio-Technical Archite... organizational capability to conduct agentic R&D / R&D productivity
The framework establishes a principled vocabulary for designing enterprise service platforms that manage human and artificial intelligence labor responsibly, transparently, and at scale.
Paper presents the combined constructs (Workforce Unit Abstraction, Hybrid Capacity Model, Governance-bound Autonomy) as a coherent reference model and vocabulary; described as conceptual contribution arising from the design-science approach.
high positive Workforce Unit Abstraction for Governing Hybrid Human and Ar... availability of a principled vocabulary/reference model for enterprise hybrid wo...
Governance-bound autonomy constrains AI Workforce Unit actions within a five-level, policy-enforced autonomy ladder supported by six mandatory governance controls.
Conceptual governance artifact described in the paper (five-level autonomy ladder + six governance controls); presented as the proposed governance design, not as an empirically tested intervention in the abstract.
high positive Workforce Unit Abstraction for Governing Hybrid Human and Ar... degree of constrained autonomy for AI workforce units (policy-enforced levels an...
The Hybrid Capacity Model extends demand-to-supply planning across heterogeneous workforce pools, resolving a multi-objective allocation problem that simultaneously optimizes cost, quality, and risk constraints.
Described model/algorithmic artifact in the paper (Hybrid Capacity Model) claiming multi-objective optimization; no empirical benchmark or sample size reported in the provided text.
high positive Workforce Unit Abstraction for Governing Hybrid Human and Ar... ability to allocate demand-to-supply across heterogeneous (human + AI) workforce...
The Workforce Unit Abstraction defines a unified seven-attribute operational schema applicable to both human workers and AI agents, enabling consistent representation across planning, scheduling, and governance systems.
Artifact description from the paper (Workforce Unit Abstraction with seven attributes); presented as a designed schema rather than an empirically validated result in the abstract.
high positive Workforce Unit Abstraction for Governing Hybrid Human and Ar... consistency of representation of human and AI workforce units across planning, s...
This article introduces three constructs as reusable primitives for hybrid workforce platform design.
Design science research methodology producing an artifact (three constructs); described as the paper's contribution. No empirical evaluation or sample size reported in the abstract.
high positive Workforce Unit Abstraction for Governing Hybrid Human and Ar... availability of design primitives for hybrid workforce platforms
Compounded through 500 turns of reciprocation, these differentials accumulated into in-group trust biases of +0.014 to +0.100 (d = 0.84-4.52), illustrating how modest per-interaction targeting propagates into structural inequality in persistent networks.
Aggregate/longitudinal result from the simulation after 500 turns: reported cumulative change in in-group trust bias (absolute change +0.014 to +0.100) and reported effect sizes in Cohen's d (0.84–4.52); based on the same experimental setup (6 model families, 20 seeds each).
high positive Human-like in-group bias in instruction-tuned language model... accumulated in-group trust bias over 500 turns (absolute change and Cohen's d)
Per-turn in-group versus out-group differentials of 5 to 16 percentage points were statistically significant for all six models (Wilcoxon signed-rank, all Benjamini-Hochberg-corrected p < 0.001), establishing group-contingent targeting as a robust property of instruction-tuned language models across architectures and training regimes.
Statistical analysis reported in the paper: per-turn differential between in-group and out-group targeting measured as percentages (5–16 percentage points); significance assessed with Wilcoxon signed-rank tests and Benjamini-Hochberg correction; applied across six model families each with 20 seeds.
high positive Human-like in-group bias in instruction-tuned language model... per-turn in-group vs out-group targeting differential
When group labels were visible, we observed network assortativity (all absent when labels were hidden).
Reported network-level outcomes from the simulation comparing visible vs hidden label conditions across the experimental runs (6 model families, 20 seeds each, 500 turns).
When group labels were visible, we observed action homophily.
Result reported from the simulation comparing visible versus hidden group label conditions across the described experimental runs (6 model families, 20 seeds each, 500 turns).
high positive Human-like in-group bias in instruction-tuned language model... action homophily (agents preferentially taking actions toward same-group agents)
When group labels were visible, we observed in-group trust bias.
Result reported from the simulation comparing conditions with visible versus hidden group labels; based on interactions of instruction-tuned LLM agents across the reported experimental runs (6 model families, 20 seeds each, 500 turns).
We ran a controlled multi-agent simulation in which instruction-tuned language model agents interacted across 500 turns under three conditions manipulating group label salience and resource scarcity, across six model families with 20 seeds each.
Descriptive methods statement from the paper: controlled multi-agent simulation; instruction-tuned LLM agents; 3 experimental conditions (manipulating group label salience and resource scarcity); 6 model families; 20 random seeds per model; 500 turns per simulation run.
high positive Human-like in-group bias in instruction-tuned language model... experimental setup / simulation configuration (turns, conditions, models, seeds)
Augment Engineering completes a three-discipline progression: Prompt Engineering (one tool), Context Engineering (reproducible pipelines), Augment Engineering (a portfolio of tools across domains).
Conceptual framing presented in the paper describing a proposed progression of disciplines.
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... conceptual progression among related disciplines