The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6574 claims)

Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 761 200 101 904 2020
Governance & Regulation 829 400 191 122 1566
Organizational Efficiency 784 193 125 84 1197
Technology Adoption Rate 637 236 124 97 1103
Research Productivity 431 131 58 340 972
Output Quality 481 183 59 47 770
Decision Quality 332 177 82 49 647
Firm Productivity 439 57 88 20 610
AI Safety & Ethics 218 279 66 33 602
Market Structure 181 170 123 24 503
Task Allocation 214 64 72 33 388
Skill Acquisition 174 62 62 17 315
Innovation Output 204 27 45 18 295
Employment Level 105 54 108 13 282
Fiscal & Macroeconomic 132 69 43 26 277
Consumer Welfare 117 63 42 11 233
Firm Revenue 154 48 26 3 231
Task Completion Time 173 31 8 12 225
Inequality Measures 44 123 50 6 223
Worker Satisfaction 89 65 22 12 188
Error Rate 71 92 10 2 175
Regulatory Compliance 77 69 14 5 165
Automation Exposure 58 56 26 13 156
Training Effectiveness 96 21 14 19 152
Wages & Compensation 77 37 25 6 145
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 81 21 1 115
Hiring & Recruitment 52 7 8 3 70
Creative Output 32 20 8 3 64
Skill Obsolescence 5 47 6 1 59
Social Protection 28 16 8 2 54
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
To enable scalable evaluation, we use a rubric-based visual language model (VLM) judge and validate its reliability through human annotation.
Method and validation claim in abstract stating use of rubric-based VLM and validation against human annotations.
high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... reliability of rubric-based VLM judge (agreement with human annotation)
The final stage uses design-specific rubrics to assess functionality, manufacturability, and assemblability, moving beyond shape matching toward practical design quality.
Paper's description of the benchmark's evaluation rubric and intended assessment criteria (abstract).
high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... assessed functionality, manufacturability, and assemblability of generated CAD m...
MUSE pairs practical design instances with structured Design Specifications and evaluates generated models through a three-stage protocol: code check, geometric check, and design-intent alignment.
Methodological description in abstract indicating dataset pairing and three-stage evaluation protocol.
high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... evaluation pipeline effectiveness (code executability, geometric validity, desig...
We introduce MUSE, a Text-to-CAD benchmark focused on complex, editable boundary representation (B-Rep) assemblies.
Paper contribution / dataset creation described in abstract; supported by project website and accompanying dataset/code.
high positive MUSE: Benchmarking Manufacturable, Functional, and Assemblab... availability of a Text-to-CAD benchmark for complex B-Rep assemblies
By Round 3, equity-aware LLM refinement reduces energy costs by 3.2%.
Empirical results reported in abstract: energy cost reduction of 3.2% after three rounds of LLM-mediated reward refinement (15 experimental runs).
By Round 3, equity-aware LLM refinement improves satisfaction for Elderly Females (+567%).
Empirical results reported in abstract following three rounds of LLM-based reward refinement; improvement magnitude given as +567%. 15 experimental runs.
high positive OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... satisfaction for Elderly Females
By Round 3, equity-aware LLM refinement improves satisfaction for Health Sensitive (+53.8%).
Empirical results reported in abstract following three rounds of LLM-based reward refinement; improvement magnitude given as +53.8%. 15 experimental runs.
high positive OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... satisfaction for Health Sensitive occupants
By Round 3, equity-aware LLM refinement improves satisfaction for Mid-aged Females (+28.2%).
Empirical results reported in abstract following three rounds of LLM-based reward refinement; improvement magnitude given as +28.2%. 15 experimental runs.
high positive OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... satisfaction for Mid-aged Females
By Round 3, equity-aware LLM refinement improves satisfaction for Young Males (+17.6%).
Empirical results reported in abstract following three rounds of LLM-based reward refinement; improvement magnitude given as +17.6%. 15 experimental runs.
high positive OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... satisfaction for Young Males
We introduce the Comfort Equity Index (CEI) as a novel feedback signal.
Paper contribution / methodological description introducing CEI (no quantitative validation details reported in abstract).
high positive OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... Comfort Equity Index (CEI)
The paper provides a conceptual foundation for designing AI systems that model expert sensing over time, positioning cognition as an infrastructural, operational, and professional domain in persistent human-AI systems.
Stated contribution of the paper (conceptual/theoretical contribution rather than empirical evidence).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... conceptual-foundation-for-tacit-sensing-in-AI-design
The Cognitive Operations Research and Training Framework (CORTF) is introduced to support research, education, and workforce development.
Conceptual framework proposed in the paper (no empirical implementation or evaluation presented).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... research-and-training-framework-for-cognitive-operations
The Cognitive Operations Manager is proposed as a prototype AI-native professional role for coordinating tacit signal modelling, semantic modelling, AI system calibration, expert validation, and ethical governance.
Proposal of a new professional role in the paper (conceptual/visionary; no pilot study, job analysis, or workforce data reported).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... creation-and-coordination-of-a-new-AI-native-professional-role
Long-term Cognitive Operations are defined as the practices required to maintain and govern such systems, including memory curation, semantic organisation, tacit signal modelling, reasoning calibration, and cognitive governance.
Conceptual taxonomy/definition introduced in the paper (theoretical framing; no empirical validation).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... practices-for-maintaining-and-governing-tacit-sensing-systems
Tacit Signal Infrastructure is introduced as a layer for capturing, structuring, modelling, interpreting, and validating expert tacit signals over time.
Conceptual design/proposal presented in the paper (architectural description; no empirical implementation or evaluation reported).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... infrastructure-capability-for-tacit-signal-management
Next-generation AI systems should move beyond explicit knowledge processing toward the longitudinal modelling of expert tacit sensing.
Normative proposal / recommendation made in the paper as part of a vision; supported by conceptual rationale rather than empirical data.
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... longitudinal-tacit-sensing-modelling-adoption
High-level expertise also depends on tacit sensing: perceiving weak signals, recognising emerging tensions, detecting coherence degradation, and anticipating instability before formal indicators appear.
Conceptual claim grounded in cognitive-science-informed argumentation presented in the paper (no empirical study or sample size reported).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... expert-tacit-sensing-capability
Current generative AI systems are increasingly effective at processing explicit knowledge, including retrieving information, summarising documents, generating explanations, and supporting codified workflows.
Asserted in the paper as a descriptive trend; based on literature synthesis and observations of current generative AI capabilities (no empirical sample or experiment reported in the paper).
high positive Tacit Signal Infrastructure: Towards AI Systems that Model E... explicit-knowledge-processing-capability (retrieval, summarisation, explanation,...
To close this gap, we recommend calibrated confidence, evidence-grounded explanations, and mechanisms that help users refine trust.
Authors' recommendations based on observed shortcomings in human–AI collaboration in the study (no direct experimental test of these interventions reported in the abstract).
high positive AI, Take the Wheel: What Drives Delegation and Trust in Huma... improvements in human–AI trust and collaboration (proposed, not empirically test...
Human–AI collaboration performs better than either AI or humans alone.
Comparison of collaborative team performance versus AI-alone and human-alone performance reported from the experiment.
high positive AI, Take the Wheel: What Drives Delegation and Trust in Huma... team performance (win rate/accuracy) of human–AI collaboration compared to AI-on...
Two non-negotiable design requirements guide the architecture: cognitive-load redistribution (DR1) and bounded autonomy with alignment (DR2).
Design requirements explicitly stated in the paper guiding the HARMONY architecture.
high positive From Replacement to Orchestration: A Socio-Technical Archite... degree to which design reduces researcher cognitive load and constrains agentic ...
The model introduces 'Orchestration Leverage' as a candidate productivity metric suited to human–agent hybrid systems.
Conceptual proposal within the paper (new metric introduced as part of HARMONY).
high positive From Replacement to Orchestration: A Socio-Technical Archite... productivity of human–agent hybrid research teams (via proposed metric)
We propose HARMONY (Hybrid Agentic Research Model for Organisational New Yield), a four-pillar socio-technical architecture comprising ResOps (Industrialized Execution), the Control Tower (Strategic Visibility and Drift Detection), the Ethics Fabric (Bounded Autonomy by Design), and the Talent Studio (Sciencepreneur Capability).
Design Science Research artifact (proposed operating model described in the paper).
high positive From Replacement to Orchestration: A Socio-Technical Archite... organizational capability to conduct agentic R&D / R&D productivity
The framework establishes a principled vocabulary for designing enterprise service platforms that manage human and artificial intelligence labor responsibly, transparently, and at scale.
Paper presents the combined constructs (Workforce Unit Abstraction, Hybrid Capacity Model, Governance-bound Autonomy) as a coherent reference model and vocabulary; described as conceptual contribution arising from the design-science approach.
high positive Workforce Unit Abstraction for Governing Hybrid Human and Ar... availability of a principled vocabulary/reference model for enterprise hybrid wo...
Governance-bound autonomy constrains AI Workforce Unit actions within a five-level, policy-enforced autonomy ladder supported by six mandatory governance controls.
Conceptual governance artifact described in the paper (five-level autonomy ladder + six governance controls); presented as the proposed governance design, not as an empirically tested intervention in the abstract.
high positive Workforce Unit Abstraction for Governing Hybrid Human and Ar... degree of constrained autonomy for AI workforce units (policy-enforced levels an...
The Hybrid Capacity Model extends demand-to-supply planning across heterogeneous workforce pools, resolving a multi-objective allocation problem that simultaneously optimizes cost, quality, and risk constraints.
Described model/algorithmic artifact in the paper (Hybrid Capacity Model) claiming multi-objective optimization; no empirical benchmark or sample size reported in the provided text.
high positive Workforce Unit Abstraction for Governing Hybrid Human and Ar... ability to allocate demand-to-supply across heterogeneous (human + AI) workforce...
The Workforce Unit Abstraction defines a unified seven-attribute operational schema applicable to both human workers and AI agents, enabling consistent representation across planning, scheduling, and governance systems.
Artifact description from the paper (Workforce Unit Abstraction with seven attributes); presented as a designed schema rather than an empirically validated result in the abstract.
high positive Workforce Unit Abstraction for Governing Hybrid Human and Ar... consistency of representation of human and AI workforce units across planning, s...
This article introduces three constructs as reusable primitives for hybrid workforce platform design.
Design science research methodology producing an artifact (three constructs); described as the paper's contribution. No empirical evaluation or sample size reported in the abstract.
high positive Workforce Unit Abstraction for Governing Hybrid Human and Ar... availability of design primitives for hybrid workforce platforms
Augment Engineering completes a three-discipline progression: Prompt Engineering (one tool), Context Engineering (reproducible pipelines), Augment Engineering (a portfolio of tools across domains).
Conceptual framing presented in the paper describing a proposed progression of disciplines.
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... conceptual progression among related disciplines
A Wright's Law fit (n = 82 artifacts, p < 0.01) shows production acceleration across the artifact portfolio.
Quantitative model reported in the paper: Wright's Law fit on 82 artifacts with reported p-value < 0.01.
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... production acceleration (learning curve effects) across produced artifacts
A Cochran-Armitage trend test (n = 200 interactions across two chat LLMs, p < 0.01) shows first-pass acceptance rising with prompt-sophistication level.
Quantitative test reported in the paper: Cochran-Armitage trend test on 200 interactions across two chat LLMs, reported p-value < 0.01.
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... first-pass acceptance rate of generated outputs as a function of prompt sophisti...
A 5-month formative case study (Nov 2025 to Mar 2026) documents a single practitioner applying Augment Engineering skills across a ten-component orchestration stack spanning seven professional domains, producing work products that would traditionally involve separate domain specialists.
Case study reported in the paper describing one practitioner's activities over five months across a 10-component stack in seven domains; sample size = 1 practitioner.
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... ability of one practitioner to produce cross-domain work products that tradition...
The paper presents a six-phase orchestration methodology and four portability metrics for Augment Engineering.
Stated methodological contribution within the paper (description of methodology and metrics).
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... methodology and metrics for orchestration and portability
Augment Engineering is a discipline of orchestrating multiple purpose-built AI tools across distinct professional domains, applying prompt and context engineering as portable competencies that transfer across tool boundaries.
Definition and conceptual development presented in the paper (methodological contribution).
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... existence/definition of a new discipline (Augment Engineering)
Prompt engineering (interaction-level optimization) and context engineering (structured input pipeline design) are domain-portable meta-skills: a practitioner who masters them can apply them to any purpose-built AI tool in any domain.
Conceptual claim supported by the paper's argumentation and exemplified by a single-practitioner case study.
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... portability of prompt and context engineering skills across tools and domains
The framework has implications for digital health, education, AI personalisation, and personal agency.
Authors' discussion in paper of potential implications across these application domains; presented qualitatively.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... implications for listed application domains
The authors list six operational requirements for state-aware systems.
Explicit statement in paper that six operational requirements are listed; descriptive rather than empirically tested in abstract.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... number of operational requirements
The authors derive seven testable predictions from the state-aware framework.
Explicit statement in paper that seven testable predictions are derived from the framework; no individual prediction effects quantified in abstract.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... number of derived testable predictions
The paper is supported by a 24-month observational base from a deployed behavioural platform spanning more than 200,000 consented users across four occupational personas (research period 2023 to 2026).
Empirical dataset described in the paper: observational deployment over 24 months, >200,000 consented users, four occupational personas, timeframe given (2023–2026).
high positive You Are in Control of Your State: Why Human Outcomes Are Con... existence and scale of observational dataset
The framework is motivated by six strands of established evidence: causal inference, predictive processing, allostasis, attentional bottleneck, chronobiology, and computational psychiatry.
Explicit statement in paper describing the literature strands used to motivate the framework.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... theoretical grounding of framework
Taken together, these claims imply that the outcome of a given event is controllable, conditionally, on the state-trajectory at the time of intervention.
Synthesis/implication drawn by authors from the conceptual framework and the six literature strands; argued but not quantified in abstract.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... conditional controllability of event outcomes
The conscious channel through which outcomes are reportable is a narrow attentional bottleneck whose contents are themselves state-dependent.
Theoretical claim supported by attentional bottleneck literature cited in the paper; presented as part of the conceptual framework.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... attentional bottleneck content dependency on state
The weighting vector (state) is dynamic at sub-daily timescales.
Claim motivated by chronobiology and related literature cited in the paper; authors state the sub-daily dynamism as part of their framework.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... temporal dynamics of latent state
The relationship between state, decision, and outcome is causal rather than correlational.
Argument grounded in causal inference literature cited by the authors; presented as a core theoretical claim in the paper rather than demonstrated by a specific randomized experiment in the abstract.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... causal influence of state on decisions/outcomes
A state can be defined as the time-indexed weighting vector over the dimensions that govern how an individual's biology, physiology, and neuropsychology process the next event into a decision and an outcome.
Explicit definitional claim / framework component introduced by the authors; justified conceptually via multidisciplinary literature cited in the paper.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... conceptual definition of latent state
Human outcomes are controllable in a precise and operational sense through interventions that target the state and its weighting at the moment a decision is being formed.
Theoretical argument in the paper, motivated by the six literature strands; supported in part by the authors' deployed behavioural platform (see separate claim about dataset) but no randomized effect sizes reported in abstract.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... controllability of outcomes via state-targeted interventions
This persistent variability belongs in a dynamic latent state of the person (i.e., is best modelled as a time-varying latent state).
Conceptual claim supported by integration of six strands of established evidence (causal inference, predictive processing, allostasis, attentional bottleneck, chronobiology, computational psychiatry) cited in the paper.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... attribution of outcome variance to latent state
Within-person variability persists: the same individual, presented with the same observable input, produces different outcomes on different occasions, and different individuals produce divergent outcomes that no observable covariate fully predicts.
Statement motivated by literature review across behavioural sciences; argued in paper as empirical puzzle rather than proven with new statistics in this manuscript.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... variation in individual outcomes / decisions
Agents share successes and failures to reduce redundant exploration during long-running experiments.
Design of AutoScientists includes mechanisms for recording and sharing experimental outcomes; asserted benefit in paper that this reduces redundant exploration (qualitative and supported by experimental comparisons).
high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... redundant exploration (qualitative/system-level reduction)
Applied without modification across all 217 ProteinGym assays, the same method improves over the prior state of the art by +6.5% (Spearman correlation).
Empirical evaluation across all 217 assays in the ProteinGym benchmark; reported aggregate improvement in Spearman correlation versus prior state-of-the-art.
high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... Spearman correlation averaged across 217 ProteinGym assays