The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
A Wright's Law fit (n = 82 artifacts, p < 0.01) shows production acceleration across the artifact portfolio.
Quantitative model reported in the paper: Wright's Law fit on 82 artifacts with reported p-value < 0.01.
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... production acceleration (learning curve effects) across produced artifacts
A Cochran-Armitage trend test (n = 200 interactions across two chat LLMs, p < 0.01) shows first-pass acceptance rising with prompt-sophistication level.
Quantitative test reported in the paper: Cochran-Armitage trend test on 200 interactions across two chat LLMs, reported p-value < 0.01.
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... first-pass acceptance rate of generated outputs as a function of prompt sophisti...
A 5-month formative case study (Nov 2025 to Mar 2026) documents a single practitioner applying Augment Engineering skills across a ten-component orchestration stack spanning seven professional domains, producing work products that would traditionally involve separate domain specialists.
Case study reported in the paper describing one practitioner's activities over five months across a 10-component stack in seven domains; sample size = 1 practitioner.
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... ability of one practitioner to produce cross-domain work products that tradition...
The paper presents a six-phase orchestration methodology and four portability metrics for Augment Engineering.
Stated methodological contribution within the paper (description of methodology and metrics).
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... methodology and metrics for orchestration and portability
Augment Engineering is a discipline of orchestrating multiple purpose-built AI tools across distinct professional domains, applying prompt and context engineering as portable competencies that transfer across tool boundaries.
Definition and conceptual development presented in the paper (methodological contribution).
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... existence/definition of a new discipline (Augment Engineering)
Prompt engineering (interaction-level optimization) and context engineering (structured input pipeline design) are domain-portable meta-skills: a practitioner who masters them can apply them to any purpose-built AI tool in any domain.
Conceptual claim supported by the paper's argumentation and exemplified by a single-practitioner case study.
high positive Augment Engineering: A Methodology for Multi-Tool AI Orchest... portability of prompt and context engineering skills across tools and domains
The framework has implications for digital health, education, AI personalisation, and personal agency.
Authors' discussion in paper of potential implications across these application domains; presented qualitatively.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... implications for listed application domains
The authors list six operational requirements for state-aware systems.
Explicit statement in paper that six operational requirements are listed; descriptive rather than empirically tested in abstract.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... number of operational requirements
The authors derive seven testable predictions from the state-aware framework.
Explicit statement in paper that seven testable predictions are derived from the framework; no individual prediction effects quantified in abstract.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... number of derived testable predictions
The paper is supported by a 24-month observational base from a deployed behavioural platform spanning more than 200,000 consented users across four occupational personas (research period 2023 to 2026).
Empirical dataset described in the paper: observational deployment over 24 months, >200,000 consented users, four occupational personas, timeframe given (2023–2026).
high positive You Are in Control of Your State: Why Human Outcomes Are Con... existence and scale of observational dataset
The framework is motivated by six strands of established evidence: causal inference, predictive processing, allostasis, attentional bottleneck, chronobiology, and computational psychiatry.
Explicit statement in paper describing the literature strands used to motivate the framework.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... theoretical grounding of framework
Taken together, these claims imply that the outcome of a given event is controllable, conditionally, on the state-trajectory at the time of intervention.
Synthesis/implication drawn by authors from the conceptual framework and the six literature strands; argued but not quantified in abstract.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... conditional controllability of event outcomes
The conscious channel through which outcomes are reportable is a narrow attentional bottleneck whose contents are themselves state-dependent.
Theoretical claim supported by attentional bottleneck literature cited in the paper; presented as part of the conceptual framework.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... attentional bottleneck content dependency on state
The weighting vector (state) is dynamic at sub-daily timescales.
Claim motivated by chronobiology and related literature cited in the paper; authors state the sub-daily dynamism as part of their framework.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... temporal dynamics of latent state
The relationship between state, decision, and outcome is causal rather than correlational.
Argument grounded in causal inference literature cited by the authors; presented as a core theoretical claim in the paper rather than demonstrated by a specific randomized experiment in the abstract.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... causal influence of state on decisions/outcomes
A state can be defined as the time-indexed weighting vector over the dimensions that govern how an individual's biology, physiology, and neuropsychology process the next event into a decision and an outcome.
Explicit definitional claim / framework component introduced by the authors; justified conceptually via multidisciplinary literature cited in the paper.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... conceptual definition of latent state
Human outcomes are controllable in a precise and operational sense through interventions that target the state and its weighting at the moment a decision is being formed.
Theoretical argument in the paper, motivated by the six literature strands; supported in part by the authors' deployed behavioural platform (see separate claim about dataset) but no randomized effect sizes reported in abstract.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... controllability of outcomes via state-targeted interventions
This persistent variability belongs in a dynamic latent state of the person (i.e., is best modelled as a time-varying latent state).
Conceptual claim supported by integration of six strands of established evidence (causal inference, predictive processing, allostasis, attentional bottleneck, chronobiology, computational psychiatry) cited in the paper.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... attribution of outcome variance to latent state
Within-person variability persists: the same individual, presented with the same observable input, produces different outcomes on different occasions, and different individuals produce divergent outcomes that no observable covariate fully predicts.
Statement motivated by literature review across behavioural sciences; argued in paper as empirical puzzle rather than proven with new statistics in this manuscript.
high positive You Are in Control of Your State: Why Human Outcomes Are Con... variation in individual outcomes / decisions
Agents share successes and failures to reduce redundant exploration during long-running experiments.
Design of AutoScientists includes mechanisms for recording and sharing experimental outcomes; asserted benefit in paper that this reduces redundant exploration (qualitative and supported by experimental comparisons).
high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... redundant exploration (qualitative/system-level reduction)
Applied without modification across all 217 ProteinGym assays, the same method improves over the prior state of the art by +6.5% (Spearman correlation).
Empirical evaluation across all 217 assays in the ProteinGym benchmark; reported aggregate improvement in Spearman correlation versus prior state-of-the-art.
high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... Spearman correlation averaged across 217 ProteinGym assays
On ProteinGym fitness prediction, AutoScientists discovers a method for ACE2-Spike binding that improves over the current state-of-the-art model by +12.5% in Spearman correlation.
Empirical evaluation on the ACE2-Spike assay within the ProteinGym benchmark; reported relative improvement in Spearman correlation versus prior state-of-the-art.
high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... Spearman correlation on ACE2-Spike binding fitness prediction
On GPT training optimization, AutoScientists continues discovering improvements from a starting champion where the single-agent approach finds none (7 vs. 0 accepted improvements).
Empirical comparison of discovered/accepted improvements during GPT training optimization; counts of accepted improvements for AutoScientists (7) versus single-agent approach (0).
high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... count of accepted improvements discovered
On GPT training optimization, AutoScientists reaches a target validation bits-per-byte 1.9x faster than Autoresearch.
Empirical training-time comparison between AutoScientists and Autoresearch on GPT training optimization tasks; reported speedup multiplier to reach a validation bits-per-byte target.
high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... time-to-target (validation bits-per-byte)
On BioML-Bench, spanning biomedical imaging, protein engineering, single-cell omics, and drug discovery, AutoScientists achieves a mean leaderboard percentile of 74.4% across 24 tasks, improving over the strongest AI agent by +8.33%.
Empirical evaluation on the BioML-Bench benchmark (24 tasks); reported mean leaderboard percentile and comparative improvement versus the strongest baseline agent.
high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... leaderboard percentile across benchmark tasks
Under matched experimental budgets, AutoScientists improves over prior AI agents across biomedical machine learning, language-model training optimization, and protein fitness prediction.
Empirical comparisons reported in paper across multiple benchmark suites and tasks (BioML-Bench, GPT training optimization experiments, ProteinGym).
high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... overall performance across multiple benchmarks
AutoScientists is a decentralized team of AI agents that interpret a shared experimental state, self-organize into teams around promising hypotheses, critique proposals before using experimental compute, and share successes and failures to reduce redundant exploration.
System design and implementation described in the paper (architecture and agent protocols); qualitative description of agent behaviors and coordination mechanisms; demonstrated in experiments.
high positive AutoScientists: Self-Organizing Agent Teams for Long-Running... agent coordination and information sharing (qualitative description)
We describe the benchmark design, evaluation protocol, and quality-control pipeline, and position OR-Space as a benchmark for studying the reliability, failure modes, and practical readiness of LLM agents in industrial OR workflows.
Statement of the paper's contributions and contents (methodological description of what the paper includes).
high positive OR-Space: A Full-Lifecycle Workspace Benchmark for Industria... capability to study reliability, failure modes, and readiness of LLM agents
By combining persistent workspaces with lifecycle-oriented tasks, OR-Space evaluates whether agents can perform reliable optimization work beyond end-to-end text generation.
Stated objective/claim in the paper about the benchmark's purpose and what it measures (conceptual/goal-oriented statement).
high positive OR-Space: A Full-Lifecycle Workspace Benchmark for Industria... reliability of LLM agents in performing optimization work (beyond text generatio...
OR-Space defines an Explain task mode, where agents answer grounded questions about solutions, constraints, and business implications using evidence spread across workspace artifacts.
Definition of the Explain task mode provided in the paper (design/specification).
high positive OR-Space: A Full-Lifecycle Workspace Benchmark for Industria... ability to generate grounded explanations using workspace evidence
OR-Space defines a Revise task mode, where agents modify existing models under changing requirements or solver feedback while preserving valid prior logic.
Definition of the Revise task mode in the benchmark design (descriptive claim in the paper).
high positive OR-Space: A Full-Lifecycle Workspace Benchmark for Industria... ability to revise models while preserving prior logic
OR-Space defines three task modes: Build, where agents construct solver-ready optimization models from heterogeneous artifacts.
Definition of one of the benchmark's task modes as described in the paper (method/design description).
high positive OR-Space: A Full-Lifecycle Workspace Benchmark for Industria... ability to construct solver-ready models
Each instance is an executable workspace containing business documents, structured data, optional code artifacts, solver outputs, and task-specific evaluators distributed across interdependent files.
Design specification of OR-Space provided in the paper (descriptive claim about benchmark instance structure).
high positive OR-Space: A Full-Lifecycle Workspace Benchmark for Industria... complexity and composition of benchmark instances
We introduce OR-Space, a full-lifecycle workspace benchmark for evaluating industrial optimization agents across model construction, model revision, and grounded explanation.
Paper presents and names a new benchmark (methodological contribution described directly in the text).
high positive OR-Space: A Full-Lifecycle Workspace Benchmark for Industria... capability of benchmarks to evaluate OR agents across lifecycle tasks
Large language model (LLM) agents are increasingly used to assist with operations research (OR) modeling.
Statement in the paper asserting an observed trend; likely based on literature/context motivating the work (no empirical sample or quantitative citation provided in the excerpt).
high positive OR-Space: A Full-Lifecycle Workspace Benchmark for Industria... LLM agent adoption in OR workflows
A recommended organizational design for the AI era is the 'resonance protocol enterprise' in which structures are temporary crystallizations, AI governance protects adaptive openness, and legitimacy derives from sustaining recursive renewal.
Normative/proposal in the paper outlining a new organizational design paradigm; presented as conceptual design without empirical pilot or evaluation.
high positive The Lantern in the Vault: AI, Crisis, and the Ontology of Or... organizational design aimed at sustaining adaptive renewal and legitimacy under ...
Digital transformation initially enhanced adaptability by fluidifying information flows and expanding relational connectivity, thereby improving some organizations' adaptability.
Theoretical claim supported by qualitative interpretation of digital transformation phenomena; no systematic measurement or reported sample.
high positive The Lantern in the Vault: AI, Crisis, and the Ontology of Or... organizational adaptability associated with digital transformation practices
Organizations capable of rapid relational reconfiguration, customer reconnection, and generative experimentation often proved more resilient during the pandemic.
Illustrative/theoretical interpretation of pandemic cases offered in the paper; no quantified sample or formal empirical evidence reported.
high positive The Lantern in the Vault: AI, Crisis, and the Ontology of Or... organizational resilience as a function of relational reconfiguration and experi...
Although AI creates obstacles, it also has the potential to be an important tool for creating innovative opportunities and continued growth if managed with sound practices.
Concluding statement in the paper's abstract presenting a normative/conditional conclusion based on the paper's evaluation and synthesis of evidence (no primary quantified results provided in the supplied text).
high positive Impact of Artificial Intelligence on Employment and Society innovation opportunities and continued economic/organizational growth under soun...
AI leads to the creation of new jobs.
The paper explicitly states it examines the creation of new jobs as a ramification of AI (abstract); claim presented qualitatively without reported sample sizes or quantified effect in the provided text.
high positive Impact of Artificial Intelligence on Employment and Society creation of new jobs / net employment effects
GENESIS is built on three composable primitives (agents, skills, hooks) and a knowledge layer (SYNAPSE) that doubles as the source of ground truth and the recipient of every artifact the framework produces, making capabilities compound across runs.
Architectural description in the paper; claim about knowledge base acting as ground truth and enabling capability compounding (design-level claim). No quantitative evaluation given in the abstract.
high positive GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesi... accumulation/compounding of capabilities across runs (longitudinal improvement o...
GENESIS is an agentic AI framework that converts intents (e.g., a specification clause, a telemetry anomaly, or a research hypothesis) into solutions validated with over-the-air experiments, fed back into a persistent knowledge base.
System design / implementation claim presented in the paper (description of proposed framework). The abstract does not report empirical evaluation metrics or sample size.
high positive GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesi... ability to produce solutions validated by over-the-air experiments (end-to-end R...
Large Language Models (LLMs) have compressed comparable R&D work in general software engineering from days to minutes.
Paper's stated comparison/claim (likely based on prior reports or authors' experience); no experimental details or sample size provided in the abstract.
high positive GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesi... time to complete R&D/software engineering tasks
Operational reasoning paradigms such as ReasonOps may become foundational infrastructure for next-generation trustworthy AI ecosystems.
Author's forward-looking argument / conjecture about the potential future impact and adoption of operational reasoning paradigms; presented as an argument rather than demonstrated empirically in the excerpt.
high positive ReasonOps: A Unified Operational Paradigm for Trustworthy Ve... future adoption / foundational role of operational reasoning paradigms
The paper presents the ReasonOps architecture, demonstrates its workflow using an autonomous braking system analysis example, and discusses its potential role in future safety-critical autonomous AI systems.
Author statement about the paper's content and demonstration (explicitly claims an architecture and an example walkthrough); evidence is the paper's own descriptive content.
high positive ReasonOps: A Unified Operational Paradigm for Trustworthy Ve... presence of architecture and example demonstration in the paper
The proposed paradigm integrates semantic interpretation, autoformalization, symbolic reasoning, theorem proving, runtime assurance, probabilistic reliability estimation, and adaptive correction into a unified reasoning lifecycle.
Author claim about the architecture and components of ReasonOps; presented as a proposed integrated lifecycle in the paper (no empirical evaluation reported in excerpt).
high positive ReasonOps: A Unified Operational Paradigm for Trustworthy Ve... integration of multiple reasoning and assurance components
ReasonOps treats reasoning as a continuously monitored, verifiable, reliability-aware operational process rather than an isolated inference task.
Author description of the ReasonOps paradigm and its operational stance (conceptual framework described in paper).
high positive ReasonOps: A Unified Operational Paradigm for Trustworthy Ve... operationalization of reasoning processes (monitoring, verification, reliability...
This paper introduces ReasonOps, a unified operational paradigm for trustworthy verified reasoning systems.
Declarative claim about the paper's contribution (introduction of a named paradigm); supported by the paper itself (architectural description and example claimed).
high positive ReasonOps: A Unified Operational Paradigm for Trustworthy Ve... existence/introduction of an operational paradigm (ReasonOps)
Recent advances in theorem proving, autoformalization, symbolic reasoning, and tool-augmented language models demonstrate substantial progress toward machine-assisted formal reasoning.
Author statement citing multiple research directions (theorem proving, autoformalization, symbolic reasoning, tool-augmented LMs); no specific empirical results or quantitative studies provided in excerpt.
high positive ReasonOps: A Unified Operational Paradigm for Trustworthy Ve... progress toward machine-assisted formal reasoning
Large Language Models (LLMs) have transformed artificial intelligence from primarily generative systems into increasingly capable reasoning agents.
Author assertion in paper's introduction; conceptual argument referencing recent developments in LLMs (no empirical study or sample size reported in text excerpt).
high positive ReasonOps: A Unified Operational Paradigm for Trustworthy Ve... capability of LLMs to perform reasoning