The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6491 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
SCDPs are a useful framework for policy simulation for the digital economy, mechanism design for information systems, and digital twin modeling of cyberinfrastructure.
Paper posits these applications as prospective uses of the framework (argumentative/speculative; no empirical evaluation reported in abstract).
high positive The Design and Composition of Structural Causal Decision Pro... usefulness for policy simulation, mechanism design, and digital twin modeling
SCDPs are capable of modeling variable discounting, a tool used widely in social scientific modeling.
Paper states the capability as part of SCDP definition and examples (theoretical claim).
high positive The Design and Composition of Structural Causal Decision Pro... modeling of variable discounting
An SCDP can endogenously model the memory-formation process and is thus useful for modeling resource‑rational agents in dynamic settings.
Paper asserts SCDP can represent memory-formation endogenously and discusses application to resource-rational agents (theoretical modeling capability).
high positive The Design and Composition of Structural Causal Decision Pro... ability to model endogenous memory formation / resource-rational agents
SCDPs are strictly more expressive than POMDPs because they do not assume rational belief formation.
Comparative expressiveness claim stated in the paper; supported by theoretical argument or formal separation result (paper text states the claim explicitly).
high positive The Design and Composition of Structural Causal Decision Pro... expressiveness relative to POMDPs (ability to represent non-rational belief form...
SCDPs inherit the composition properties of SCDMs (i.e., SCDPs benefit from SCDM composability).
Logical consequence argued in the paper from SCDP being constructed from SCDMs; likely supported by formal argumentation in the text.
high positive The Design and Composition of Structural Causal Decision Pro... inheritance of composability by SCDPs
A Structural Causal Decision Process (SCDP) is defined as a recurring SCDM with a discount variable.
Formal definition introduced in the paper (theoretical definition).
high positive The Design and Composition of Structural Causal Decision Pro... definition of SCDP as recurring SCDM with discounting
SCDMs have a well-defined and computationally useful property of composability.
Paper states and demonstrates ("We show") composability property — presumably via formal proofs or constructive arguments in the text (theoretical proofs/exposition).
high positive The Design and Composition of Structural Causal Decision Pro... composability of causal decision models
SCDMs can have open root variables for which no probability distribution or structural equation is given.
Model definitions in the paper explicitly allow open root variables (theoretical description).
high positive The Design and Composition of Structural Causal Decision Pro... support for open root variables in model formalism
In SCDMs, agent decisions can be constrained by their causal antecedents (i.e., decisions can be constrained by their causal parents).
Model specification and definitions in the paper describing constraints on decisions as part of SCDM structure (theoretical construction).
high positive The Design and Composition of Structural Causal Decision Pro... decision constraints by causal antecedents
Structural Causal Decision Models (SCDMs) expand on Structural Causal Influence Models by explicitly representing the causal relationships between model variables and the payoffs of agent decisions.
Formal model development and comparison to existing SCIMs provided in the paper (theoretical definitions and arguments).
high positive The Design and Composition of Structural Causal Decision Pro... explicit representation of causal relationships between variables and payoffs
We present two new classes of causal models of decision-making agents: Structural Causal Decision Models (SCDMs) and Structural Causal Decision Processes (SCDPs).
Paper introduces formal definitions for two model classes and describes their properties in the text (theoretical exposition).
high positive The Design and Composition of Structural Causal Decision Pro... introduction of new model classes (SCDMs and SCDPs)
These findings provide insights for designing flexible yet reliable constraint-based workflows.
Synthesis and discussion of study results and technical evaluation in paper's conclusion.
high positive U-Define: Designing User Workflows for Hard and Soft Constra... design guidance for constraint-based workflows
User-defined constraint types improve user satisfaction.
Reported user study measures showing higher satisfaction for participants using U-Define compared to baselines (no sample size or numeric effects provided).
high positive U-Define: Designing User Workflows for Hard and Soft Constra... user satisfaction (self-reported)
User-defined constraint types improve performance.
Reported results from user studies and/or technical evaluation indicating better task performance when users can set hard/soft constraint types (no numeric effect size or sample size in excerpt).
high positive U-Define: Designing User Workflows for Hard and Soft Constra... performance (task success / quality of generated plans)
User-defined constraint types improve perceived usefulness.
Results from the reported user studies comparing U-Define (user-defined constraint types) to baselines; based on participant responses and measures of perceived usefulness (sample sizes/details not provided in excerpt).
high positive U-Define: Designing User Workflows for Hard and Soft Constra... perceived usefulness (user-reported)
U-Define verifies hard constraints using formal model checking and verifies soft constraints using an LLM-as-judge evaluation.
Description of the complementary verification methods employed in the U-Define system (technical design/implementation).
high positive U-Define: Designing User Workflows for Hard and Soft Constra... verification of constraint types (hard via model checking, soft via LLM evaluati...
We present U-Define, a system that lets users define constraints in natural language and categorize them as either hard rules that must not be violated or soft preferences that allow flexibility.
System implementation and description in paper (design and implementation of U-Define).
high positive U-Define: Designing User Workflows for Hard and Soft Constra... ability to specify constraints (natural-language input and categorization into h...
KOs transform verification economics: what was previously too costly to verify becomes feasible, enabling accumulated human validation to improve reliability over time.
Theoretical claim about economic and cumulative effects of adopting KOs; no cost-benefit analysis, pilot results, or quantitative evidence reported in the paper.
high positive Reliable AI Needs to Externalize Implicit Knowledge: A Human... cost-effectiveness of verification and cumulative improvement in AI reliability
We propose Knowledge Objects (KOs) — structured artifacts that externalize implicit knowledge into forms humans can inspect, verify, and endorse.
Proposed solution described in the paper; conceptual design and intended properties presented, without reported deployments, trials, or empirical evaluation.
high positive Reliable AI Needs to Externalize Implicit Knowledge: A Human... externalization and human verifiability of implicit knowledge via KOs
Evaluating AI applications in actual multi-turn interactions with human users, looking at usability and satisfaction besides accuracy, provides added value compared to focusing on benchmark performance only.
Argument/interpretation in the paper based on the study's multi-turn human-in-the-loop evaluation showing differences between objective performance gains and participant perceptions.
high positive Seeking Information with RAG-Assistants: Does Model Size Mat... evaluation methodology value (usability, satisfaction, accuracy)
Hybrid systems (human + RAG assistant) are beneficial in information-seeking scenarios.
Conclusion drawn from the experiment showing human-AI collaboration outperforms model-only baselines across model sizes in a realistic multi-turn information-seeking task with N=112 participants.
high positive Seeking Information with RAG-Assistants: Does Model Size Mat... task performance in information-seeking
The performance gain of human-AI collaboration over the model-only baselines is significant, irrespective of model size.
Reported results from the experimental comparison across conditions and three model sizes (3B, 8B, 70B) with N=112 participants; paper states the performance gain is significant across sizes (no numeric effect sizes or p-values provided in the excerpt).
high positive Seeking Information with RAG-Assistants: Does Model Size Mat... task accuracy / performance
The framework addresses AI-specific challenges including model versioning, human-AI interaction dynamics, contamination and spillover effects, and equitable impact assessment.
Paper lists and provides guidance on AI-specific methodological issues (model versioning, interaction dynamics, contamination/spillover, equity). This is a descriptive claim about topics the framework covers, not an empirical evaluation of solutions.
high positive Principles and Guidelines for Randomized Controlled Trials i... coverage of AI-specific methodological challenges in evaluation guidelines
The framework implements a graded transparency and repeatability framework.
Paper extends TOP-guideline-derived transparency principle into a graded scheme for transparency and repeatability; described as an operational feature of the proposed framework.
high positive Principles and Guidelines for Randomized Controlled Trials i... graded transparency and repeatability practices for AI RCTs
The framework integrates heterogeneity analysis and practical significance assessment.
Paper reports inclusion of guidance on analyzing heterogenous treatment effects and assessing practical significance; presented as part of guidelines rather than tested across datasets.
high positive Principles and Guidelines for Randomized Controlled Trials i... inclusion of heterogeneity and practical significance analysis in evaluation pra...
The framework formalizes causal inference through RCT methodology for AI contexts.
Paper states adoption of randomized controlled trial methods and causal inference framing for AI impact evaluation; described as methodological proposition rather than validated application.
high positive Principles and Guidelines for Randomized Controlled Trials i... use of RCTs to support causal inference in AI evaluations
Our framework extends prior work by centering evaluation on human performance rather than model output alone.
Paper claims a conceptual shift: focus on human performance metrics; supported by argumentative rationale and literature references rather than empirical demonstration.
high positive Principles and Guidelines for Randomized Controlled Trials i... focus of evaluation metrics (human performance vs. model output)
The principles and guidelines serve three key roles for AI evaluation RCTs: a design tool for planning studies, an evaluation rubric for assessing existing work, and a blueprint for standard setting as the field converges on norms.
Paper's stated intended uses/positioning of the framework; presented as roles in the discussion/positioning section rather than empirically validated roles.
high positive Principles and Guidelines for Randomized Controlled Trials i... utility of the framework in planning, evaluating, and standard-setting
We operationalize all five principles into 33 guidelines adapted for AI evaluation RCT contexts, expressed as requirements with rationales, implementation instructions, and evidence bases.
Paper reports a concrete output: 33 guidelines derived from the five principles, with each guideline presented as requirement + rationale + implementation instructions + evidence base (documented in paper content).
high positive Principles and Guidelines for Randomized Controlled Trials i... availability of operational guidelines for AI RCTs
The paper adopts the (Shadish et al., 2002) four-validity framework and extends it with a fifth principle on transparency, repeatability, and verification adapted from the Transparency and Openness Promotion (TOP) Guidelines (Center for Open Science, 2025).
Explicit methodological choice described in the paper: adoption of Shadish et al. four-validity framework and addition of a transparency/repeatability principle based on TOP Guidelines; documented in the text as design decision.
high positive Principles and Guidelines for Randomized Controlled Trials i... methodological framework / validity criteria
The framework draws on established experimental practices from disciplines with established RCT traditions, including software engineering, economics, clinical and health sciences, and psychology.
Paper reports literature review and cross-disciplinary synthesis as the methodological foundation for the framework (references to those disciplines). No empirical cross-disciplinary experiment reported.
high positive Principles and Guidelines for Randomized Controlled Trials i... methodological comprehensiveness / interdisciplinary grounding
This work establishes a foundational framework for standardizing AI evaluation RCTs (sometimes called human uplift studies).
Paper's stated contribution: development of a conceptual framework integrating RCT design principles for AI evaluation. Based on literature synthesis and methodological argumentation rather than empirical testing.
high positive Principles and Guidelines for Randomized Controlled Trials i... standardization of AI evaluation RCTs / evaluation methodology
The paper introduces a Specification Governance Model (SGM), grounded in Transaction Cost Economics, and provides a practical governance decision guide.
Conceptual/modeling contribution described in the paper: SGM grounded in TCE with an applied decision guide (theoretical plus prescriptive).
high positive The Productivity-Reliability Paradox: Specification-Driven G... governance decision-making for specification practices
The paper proposes the AI-Augmented Methodology Taxonomy (AAMT), classifying six methodologies under three AI integration tiers.
Conceptual contribution: taxonomy introduced and described in the paper (six methodologies, three tiers).
high positive The Productivity-Reliability Paradox: Specification-Driven G... existence and classification of methodologies (taxonomic contribution)
Telemetry across 10,000+ developers shows a 98% increase in pull requests.
Observational telemetry data aggregated across >10,000 developers reported in the paper; metric reported is percent increase in pull request count.
high positive The Productivity-Reliability Paradox: Specification-Driven G... number of pull requests (pull_request_count)
Controlled studies report 20-56% productivity gains on well-scoped tasks.
Aggregate of multiple controlled experimental studies cited in the paper (2022–2026); reported as observed productivity improvements on well-scoped tasks in those studies. Specific study-level sample sizes not reported in the claim text.
Practical properties for Bayesian control that fit modern agentic AI systems and human-AI collaboration can be articulated, and calibrated beliefs plus utility-aware policies can improve agentic AI orchestration (illustrated via concrete examples and design patterns)
Paper provides articulated properties, examples, and design patterns but no empirical validation; claims of improvement are illustrated conceptually.
high positive Position: agentic AI orchestration should be Bayes-consisten... improvement in agentic AI orchestration from calibrated beliefs and utility-awar...
Coherent decision-making requires Bayesian principles at the orchestration level of the agentic system, not necessarily the LLM agent parameters
Central prescriptive claim of the position paper; supported by conceptual argumentation and illustrative examples rather than empirical tests.
high positive Position: agentic AI orchestration should be Bayes-consisten... coherence of decision-making in agentic systems as a function of orchestration-l...
Bayesian decision theory provides a framework for agentic systems that can help to maintain beliefs over task-relevant latent quantities, to update these beliefs from observed agentic and human-AI interactions, and to choose actions
Argumentative/theoretical claim in the position paper; illustrated with conceptual examples and design patterns rather than empirical evaluation.
high positive Position: agentic AI orchestration should be Bayes-consisten... decision quality of agentic control via belief maintenance and updating
Many high-value deployments rely on decisions under uncertainty (for example, which tool to call, which expert to consult, or how many resources to invest)
Stated as a motivating observation in the paper; no quantitative data or sample provided.
high positive Position: agentic AI orchestration should be Bayes-consisten... prevalence of decision-under-uncertainty requirements in high-value deployments
LLMs excel at predictive tasks and complex reasoning tasks
Asserted in the paper's opening motivation; no empirical evaluation or sample reported in the paper itself.
high positive Position: agentic AI orchestration should be Bayes-consisten... LLM performance on predictive and reasoning tasks
Qiushi Engine performed thousands of LLM-mediated reasoning, measurement and revision actions during its investigations (e.g., 3,242 LLM calls, 1,242 tool calls).
Operational logs and activity counts reported in the paper: 145.9 million tokens, 3,242 LLM calls, 1,242 tool calls, 163 research notes, 44 scripts.
high positive End-to-end autonomous scientific discovery on a real optical... scale of automated research activity (counts of LLM calls, tool calls, notes, sc...
Qiushi Engine combines nonlinear research phases, Meta-Trace memory and a dual-layer architecture to maintain adaptive and stable research trajectories across long-horizon investigations.
System architecture and methods section describing nonlinear research phases, Meta-Trace memory, and dual-layer architecture; demonstrated operation across long-horizon tasks in experiments (thousands of LLM and tool calls).
high positive End-to-end autonomous scientific discovery on a real optical... ability to maintain adaptive and stable research trajectories over long-horizon ...
The AI-discovered optical bilinear mechanism suggests a route towards high-speed, energy-efficient optical hardware for pairwise computation.
Interpretive claim based on the structural analogy between the discovered optical bilinear interaction and Transformer attention; conceptual argument provided in the paper rather than measured hardware speed or energy benchmarks.
high positive End-to-end autonomous scientific discovery on a real optical... potential for high-speed, energy-efficient optical hardware (conceptual implicat...
In an open-ended study (145.9 million tokens, 3,242 LLM calls, 1,242 tool calls, 163 research notes and 44 scripts), Qiushi Engine proposes and experimentally validates an optical bilinear interaction, a physical mechanism structurally analogous to a core operation in Transformer attention.
Open-ended experimental study reported in the paper with the listed activity metrics (145.9M tokens, 3,242 LLM calls, etc.); experimental investigation and measurements presented claiming validation of optical bilinear interaction and drawing structural analogy to Transformer attention's pairwise operation.
high positive End-to-end autonomous scientific discovery on a real optical... experimental validation of an optical bilinear interaction mechanism
Qiushi Engine autonomously reproduces a published transmission-matrix experiment on a non-original platform.
Experimental reproduction reported in the paper; description of executing the published transmission-matrix experiment using the Qiushi Engine on a different (non-original) optical platform and presenting measured results comparing to published experiment.
high positive End-to-end autonomous scientific discovery on a real optical... successful reproduction of a published transmission-matrix experiment (experimen...
Qiushi Discovery Engine is an LLM-based agentic system for end-to-end autonomous scientific discovery on a real optical platform.
Description and implementation of the Qiushi Engine combining LLM-based agentic control with an optical experimental platform; system design and end-to-end experiments reported in the paper (no randomized trial; system demonstration).
high positive End-to-end autonomous scientific discovery on a real optical... existence and operation of an end-to-end autonomous LLM-driven discovery system ...
The practical aim is to help strategic leaders and system designers recognize the configuration at work, notice when it shifts, and judge whether it fits the decision before them.
Stated aim/objective of the paper (normative guidance; conceptual).
high positive Leading Across the Spectrum of Human-AI Relationships: A Con... leaders' capacity to detect configuration, detect shifts, and assess fitness of ...
The framework introduces 'co-adaptability'—the capacity of a configuration to improve as human and non-human participants adjust together—and situates it within 'heterogeneous teaming' where participants may vary by number, substrate, model architecture, capability, speed, memory, and form of participation.
Conceptual/theoretical introduction of new constructs (co-adaptability and heterogeneous teaming) in the paper; definitional rather than empirical.
high positive Leading Across the Spectrum of Human-AI Relationships: A Con... capacity for joint improvement through adaptation between human and AI participa...
The five positions serve as landmarks that help leaders recognize configurations as they layer, drift, or change in a single decision.
Normative/conceptual claim supported by the framework; no empirical validation or sample provided in the excerpt.
high positive Leading Across the Spectrum of Human-AI Relationships: A Con... leaders' ability to recognize shifting decision configurations