The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6491 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
We introduce ImplicitMemBench, the first systematic benchmark evaluating implicit memory through three cognitively grounded constructs.
Paper claim of introducing a new benchmark named ImplicitMemBench; it states novelty ('first systematic benchmark') and describes design around three constructs (Procedural Memory, Priming, Classical Conditioning).
Above the Accountability Horizon, distributed accountability mechanisms become necessary.
Derived implication from the Accountability Incompleteness Theorem and the paper's discussion of policy responses; theoretical argument rather than empirical evidence.
high positive The Accountability Horizon: An Impossibility Theorem for Gov... necessity of distributed accountability mechanisms conditional on compound auton...
Experiments on 3,000 synthetic collectives confirm all predictions with zero violations.
Reported simulation experiments: N = 3,000 synthetic Human-Agent Collectives evaluated against the theoretical predictions; reported outcome was zero violations of the predicted impossibility/conditions.
high positive The Accountability Horizon: An Impossibility Theorem for Gov... number of violations of the theoretical predictions (violations of impossibility...
Below the threshold (Accountability Horizon), legitimate frameworks exist, establishing a sharp phase transition between regimes where the four properties can and cannot be satisfied.
Constructive existence results and theoretical arguments in the paper showing frameworks that satisfy the axioms when compound autonomy is below the defined threshold.
high positive The Accountability Horizon: An Impossibility Theorem for Gov... existence/non-existence of legitimate accountability frameworks as a function of...
We introduce Human-Agent Collectives, a formalisation of joint human-AI systems where agents are modelled as state-policy tuples within a shared structural causal model.
Paper provides a formal model/definition called Human-Agent Collectives (mathematical formalisation and definitions).
high positive The Accountability Horizon: An Impossibility Theorem for Gov... formal representation of joint human-AI systems
Existing accountability frameworks for AI systems, legal, ethical, and regulatory, rest on a shared assumption: for any consequential outcome, at least one identifiable person had enough involvement and foresight to bear meaningful responsibility.
Stated as background assumption in the paper's introduction/abstract; supported by citation to prior legal/ethical/regulatory frameworks (normative claim about literature). No empirical test reported in this paper.
high positive The Accountability Horizon: An Impossibility Theorem for Gov... attributability of responsibility for consequential outcomes
Tiny sharing incentives improve models with weak cooperation.
Experimental intervention reported in the paper: adding small sharing incentives and observing improved cooperation among weakly-cooperative models (stated in abstract; no quantitative effect size or sample size provided there).
high positive More Capable, Less Cooperative? When LLMs Fail At Zero-Cost ... cooperation / collective performance under small incentive intervention
Explicit protocols double performance for low-competence models.
Experimental intervention reported in the paper: introducing explicit protocols in the multi-agent setup and observing a doubling of performance for low-competence models (stated in abstract; no sample size reported there).
high positive More Capable, Less Cooperative? When LLMs Fail At Zero-Cost ... model/team performance under explicit protocol intervention
OpenAI o3-mini reaches 50% of optimal collective performance.
Experimental measurement of collective performance for OpenAI o3-mini in the paper's multi-agent setup (value reported in abstract; no sample size provided there).
high positive More Capable, Less Cooperative? When LLMs Fail At Zero-Cost ... collective performance (percent of optimal group revenue)
The core thesis is alignment-through-accountability: if each agent is aligned with its human owner through the accountability chain, then the collective converges on behavior aligned with human intent -- without top-down rules.
Central theoretical thesis of the paper; presented as a hypothesis to be evaluated rather than as an empirically demonstrated result in the excerpt.
high positive AgentCity: Constitutional Governance for Autonomous Agent Ec... convergence of collective agent behavior to human intent via accountability chai...
We propose the Separation of Power (SoP) model, a constitutional governance architecture deployed on public blockchain that breaks this monopoly through three structural separations: agents legislate operational rules as smart contracts, deterministic software executes within those contracts, and humans adjudicate through a complete ownership chain binding every agent to a responsible principal.
Design proposal / governance architecture presented in the paper; the text asserts that the model 'breaks this monopoly' but provides no experimental results in the excerpt to validate that claim.
high positive AgentCity: Constitutional Governance for Autonomous Agent Ec... reduction/elimination of 'Logic Monopoly' via structural separations
Those incentivized for originality rely on the model more selectively for brainstorming, proofreading, and targeted edits.
Behavioral/usage measures from the RCT indicating task-level patterns of AI use (described qualitatively in excerpt; no quantitative task-level usage breakdown provided).
high positive Incentives shape how humans co-create with generative AI types of tasks for which AI is used (brainstorming, proofreading, targeted edits...
Participants rewarded for originality relative to peers produce collectively more diverse writing than those rewarded for quality alone.
Randomized assignment to incentive conditions (originality reward vs. quality reward) in the pre-registered RCT on a creative writing task (no sample size or numerical effect provided in excerpt).
high positive Incentives shape how humans co-create with generative AI collective diversity of writing
Early evidence has shown that generative AI can increase individual-level productivity.
Statement refers to prior literature/early studies (no specific study, sample size, or method reported in the excerpt).
high positive Incentives shape how humans co-create with generative AI individual-level productivity
Much of the business and management literature approaches artificial intelligence primarily as a technological capability that enhances efficiency and productivity.
Literature review / characterization of existing business and management literature cited in the paper; no quantitative synthesis or meta-analysis reported.
high positive Algorithmic Agency and the Posthuman Economy: Artificial Int... portrayal of AI in business literature as a capability that enhances efficiency ...
The paper documents best practices for iteratively generating tests to capture existing system behavior before model-assisted refactoring.
Methodological contributions in the paper: recommended workflow and practices for iterative test generation to lock down behavior prior to refactoring.
high positive AI-Assisted Unit Test Writing and Test-Driven Code Refactori... effectiveness of iterative test-generation workflow
The described workflow constrained refactoring changes and enabled model-assisted refactoring under developer supervision, with proposed code changes validated by passing tests.
Methodological description in the paper: iterative test generation to capture existing behavior, then model-assisted refactoring with developer oversight and test-based validation.
high positive AI-Assisted Unit Test Writing and Test-Driven Code Refactori... constrained refactoring changes (safety of refactoring)
The generated tests achieved up to 78% branch coverage in critical modules.
Measured branch coverage reported in the case study for critical modules after running the generated tests.
Using coding models, we generated nearly 16,000 lines of reliable unit tests in hours rather than weeks.
Single case study reported in the paper: automated unit test generation using coding models; reported aggregate output of generated tests and a qualitative time comparison (hours vs weeks).
high positive AI-Assisted Unit Test Writing and Test-Driven Code Refactori... unit test lines generated
This work demonstrates how energy considerations can be embedded directly into AI-assisted coding workflows, supporting developers as they engage with energy implications through actionable feedback.
Concluding claim based on the system implementation and evaluation described (benchmarks and controlled study).
high positive EcoAssist: Embedding Sustainability into AI-Assisted Fronten... feasibility of embedding energy considerations into AI-assisted coding workflows
EcoAssist reduced per-website energy by 13-16% on average.
Reported result from the benchmark evaluation of 500 websites (effect size reported as 13-16%).
high positive EcoAssist: Embedding Sustainability into AI-Assisted Fronten... per-website energy consumption
We introduce EcoAssist, an energy-aware assistant integrated into an IDE that analyzes AI-generated frontend code, estimates its energy footprint, and proposes targeted optimizations.
Description of the system introduced by the authors (implementation claim).
high positive EcoAssist: Embedding Sustainability into AI-Assisted Fronten... availability of an IDE-integrated, energy-aware assistant
AI assistance improves short-term performance on tasks (people do better while using the AI).
Randomized controlled trials (N = 1,222) showing better immediate task outcomes when participants used AI assistance.
high positive AI Assistance Reduces Persistence and Hurts Independent Perf... short-term task performance (immediate accuracy/quality while assisted by AI)
Analyses use fixed-effects regression and structural equation modeling (SEM) on panel data from OECD countries.
Methods statement in the paper indicating use of fixed-effects and SEM applied to OECD-country panel data.
high positive AI-Augmented Peer Review and Scientific Productivity: A Cros... methodological approach (fixed-effects regression and SEM)
This paper provides the first cross-country empirical validation of AI-augmented scientific evaluation systems.
Authors' stated novelty claim that prior work lacked cross-country empirical quantification and that their OECD panel study is the first such validation.
high positive AI-Augmented Peer Review and Scientific Productivity: A Cros... novelty / first empirical cross-country validation
A one standard deviation increase in AIRC is associated with an 18–25% increase in scientific productivity.
Reported point estimate/range from regression/SEM results linking a 1 SD change in the constructed AIRC to productivity outcomes in the OECD panel.
high positive AI-Augmented Peer Review and Scientific Productivity: A Cros... scientific productivity (percent change per 1 SD AIRC)
AI-assisted evaluation significantly enhances scientific productivity.
Fixed-effects regression and structural equation modeling (SEM) applied to panel data from OECD countries; reported association between AIRC and research output.
high positive AI-Augmented Peer Review and Scientific Productivity: A Cros... scientific productivity (research output)
We construct a novel AI Review Capability Index (AIRC).
Paper reports creation of a new composite index (AIRC) to measure national-level AI capability in peer review; constructed and applied to panel data from OECD countries.
high positive AI-Augmented Peer Review and Scientific Productivity: A Cros... AI Review Capability (AIRC) (index construction)
China's 'Global Community of Shared Future' white paper and Putin's 2024 Valdai address provide empirical evidence for an articulated alternative vision to the Western‑led global order.
Qualitative textual/readings of the cited official documents (the white paper and the Valdai address) used in the paper as empirical support; no quantitative content analysis or sample coding is reported.
high positive Theorising the Interregnum: existence of articulated alternative geopolitical vision in official documents
Technical workers' potential for progressive transformation lies not just in their strategic importance and specialized knowledge but in their ability to build solidarity across the broader ecosystem of AI labour while operating between otherwise incommensurable philosophical and infrastructural systems.
Normative/theoretical claim combining philosophical analysis (Chinese Marxism, Bauman) with empirical literature on hidden AI labour and infrastructure competition (Muldoon et al., 2024); offered as an interpretive synthesis rather than empirically validated causal finding.
high positive Theorising the Interregnum: capacity for progressive transformation via worker solidarity in AI labour ecosy...
Technical workers occupy a strategic position at the intersection of competing infrastructural systems and alternative visions of global order, making them potentially crucial actors in determining the outcome of the current interregnum.
Argumentative claim supported by secondary empirical literature cited in the paper (Muldoon, Graham, and Cant, 2024) on hidden labour supporting AI systems and on geopolitical competition over digital infrastructure; presented as qualitative/interpretive evidence rather than primary quantitative measurement.
high positive Theorising the Interregnum: technical workers' strategic influence over geopolitical/technical outcomes
The semi-core's challenge to Western hegemony creates unique conditions for systemic transformation.
The paper advances this as a theoretical argument synthesizing World‑Systems theory, Demirel (2024), Bauman's philosophical work, and interpretive readings of official Chinese and Russian documents; no quantitative causal test is reported.
high positive Theorising the Interregnum: potential for systemic transformation arising from semi‑core challenge
The emergence of a 'semi-core' is represented most prominently by China and Russia.
The paper cites Ege Demirel (2024) as the primary conceptual source and draws on textual evidence from China's 'Global Community of Shared Future' white paper and Putin's 2024 Valdai address; presented via World‑Systems theoretical framing and qualitative/discourse analysis.
high positive Theorising the Interregnum: emergence of a semi-core led by China and Russia
AI agents autonomously plan, invoke external tools, and execute multi-step action chains with reduced human involvement.
Definitional framing provided by the authors describing the technical/functional characteristics of 'AI agents' as used in the paper.
high positive AI Agents Under EU Law technical capability characteristics of AI agents (autonomous planning, tool inv...
The provider's foundational compliance task is an exhaustive inventory of the agent's external actions, data flows, connected systems, and affected persons.
Authors' recommendation/practical conclusion derived from the regulatory mapping (prescriptive guidance rather than empirical measurement).
high positive AI Agents Under EU Law recommended compliance practice (exhaustive inventory of actions, data flows, sy...
We propose a twelve-step compliance architecture and a regulatory trigger mapping connecting agent actions to applicable legislation.
Paper asserts it includes a proposed 12-step compliance architecture and a mapping between agent actions and regulatory triggers (explicit step count provided).
high positive AI Agents Under EU Law proposed compliance architecture (12 steps) and regulatory trigger mapping
We present a practical taxonomy of nine agent deployment categories mapping concrete actions to regulatory triggers.
Paper states it includes a taxonomy comprising nine deployment categories (explicit count provided).
high positive AI Agents Under EU Law taxonomy of agent deployment categories (count = 9)
This paper provides the first systematic regulatory mapping for AI agent providers integrating (a) draft harmonised standards under Standardisation Request M/613 to CEN/CENELEC JTC 21 as of January 2026, (b) the GPAI Code of Practice published in July 2025, (c) the CRA harmonised standards programme under Mandate M/606 accepted in April 2025, and (d) the Digital Omnibus proposals of November 2025.
Author claim about the paper's contribution and scope (novelty/first-of-its-kind mapping integrating specified standards and documents).
high positive AI Agents Under EU Law existence of an integrated, systematic regulatory mapping
AI agents - i.e. AI systems that autonomously plan, invoke external tools, and execute multi-step action chains with reduced human involvement - are being deployed at scale across enterprise functions ranging from customer service and recruitment to clinical decision support and critical infrastructure management.
Author assertion in the paper's introductory framing; no empirical sample size or quantified deployment statistics provided in the excerpt.
high positive AI Agents Under EU Law deployment/adoption of AI agents across enterprise functions
Rather than indiscriminate collection of context-relevant data, researchers and practitioners should adopt interactional practices to embed generative AI systems more appropriately into users' contexts of use.
Normative conclusion/provocation drawn from the paper's empirical findings and analysis of failure modes; presented as a recommendation (not an empirical effect; based on qualitative synthesis).
high positive Context Collapse: Barriers to Adoption for Generative AI in ... recommended design and deployment practices for contextual integration
Users deploy concrete strategies to address failures of generative AI systems to account for context.
Empirical observations from interviews describing user-devised workarounds and strategies; qualitative cases/examples (sample size not provided).
high positive Context Collapse: Barriers to Adoption for Generative AI in ... user practices and strategies for mitigating system-context misalignment
We hypothesize the emergent necessity of a 'Compliance Premium,' indicating wage resilience increasingly tied to risk-absorption capacity.
Hypothesis proposed by authors based on observed institutional/business risk differentials from HITL validation and OAI patterns; framed as a forward-looking interpretation rather than demonstrated empirical result.
high positive Bounded by Risk, Not Capability: Quantifying AI Occupational... wage resilience tied to compliance/risk-absorption capacity
Non-routine cognitive roles highly dependent on symbolic manipulation (e.g., Data Scientists) face unprecedented exposure, with OAI ≈ 0.70.
Reported OAI value for example occupation(s) (Data Scientists) derived from the algorithmic aggregation across DWAs; claim presented as a key empirical finding.
high positive Bounded by Risk, Not Capability: Quantifying AI Occupational... Relative Occupational Automation Index (OAI) for Data Scientists
We utilize a multi-agent LLM ensemble to score both technical feasibility and business risk for DWAs.
Method description: deployment of a multi-agent LLM ensemble to produce scores on technical feasibility and business risk per DWA. Specific ensemble composition and hyperparameters not provided in the excerpt.
high positive Bounded by Risk, Not Capability: Quantifying AI Occupational... LLM-derived technical feasibility and business risk scores
We introduce a Tech-Risk Dual-Factor Model that jointly scores technical feasibility and business risk to re-evaluate occupational exposure to LLMs.
Methodological contribution described in the paper (model specification). Implementation details described elsewhere in paper (see multi-agent scoring and aggregation), but claim itself is the introduction of the model.
high positive Bounded by Risk, Not Capability: Quantifying AI Occupational... joint technical feasibility and business risk scores
All code, infrastructure, and benchmark data are released to facilitate future research in realistic computer-use agents.
Statement of release in paper (availability claim).
high positive Gym-Anything: Turn any Software into an Agent Environment availability of code, infrastructure, and benchmark data
Applying the same auditing principle at test time — a separate VLM reviews completed trajectories and provides feedback — improves Gemini-3-Flash on CUA-World-Long from 11.5% to 14.0%.
Experimental result reported in paper: evaluation of Gemini-3-Flash with/without test-time VLM auditing on CUA-World-Long, reported scores 11.5% -> 14.0%.
high positive Gym-Anything: Turn any Software into an Agent Environment benchmark score (success rate) on CUA-World-Long
Distilling successful trajectories from the training split into a 2B vision-language model outperforms models 2× its size.
Modeling experiments reported in paper: distilled 2B VLM evaluated against larger models (2× size). Exact evaluation metrics and baseline model sizes not specified in excerpt.
high positive Gym-Anything: Turn any Software into an Agent Environment model performance on benchmark tasks (success metric unspecified in excerpt)
CUA-World-Long is a challenging long-horizon benchmark with tasks often requiring over 500 steps, far exceeding existing benchmarks.
Benchmark description in paper reporting typical task lengths ("often requiring over 500 steps") and comparison to existing benchmarks.
high positive Gym-Anything: Turn any Software into an Agent Environment task horizon measured in number of steps
The result is CUA-World, a collection of over 10K long-horizon tasks spanning domains from medical science and astronomy to engineering and enterprise systems, each configured with realistic data along with train and test splits.
Dataset release / creation claim specifying >10,000 tasks and train/test splits.
high positive Gym-Anything: Turn any Software into an Agent Environment number of long-horizon tasks and availability of realistic data and splits