The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6491 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
At a 20x compression ratio, DPM improves reasoning coherence by +0.53 (Cohen's h=1.13, p=0.0034) compared to summarization-based memory (paired permutation, n=10).
Paired permutation test over 10 cases at a 20x compression ratio; reported effect +0.53 with Cohen's h=1.13 and p=0.0034.
high positive Stateless Decision Memory for Enterprise AI Agents reasoning coherence
At a 20x compression ratio, DPM improves factual precision by +0.52 (Cohen's h=1.17, p=0.0014) compared to summarization-based memory (paired permutation, n=10).
Paired permutation test over 10 cases at a 20x compression ratio; reported effect +0.52 with Cohen's h=1.17 and p=0.0014.
high positive Stateless Decision Memory for Enterprise AI Agents factual precision
On ten regulated decisioning cases at three memory budgets, DPM matches summarization-based memory at generous budgets and substantially outperforms it when the budget binds.
Empirical evaluation on 10 decisioning cases across three memory budgets; comparison between DPM and summarization-based memory as reported in the paper (n=10).
high positive Stateless Decision Memory for Enterprise AI Agents relative performance (match/outperform) of DPM vs summarization-based memory acr...
We propose Deterministic Projection Memory (DPM): an append-only event log plus one task-conditioned projection at decision time.
Method/architectural proposal described in the paper.
high positive Stateless Decision Memory for Enterprise AI Agents architecture design (DPM specification)
Presumptuousness in legal AI is systematic but addressable, and addressing it is a necessary step towards systems that reliably support, rather than supplant, human judgment wherever decisions must await sufficient evidence.
Synthesis conclusion in paper based on the benchmark experiments, comparisons across prompting methods, and SPEC results.
high positive Learning When Not to Decide: A Framework for Overcoming Fact... reliability of AI systems to support human judgment under insufficient evidence ...
SPEC achieves 89% overall accuracy, while appropriately deferring when evidence is insufficient.
Empirical evaluation of SPEC reported in paper: overall accuracy reported as 89% and behavior of proper deferral on insufficient-evidence cases.
high positive Learning When Not to Decide: A Framework for Overcoming Fact... overall accuracy and appropriate deferral on insufficient-evidence cases
We introduce SPEC (Structured Prompting for Evidence Checklists), a structured framework requiring explicit identification of missing information before any determination.
Methodological contribution described in paper: new prompting/framework (SPEC) that enforces explicit missing-information identification prior to decision.
high positive Learning When Not to Decide: A Framework for Overcoming Fact... framework implementation that forces evidence-checklist and missing-information ...
Through a collaboration with the Colorado Department of Labor and Employment, we secured access to official training materials and guidance to design a novel benchmark that systematically varies information completeness.
Methodological description in paper: collaboration with state agency and dataset/benchmark construction using official training materials and guidance.
high positive Learning When Not to Decide: A Framework for Overcoming Fact... creation of a benchmark varying information completeness
Long-term prospects of agentic AI include catalyzing accelerated innovation in physical design via autonomous algorithm discovery, continuous tool improvement, and closed-loop learning from large design corpora.
Forward-looking conclusion in the paper; framed as the authors' projection based on survey synthesis rather than as an empirically demonstrated outcome in the abstract.
high positive Invited: Agentic AI for Physical Design R&D: Status and Pros... autonomous algorithm discovery, continuous tool improvement, closed-loop learnin...
Interfaces between agentic systems and traditional EDA frameworks are a key area of focus and enable tighter integration of agent capabilities into existing design workflows.
Survey highlights interfaces between agents and EDA frameworks as a focus area; claim is descriptive of research direction rather than reporting empirical outcomes.
high positive Invited: Agentic AI for Physical Design R&D: Status and Pros... development and importance of interfaces between agents and EDA frameworks
Autonomous agents can explore heuristic spaces for placement, routing, and partitioning, enabling autonomous exploration of design heuristics.
Presented as an emphasized capability/area of research in the survey; the abstract asserts this possibility but does not report empirical benchmarks or sample sizes.
high positive Invited: Agentic AI for Physical Design R&D: Status and Pros... autonomous exploration of heuristic spaces (placement, routing, partitioning)
Tool-integrated agents can be used for algorithm evolution, debugging, and workflow automation in physical design R&D.
Paper emphasizes this as a primary area of application in the survey; rationale and examples are discussed but no quantitative trial sizes are given in the abstract.
high positive Invited: Agentic AI for Physical Design R&D: Status and Pros... use of agents for algorithm evolution, debugging, and workflow automation
Agentic AI systems can comprehend user specifications, modify code, run EDA tools, analyze results, perform multi-step reasoning, and iteratively refine design heuristics—unlike earlier ML uses that focused narrowly on prediction or optimization subroutines.
Descriptive claim in the paper contrasting agentic AI capabilities with earlier ML approaches; presented as an overview of functional capabilities rather than empirical measurement.
high positive Invited: Agentic AI for Physical Design R&D: Status and Pros... breadth of tasks agentic AI systems can perform (spec comprehension, code modifi...
Recent advances in large language models (LLMs) and tool-using autonomous agents present new opportunities for accelerating research and development in physical design.
Stated as a central thesis in the paper's abstract/survey; based on the authors' synthesis of recent advances and emerging applications (no empirical sample or quantified evaluation reported in the abstract).
high positive Invited: Agentic AI for Physical Design R&D: Status and Pros... acceleration of research and development in physical design
The framework is applied to Canada's 2025-2026 national AI Strategy consultation with n = 5,253 respondents across two independent policy topics.
Empirical application reported in the paper; dataset description gives sample size and two policy topics.
high positive Participatory provenance as representational auditing for AI... sample and context for empirical evaluation
This paper introduces 'participatory provenance': a measurement framework grounded in optimal transport theory, causal inference and semantic analysis that tracks how individual public submissions are transformed, filtered or lost through AI-mediated summarization.
Methodological contribution described in the paper (framework design combining optimal transport, causal inference, semantic analysis).
high positive Participatory provenance as representational auditing for AI... ability to track transformations/filtration/loss of individual submissions
AI systems currently provide more consistent fraud warnings than lay humans in an identical advisory role.
Aggregate comparison from the preregistered experiment showing humans had nonzero endorsement and higher suppression rates while all tested LLMs showed 0% endorsements and lower suppression under pressure (human n=1,201; AI conversations n=3,360).
high positive Large Language Models Outperform Humans in Fraud Detection a... consistency of fraud warnings between advisors (LLMs vs. lay humans)
Human advisors endorsed fraudulent investments at baseline rates of 13-14%.
Human benchmark of 1,201 participants run in the preregistered experiment; reported baseline endorsement rates for fraudulent scenarios.
high positive Large Language Models Outperform Humans in Fraud Detection a... baseline endorsement rate of fraudulent investments by human advisors
Motivated investor framing did not suppress AI fraud warnings; if anything, it marginally increased them.
Preregistered experiment across seven leading LLMs and twelve investment scenarios; 3,360 AI advisory conversations analyzed comparing motivated vs. baseline investor framings.
high positive Large Language Models Outperform Humans in Fraud Detection a... frequency of AI fraud warnings under motivated investor framing
Under these conditions (alignment of forces and AI-driven ideation cost reductions), PIM offers a framework for organising governed discovery in real time and provides the methodological foundation for later applied work.
The paper presents PIM as a proposed framework and positioning statement for future applied research and implementations (theoretical proposal; no applied trials reported).
high positive Probabilistic Innovation Methodology: A Scientific Methodolo... feasibility of using PIM to organise real-time governed discovery
Organised attacks on complex problems can generate an epistemic mode transition: a shift from predominantly Knightian uncertainty toward probabilistically characterisable innovation dynamics as relevant structures become more visible, decomposed, coordinated, and testable.
The paper states and formalises this methodological claim within PIM as a central proposition (theoretical argumentation; no empirical validation reported).
high positive Probabilistic Innovation Methodology: A Scientific Methodolo... degree of uncertainty characterization (Knightian vs probabilistic)
When problem-relevant causal, informational, and coordinative forces become sufficiently aligned, the epistemic character of search changes and open-ended uncertainty can be progressively transformed into structured probabilistic search.
The claim is presented as the central theoretical argument and formalised within the PIM conceptual framework (theoretical/model-based argumentation; no empirical sample).
high positive Probabilistic Innovation Methodology: A Scientific Methodolo... epistemic character of search (shift from Knightian uncertainty to probabilistic...
The same user study (n=32) reports improvements in subjective measures including fluency and user preference for RAPIDDS over non-adaptive systems.
User study (n=32) reporting subjective questionnaire/ratings (fluency, preference) comparing RAPIDDS vs non-adaptive baselines.
high positive Multi-Cycle Spatio-Temporal Adaptation in Human-Robot Teamin... subjective fluency and user preference
A user study (n=32) shows significant plan improvement compared to non-adaptive systems across objective metrics such as efficiency and proximity.
User study reported in paper with sample size n=32 comparing RAPIDDS to non-adaptive systems on objective metrics (efficiency, proximity); significance claimed.
high positive Multi-Cycle Spatio-Temporal Adaptation in Human-Robot Teamin... efficiency and proximity (objective plan metrics)
An ablation study in simulation and a physical robot scenario demonstrates the importance of dual (task + motion) adaptation.
Ablation experiments reported in paper (simulation and physical robot experiments comparing full RAPIDDS to ablated variants).
high positive Multi-Cycle Spatio-Temporal Adaptation in Human-Robot Teamin... plan performance when removing components (effect of dual adaptation)
RAPIDDS jointly adapts task schedules and steers diffusion models of robot motions to maximize efficiency and minimize proximity accounting for individualized models.
Algorithmic method described in paper combining schedule optimization with motion steering (method section).
high positive Multi-Cycle Spatio-Temporal Adaptation in Human-Robot Teamin... efficiency and proximity of joint plans
At the country level, digitalisation and workplace training provision steepen the exposure–adoption gradient.
Country-level heterogeneity analysis using the 2024 EWCS (35 countries) linking national measures of digitalisation and prevalence of workplace training to stronger occupational exposure–adoption relationships.
high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI (interaction with exposure)
Individual skills, non-routine cognitive job content within occupations, and employee say in organisational decisions steepen the exposure–adoption gradient.
Interaction and stratified analyses from the 2024 EWCS showing stronger exposure–adoption associations among workers with higher individual skills, more non-routine cognitive job content (within occupations), and greater employee influence over organisational decisions; sample >36,600 workers.
high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI (interaction with exposure)
Occupational exposure strongly predicts uptake.
Associational/regression analysis using the 2024 EWCS linking occupation-level measures of AI exposure to individual-level self-reported adoption; sample >36,600 workers across 35 countries.
high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI
Adoption averages 12% but ranges from under 3% to 25% across countries.
Descriptive analysis of the 2024 European Working Conditions Survey (EWCS), sample of more than 36,600 workers in 35 countries; country-level tabulations of self-reported generative AI adoption.
high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI
ClawNet enables multiple users to collaborate securely through their respective agents.
Capability claim about the instantiated system (authors assert that ClawNet enables secure multi-user collaboration; excerpt contains no empirical security evaluation or user study).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... secure multi-user collaboration enabled by agent-mediated interactions
We instantiate this paradigm in ClawNet, an identity-governed agent collaboration framework that enforces identity binding and authorization verification through a central orchestrator.
Implementation claim: authors state they built ClawNet as an instantiation of their paradigm (paper describes framework/architecture; no experimental evaluation included in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... existence of an implemented framework (ClawNet) enforcing identity binding and a...
Action-level accountability logs every operation against its owner's identity and authorization, ensuring full auditability.
Design claim describing an accountability primitive (paper asserts logging and auditability as a property; no audit or verification evidence shown in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... auditability of agent actions (logging tied to owner identity/authorization)
Scoped authorization enforces per-identity access control and escalates boundary violations to the owner.
Design/specification claim describing the scoped authorization governance primitive in the proposed paradigm (no empirical or security evaluation provided in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... access control enforcement and escalation behavior
The paradigm rests on three governance primitives: (1) a layered identity architecture that separates a Manager Agent from multiple context-specific Identity Agents; the Manager Agent holds global knowledge but is architecturally isolated from external communication.
Architectural/design claim describing the proposed layered identity primitive (presentation of design; no empirical validation in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... identity architecture and information flow constraints
We propose a human-symbiotic agent paradigm in which each user owns a permanently bound agent system that collaborates on the owner's behalf, forming a network whose nodes are humans rather than agents.
Design proposal / conceptual architecture presented in the paper (no large-scale deployment or empirical evaluation described in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... structure of agent networks (human-centric vs agent-centric) and delegation mode...
The next frontier for AI agents lies not in stronger individual capability, but in the digitization of human collaborative relationships.
Normative/strategic claim advanced by the authors as the central thesis (conceptual argument, no empirical test reported).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... focus of AI-agent development (individual capability vs collaboration digitizati...
Human productivity rests on the social and organizational relationships through which people coordinate, negotiate, and delegate.
Theoretical/argumentative claim presented as background motivation (conceptual reasoning, citation not provided in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... human productivity as mediated by social/organizational relationships
Time Series Augmented Generation (TSAG) enables LLM agents to delegate quantitative tasks to verifiable external tools.
Description of TSAG framework in paper stating delegation mechanism to external verifiable tools for quantitative computations.
high positive Time Series Augmented Generation for Financial Applications delegation capability to external tools
We publicly release the evaluation framework and empirical insights to foster standardized research on reliable financial AI.
Paper states that the framework, benchmark, and empirical results are released publicly by the authors.
high positive Time Series Augmented Generation for Financial Applications public release of resources
The results demonstrate that capable agents can achieve near-perfect tool-use accuracy with minimal hallucination, validating the tool-augmented paradigm.
Empirical results from the authors' experiments on the 100-question benchmark across multiple agents; paper states agents achieve 'near-perfect' tool-use accuracy and 'minimal' hallucination.
high positive Time Series Augmented Generation for Financial Applications tool-use accuracy; hallucination rate
We apply this methodology in a large-scale empirical study using our framework, Time Series Augmented Generation (TSAG), where an LLM agent delegates quantitative tasks to verifiable, external tools.
Paper reports applying the TSAG framework in an empirical study in which agents call external tools to perform quantitative computations; described as 'large-scale' and implemented by the authors.
high positive Time Series Augmented Generation for Financial Applications use of external/verifiable tools by LLM agents
We introduce a novel evaluation methodology and benchmark designed to rigorously measure an LLM agent's reasoning for financial time-series analysis.
Paper describes a new methodology and benchmark (Time Series Augmented Generation, TSAG) developed by the authors for evaluating LLM reasoning on financial time-series tasks.
high positive Time Series Augmented Generation for Financial Applications existence of a new evaluation methodology / benchmark
Effective evaluation-driven loop scaling is a central axis for advancing LLM-driven scientific discovery, and SimpleTES provides a simple yet practical framework for realizing these gains.
High-level claim supported by the aggregate experimental results and discussion in the paper.
high positive Evaluation-driven Scaling for Scientific Discovery impact of scaling evaluation-driven discovery loops on LLM-driven scientific dis...
When post-trained on successful trajectories, models not only improve efficiency on seen problems but also generalize to unseen problems, discovering solutions that base models fail to uncover.
Experiments in which models were post-trained on successful SimpleTES trajectories and evaluated on both seen and unseen problems (paper claim of improved efficiency and generalization).
high positive Evaluation-driven Scaling for Scientific Discovery post-training efficiency on seen problems and generalization to unseen problems ...
SimpleTES produces trajectory-level histories that naturally supervise feedback-driven learning.
Methodological claim and supporting experiments where SimpleTES generates solution trajectories that are then used as supervision for learning.
high positive Evaluation-driven Scaling for Scientific Discovery availability and usefulness of trajectory-level histories for supervision
We discovered new Erdos minimum overlap constructions that surpass the best-known results.
Reported novel combinatorial constructions (Erdos minimum overlap) in the experiments that improve on prior best-known results.
high positive Evaluation-driven Scaling for Scientific Discovery quality of Erdos minimum overlap constructions (best-known benchmarks)
We designed quantum circuit routing policies that reduce gate overhead by 24.5%.
Experimental results reported for quantum circuit routing tasks showing a 24.5% reduction in gate overhead when using SimpleTES-designed policies.
high positive Evaluation-driven Scaling for Scientific Discovery quantum circuit gate overhead
We sped up the widely used LASSO algorithm by over 2x.
Benchmarking experiment reported in the paper comparing LASSO runtime/performance with and without SimpleTES (paper states >2x speedup).
high positive Evaluation-driven Scaling for Scientific Discovery LASSO algorithm runtime / speed
SimpleTES consistently outperforms both frontier-model baselines and sophisticated optimization pipelines.
Comparative experimental evaluation vs. frontier-model baselines and optimization pipelines across the reported problems (paper claim).
high positive Evaluation-driven Scaling for Scientific Discovery performance relative to baselines (solution quality / discovery success)