The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
Artificial intelligence (AI) has a positive but weaker impact on sustainable development relative to digital transformation, reflecting its complementary and maturity-dependent role within the digital ecosystem.
Same System GMM regressions on panel of MENA economies (2010–2023) that include measures of AI and digital transformation; reported positive but smaller coefficient for AI.
Digital transformation is the primary driver of sustainable development in MENA economies, exerting a stronger and more consistent effect than AI.
Dynamic panel data analysis of MENA economies (2010–2023) using System GMM; reported comparative effect sizes of digital transformation vs. AI in regression results.
In the ICT industry, Tobin's Q significantly increased following AI adoption (heterogeneous positive effect).
Subgroup/heterogeneity analysis within the main sample (KOSDAQ firms 2018–2025), estimating the post-adoption effect of AI on Tobin's Q in firms classified as ICT.
high positive The Dynamic Causal Effects of Corporate AI Adoption on Profi... Tobin's Q (market value) in ICT-industry firms
The authors propose corresponding analytical extensions to the framework to address the three structural breaks in agentic systems.
Paper presents proposed analytical extensions (methodological proposals) tied to each identified structural break.
high positive Governed Auditable Decisioning Under Uncertainty: Synthesis ... availability of proposed analytical extensions for governance framework
Cross-architecture comparison reveals a governance coverage gradient: deterministic rule engines achieve full DES-property fillability.
Analytic cross-architecture comparison reported in the paper (comparative analysis across four architectures); deterministic rule engines identified as achieving 'full' fillability of DES-properties.
high positive Governed Auditable Decisioning Under Uncertainty: Synthesis ... DES-property fillability (completeness of governance evidence coverage)
The paper synthesizes an operational governance evidence framework composed of: structural accountability collapse diagnostics, decision trace schemas, evidence sufficiency measurement, and label-free monitoring, integrated into a chain.
Methodological contribution: authors construct and present a synthesized framework from those four components (conceptual/analytical synthesis).
high positive Governed Auditable Decisioning Under Uncertainty: Synthesis ... presence and structure of an operational governance evidence framework
The Barcelona Declaration offers a promising forum for boundary governance.
Policy recommendation pointing to an existing initiative (Barcelona Declaration) as a suitable forum; stated without empirical evaluation in the excerpt.
high positive Market Dynamics, Governance and Open Research Metadata in th... suitability of the Barcelona Declaration as a forum for boundary governance
Governance should calibrate the annulus, not abolish it: thin enough to serve research efficiently, wide enough to sustain innovation.
Normative policy recommendation from the authors; based on their conceptual framework rather than on empirical policy evaluation in the excerpt.
high positive Market Dynamics, Governance and Open Research Metadata in th... optimal governance calibration of the annulus balancing research efficiency and ...
Artificial intelligence reshapes the annulus by lowering barriers to basic structuring.
Conceptual claim in the paper; asserted as an effect of AI on metadata production without empirical estimates in the excerpt.
high positive Market Dynamics, Governance and Open Research Metadata in th... barriers to basic structuring of metadata
The proposed framework is intended to serve as a practical reference for engineering teams and decision-makers navigating enterprise LLM adoption.
Author statement of intent in the paper (qualitative claim about intended audience and utility).
high positive Buy Or Build? A Practitioner’s Framework for Large Language ... practical utility for engineering teams and decision-makers
The buy-versus-build decision should be viewed as a phased continuum: initial API adoption can give way to hybrid architectures as organizational maturity and requirements evolve.
Conceptual argument in the paper, illustrated by the Bills Converter experience (single-case narrative recommending phased/hybrid progression).
high positive Buy Or Build? A Practitioner’s Framework for Large Language ... recommended adoption pathway (phased/API→hybrid)
In the end-to-end development of the Bills Converter, the authors chose a closed-source, API-based approach over self-hosted or custom-built alternatives.
Case study: the Bills Converter system (single end-to-end project described in the paper).
high positive Buy Or Build? A Practitioner’s Framework for Large Language ... adoption decision (choice of architecture: API-based closed-source vs self-hoste...
This paper presents a multi-dimensional decision framework that synthesizes technical, financial, and strategic considerations into a coherent evaluation methodology for enterprise LLM adoption.
The paper is explicitly framed as presenting a decision framework; supported by conceptual synthesis and exposition within the manuscript (no reported quantitative validation).
high positive Buy Or Build? A Practitioner’s Framework for Large Language ... quality/usefulness of decision-making framework for enterprise LLM adoption
At the country level, digitalisation and workplace training provision steepen the exposure–adoption gradient.
Country-level heterogeneity analysis using the 2024 EWCS (35 countries) linking national measures of digitalisation and prevalence of workplace training to stronger occupational exposure–adoption relationships.
high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI (interaction with exposure)
Individual skills, non-routine cognitive job content within occupations, and employee say in organisational decisions steepen the exposure–adoption gradient.
Interaction and stratified analyses from the 2024 EWCS showing stronger exposure–adoption associations among workers with higher individual skills, more non-routine cognitive job content (within occupations), and greater employee influence over organisational decisions; sample >36,600 workers.
high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI (interaction with exposure)
Occupational exposure strongly predicts uptake.
Associational/regression analysis using the 2024 EWCS linking occupation-level measures of AI exposure to individual-level self-reported adoption; sample >36,600 workers across 35 countries.
high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI
Adoption averages 12% but ranges from under 3% to 25% across countries.
Descriptive analysis of the 2024 European Working Conditions Survey (EWCS), sample of more than 36,600 workers in 35 countries; country-level tabulations of self-reported generative AI adoption.
high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI
Our baseline model finds evidence that AI is productivity enhancing.
Results from the paper's stated baseline empirical model using BEA industry-account-based measures; model specification described by authors.
States can adjust their foreign policies to this fact by focusing on resilience, technological sovereignty, strategic decoupling, and coordination through alliances.
Policy-prescriptive recommendations based on the paper's theoretical framework and analysis; no empirical testing or sample size reported in the abstract.
high positive ARTIFICIAL INTELLIGENCE AND THE WEAPONIZATION OF ECONOMIC IN... effectiveness of foreign policy adjustments (resilience, sovereignty, decoupling...
ClawNet enables multiple users to collaborate securely through their respective agents.
Capability claim about the instantiated system (authors assert that ClawNet enables secure multi-user collaboration; excerpt contains no empirical security evaluation or user study).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... secure multi-user collaboration enabled by agent-mediated interactions
We instantiate this paradigm in ClawNet, an identity-governed agent collaboration framework that enforces identity binding and authorization verification through a central orchestrator.
Implementation claim: authors state they built ClawNet as an instantiation of their paradigm (paper describes framework/architecture; no experimental evaluation included in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... existence of an implemented framework (ClawNet) enforcing identity binding and a...
Action-level accountability logs every operation against its owner's identity and authorization, ensuring full auditability.
Design claim describing an accountability primitive (paper asserts logging and auditability as a property; no audit or verification evidence shown in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... auditability of agent actions (logging tied to owner identity/authorization)
Scoped authorization enforces per-identity access control and escalates boundary violations to the owner.
Design/specification claim describing the scoped authorization governance primitive in the proposed paradigm (no empirical or security evaluation provided in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... access control enforcement and escalation behavior
The paradigm rests on three governance primitives: (1) a layered identity architecture that separates a Manager Agent from multiple context-specific Identity Agents; the Manager Agent holds global knowledge but is architecturally isolated from external communication.
Architectural/design claim describing the proposed layered identity primitive (presentation of design; no empirical validation in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... identity architecture and information flow constraints
We propose a human-symbiotic agent paradigm in which each user owns a permanently bound agent system that collaborates on the owner's behalf, forming a network whose nodes are humans rather than agents.
Design proposal / conceptual architecture presented in the paper (no large-scale deployment or empirical evaluation described in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... structure of agent networks (human-centric vs agent-centric) and delegation mode...
The next frontier for AI agents lies not in stronger individual capability, but in the digitization of human collaborative relationships.
Normative/strategic claim advanced by the authors as the central thesis (conceptual argument, no empirical test reported).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... focus of AI-agent development (individual capability vs collaboration digitizati...
Human productivity rests on the social and organizational relationships through which people coordinate, negotiate, and delegate.
Theoretical/argumentative claim presented as background motivation (conceptual reasoning, citation not provided in excerpt).
high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... human productivity as mediated by social/organizational relationships
Time Series Augmented Generation (TSAG) enables LLM agents to delegate quantitative tasks to verifiable external tools.
Description of TSAG framework in paper stating delegation mechanism to external verifiable tools for quantitative computations.
high positive Time Series Augmented Generation for Financial Applications delegation capability to external tools
We publicly release the evaluation framework and empirical insights to foster standardized research on reliable financial AI.
Paper states that the framework, benchmark, and empirical results are released publicly by the authors.
high positive Time Series Augmented Generation for Financial Applications public release of resources
The results demonstrate that capable agents can achieve near-perfect tool-use accuracy with minimal hallucination, validating the tool-augmented paradigm.
Empirical results from the authors' experiments on the 100-question benchmark across multiple agents; paper states agents achieve 'near-perfect' tool-use accuracy and 'minimal' hallucination.
high positive Time Series Augmented Generation for Financial Applications tool-use accuracy; hallucination rate
We apply this methodology in a large-scale empirical study using our framework, Time Series Augmented Generation (TSAG), where an LLM agent delegates quantitative tasks to verifiable, external tools.
Paper reports applying the TSAG framework in an empirical study in which agents call external tools to perform quantitative computations; described as 'large-scale' and implemented by the authors.
high positive Time Series Augmented Generation for Financial Applications use of external/verifiable tools by LLM agents
We introduce a novel evaluation methodology and benchmark designed to rigorously measure an LLM agent's reasoning for financial time-series analysis.
Paper describes a new methodology and benchmark (Time Series Augmented Generation, TSAG) developed by the authors for evaluating LLM reasoning on financial time-series tasks.
high positive Time Series Augmented Generation for Financial Applications existence of a new evaluation methodology / benchmark
Effective evaluation-driven loop scaling is a central axis for advancing LLM-driven scientific discovery, and SimpleTES provides a simple yet practical framework for realizing these gains.
High-level claim supported by the aggregate experimental results and discussion in the paper.
high positive Evaluation-driven Scaling for Scientific Discovery impact of scaling evaluation-driven discovery loops on LLM-driven scientific dis...
When post-trained on successful trajectories, models not only improve efficiency on seen problems but also generalize to unseen problems, discovering solutions that base models fail to uncover.
Experiments in which models were post-trained on successful SimpleTES trajectories and evaluated on both seen and unseen problems (paper claim of improved efficiency and generalization).
high positive Evaluation-driven Scaling for Scientific Discovery post-training efficiency on seen problems and generalization to unseen problems ...
SimpleTES produces trajectory-level histories that naturally supervise feedback-driven learning.
Methodological claim and supporting experiments where SimpleTES generates solution trajectories that are then used as supervision for learning.
high positive Evaluation-driven Scaling for Scientific Discovery availability and usefulness of trajectory-level histories for supervision
We discovered new Erdos minimum overlap constructions that surpass the best-known results.
Reported novel combinatorial constructions (Erdos minimum overlap) in the experiments that improve on prior best-known results.
high positive Evaluation-driven Scaling for Scientific Discovery quality of Erdos minimum overlap constructions (best-known benchmarks)
We designed quantum circuit routing policies that reduce gate overhead by 24.5%.
Experimental results reported for quantum circuit routing tasks showing a 24.5% reduction in gate overhead when using SimpleTES-designed policies.
high positive Evaluation-driven Scaling for Scientific Discovery quantum circuit gate overhead
We sped up the widely used LASSO algorithm by over 2x.
Benchmarking experiment reported in the paper comparing LASSO runtime/performance with and without SimpleTES (paper states >2x speedup).
high positive Evaluation-driven Scaling for Scientific Discovery LASSO algorithm runtime / speed
SimpleTES consistently outperforms both frontier-model baselines and sophisticated optimization pipelines.
Comparative experimental evaluation vs. frontier-model baselines and optimization pipelines across the reported problems (paper claim).
high positive Evaluation-driven Scaling for Scientific Discovery performance relative to baselines (solution quality / discovery success)
Across 21 scientific problems spanning six domains, SimpleTES discovers state-of-the-art solutions using gpt-oss models.
Empirical experiments reported across 21 problems in six domains using gpt-oss models (paper states 21 problems).
high positive Evaluation-driven Scaling for Scientific Discovery ability to discover state-of-the-art solutions (solution quality / discovery suc...
We introduce Simple Test-time Evaluation-driven Scaling (SimpleTES), a general framework that strategically combines parallel exploration, feedback-driven refinement, and local selection.
Methodological contribution described in the paper (framework design and algorithmic description).
high positive Evaluation-driven Scaling for Scientific Discovery framework design combining parallel exploration, feedback-driven refinement, and...
Given historical inequities in housing placement, it is crucial to audit LLM use in this context.
Authors' policy/recommendation motivated by historical inequities in housing placement and their empirical audit findings; presented as an argument in the report rather than a quantified experimental result.
high positive Auditing LLMs for Algorithmic Fairness in Casenote-Augmented... need for auditing LLMs (policy recommendation)
Leveraging LLMs to augment tabular classification with casenote summaries can safely incorporate additional text information with low implementation burden.
Authors' reported experiments and practical assessment on augmenting tabular classifiers with LLM-derived casenote summaries from a nonprofit outreach dataset; described as having low implementation burden and being safe to use. (No sample size given in abstract.)
high positive Auditing LLMs for Algorithmic Fairness in Casenote-Augmented... feasibility/safety of augmenting tabular models with LLM casenote summaries; imp...
A fine-tuned model augmented with casenote summaries can improve accuracy while reducing algorithmic fairness disparities on the housing placement multi-class classification task.
Empirical audit of LLM-based tabular classification on a real housing placement prediction task augmented with street outreach casenotes from a nonprofit partner; authors report multi-class classification experiments comparing fine-tuned models with and without casenote summaries and auditing error disparities across groups. (Sample size not stated in the abstract.)
high positive Auditing LLMs for Algorithmic Fairness in Casenote-Augmented... multi-class classification accuracy; classification error disparities across dem...
There is a positive relationship between disagreement among agents and trading volume in the simulated markets.
Observed correlation in the simulated open-call auction between measured disagreement (e.g., dispersion in beliefs) and trading volume; described as replicating classic experimental findings.
high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles relationship between disagreement (belief dispersion) and trading volume
These individual-level patterns aggregate into equilibrium dynamics that replicate classic experimental findings (Smith et al., 1988), including the predictive power of excess demand for future prices.
Aggregation of simulated agent behavior in the open-call auction producing market-level time series; comparison of market dynamics to classic experimental benchmark (Smith et al., 1988) and reported finding that excess demand predicts future prices.
high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles predictive power of excess demand for future prices
AI agents form recency-weighted extrapolative beliefs (i.e., overweight recent price history when forecasting future prices).
Analysis of agents' forecasts and trading behavior in the simulated open-call auction populated by autonomous LLM agents; identification of extrapolative forecasting patterns reported as a main finding.
high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles recency-weighted extrapolative beliefs in price forecasts
AI agents exhibit a pronounced disposition effect.
Simulated open-call auction populated by autonomous LLM agents in experimental asset-market simulations; behavioral trading data showing agents' selling/holding patterns (paper describes this as a main documented finding).
high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles disposition effect (tendency to sell winners and hold losers)
We propose seven interface primitives operationalizing verification-centered HCI.
Design contribution: specification of seven interface primitives within the paper (conceptual/design proposal); no user-study or empirical validation reported.
high positive The Instrumental Dissolution of Typing: Why AI Challenges th... existence and specification of interface primitives for verification-centered HC...
We map synthetic literacy -- oral input generating literate output -- as the defining feature of this transition.
Conceptual mapping and theoretical framing within the paper; supported by examples from technology trends but no empirical evaluation reported.
high positive The Instrumental Dissolution of Typing: Why AI Challenges th... emergence of synthetic literacy (oral-to-literate workflows)