The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13661 claims)

Adoption
8339 claims
Productivity
7479 claims
Governance
6715 claims
Human-AI Collaboration
6267 claims
Org Design
4098 claims
Innovation
3987 claims
Labor Markets
3488 claims
Skills & Training
2888 claims
Inequality
2016 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 740 192 95 871 1945
Governance & Regulation 796 388 185 119 1512
Organizational Efficiency 765 186 123 82 1166
Technology Adoption Rate 610 227 121 95 1061
Research Productivity 409 121 56 331 928
Output Quality 464 174 58 47 743
Decision Quality 318 173 75 42 615
Firm Productivity 432 55 88 20 601
AI Safety & Ethics 214 273 65 33 589
Market Structure 175 165 120 24 489
Task Allocation 206 64 70 31 376
Skill Acquisition 161 57 57 16 291
Innovation Output 201 27 41 18 288
Fiscal & Macroeconomic 130 69 43 26 275
Employment Level 104 50 105 13 274
Consumer Welfare 116 62 42 11 231
Firm Revenue 149 45 26 3 223
Inequality Measures 43 120 49 6 218
Task Completion Time 164 29 8 12 214
Worker Satisfaction 89 60 20 12 181
Error Rate 69 89 9 2 169
Regulatory Compliance 74 67 14 4 159
Training Effectiveness 91 19 13 19 144
Wages & Compensation 77 33 25 6 141
Team Performance 86 17 27 9 140
Automation Exposure 49 50 22 12 136
Developer Productivity 91 17 14 5 128
Job Displacement 12 80 19 1 112
Hiring & Recruitment 51 7 8 3 69
Creative Output 31 16 7 2 57
Skill Obsolescence 5 43 6 1 55
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
Commonly reported gains include the automation of trivial and repetitive tasks.
Multiple studies in the review report that LLM-assistants automate mundane programming tasks.
high positive The Impact of LLM-Assistants on Software Developer Productiv... automation of low-complexity tasks / developer time freed
Commonly reported gains include minimized code search due to LLM assistance.
Synthesis of study findings noting reductions in developer time spent searching for code or answers.
high positive The Impact of LLM-Assistants on Software Developer Productiv... time/effort spent searching for code or information
Commonly reported gains from LLM-assistants include accelerated development (faster task completion).
Multiple included studies report faster development workflows and reduced time-to-complete tasks, as synthesized in the review.
high positive The Impact of LLM-Assistants on Software Developer Productiv... task completion time / development speed
The majority of reviewed studies report considerable benefits from LLM-assistants.
Synthesis of findings across the 39 included peer-reviewed studies as reported in the review.
high positive The Impact of LLM-Assistants on Software Developer Productiv... overall reported impact on developer productivity
Comparing the verbal-profile setting to a numeric-budget condition with confidentiality instructions cleanly isolates role coherence as distinct from instruction-following failure.
Experimental comparison between verbal-profile condition and numeric-budget condition with confidentiality instructions; result claimed to isolate mechanism (role coherence) from mere instruction-following failure.
high positive When Agents Shop for You: Role Coherence in AI-Mediated Mark... mechanism attribution (role coherence vs. instruction-following failure) for obs...
In an experiment where a language-model buyer agent shops on behalf of a verbal consumer profile, seller-side inference from dialogue alone recovers willingness to pay nearly one-for-one.
Reported experimental result using a language-model buyer agent interacting on behalf of a verbal consumer profile; experimental comparison described in paper excerpt (specific sample size and statistical details not provided in the excerpt).
high positive When Agents Shop for You: Role Coherence in AI-Mediated Mark... accuracy of seller-side inference of willingness to pay (recovery of WTP)
Consumers are increasingly delegating purchase decisions to AI agents, providing natural-language descriptions of their preferences and identity.
Asserted in paper's introduction/abstract as a background trend; no empirical sample or citation provided in the excerpt.
high positive When Agents Shop for You: Role Coherence in AI-Mediated Mark... use of AI agents for purchase delegation / prevalence of natural-language prefer...
Capital-managing agents should be evaluated across the full path from user mandate to prompt, validated action, and settlement.
Recommendation based on the authors' empirical deployment and analysis of failure modes and mitigation effectiveness across the end-to-end pipeline.
high positive Operating-Layer Controls for Onchain Language-Model Agents U... evaluation scope for capital-managing agents
Targeted harness changes increased capital deployment from 42.9% to 78.0% in an affected test population.
A/B or pre/post testing in an affected test population measuring percentage of capital deployed before and after harness changes.
Targeted harness changes reduced fee-led observations from 32.5% to below 10% in an affected test population.
A/B or pre/post testing in an affected test population measuring incidence of fee-led observations before and after harness changes.
high positive Operating-Layer Controls for Onchain Language-Model Agents U... incidence of fee-led observations
Targeted harness changes reduced fabricated sell rules from 57% to 3% in an affected test population.
A/B or pre/post testing in an affected test population measuring incidence of fabricated sell-rule observations before and after harness changes (percentage rates reported).
high positive Operating-Layer Controls for Onchain Language-Model Agents U... incidence of fabricated sell rules
Policy-valid submitted transactions settled with 99.9% success.
Settlement logs comparing policy-valid submitted transactions to successful onchain settlements.
high positive Operating-Layer Controls for Onchain Language-Model Agents U... settlement success rate for policy-valid submissions
Expert validation established strong relevance and practical utility for the framework, with a mean score of 4.6/5.
Structured validation exercise with five domain experts in AI ethics, corporate governance, and fintech regulation; paper reports the mean validation score as 4.6/5.
high positive Corporate-Governance-Driven Algorithmic Fairness in SME Fint... perceived relevance and practical utility of the framework (expert validation sc...
Analysis revealed four foundational governance pillars: Accountability, Transparency, Fairness, and Compliance.
Theme extraction from the SLR of 45 peer-reviewed publications (2022-2025) reported in the paper; these four pillars are presented as the core components of the proposed framework.
high positive Corporate-Governance-Driven Algorithmic Fairness in SME Fint... identification of governance pillars for algorithmic fairness
The study develops and validates an integrated conceptual framework that incorporates corporate governance principles with mechanisms for algorithmic fairness to foster ethical outcomes in SME fintech lending.
Two-phase research approach described in paper: (1) systematic literature review (45 peer-reviewed publications, 2022-2025) and (2) structured validation with five domain experts in AI ethics, corporate governance, and fintech regulation.
high positive Corporate-Governance-Driven Algorithmic Fairness in SME Fint... existence and validated relevance of an integrated governance-fairness framework
AI-driven credit assessment platforms promise greater efficiency in fintech lending.
Statement in paper (conceptual claim); supported by related literature cited in the SLR of 45 papers but no empirical efficiency metric reported in this paper.
high positive Corporate-Governance-Driven Algorithmic Fairness in SME Fint... efficiency of credit assessment processes
The rapid growth of fintech lending has reshaped financial access for SMEs through AI-driven credit assessment platforms.
Assertion in paper's background; positioned as established context for study (no specific empirical estimate given). The paper's SLR (45 peer-reviewed publications, 2022-2025) is presented as the literature basis for context.
high positive Corporate-Governance-Driven Algorithmic Fairness in SME Fint... financial access for SMEs
To a lesser extent, fears of AI automation drive demand for schemes that guarantee income regardless of employment status.
Findings from the 2024 OECD 'Risks that Matter' survey reported in the paper (survey-based measure of support for income-guarantee schemes conditional on fear of automation).
high positive AI, the Future of Work, and the Politics of the Welfare Stat... public support for income-guarantee schemes (e.g., universal basic income)
Rather than increasing support for traditional interventions such as unemployment benefits and training programs, these fears primarily drive demand for measures that preserve the social role of work and protect it from automation, such as robot taxes.
Results from the 2024 OECD 'Risks that Matter' public opinion survey analyzed in the paper (survey-based association between fear and policy preferences).
high positive AI, the Future of Work, and the Politics of the Welfare Stat... public support for policies that protect the social role of work (e.g., robot ta...
Framework, metrics, baselines, and collection scripts will be released open-source on acceptance.
Author statement of intent to release code and assets upon paper acceptance.
high positive Benchmarking Complex Multimodal Document Processing Pipeline... open-source release of materials
The paper describes three reference architectures (ColPali, ColQwen2, agentic complexity-based routing) which are not yet integrated end-to-end.
Author statement listing three proposed reference architectures and noting they are not yet integrated end-to-end.
high positive Benchmarking Complex Multimodal Document Processing Pipeline... proposed system architectures (descriptive)
Factual accuracy on stated claims is 85.5%.
Reported accuracy measurement on 'stated claims' in generated outputs from systems evaluated on EnterpriseDocBench. Details on annotator process and sample size not included in excerpt.
high positive Benchmarking Complex Multimodal Document Processing Pipeline... factual accuracy (fraction of stated claims judged factually correct)
Both hybrid and BM25 beat dense embedding (dense embedding nDCG@5 = 0.83).
Reported nDCG@5 values for three retrieval approaches on the benchmark (values quoted in paper).
high positive Benchmarking Complex Multimodal Document Processing Pipeline... retrieval relevance (nDCG@5)
Hybrid retrieval narrowly beats BM25 (nDCG@5 of 0.92 vs. 0.91).
Empirical evaluation on EnterpriseDocBench using nDCG@5 as reported metric in paper. Exact query count or folds not provided in the excerpt.
high positive Benchmarking Complex Multimodal Document Processing Pipeline... retrieval relevance (nDCG@5)
We ran three pipelines through it: BM25, dense embedding, and a hybrid, all using the same GPT-5 generator.
Method statement describing experimental pipelines evaluated on EnterpriseDocBench (three retrieval variants combined with a shared generator).
high positive Benchmarking Complex Multimodal Document Processing Pipeline... evaluation of retrieval pipelines with a shared generator
The corpus is built from public, permissively licensed documents across six enterprise domains (five represented in the current pilot).
Author description of corpus composition (number of domains and pilot coverage). No document-count supplied in provided text.
We built EnterpriseDocBench to evaluate parsing fidelity, indexing efficiency, retrieval relevance, and generation groundedness on the same corpus.
Description of dataset/benchmark creation and stated design goals in paper (author-developed benchmark covering four stages).
high positive Benchmarking Complex Multimodal Document Processing Pipeline... system-level evaluation across parse/index/retrieve/generate stages
Most enterprise document AI today is a pipeline: parse, index, retrieve, generate.
Author assertion about prevailing architecture of enterprise document-AI systems (introductory observation in paper). No empirical sample size or systematic survey reported in text provided.
high positive Benchmarking Complex Multimodal Document Processing Pipeline... prevalence of pipeline architecture
We release the full code base and a richly annotated dataset to support reproducible research on adaptive VCAs.
Paper statement announcing release of code and dataset.
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... availability of codebase and annotated dataset
The recommender achieved high relevance (MRR@1=0.75).
Reported offline/online recommender evaluation in the paper using Mean Reciprocal Rank at 1 (MRR@1) metric; presumably computed over recommendations in the study (711 conversations).
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... recommendation relevance (MRR@1)
Step-by-step guidance improved pleasantness and reduced user burden.
User-reported measures collected in the controlled study (likely subjective ratings across participants/conversations).
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... pleasantness (user satisfaction) and user burden
Device-level evidence increased correct resolutions from about 50% to over 90% relative to an LLM-only baseline.
Controlled study comparing SecMate with device-level diagnostic evidence to an LLM-only baseline; reported results across 144 participants / 711 conversations.
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... correct resolutions (successful troubleshooting)
Service specificity is achieved through a proactive, context-aware recommender.
System description and recommender component evaluation in the paper.
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... use of a proactive, context-aware recommender for service specificity
User specificity relies on implicit proficiency inference and profile-aware troubleshooting.
System design and algorithmic description in the paper explaining user-proficiency inference and profile-aware components.
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... ability to infer user proficiency and use profiles for troubleshooting
Device specificity is provided by a lightweight local diagnostic utility.
System design and implementation details reported in the paper describing the diagnostic utility component.
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... presence and role of a local diagnostic utility for device specificity
We present SecMate, a multi-agent VCA for cybersecurity troubleshooting that integrates device, user, and service specificity from conversational and device-level signals.
System description and architecture presented in the paper (design and implementation of SecMate).
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... system capability to integrate device, user, and service specificity
Make is most compelling for commodity utilities and for differentiating custom applications in the AI era.
Paper's typology and normative recommendation derived from conceptual analysis (no empirical validation reported).
high positive The Buy-or-Build Decision, Revisited: How Agentic AI Changes... relative attractiveness of in-house development (Make) across application catego...
AI fundamentally transforms the governance properties of the Make option, shifting it from Williamson's pure hierarchy to a hybrid governance form that combines code ownership with external AI infrastructure dependency.
Conceptual argument combining transaction cost economics, resource-based view, and assessment of AI infrastructure characteristics (no empirical testing reported).
high positive The Buy-or-Build Decision, Revisited: How Agentic AI Changes... governance form of in-house software development (Make)
The 'SaaSocalypse' narrative predicts that AI will render large segments of the Software-as-a-Service market obsolete by enabling firms to build software in-house at a fraction of historical cost.
Statement summarizing an extant narrative in industry and literature (paper cites/describes this narrative; no empirical test in the paper).
high positive The Buy-or-Build Decision, Revisited: How Agentic AI Changes... obsolescence of SaaS offerings / shift from buy to make
Advances in generative artificial intelligence, particularly agentic coding systems capable of autonomous software development, are disrupting the economics of the make-or-buy decision for enterprise applications.
Paper's conceptual analysis combining transaction cost economics, resource-based view, and assessment of current AI capabilities (no empirical sample reported).
high positive The Buy-or-Build Decision, Revisited: How Agentic AI Changes... economics of the make-or-buy decision for enterprise applications
Empirically, the decomposition confirms a speculative peak confined to December 1999–March 2000 in the dot-com episode.
Empirical application of the proposed decomposition and bubble test to historical asset price data covering the dot-com episode (data analysis reported in the paper).
high positive General-Purpose Technology and Speculative Bubble Detection timing and presence of a speculative peak during the dot-com episode
A fundamental-versus-speculative decomposition that projects prices onto observable technology proxies and applies the bubble test to the residual corrects for the contamination.
Methodological proposal described in the paper; presented as a corrective procedure (projection of prices on technology proxies and testing residuals).
high positive General-Purpose Technology and Speculative Bubble Detection ability to separate fundamental-driven price movements from speculative componen...
Embedding a hump-shaped technology shock in the Campbell-Shiller present-value model, the fundamental price becomes locally explosive during adoption.
Analytical/theoretical proof derived from the modified Campbell-Shiller present-value model with a hump-shaped technology shock (model-based derivation).
high positive General-Purpose Technology and Speculative Bubble Detection explosiveness of the fundamental price (local explosiveness during adoption)
The framework produces a list of testable empirical questions that we leave as open problems.
Statement in the paper that it derives testable empirical questions from the theoretical framework; no empirical tests are executed in the paper itself.
high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... set of testable empirical research questions derived from the framework
The framework operationalizes aspects of earlier qualitative work on supervisory control (Sheridan, 1992), common ground (Clark & Brennan, 1991), and mixed-initiative interaction (Horvitz, 1999) within a single normative ratio.
Conceptual synthesis and mapping of prior qualitative literature into the new per-task leverage formalism presented in the paper; this is a theoretical linkage rather than empirical validation.
high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... conceptual operationalization of supervisory control/common ground/mixed-initiat...
The per-task ceiling does not bind the windowed measure, though both remain bounded: L_task by per-task novelty, L_window by the stock of accumulated planning investment that pays out within the window.
Theoretical derivation/argument in the paper distinguishing bounds on per-task leverage (L_task) and windowed leverage (L_window) and identifying their respective limiting factors; no empirical evidence provided.
high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... bounds on L_task and L_window (per-task novelty and accumulated planning investm...
We extend this per-task analysis to a windowed leverage measure that accommodates recurring tasks, spawned subtasks, and amortized system-design investment.
Conceptual/theoretical extension in the paper defining a windowed leverage metric and describing how it accounts for recurring tasks, subtasks, and amortized design investments; no empirical tests reported.
high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... windowed leverage (aggregated leverage over a time window accounting for amortiz...
The asymptotic behavior of leverage decomposes into two scaling axes (capability and memory) with a non-zero floor on the planning term set by irreducible task novelty bounded by human throughput.
Mathematical/theoretical asymptotic analysis within the paper; conceptual derivation linking capability and memory as scaling axes and asserting a lower bound on planning cost due to task novelty and human throughput.
high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... leverage scaling behavior and lower bound on planning term
Information density itself is directional and bounded by separate ceilings on human-to-agent and agent-to-human flow.
Theoretical argument/derivation in the paper establishing directional information-density and distinct upper bounds for each flow direction; no empirical validation reported.
high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... directional information flow bounds between human and agent
The denominator decomposes into three channels through which a conserved per-task information requirement must flow, each with its own time-cost scalar (specify the task, resolve mid-run interrupts, and review the result).
Analytic decomposition within the paper's theoretical framework; conceptual argument rather than empirical measurement.
high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... components of human time cost (specification, interrupt resolution, review)