The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
We implement an Adversarial Multi-Agent Quality Control (QC) loop in which evaluator agents iteratively critique generated frames and prompt generators to refine outputs until a deterministic consensus is reached.
Method description of a multi-agent adversarial QC loop used in the pipeline; no experimental protocol, number of agents, or sample sizes provided in this sentence.
high positive Genflow Ad Studio: A Compound AI Architecture for Brand-Alig... iterative refinement / consensus-driven quality control
Our architecture integrates a retrieval-based 'Brand DNA' extraction module to parameterize generation according to established corporate identity guidelines.
Methodological description in paper indicating a retrieval-based module for extracting Brand DNA used to condition generation; no evaluation metrics or sample sizes provided in this statement.
high positive Genflow Ad Studio: A Compound AI Architecture for Brand-Alig... parameterization of generation by brand guidelines
We introduce Genflow, a Compound AI System designed to enforce brand consistency in generative media production.
Paper describes the proposed system architecture (Genflow) as a methodological contribution; description of modules and pipeline provided but no external validation details in this sentence.
high positive Genflow Ad Studio: A Compound AI Architecture for Brand-Alig... ability to enforce brand consistency
Recent advancements in generative video models demonstrate high visual fidelity.
Asserted in paper as a background observation about recent generative video models; no specific dataset, benchmark, or sample size reported.
Benchmark comparisons of multiple LLM backends (Granite-Docling, Mistral-Small, DeepSeek-OCR) were performed to provide practical insights for production deployment.
Authors state they performed benchmark comparisons of multiple LLM backends (listed in abstract); specifics of metrics and sample sizes not given in abstract.
high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... comparative performance of LLM backends for deployment
A comprehensive sustainability analysis shows that the hybrid AI+HITL approach reduces CO2 emissions by 69%, energy consumption by 69%, and water usage by 63% compared to traditional manual processing.
Authors report a sustainability analysis comparing hybrid AI+HITL approach to traditional manual processing (details not provided in abstract).
high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... CO2 emissions, energy consumption, water usage
Prompt Fine Tuning with Feedback Inheritance (PFTFI) is a novel approach introduced in this work.
Authors explicitly introduce PFTFI as part of their approach (stated in abstract).
The system integrates five specialized agents—Classificator, Splitter, Parser, Extraction, and Validator—together with a Human-in-the-Loop mechanism and a Prompt Fine Tuning with Feedback Inheritance (PFTFI) approach.
Authors' architectural description in the abstract specifying the five agents, HITL mechanism, and PFTFI approach.
high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... system architecture components (agent integration)
MADP combines deep learning-based classification and parsing with large language model extraction while maintaining accuracy through selective human validation.
System description in paper asserting integration of DL classification/parsing, LLM extraction, and selective human validation; supported by system evaluations reported elsewhere in abstract.
high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... maintenance of accuracy via selective human validation
Ablation evaluation on a stratified 100-document subset demonstrates that the full MADP configuration with Human-in-the-Loop supervision attains 98.5% document-level accuracy.
Ablation evaluation on a stratified subset of 100 documents (5 documents per each of 20 supplier/document-type categories) reported by authors.
Only 3% of documents required non-AI fallback in the production deployment.
Same production deployment on 955 documents (stated in abstract).
high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... proportion requiring non-AI fallback
Production deployment on 955 real-world documents processed through January 2026 achieves a 97.0% full-pipeline automation rate.
Reported production deployment on 955 real-world documents processed through January 2026 (stated in abstract).
high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... full-pipeline automation rate
Operational analysis on a production use-case scenario of 100,000 invoices per year indicates a potential reduction of Full-Time Equivalent (FTE) requirements by approximately 70%.
Operational analysis reported by authors on a production use-case scenario involving 100,000 invoices per year (stated in abstract).
high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... Full-Time Equivalent (FTE) requirements
Trace-Prior RL adds bounded adaptation under capacity asymmetry.
Experiments contrasting Trace-Prior RL versus behavior cloning and reward-only approaches in settings with capacity asymmetry, showing Trace-Prior RL permits limited/adaptive deviation while preserving trace alignment.
high positive When Outcome Looks Right But Discipline Fails: Trace-Based E... bounded adaptation (ability to adapt under capacity asymmetry while preserving t...
Pure behavior cloning is nearly enough for symmetric imitation.
Empirical results in symmetric imitation settings (presumably in the two-hotel or bidding benchmarks) showing behavior cloning achieves close imitation without additional RL.
high positive When Outcome Looks Right But Discipline Fails: Trace-Based E... imitation fidelity in symmetric settings
Trace-prior or corrected-history policies better preserve price or bid distributions.
Comparative experiments and ablations across the two-hotel benchmark and hidden-budget bidding task showing trace-prior and corrected-history policies retain price/bid distribution characteristics better than reward-only variants.
high positive When Outcome Looks Right But Discipline Fails: Trace-Based E... preservation of price or bid distributions
Revealing hidden state reduces label uncertainty.
Experiments (hidden-state ablations) in the compact hidden-budget bidding task and/or two-hotel benchmark where providing hidden state information to the learner reduced uncertainty in inferred labels.
A year-long pilot across three clinical sites executed 8,728 cohort-enrolled workflow runs with a 97.08% completion rate under an early prototype without the verified-core subsystem.
Reported evaluation: year-long pilot conducted across three clinical sites, total workflow runs = 8,728, reported completion rate = 97.08%; prototype lacked verified-core subsystem.
high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... workflow run completion rate
Swimlanes make trust boundaries explicit, separating verified logic from external systems, human judgment, and AI decisions.
Design description in the paper explaining swimlane use to delineate trust boundaries between system components and humans/external systems/AI.
high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... clear trust boundaries / separation of concerns (design feature)
At runtime a durable engine records outcomes in an append-only event log and can enforce contracts at system boundaries, supporting replay, retries, and audit.
System architecture description in the paper describing runtime engine features (append-only log, enforcement, replay/retry, audit support).
high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... runtime durability, auditability, and recoverability (design features)
At compile time GraphFlow restricts diagrams to produce reusable automations whose contracts (preconditions, postconditions, and composition obligations) are intended to be proof-checked before admission to a shared library.
Design/specification claim in the paper describing compile-time restrictions and proof-checked admission model (implementation/design detail).
high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... contract verification / reusability (design intention)
GraphFlow treats workflow diagrams as the executable specification — a single artifact defining data scope, execution semantics, and monitoring — to address the gap between durable execution and semantic correctness.
System design description in the paper explaining GraphFlow's design philosophy and intended role of diagrams as executable specifications.
high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... specification completeness / clarity (design intent)
Existing workflow platforms provide durable execution and observability.
Author statement in background/motivation describing properties of existing workflow platforms.
high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... platform durability and observability (feature presence)
Context engineering (programmatic state abstraction and clean task decomposition) is generally more cost-effective than deeper per-agent deliberation.
Cost-effectiveness measured as returns per token spent (RPTS) across configurations that vary context representation and deliberation; results from the 3,475-episode controlled study indicate context changes yielded larger returns per token than adding deliberation tools.
high positive Context, Reasoning, and Hierarchy: A Cost-Performance Study ... returns per token spent (RPTS)
Programmatic state abstraction delivers the largest returns per token spent (RPTS), improving mean return by up to 76% over raw observations.
Controlled empirical study in the CybORG CAGE-2 POMDP environment comparing context representations (raw observations vs. deterministic state-tracking layer with compressed history) across five model families, six models, and twelve configurations with token-level cost accounting (3,475 episodes).
high positive Context, Reasoning, and Hierarchy: A Cost-Performance Study ... mean return (and returns per token spent, RPTS)
Companies that train workers outperform those that simply cut them.
Claim presented as one of the five lessons, based on historical analogy and emerging workplace evidence (chapter asserts firms that invest in training do better).
high positive 7. AI and the Future of Work firm performance (outperformance of training firms relative to cutting firms)
For AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time.
Conclusion/recommendation drawn from the paper's modeling results and analysis (argument that installed MW is a poor planning metric compared to time-varying deployable capacity).
high positive Designing Datacenter Power Delivery Hierarchies for the AI E... planning objective (deployed capacity over time vs installed MW)
The framework combines projection models for GPU, compute, and storage deployments with operational factors grounded in production data from Microsoft Azure.
Method claim: framework integrates projection models and operational data from Microsoft Azure (production data grounding); stated in the paper's methods summary.
high positive Designing Datacenter Power Delivery Hierarchies for the AI E... realism/grounding of projection models (use of Azure production data)
We develop a framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences.
Methodological claim describing the paper's core contribution: a simulation/evaluation framework combining throughput, power, and cost metrics with arrival/oversubscription/decommissioning sequences; based on the authors' implementation (details and data referenced in the paper).
high positive Designing Datacenter Power Delivery Hierarchies for the AI E... throughput, power utilization, cost metrics over deployment sequences
Designs must remain efficient over long datacenter lifetimes and multiple hardware generations.
Normative/design recommendation motivated by long asset lifetimes and evolving hardware density; stated as a requirement in the paper.
high positive Designing Datacenter Power Delivery Hierarchies for the AI E... design efficiency over time
Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027.
Projection models for GPU deployments described in the paper (projection models combined with industry deployment assumptions); specific provenance referenced in the abstract but no sample size reported.
high positive Designing Datacenter Power Delivery Hierarchies for the AI E... rack power density (MW per deployment)
Ülkelerin yapay zekâ kaynaklı yapısal dönüşüme uyum sağlayabilmesi için koordineli ve uzun vadeli politika çerçevelerine ihtiyaç vardır; ticaret politikası, sanayi politikası ve dijital düzenlemeler bütünleşik bir strateji dahilinde ele alınmalıdır.
Çalışmanın sonuç ve politika önerisi bölümü; normatif tavsiye ve koordinasyon gereksinimi üzerine argüman; ampirik kanıt veya uygulama örnekleri verilmiyor.
high positive Yapay Zekâ ve Küresel Değer Zincirleri: Ticaret Politikası v... ulusal ve uluslararası politika koordinasyonunun gerekliliği ve bütünleşik strat...
Gelişmekte olan ülkeler için dijital altyapıya erken yatırım yapmak yeni rekabet gücü pencereleri açabilir.
Kavramsal argüman; politika yönelimi ve stratejik öneri; ampirik test veya nicel kanıt sunulmamıştır.
high positive Yapay Zekâ ve Küresel Değer Zincirleri: Ticaret Politikası v... erken dijital altyapı yatırımı yapan ülkelerin rekabet gücü (yeni fırsatların or...
Otomasyon ve akıllı üretim sistemlerinin yaygınlaşmasıyla ucuz işgücüne dayalı karşılaştırmalı üstünlüklerin aşınması ve üretimin gelişmiş ekonomilere veya müttefik ülkelere geri dönüşünü (reshoring ve friendshoring) ifade eden eğilimlerin ivme kazanması beklenmektedir.
Kavramsal analiz ve beklenen teknoloji→tüketim/üretim mekanizmalarına ilişkin mantıksal çıkarımlar; çalışmada ampirik test veya nicel veri sunulmamıştır.
high positive Yapay Zekâ ve Küresel Değer Zincirleri: Ticaret Politikası v... reshoring ve friendshoring eğilimlerinin artışı (üretimin coğrafi yeniden yerleş...
Higher sectoral digitalization potential strongly increased remote work: DiD estimate 40.74 percentage points (p < 0.001); remote work rose from 17.6% to 82.1% in highly digitalized sectors versus 1.3% to 6.6% in less digitalized sectors.
Difference-in-differences (DiD) analysis using the COVID-19 shock as quasi-natural experiment on quarterly panel data for 27 EU Member States (2018–2024), N = 36,685; reported DiD estimate = 40.74 percentage points, p < 0.001; descriptive pre/post shares reported for both groups.
high positive Digital transformation and labor market indicators in the EU... share of remote work (percent of work done remotely)
Higher sectoral digitalization potential has a statistically significant positive effect on wages (hourly wages).
Difference-in-differences (DiD) analysis using the COVID-19 shock as quasi-natural experiment on the same quarterly panel (27 EU Member States, 2018–2024), N = 36,685; reported DiD coefficient = 0.52 €/hour, p < 0.001; authors state this corresponds to ≈4.6% increase in the wage gap between highly and less digitalized activities.
The study's findings offer actionable insights for managers and policymakers to leverage AI for sustainable organizational growth while safeguarding employee well-being.
Authors' concluding statement based on survey findings and analytical results.
high positive Opportunities and Challenges of Human- AI Collaboration in W... practical relevance of findings for management and policy decisions
Successful human–AI collaboration requires a human-centric approach that balances technological advancement with workforce development, ethical governance, and organizational support.
Study conclusion/recommendation based on survey findings (perceptions of opportunities and challenges) and analytical results (correlation/regression).
high positive Opportunities and Challenges of Human- AI Collaboration in W... effective implementation of human–AI collaboration (organizational success facto...
Human–AI collaboration reduces employees' routine workload.
Respondent perceptions collected via the structured questionnaire and analyzed with descriptive statistics and regression in SPSS.
high positive Opportunities and Challenges of Human- AI Collaboration in W... amount of routine work assigned to employees
AI-based systems support better decision-making by providing data-driven insights, allowing employees to focus on higher-level cognitive and strategic activities.
Survey responses (structured questionnaire) analyzed with SPSS (correlation and regression analyses) reporting perceived support for decision-making.
high positive Opportunities and Challenges of Human- AI Collaboration in W... decision-making quality / decision-support
Human–AI collaboration significantly enhances workplace efficiency and productivity by reducing routine workload and improving accuracy and speed in task execution.
Primary data from employees in AI-enabled organizations collected via a structured questionnaire (5-point Likert); analyzed with SPSS using descriptive statistics and regression analysis.
high positive Opportunities and Challenges of Human- AI Collaboration in W... workplace efficiency and productivity (reduction in routine workload, improved a...
Simulations calibrated to a real multifamily rental market confirm that supra-competitive outcomes arise robustly beyond the theoretical assumptions, including under finite horizons, heterogeneous products, and nonlinear logit demand.
Simulation experiments calibrated to a real multifamily rental market; simulations test finite-horizon settings, product heterogeneity, and nonlinear logit demand formulations.
high positive Misspecified Explore-then-Exploit Leads to Supra-Competitive... occurrence of supra-competitive prices in simulated market environments
Under symmetric exploration, prices can reach monopoly levels.
Theoretical result derived in the ODE analysis showing convergence to monopoly-level prices in symmetric exploration scenarios.
high positive Misspecified Explore-then-Exploit Leads to Supra-Competitive... price level (specifically reaching the monopoly price)
Supra-competitive prices arise when firms explore within similar price ranges on the same side of the Nash price.
Analytical characterization from the fluid-limit ordinary differential equation (ODE) analysis of the explore-then-exploit pipeline with misspecified monopoly-style estimation.
high positive Misspecified Explore-then-Exploit Leads to Supra-Competitive... whether equilibrium prices are supra-competitive (above Nash)
Simple algorithmic pricing systems can systematically produce collusive-like (supra-competitive) prices in multi-firm markets.
Theoretical model of multi-firm pricing with an explore-then-exploit pipeline and misspecified monopoly-style demand estimation; fluid-limit ODE analysis characterizing convergence; supporting simulations calibrated to a real multifamily rental market.
high positive Misspecified Explore-then-Exploit Leads to Supra-Competitive... price level relative to the Nash equilibrium
Continuous, simulation-driven prompt optimization is both tractable and necessary for reliable enterprise conversational AI at scale.
Concluding claim in abstract: 'Our results suggest that continuous, simulation-driven prompt optimization is both tractable and necessary...'.
high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... feasibility and necessity of continuous simulation-driven prompt optimization
PRISM is designed to run on a scheduled basis (daily), treating LLM behavioral drift as a first-class reliability concern.
Design statement in abstract describing scheduled daily runs to monitor behavioral drift.
high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... scheduled (daily) monitoring frequency
PRISM diagnoses root causes of failures and surgically repairs the prompt, iterating until all tests pass.
Methodological description in abstract stating diagnosis and iterative repair loop until tests pass.
high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... automated failure diagnosis and prompt repair (iteration to pass all tests)
PRISM simulates full multi-turn conversations against a platform-faithful LLM environment and evaluates pass/fail using an LLM-as-judge.
Method/architecture claim in abstract describing simulation of multi-turn conversations and LLM-based judging.
high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... simulation and automated evaluation of conversations
PRISM automatically generates test cases from plain-language agent requirements.
Methodological description in abstract stating PRISM takes plain-language requirements and automatically generates test cases.
high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... test-case generation from requirements