The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
The Claude family leads the benchmark and produces the most professional-looking outputs in our qualitative review.
Empirical result reported from the paper's benchmark and qualitative review of agent outputs (specific metrics, number of agents/tasks, and quantitative scores not provided in the excerpt).
high positive WorkstreamBench: Evaluating LLM Agents on End-to-End Spreads... output professionalism/quality
We develop an evaluation taxonomy comprising three dimensions: Accuracy, Formula, and Format, each comprising fine-grained criteria that reflect professional standards.
Methodological contribution stated in paper; described taxonomy elements (Accuracy, Formula, Format) as part of the evaluation design.
high positive WorkstreamBench: Evaluating LLM Agents on End-to-End Spreads... evaluation criteria/taxonomy
We provide one of the first evaluations of agents on end-to-end spreadsheet tasks, focusing on economically critical financial workflows such as modeling and scenario analysis.
Claim of contribution in the paper; refers to the authors' own evaluation study (details like number of tasks/agents not provided in the excerpt).
high positive WorkstreamBench: Evaluating LLM Agents on End-to-End Spreads... existence of evaluation on end-to-end spreadsheet tasks
Frontier AI labs have developed agents that can construct entire spreadsheets from scratch.
Asserted in paper as background/context; no specific models, numbers, or experimental details provided in the excerpt.
high positive WorkstreamBench: Evaluating LLM Agents on End-to-End Spreads... agent capability to construct spreadsheets
LLM agents are increasingly expected to carry out end-to-end workflows, producing complete artifacts from high-level user instructions.
Framing statement in paper; no empirical data or sample size reported to support the trend claim within the excerpt.
high positive WorkstreamBench: Evaluating LLM Agents on End-to-End Spreads... expectations of agent capabilities (trend)
Adoption under higher communicative standards and institutional norms can mitigate suboptimal collective equilibria by imposing social commitments on individual users.
Theoretical argument and model-based analysis proposing communicative and institutional interventions as mitigating mechanisms (conceptual and formal reasoning).
high positive The Human-AI Delegation Dilemma: Individual Strategies, Coll... reduction of suboptimal collective equilibria / improvement in collective outcom...
Individually stable strategies can be scaled to collective equilibria using three extrapolation principles: (a) non-communicative aggregation, (b) local social signaling, and (c) institutional norms setting.
Theoretical extrapolation/principled modeling presented in the paper (conceptual and formal extension from individual to collective level).
high positive The Human-AI Delegation Dilemma: Individual Strategies, Coll... mechanisms for aggregation from individual strategies to collective equilibria
Canonical decision-theoretic strategies that account for adaptive user trajectories can be mapped so that agents transition between strategies based on interaction feedback to reach stable equilibria.
Analytical results from the decision-theoretic modeling in the paper showing adaptive trajectories and stable equilibria (theoretical model derivation).
high positive The Human-AI Delegation Dilemma: Individual Strategies, Coll... stability of agent strategies / attainment of equilibria
The paper develops a decision- and game-theoretic approach to the human-AI delegation-verification dilemma.
Methodological contribution: construction of decision- and game-theoretic models described in the paper (modeling/theoretical development).
high positive The Human-AI Delegation Dilemma: Individual Strategies, Coll... availability of a formal modeling framework for the delegation-verification dile...
Emerging models of human-AI interaction predominantly advance the complementarity thesis variously dubbed human-AI collaboration and human-AI hybrid intelligence.
Literature characterization / conceptual review reported in the paper (no empirical sample or quantitative analysis cited).
high positive The Human-AI Delegation Dilemma: Individual Strategies, Coll... prevalent theoretical framing in human-AI interaction literature (complementarit...
These effects are linked to improvements in green innovation quality.
Authors report that the observed negative associations between AIO and carbon emission intensity are connected to measures of green innovation quality (suggesting a mediating mechanism) in their empirical analyses.
A six-phase, stepwise implementation framework (ABC-XYZ segmentation, forecast model selection, safety stock calibration, replenishment policy assignment, simulation-based parameter tuning, KPI governance) enables enterprises to achieve 9–16% reductions in inventory costs within existing WMS and ERP architectures.
Practical implications presented in the paper proposing a six-phase implementation framework and asserting expected inventory cost reductions of 9–16% when deployed within existing WMS/ERP.
high positive Equitable railway corridor investment under demand uncertain... expected inventory cost reduction achievable by implementing the proposed framew...
Learning-based control methods deliver up to 16% cost reductions under complex network conditions but require substantial data and governance infrastructure.
Findings from included studies (narrative and/or quantitative results) reporting maximum observed reductions 'up to 16%' and qualitative synthesis noting data/governance requirements.
high positive Equitable railway corridor investment under demand uncertain... inventory cost reduction achieved by learning-based control methods; infrastruct...
The cost reduction from multi-echelon coordination increases significantly with network complexity and lead-time variability.
Pre-specified moderator analyses reported in the paper showing effect size growth with network complexity and lead-time variability.
high positive Equitable railway corridor investment under demand uncertain... magnitude of multi-echelon coordination cost reduction as a function of network ...
Multi-echelon coordination yields a pooled mean cost reduction of 11.4% (95% CI: 6.9–15.9%).
Random-effects meta-analysis pooling percentage cost-reduction effect sizes (reported pooled mean and 95% CI).
high positive Equitable railway corridor investment under demand uncertain... inventory cost reduction from multi-echelon coordination
The advantage of distributional safety stock methods is largest for high-variability SKU segments.
Pre-specified subgroup and moderator analyses reported in the paper indicating greater pooled effects in high-variability SKU segments.
high positive Equitable railway corridor investment under demand uncertain... relative cost-reduction advantage of distributional safety-stock vs normal appro...
Distributional safety stock methods outperform classical normal approximations by a pooled mean of 9.3% (95% CI: 5.8–12.7%) at equivalent service levels.
Random-effects meta-analysis pooling percentage cost-reduction effect sizes (reported pooled mean and 95% CI).
high positive Equitable railway corridor investment under demand uncertain... inventory cost reduction at equivalent service levels
Politika önerisi: Yapay zekâ teknolojileri alanında faaliyet gösteren firmalara uygulanan vergi indirim oranları artırılabilir.
Araştırma bulgularının (Ar-Ge vergi teşviklerinin AI patent sayısıyla pozitif ilişkisi) politika çıkarımı; doğrudan ampirik test değil öneri.
high positive AR-GE HARCAMALARININ VE VERGİ TEŞVİKLERİNİN YAPAY ZEKAYA ETK... vergi indirimlerinin artırılması (öneri) ve dolaylı olarak AI patent üretimi
Politika önerisi: Devlet, Ar-Ge harcamalarında verimliliği artırmak için performans ve proje bazlı destekler verebilir.
Yazarların çalışmanın bulgularından hareketle önerdiği uygulamalı politika tedbiri; ampirik olarak test edilmemiş öneri.
high positive AR-GE HARCAMALARININ VE VERGİ TEŞVİKLERİNİN YAPAY ZEKAYA ETK... Ar-Ge verimliliği (öneri/yorum)
Politika önerisi: Teknolojik ilerlemeyi ve yeniliği önemseyen devletler, özel sektörün Ar-Ge yatırımlarını sübvansiyonlar ve düşük faizli krediler gibi araçlarla teşvik etmelidir.
Araştırmanın regresyon bulgularına dayanarak yapılan politika önerisi; doğrudan ampirik test değil, uygulama önerisi (çalışmanın sonuçlarından türetilmiş).
high positive AR-GE HARCAMALARININ VE VERGİ TEŞVİKLERİNİN YAPAY ZEKAYA ETK... özel sektör Ar-Ge yatırım teşviki (öneri) ve dolaylı olarak AI patent üretimi
Yukarıdaki bulgular, özel sektör Ar-Ge harcamalarının ve Ar-Ge’deki vergi teşviklerinin verimli kullanıldığını göstermektedir.
Araştırmanın pozitif ilişkiler üzerine elde ettiği regresyon sonuçlarından çıkarılan yorum/yorumlayıcı çıkarım (G8 + Türkiye, 2010-2020, random effects regresyon).
high positive AR-GE HARCAMALARININ VE VERGİ TEŞVİKLERİNİN YAPAY ZEKAYA ETK... etkinlik/verimlilik (yorumsal çıkarım, doğrudan ölçülmemiş)
Ar-Ge'de uygulanan vergi teşvikleri arttıkça yapay zekâ patent sayıları artmaktadır (pozitif ilişki).
Aynı panel veri seti ve rassal etkiler regresyonu (G8 + Türkiye, 2010-2020); vergi teşvikleri değişkeninin AI patent sayısı üzerindeki katsayısı pozitif bulunmuştur.
high positive AR-GE HARCAMALARININ VE VERGİ TEŞVİKLERİNİN YAPAY ZEKAYA ETK... AI patent sayıları (yapay zekâ patent sayısı)
Özel sektörün Ar-Ge harcamaları ile yapay zekâ (AI) patent sayıları arasında pozitif bir ilişki vardır.
Panel veri analizi: G8 ülkeleri + Türkiye, yıllar 2010-2020; rassal etkiler (random effects) regresyon modeli; ülke-yıl düzeyinde veri (9 ülke × 11 yıl = 99 gözlem). Sonuç olarak özel sektör Ar-Ge harcamaları değişkeninin AI patent sayıları ile istatistiksel olarak pozitif ilişki gösterdiği raporlanmıştır.
high positive AR-GE HARCAMALARININ VE VERGİ TEŞVİKLERİNİN YAPAY ZEKAYA ETK... AI patent sayıları (yapay zekâ patent sayısı)
Given the mixed outcomes (some improvements, some new lint/security issues), stronger tool-in-the-loop quality and security gating is motivated for AI-driven development workflows.
Interpretation/recommendation based on observed mix of improvements and introduced issues from the empirical results (PyQu, Pylint, Bandit analyses) and high merge rates.
high positive Quality and Security Signals in AI-Generated Python Refactor... policy/process recommendation (quality/security gating)
73.5% of the analyzed PRs are merged (developer acceptance is high).
Empirical measurement of PR outcomes (merged vs. not merged) in the AIDev dataset of Python refactoring PRs.
high positive Quality and Security Signals in AI-Generated Python Refactor... PR merge rate (acceptance)
Usability is the quality attribute that improves most frequently, improving in 36.5% of the studied changes.
PyQu-based before-and-after analysis of quality attributes on Python refactoring PRs from the AIDev dataset; reported frequency for the 'usability' attribute.
high positive Quality and Security Signals in AI-Generated Python Refactor... usability (one of PyQu's quality attributes)
Agentic commits improve a quality attribute in 22.5% of the studied changes.
Empirical analysis of Python refactoring pull requests from the AIDev dataset using PyQu (an ML-based Python quality assessment tool) to compare quality attributes before and after each change.
high positive Quality and Security Signals in AI-Generated Python Refactor... improvement in any measured code quality attribute (per change)
The proposed taxonomy advances understanding and provides a structured framework for studying emerging human–algorithmic supervisory arrangements in organizations.
Authors' asserted contribution based on literature synthesis and their taxonomy derived from analysis of 14 real-world settings; intended to guide future research.
high positive A Taxonomy Of Algorithmic Co-Supervision governance_and_regulation
We demonstrate the taxonomy’s applicability through three ACoS examples.
Authors state they applied the taxonomy to three examples (case applications) to show applicability; abstract reports N=3 examples.
high positive A Taxonomy Of Algorithmic Co-Supervision governance_and_regulation
We identify two meta-dimensions, control collaboration and control enactment, and six dimensions that enable researchers to categorize and compare ACoS across organizations.
Taxonomy derived from the authors' analysis (14 real-world settings) and literature synthesis; specific dimensions enumerated in paper (as summarized in abstract).
high positive A Taxonomy Of Algorithmic Co-Supervision governance_and_regulation
Building on prior literature and an analysis of 14 real-world ACoS settings, we propose a taxonomy that conceptualizes the phenomenon.
Method stated in abstract: literature review plus qualitative/empirical analysis of 14 real-world ACoS settings; taxonomy presented as an output.
high positive A Taxonomy Of Algorithmic Co-Supervision governance_and_regulation
Organizations increasingly weave algorithmic systems into control processes.
Statement supported by prior literature review and the paper's motivating statements (no specific empirical trend data reported in abstract).
high positive A Taxonomy Of Algorithmic Co-Supervision adoption_rate
AI is a knowledge-intensive field that is particularly shaped by the flow of knowledge from scientific research to technological development.
Framing/background claim in the introduction describing the nature of AI and its dependence on science-to-technology knowledge flow.
high positive Knowledge flows from science to AI technology: Identifying c... role of scientific knowledge flow in AI development
The analysis covers AI-related patents filed from 2002 to 2021.
Paper states the temporal scope of the patent dataset analyzed (2002–2021).
high positive Knowledge flows from science to AI technology: Identifying c... temporal coverage of analyzed patents
Abstracts from patents and their cited scientific publications were extracted and BERTopic modelling was applied; topic labels were generated using generative AI.
Method description: data extraction of patent abstracts and cited scientific publication abstracts, application of BERTopic for topic modeling, and use of generative AI to create topic labels.
high positive Knowledge flows from science to AI technology: Identifying c... semantic topics derived from patent and cited-publication abstracts
AI patents are classified into four categories using centrality measures derived from a CPC co-occurrence network.
Method section describing construction of a CPC (Cooperative Patent Classification) co-occurrence network and use of centrality measures to partition patents into four categories.
high positive Knowledge flows from science to AI technology: Identifying c... patent classification into four categories
This study proposes a semantic science-technology exploration framework specifically designed for the AI domain, consisting of two stages: technology classification and semantic topic exploration.
Paper description of the proposed framework and its two-stage design (methodological contribution).
high positive Knowledge flows from science to AI technology: Identifying c... existence and design of a two-stage semantic science-technology exploration fram...
Current models demonstrate promising spatial grounding, multimodal alignment, and coordinated action execution.
Qualitative and/or quantitative evaluation results in paper indicating strengths in spatial grounding, multimodal alignment, and coordinated action execution.
high positive CutVerse: A Compositional GUI Agents Benchmark for Media Pos... spatial grounding, multimodal alignment, coordinated action execution
We develop a lightweight parser that transforms raw screen recordings and low-level interaction logs into structured, compositional GUI action trajectories with precise grounding.
Methodological contribution described in paper: parser implementation that converts recordings and logs into structured GUI action trajectories.
high positive CutVerse: A Compositional GUI Agents Benchmark for Media Pos... ability to produce structured, grounded GUI action trajectories from recordings/...
The tasks involve dense multimodal interfaces and tightly coupled interaction sequences.
Task descriptions and dataset characteristics in paper stating tasks are complex, long-horizon, multimodal, and tightly coupled.
high positive CutVerse: A Compositional GUI Agents Benchmark for Media Pos... interface complexity and interaction coupling in tasks
We curate expert demonstrations across 7 professional applications (e.g., Premiere Pro, Photoshop), covering 186 complex, long-horizon tasks grounded in authentic editing workflows.
Dataset construction reported in paper: curated expert demonstrations spanning 7 applications and 186 tasks (numbers provided in text).
high positive CutVerse: A Compositional GUI Agents Benchmark for Media Pos... size and scope of demonstration dataset (number of applications and tasks)
We introduce Cutverse, a benchmark designed to systematically evaluate autonomous GUI agents in realistic media post-production environments.
Paper describes the creation of the Cutverse benchmark as a central contribution (design and implementation described in methods).
high positive CutVerse: A Compositional GUI Agents Benchmark for Media Pos... existence and design of a benchmark for GUI agents in media post-production
GUI agents have made significant progress in web navigation and basic operating system tasks.
Background claim stated in paper referencing prior work on GUI agents applied to web navigation and OS tasks (no specific experiments in this paper to support it).
high positive CutVerse: A Compositional GUI Agents Benchmark for Media Pos... capability progress on web navigation and OS tasks
We develop a unified taxonomy mapping diverging terminology to a shared framework of measured signals based on what benchmark authors claim to measure.
Methodological contribution described in the paper: creation of a taxonomy to harmonize labels and claimed measurement targets across benchmarks (details and mapping provided in paper/tool).
high positive Unsteady Metrics and Benchmarking Cultures of AI Model Build... harmonization/taxonomy of benchmark labels
We introduce and open-source Benchmarking-Cultures-25, a dataset of 231 benchmarks highlighted across 139 model releases in 2025 from 11 major AI builders, alongside an interactive tool to explore the data.
Empirical contribution: the paper publishes the dataset and tool (links provided). Counts reported in the paper metadata (231 benchmarks, 139 model releases, 11 builders).
high positive Unsteady Metrics and Benchmarking Cultures of AI Model Build... size and coverage of the released dataset
The architecture successfully manages profiles with 14,000+ scientific facts (125k tokens), enabling sustained operation beyond full-context limits.
Reported stress test / capability demonstration in paper: profile size stated as 14,000+ facts and 125k tokens stored and managed by the system.
high positive Episodic-Semantic Memory Architecture for Long-Horizon Scien... number of scientific facts and token footprint the system can manage (profile ca...
The Dual Process system maintains 70-85% accuracy with 1-2 second latency while using 62% fewer tokens (45,434 vs 120,000+ limit) compared to full-context approaches.
Reported empirical results from the large-scale evaluation (1,440 queries / 15,000 messages) comparing Dual Process to full-context models; exact accuracy, latency, and token-count figures provided in the paper.
high positive Episodic-Semantic Memory Architecture for Long-Horizon Scien... accuracy; latency (seconds); token usage
The Dual Process Memory Architecture decouples immediate episodic needs (constant 10-message window) from long-term consolidated knowledge (growing at approximately 3 tokens/message).
System design description and measured consolidation growth rate reported in the paper; empirical observation of growth rate stated.
high positive Episodic-Semantic Memory Architecture for Long-Horizon Scien... episodic window size; long-term memory growth rate (tokens/message)
Experiments on real-world and synthetic tabular datasets show that SPN consistently improves robustness and predictive performance under strategic manipulation compared with both tabular foundation models and classical tabular methods.
Empirical experiments reported in the paper (on unspecified real-world and synthetic tabular datasets) comparing SPN to PFN-style tabular foundation models and classical tabular methods; the abstract claims consistent improvements but does not report sample sizes, dataset names, or quantitative effect sizes.
high positive When Tabular Foundation Models Meet Strategic Tabular Data: ... robustness and predictive performance under strategic manipulation
SPN constructs strategic in-context examples to approximate post-manipulation inputs and aligns PFN predictions with the induced strategic distribution.
Description of SPN's mechanism in the paper (methodological detail). Presented as the approach used to approximate strategic post-manipulation inputs and align predictions; no quantitative details or sample sizes in the abstract.
high positive When Tabular Foundation Models Meet Strategic Tabular Data: ... alignment of PFN predictions with induced strategic distribution