The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
IDS jointly and incrementally synthesizes implementation and proof, and learns from failed attempts to systematically try promising strategies.
Description of the IDS method and architecture presented in the paper (system design and algorithmic loop).
This paper presents the first effective approach to addressing the gap between LLM coding agents and mechanized formal verification for distributed systems (Inductive Deductive Synthesis, IDS).
Statement of novelty supported by the empirical claim that IDS succeeds on all 7 benchmark specs while prior SOTA agents did not; methodological description of IDS as a joint, incremental synthesis and learning system.
IDS further incorporates performance feedback into the same loop, yielding implementations up to 3x faster than published verified systems.
Empirical benchmarking of IDS-produced implementations against published verified systems, with performance (runtime) comparisons reporting up to a 3x speedup.
high positive Inductive Deductive Synthesis: Enabling AI to Generate Forma... organizational_efficiency
IDS is 17% cheaper than SOTA agents.
Cost comparison reported in the paper between IDS and the evaluated SOTA coding agents across the same 7 specs, yielding a 17% cost reduction for IDS.
high positive Inductive Deductive Synthesis: Enabling AI to Generate Forma... organizational_efficiency
IDS is roughly 200x faster than expert effort.
Comparison in the paper between IDS runtime (hours) and the typical expert effort (described as months to years) required for mechanized formal verification of similar distributed-system specifications; reported multiplicative speedup (~200x).
IDS costs $106 per spec on average.
Reported monetary cost computed for IDS runs averaged across the 7 specs in the evaluation.
high positive Inductive Deductive Synthesis: Enabling AI to Generate Forma... organizational_efficiency
IDS achieves 7/7 (succeeds on all 7 specs) in about 6.8 hours per spec on average.
Empirical evaluation of IDS on the same suite of 7 distributed key-value-store specifications, with runtime (wall-clock) measured and averaged over the 7 specs.
The paper presents a comprehensive empirical study of key design choices — including alignment objectives, embedding dimensionality, model scale, architecture, and optimization strategies — to identify configurations that are most effective in production settings.
Authors report an empirical study covering multiple design axes; details of experiments, datasets, and sample sizes are not included in the excerpt.
high positive HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLM... design configuration effectiveness for production deployment
HARNESS-LM (HLM) is a three-phase training framework for transferring the capabilities of large-scale retrievers into compact, cost-efficient models: (1) train a high-performance reference ('teacher') retriever by fine-tuning a billion-parameter-scale SLM; (2) align query representations via an L2 objective to distill knowledge into a sub-600M parameter student encoder; (3) apply a final contrastive refinement stage to optimize the student for retrieval performance.
Methodological description of the HLM training recipe and model sizes provided in the paper; supported by subsequent empirical evaluations reported in the paper.
high positive HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLM... model compression / knowledge transfer into compact retriever
Online A/B testing on Bing Ads shows a +0.4% Click uplift over the current ensemble of retrievers running in production with the deployed 190M parameter model.
Live online A/B testing on Bing Ads comparing HLM deployment to the production ensemble using the 190M parameter model; exact experiment details not provided in excerpt.
Online A/B testing on Bing Ads shows a +0.6% Impression uplift over the current ensemble of retrievers running in production with the deployed 190M parameter model.
Live online A/B testing on Bing Ads comparing HLM deployment to the production ensemble using the 190M parameter model; exact experiment details not provided in excerpt.
Online A/B testing on Bing Ads shows a +1% Revenue uplift over the current ensemble of retrievers running in production with the deployed 190M parameter model.
Live online A/B testing on Bing Ads comparing HLM deployment to the production ensemble using the 190M parameter model; exact experiment duration, traffic allocation, and statistical significance not provided in excerpt.
HLM delivers up to 20x higher throughput on NVIDIA A100 GPUs.
Throughput benchmarking on NVIDIA A100 GPUs comparing HLM student encoder to baseline/reference encoders; exact workload and measurement details not provided in excerpt.
HLM delivers up to 27x lower online query-encoder latency on NVIDIA A100 GPUs.
Measured inference latency on NVIDIA A100 GPUs comparing HLM student encoder to baseline/reference encoders; exact measurement procedure and number of runs not provided in excerpt.
high positive HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLM... online query-encoder latency
On a real-world Bing Ads evaluation benchmark, HLM recovers over 98% of the reference retriever's precision across multiple settings.
Empirical evaluation on a real-world Bing Ads retrieval benchmark comparing HLM student retriever to a high-performance reference (teacher) retriever; exact benchmark dataset and number of test queries not reported in excerpt.
Accountability assets are complementary assets that make AI-supported outputs legitimate, auditable, reviewable, and assignable to a responsible party.
Conceptual definition and development in the paper; supported by illustrative domain examples but no empirical validation.
high positive Redrawing the AI Map: A Theory of Accountability Boundaries ... legitimacy/auditability/assignability of AI outputs (regulatory/compliance readi...
Agentic AI orchestrators reduce the interface and assembly costs of composing information systems capabilities across organizational boundaries, seemingly accelerating modularization and organizational disaggregation.
Conceptual/theoretical argument in the paper; theory development and illustrative examples across domains (document processing, legal services, audit, clinical decision support, procurement). No empirical sample or quantitative test reported.
high positive Redrawing the AI Map: A Theory of Accountability Boundaries ... organizational disaggregation / modularization
The paper's contribution is to clarify the trade-offs that infrastructure decisions often obscure, distinguish deliberate triad governance from default allocation by market power or regulatory inertia, and propose a Deliberate Triad Choice Framework for policymakers considering AI infrastructure decisions of significant scale.
Stated contributions in the abstract: conceptual clarification, normative distinction between deliberate governance and default allocation, and proposal of a policy framework (Deliberate Triad Choice Framework).
high positive The AI Infrastructure Triad in Regional Governance: How Regi... availability and design of a policy framework (Deliberate Triad Choice Framework...
This article develops the AI Infrastructure Triad as a conceptual framework for analyzing three competing priorities in regional AI infrastructure governance: Progress, Sustainability, and Equity.
Theoretical/conceptual development presented in the paper; synthesis of prior work on economic, physical, and moral limits of AI development.
high positive The AI Infrastructure Triad in Regional Governance: How Regi... conceptual clarity of governance priorities (Progress, Sustainability, Equity)
Together, the capability profile and the jaggedness measure give a deployment-relevant diagnostic that the overall ranking alone cannot provide.
Argument supported by observed cases in the experiments where models with similar overall ranks differed on capability axes and jaggedness, implying additional diagnostic value.
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... diagnostic usefulness for deployment decisions
Newer frontier-tier models score higher on average.
Aggregate results from the head-to-head tournament comparing nine models across sampled games (>36k matches).
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... average model score / overall strength
We introduce a jaggedness measure of within-distribution smoothness that detects when a model's advantage jumps unpredictably between strategically similar games.
Methodological contribution described in paper (jaggedness metric).
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... within-distribution smoothness / local volatility (jaggedness)
We pair the game distribution with a capability-profile methodology that decomposes model competence across six axes (state space, temporal depth, information sensitivity, opponent modeling, risk, and brittleness).
Methodological description in paper introducing the capability-profile decomposition.
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... decomposed capability profile across six axes
The generator can draw fresh games on demand, allowing for evergreen evaluation and resistance to contamination.
Method claim about generator capability described in the paper.
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... freshness/resistance-to-contamination of benchmarks
We introduce GENSTRAT, which uses procedurally generated strategic environments to address the limitations of fixed benchmarks.
Methodological contribution described in paper: design and implementation of GENSTRAT.
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... availability of procedurally generated strategic environments for evaluation
Large language models (LLMs) are increasingly deployed as economic agents in marketplaces, auctions, and bidding settings.
Introductory statement in the paper situating motivation; no empirical data reported in the abstract to quantify the increase.
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... deployment of LLMs as economic agents
FastKernels is released (code available) as a stepping stone toward kernel agents whose benchmark gains translate directly into production throughput improvements.
Statement of code release with GitHub URL provided in abstract.
high positive FastKernels: Benchmarking GPU Kernel Generation in Productio... availability of code / reproducibility (release of benchmark and framework)
The FastKernels kernels collectively subsume those of 96.2% (409/425) of HuggingFace Transformers architectures.
Coverage analysis comparing FastKernels' kernels to HuggingFace Transformers architectures (numbers given in abstract: 409/425).
high positive FastKernels: Benchmarking GPU Kernel Generation in Productio... architecture coverage (proportion subsumed)
FastKernels is built around a minimal set of 46 representative architectures spanning 8 categories.
Design of the benchmark as described in the paper; explicit counts provided in abstract.
high positive FastKernels: Benchmarking GPU Kernel Generation in Productio... benchmark coverage (number of representative architectures and categories)
This paper provides new evidence on AI adoption from a non-US context by leveraging German firm-level data (ifo Business Survey).
Use of a large German business survey (ifo Business Survey) and analysis of AI adoption patterns among German firms.
high positive AI adoption among German firms Empirical evidence on AI adoption in Germany (contribution to literature)
AI is expected to have positive long-term productivity impacts for different sectors of the German economy.
Assessment of potential productivity impacts using firm-level survey responses about expected long-term benefits of AI (forward-looking/expectation-based analysis).
high positive AI adoption among German firms Expected firm-level productivity / anticipated long-term productivity benefits
The increase in AI usage from 2023 to 2024 was particularly pronounced in manufacturing and services sectors.
Sectoral breakdown of ifo Business Survey firm-level data showing higher increases in reported AI usage for manufacturing and services compared with other sectors.
high positive AI adoption among German firms AI usage / AI adoption rate by sector
There was a significant increase in AI usage among German firms from 2023 to 2024.
Firm-level responses from the ifo Business Survey comparing reported AI usage in 2023 versus 2024 (cross-sectional/descriptive trend analysis).
high positive AI adoption among German firms AI usage / AI adoption rate (reported by firms)
We propose efforts that individuals and leaders can take to support their colleagues through AI transformation while preserving healthy company cultures that support diverse thinking, collaboration, and informal interactions.
Authors' prescriptive recommendations derived from interview insights; recommendations are not empirically validated in the study.
high positive Beyond the Org Chart: AI and the Transformation of Invisible... leadership and individual practices to preserve culture during AI adoption
We propose steps that AI companies can take to make the invisible work more visible.
Authors' normative recommendations based on synthesis of the qualitative interview findings; not empirically tested within the paper.
high positive Beyond the Org Chart: AI and the Transformation of Invisible... organizational practices to surface invisible work
Some of these changes are positive, such as smoother collaboration between peers.
Interviewee accounts from the 24-participant qualitative study reporting perceived improvements in peer collaboration due to AI tools.
high positive Beyond the Org Chart: AI and the Transformation of Invisible... peer collaboration / team coordination
To support sustainable human–AI collaboration, the authors emphasize adopting a human-centered approach that prioritizes transparency, explainability, and user autonomy.
Authors' policy/research/practice recommendation grounded in the review synthesis of the interdisciplinary literature.
high positive Yapay Zeka Sistemleri ve İnsan İşbirliğinin Psikolojik, Sosy... adoption of human-centered design practices (transparency, explainability, user ...
Well-designed AI systems have the potential to increase cognitive efficiency and job satisfaction.
Synthesis of findings across reviewed studies indicating positive associations between human-centered AI design and outcomes like cognitive efficiency and job satisfaction.
high positive Yapay Zeka Sistemleri ve İnsan İşbirliğinin Psikolojik, Sosy... cognitive efficiency (and job satisfaction, secondary)
The successful integration of AI-driven EPM systems relies on the synergy between AI technologies and human judgment, allowing healthcare organizations to cultivate a more dynamic, innovative and responsive workforce.
Normative/concluding statement in the scoping review based on synthesis of included studies (n=29).
high positive The influence of AI-Driven Employee Performance Management (... integration success conditional on human-AI synergy; workforce dynamism and resp...
AI-driven EPM systems mark a significant advance in accessing real-time performance data and provide considerable progression when utilized within appropriate guidelines.
Conclusion drawn in the paper from the scoping review of 29 empirical studies; phrased as an overall assessment.
high positive The influence of AI-Driven Employee Performance Management (... availability/access to real-time performance data and improvement in HR processe...
Predictive analytics help manage high rates of burnout.
Reported in the scoping review as a benefit across included studies (n=29).
high positive The influence of AI-Driven Employee Performance Management (... burnout management / mitigation
Predictive analytics optimize operations.
Stated as an operational benefit in the scoping review (29 studies).
high positive The influence of AI-Driven Employee Performance Management (... operational optimization (scheduling, resource allocation, workflows)
Predictive analytics assist in assessing labor shortages.
Reported use-case in the scoping review synthesizing empirical studies (n=29).
high positive The influence of AI-Driven Employee Performance Management (... ability to assess/predict labor shortages
Predictive analytics are vital in orchestrating healthcare organizations’ strategic and operational activities.
Claim derived from the scoping review's conclusions across included studies (n=29).
high positive The influence of AI-Driven Employee Performance Management (... usefulness of predictive analytics for strategic/operational decision-making
AI-powered EPM produces significant time savings for managers.
Reported as a benefit in the scoping review synthesis (29 studies); no numerical magnitude given in the excerpt.
high positive The influence of AI-Driven Employee Performance Management (... manager time spent on EPM tasks / administrative burden
AI-powered EPM helps identify potential leaders.
Summarized outcome across empirical studies in the scoping review (n=29).
high positive The influence of AI-Driven Employee Performance Management (... identification of leadership potential / talent spotting
AI-powered EPM heightens employee engagement.
Reported as an aggregated finding in the scoping review of 29 empirical studies.
AI-powered EPM increases the frequency of feedback to employees.
Stated as a benefit in the scoping review synthesis across included studies (n=29).
AI-powered EPM platforms result in considerable improvements in efficiency, including increased frequent feedback, heightened employee engagement, identification of potential leaders and significant time savings for managers.
Synthesis claim from the scoping review of 29 empirical studies; no quantitative effects reported in the excerpt.
high positive The influence of AI-Driven Employee Performance Management (... efficiency gains and specific HR outcomes (feedback frequency, engagement, leade...
The delivery of high-quality healthcare depends essentially on the effective functioning of personnel, who are the vital resource for maintaining reputation, fostering a culture of continuous improvement, and ensuring the overall effective operation of the healthcare sector.
Conceptual assertion in the paper supported by literature synthesis in the scoping review (29 studies).
high positive The influence of AI-Driven Employee Performance Management (... relationship between personnel functioning and healthcare quality/delivery