The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (8570 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Adoption Remove filter
Standard health system digital transformation policy, which typically addresses only the threshold failure through individual incentives, is predicted to systematically produce the partial adoption trap.
Model prediction contrasting full policy architecture vs. conventional policies that focus solely on individual incentives; analytical conclusion that such limited policies leave other failure modes unaddressed and therefore lead to stable partial adoption. Theoretical model; no empirical sample.
high negative The partial adoption trap: Coordination failure, trust, and ... policy-induced equilibrium (partial adoption trap likelihood) under conventional...
The barrier-lowering benefit of failed attempts is offset when trust erosion is rapid.
Model analysis combining cost-ratchet dynamics and trust erosion parameters; results showing interaction where fast trust erosion negates barrier reductions. Theoretical simulations/derivations; no empirical sample.
high negative The partial adoption trap: Coordination failure, trust, and ... net effect on adoption barriers given interplay of cost ratchet and trust erosio...
These failure modes are most severe precisely for the technologies with the greatest systemic value: the Value-Adoption Paradox.
Analytical result from the model showing failure-mode severity as a function of systemic value; theoretical identification of a paradox where higher systemic-value technologies face stronger coordination/trust/cultural barriers. Theoretical derivation; no empirical sample.
high negative The partial adoption trap: Coordination failure, trust, and ... relationship between systemic value of technology and severity of adoption failu...
The basin of attraction of the partial adoption trap is enlarged by a cultural failure arising from negative coordination norms among doctors.
Model analysis including cultural coordination norms; theoretical demonstration that negative norms exacerbate partial adoption equilibria. Theoretical model; no empirical sample.
high negative The partial adoption trap: Coordination failure, trust, and ... size of basin of attraction for partial adoption (effect of cultural/coordinatio...
The basin of attraction of the partial adoption trap is enlarged by a trust failure arising from the organisation's inability to credibly commit to sharing productivity gains.
Model extension incorporating organisational commitment/transfer of gains; analytical results showing trust/commitment constraints increase stability of partial adoption. Theoretical model; no empirical sample.
high negative The partial adoption trap: Coordination failure, trust, and ... size of basin of attraction for partial adoption (effect of trust/commitment con...
The basin of attraction of the partial adoption trap is enlarged by a threshold coordination failure arising from the non-appropriable nature of systemic benefits.
Model analysis showing how non-appropriable systemic benefits (externalities) change payoff structure and enlarge the basin of attraction for partial adoption. Theoretical derivation; no empirical sample.
high negative The partial adoption trap: Coordination failure, trust, and ... size of basin of attraction for partial adoption (likelihood of landing in parti...
Current monolithic architectures struggle to enforce rigid brand constraints, frequently hallucinating unapproved visual assets.
Asserted critique of existing architectures in paper; no specific empirical metrics, datasets, or sample sizes provided.
high negative Genflow Ad Studio: A Compound AI Architecture for Brand-Alig... hallucination of unapproved assets / brand compliance
Integration of generative video models into enterprise environments is restricted by temporal inconsistencies and severe brand misalignment.
Statement in paper describing deployment limitations; no empirical study, dataset, or sample size provided to quantify these restrictions.
high negative Genflow Ad Studio: A Compound AI Architecture for Brand-Alig... brand alignment / temporal consistency
Deterministic copy collapses uncertainty (i.e., copying deterministically collapses the learner's uncertainty over actions).
Ablation/diagnostic comparisons reported in the paper showing deterministic-copy policies reduce or collapse uncertainty compared to stochastic or trace-informed policies in the benchmark tasks.
high negative When Outcome Looks Right But Discipline Fails: Trace-Based E... uncertainty over action distributions (uncertainty collapse)
Reward-only PPO variants miss trace alignment (they achieve reward/KPIs but do not align with benchmark trace/behavior).
Empirical comparison across the two-hotel benchmark and a compact hidden-budget bidding task showing reward-only PPO variants fail to match trace-based diagnostics.
high negative When Outcome Looks Right But Discipline Fails: Trace-Based E... trace alignment (agreement between agent trace and benchmark behavior)
Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance.
Empirical/modeling results from the paper's framework (simulation results using projection models + Azure operational data); the abstract claims material effects but does not report numeric sample sizes or effect sizes in the excerpt provided.
high negative Designing Datacenter Power Delivery Hierarchies for the AI E... deployable capacity / effective capex / delivered performance (primary: deployab...
Designing an efficient power delivery hierarchy for the long run is difficult because rack placement feasibility, workload impact, and cost depend jointly on electrical topology, deployment granularity, placement policy, power oversubscription, and workload mix.
Analytic/methodological claim enumerating interacting factors; stated as a complexity motivating the modeling framework.
high negative Designing Datacenter Power Delivery Hierarchies for the AI E... difficulty/complexity of designing efficient power delivery hierarchies
Power utilization is particularly important as grid power capacity is a scarce resource in the AI era.
Contextual claim in the paper linking increased AI demand to constrained grid power capacity; supported by the paper's framing rather than reported empirical measurements in the abstract.
high negative Designing Datacenter Power Delivery Hierarchies for the AI E... grid power scarcity/importance of power utilization
As power densities increase, a datacenter designed for a different target density may strand power, i.e., may be unable to use all the power that its delivery hierarchy has provisioned.
Conceptual/mechanistic claim supported by the paper's modeling framework that examines mismatches between provisioned power and deployed demand; no numeric sample size provided in the abstract.
high negative Designing Datacenter Power Delivery Hierarchies for the AI E... power stranding (unused provisioned power)
This poses a major challenge for datacenter power delivery designers.
Argument based on the projected rise in rack power density and resulting engineering constraints; asserted in the paper's introduction/contextual framing rather than an experimental result.
high negative Designing Datacenter Power Delivery Hierarchies for the AI E... difficulty/challenge for datacenter power delivery design
Yapay zekâ gelişmekte olan ekonomiler için hem fırsatlar hem de tehditler yaratmaktadır: AI işgücü maliyeti avantajını törpüleyebilir.
Kavramsal değerlendirme; mekanizma temelli argüman (otomasyon işgücü maliyeti avantajını azaltır); ampirik veri ya da örneklem belirtilmemiştir.
high negative Yapay Zekâ ve Küresel Değer Zincirleri: Ticaret Politikası v... gelişmekte olan ülkelerin işgücü maliyeti avantajının azalması
Bu dönüşüm mevcut küresel değer zinciri yapılarını ve ülkelerin bu zincirlerdeki konumlarını doğrudan sorgulamaktadır.
Kavramsal tartışma; yazarın analitik çerçevesiyle GVC (küresel değer zinciri) yapılarının AI ile yeniden değerlendirilebileceği ileri sürülmektedir; ampirik örneklem yok.
high negative Yapay Zekâ ve Küresel Değer Zincirleri: Ticaret Politikası v... ülkelerin küresel değer zincirlerindeki konumlarının belirsizleşmesi / yeniden b...
Monte Carlo simulations illustrate that standard DID estimators that ignore spillovers can miss the total effect.
Monte Carlo simulation results reported in the paper comparing standard DID estimators (which ignore spillovers) to the proposed approach; simulations show standard DID can fail to capture the total effect under spillovers.
high negative Identification and Estimation of Staggered Difference-in-Dif... accuracy of total effect estimation (bias/omission by standard DID)
No existing AI system replicates this: conversational recommenders treat recommendation as a terminal act, while general-purpose LLMs hallucinate product claims and default to generic promotional templates that fail to engage or persuade.
Author assertion/diagnosis comparing existing conversational recommenders and general-purpose LLMs; no empirical comparisons or quantified evaluation provided in the excerpt.
high negative VerbalValue: A Socially Intelligent Virtual Host for Sales-D... quality of recommendations / engagement and persuasion
Das Dokument untersucht neuere Daten zur Verbreitung von KI in den G7-Volkswirtschaften, die auf große und anhaltende Unterschiede zwischen KMU und großen Unternehmen hindeuten.
Empirical examination of recent diffusion/adoption data across G7 economies as described in the paper; no sample size or specific datasets provided in the excerpt.
high negative Einführung von KI in kleinen und mittleren Unternehmen Unterschiede in der KI-Verbreitung zwischen KMU und großen Unternehmen
Trotz der jüngsten technologischen Fortschritte bei KI-Tools, sind KMU bei der Einführung von KI im Vergleich zu anderen digitalen Technologien und größeren Unternehmen zurückhaltender.
Statement referencing 'neuere Daten zur Verbreitung von KI in den G7-Volkswirtschaften' showing differences between SMEs and large firms; implies empirical analysis of diffusion/adoption data (no sample size given in excerpt).
high negative Einführung von KI in kleinen und mittleren Unternehmen Adoption/Verbreitung von KI-Technologien in KMU versus großen Unternehmen
The analysis also identifies risks linked to exclusion, symbolic compliance, and concentration of control over compliance processes.
Theoretical risk mapping produced by the integrative review and interpretive synthesis; no primary empirical evidence presented.
high negative RegTech-enabled governance of sanctions-safe enterprise ecos... risks of RegTech governance (exclusion, symbolic compliance, concentration of co...
Uncertainty around compliance and excessive risk avoidance reduce the space for lawful business activity.
Interpretive synthesis of evidence and arguments across the reviewed literatures (sanctions compliance, institutional voids); no original empirical test.
high negative RegTech-enabled governance of sanctions-safe enterprise ecos... extent of lawful business activity (regulatory-compliance-driven market particip...
Firms working under such conditions often experience limited access to finance and markets.
Claim derived from literature on firm constraints in weak institutional/sanctioned contexts as reviewed in the paper; no primary empirical data reported.
high negative RegTech-enabled governance of sanctions-safe enterprise ecos... access to finance and markets for firms
Post-conflict and sanctions-affected environments are strongly affected by sanctions pressure, weak rule enforcement, and high levels of corruption risk.
Synthesis of literature on sanctions, weak institutions, and corruption risk presented in the integrative review; no new empirical sample reported.
high negative RegTech-enabled governance of sanctions-safe enterprise ecos... institutional environment quality (sanctions pressure, rule enforcement, corrupt...
Currently, systematic assessment errors cause owners of lower-valued properties to face disproportionately high tax burdens, creating regressivity in the property tax system.
Empirical analysis of property assessments and tax burdens using 26 million property sales across ~95% of U.S. counties, showing systematic errors that bias tax burdens toward lower-valued properties.
high negative Tradeoffs are Domain Dependent: Improving Accuracy and Fairn... distributional tax burden (regressivity across property value quintiles)
There are limits to technology‑led growth strategies in labor‑abundant contexts; such strategies do not reliably deliver inclusive employment gains.
Argument based on synthesis of theory and comparative field evidence demonstrating weak employment outcomes from technology‑led growth in labor‑abundant settings (no quantitative effect sizes reported).
high negative Automation, Migration, and Development: Geography of Job Pre... effectiveness of technology-led growth strategies for employment generation
Digital media play a significant role in shaping youth mobilization and political unrest in migrants' countries of origin.
Empirical observations and regional field evidence reported in the paper linking digital media use to youth mobilization and political outcomes (qualitative/comparative evidence; no numeric sample size provided).
high negative Automation, Migration, and Development: Geography of Job Pre... youth mobilization and political unrest
Developing countries face macroeconomic vulnerabilities because of dependence on remittances, which are exposed by automation-driven changes in migrant labor demand.
Analytical linkage developed in the paper supported by comparative field evidence and macroeconomic reasoning; remittance dependence highlighted as a vulnerability (no quantitative estimates or sample sizes reported).
high negative Automation, Migration, and Development: Geography of Job Pre... macroeconomic vulnerability arising from remittance dependence
Technology adoption in core industries in advanced economies is linked with labor displacement, rising youth unemployment, and urban labor saturation in South Asia and North Africa.
Geographically grounded framework combined with comparative regional field evidence focused on South Asia and North Africa (qualitative/comparative field data referenced; no numeric sample sizes provided).
high negative Automation, Migration, and Development: Geography of Job Pre... labor displacement / youth unemployment / urban labor saturation
AI adoption and accelerating automation amplify employment precarity in labor‑surplus economies.
Conceptual synthesis grounded in economic geography and labor economics, supported by comparative field evidence cited for labor‑surplus contexts (no quantitative sample size reported).
high negative Automation, Migration, and Development: Geography of Job Pre... employment precarity (job quality and stability)
Automation functions as a transnational shock that contracts demand for migrant labor in advanced economies.
Theoretical argument drawing on economic geography, labor economics, and development studies; comparative/regional field evidence referenced in the paper (no numerical sample size reported).
Shifts persist in even the newest AI models despite remarkable progress in AI modeling, post-training alignment and safeguards.
Asserted in paper; supported by later empirical validation across multiple models and production chatbots (see other claims), but no explicit sample size in this sentence.
high negative Fusion-fission forecasts when AI will shift to undesirable b... persistence of undesirable behavioral shifts despite alignment/safeguards
ChatGPT-like AI behavior can shift, unnoticed, from desirable to undesirable (e.g., encouraging self-harm, extremist acts, financial losses, or costly medical and military mistakes), and no one can yet predict when.
Statement in paper framing the problem; qualitative observations and motivating examples (no numeric sample size provided in the excerpt).
high negative Fusion-fission forecasts when AI will shift to undesirable b... occurrence of unnoticed shifts from desirable to undesirable outputs
An analysis of a 21-instrument inventory identifies an incentive gradient where geopolitical and industrial pressures systematically reward surface-level behavioral proxies over deep structural verification.
Empirical/qualitative analysis of an inventory of 21 governance instruments compiled and analysed in the paper (n=21 instruments).
high negative Position: Behavioural Assurance Cannot Verify the Safety Cla... governance_and_regulation
Behavioural assurance, even when carefully designed, is being asked to carry safety claims it cannot verify.
The paper's normative and conceptual argument synthesising governance requirements and the epistemic limits of behavioural testing.
Current assurance methodologies (primarily behavioural evaluations and red-teaming) are epistemically limited to observable model outputs and cannot verify latent representations or long-horizon agentic behaviours.
Conceptual/analytic argument and review of existing assurance methodologies presented in the paper.
Policy responses in Europe are fragmented across the EU and Member State levels and do not match the potential scale of disruption from AGI.
Paper's policy analysis of EU- and Member-State-level responses (stated in abstract); no quantitative metrics provided in the abstract.
high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation
Europe has low rates of industrial AI adoption.
Paper's empirical/policy review claiming low industrial AI adoption in Europe (as stated in abstract); the abstract does not provide numeric adoption rates or sample sizes.
Europe exhibits structural weaknesses in compute infrastructure and talent retention.
Paper's structural assessment of Europe's AI value-chain capabilities (stated in abstract); no numerical measures provided in the abstract.
Europe has limited strategic awareness of frontier AI progress.
Paper's assessment of Europe's positioning based on policy analysis and review of capabilities monitoring (as stated in abstract); no supporting metrics or sample sizes provided in the abstract.
high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation
AGI could strain existing governance frameworks.
Paper's policy analysis describing potential mismatches between governance capacity and AGI-induced disruptions (as stated in abstract); no empirical tests or quantification reported in the abstract.
high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation
AGI could intensify interstate competition.
Paper's geopolitical analysis and scenario-based reasoning informed by trends in AI capabilities (stated in abstract); no quantitative measures reported in the abstract.
high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation
AGI could fundamentally alter the global distribution of economic and military power.
Paper's geopolitical analysis drawing on capability trends and scenario reasoning (as stated in abstract); no empirical quantification provided in the abstract.
high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation
Simulated users produce feedback dynamics that diverge from humans.
Temporal/interaction analysis in the replication showing differences in how simulators provide feedback across multi-turn interactions compared to humans.
high negative PRISM-X: Experiments on Personalised Fine-Tuning with Human ... feedback/interaction dynamics over multi-turn conversations (simulator vs human)
Simulated users exhibit amplified position biases relative to human participants.
Behavioral comparison in the simulator replication showing stronger position biases in simulated responses than in human responses.
high negative PRISM-X: Experiments on Personalised Fine-Tuning with Human ... magnitude of position bias in simulated vs human responses
Simulated users discuss different topics compared to the human participants.
Analysis of conversation content in the simulator replication showing differences in topical distribution between simulators and humans.
high negative PRISM-X: Experiments on Personalised Fine-Tuning with Human ... topic distribution of conversations produced by simulators versus humans
Simulators perform far below human self-consistency baselines for individual judgements.
Comparison in the replication study between simulator consistency and human self-consistency on individual-level judgments; reported large performance gap (simulators far below humans).
high negative PRISM-X: Experiments on Personalised Fine-Tuning with Human ... individual-level judgment consistency (simulator vs human self-consistency)
Amplified sycophancy and relationship-seeking behaviours may introduce deleterious long-term consequences.
Authors' interpretation and cautionary note based on observed behavioral amplification after fine-tuning; presented as potential long-term risk rather than an empirically measured long-term outcome.
high negative PRISM-X: Experiments on Personalised Fine-Tuning with Human ... long-term social/consequential harms from amplified model behaviours (hypothesiz...
In a controlled experiment across six industry configurations (72 tool invocations using Qwen3-32B), unconstrained tool parameters produced a 43% hallucination rate for domain identifiers.
Controlled experiment reported in the paper: six industry configurations, 72 tool invocations, model used: Qwen3-32B; reported unconstrained parameter condition resulted in 43% hallucination rate for domain identifiers.
high negative The Semantic Training Gap: Ontology-Grounded Tool Architectu... hallucination rate for domain identifiers