The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
Agentic Technical Debt and Stochastic Tax are related but distinct: debt can amplify the tax.
Theoretical relationship asserted in the structural model; the note states debt can amplify the recurring Stochastic Tax and provides model expressions and discussion (and illustrative simulation) to substantiate the relationship.
high positive Modeling Agentic Technical Debt and Stochastic Tax: A Standa... impact of accumulated Agentic Technical Debt on the magnitude of Stochastic Tax ...
Combining both levers yields a 502% improvement on single-cell RNA denoising over the initial baseline.
Reported experimental result in the paper comparing SIA to the initial baseline on the single-cell RNA denoising task (denoising metric unspecified in abstract).
high positive SIA: Self Improving AI with Harness & Weight Updates denoising performance for single-cell RNA data
Combining both levers yields a 91.9% runtime reduction on GPU kernels over the initial baseline.
Reported experimental result in the paper comparing SIA to the initial baseline on the low-level GPU kernel optimisation task (runtime measured).
high positive SIA: Self Improving AI with Harness & Weight Updates runtime for GPU kernels
Combining both levers yields a 56.6% gain on LawBench (Chinese legal charge classification) over the initial baseline.
Reported experimental result in the paper comparing SIA to the initial baseline on the LawBench task.
high positive SIA: Self Improving AI with Harness & Weight Updates task performance on LawBench (unspecified metric in abstract)
Combining both levers (harness updates and weight updates) outperforms scaffold iteration alone on all three benchmarks.
Empirical comparison reported in the paper: experiments across the three domains comparing SIA (combined harness+weight updates) against scaffold-iteration-only baseline.
high positive SIA: Self Improving AI with Harness & Weight Updates overall task performance relative to scaffold-only baseline
These results show that per-query configuration of the full retrieval pipeline is a practical alternative to static workload-level tuning.
Authors' conclusion drawn from the reported empirical results on MuSiQue, BrowseComp-Plus, and FinanceBench demonstrating BRANE's performance advantages.
high positive Natural Language Query to Configuration for Retrieval Agents practicality of per-query configuration vs static tuning
BRANE outperforms LLM-routing, rule-based, and fine-tuned Qwen3-4B baselines.
Empirical comparisons against specified baselines (LLM-routing, rule-based approaches, and fine-tuned Qwen3-4B) across the reported benchmark sets. The text does not provide numeric performance metrics or sample sizes in the excerpt.
high positive Natural Language Query to Configuration for Retrieval Agents answer quality (accuracy) and/or cost-quality tradeoff relative to baselines
BRANE matches the best fixed configuration's accuracy at up to 89% lower cost.
Empirical result reported by the authors based on experiments on the named benchmarks (MuSiQue, BrowseComp-Plus, FinanceBench). The provided text states the magnitude ('up to 89% lower cost') but does not give sample sizes or confidence intervals.
high positive Natural Language Query to Configuration for Retrieval Agents inference serving cost while maintaining accuracy
Across MuSiQue, BrowseComp-Plus, and FinanceBench, BRANE consistently pushes the cost-quality Pareto frontier.
Empirical evaluation reported on three benchmark suites: MuSiQue, BrowseComp-Plus, and FinanceBench. The claim is based on experimental comparisons across these datasets; the paper does not state numeric sample sizes in the provided text.
high positive Natural Language Query to Configuration for Retrieval Agents cost-quality tradeoff / Pareto frontier position
There exists a data supply chain that runs from individual translators through language service providers (LSPs) and platforms to model developers.
Mapping and descriptive analysis of industry supply chains and intermediary roles provided in the paper; conceptual and empirical examples of flows of translation data from translators to model developers. No numerical sample reported.
high positive Translators as Invisible Teachers of AI: Copyright, Translat... structure and flow of translation data across actors
Article 30-4 of the Japanese Copyright Act legitimates a mode of use the paper terms 'appropriation without consumption'—i.e., mining works for statistical features rather than reading or experiencing them.
Textual/legal analysis of Article 30-4 of the Japanese Copyright Act and its interpretation; comparative legal reading presented in the paper. No numerical sample reported.
high positive Translators as Invisible Teachers of AI: Copyright, Translat... legal legitimation of non-experiential mining of copyrighted works
The development of statistical machine translation (SMT), neural machine translation (NMT), the Transformer architecture, and multilingual large language models (LLMs) cannot be disentangled from the accumulation of translation data (TM/parallel corpora).
Historical and technical literature review linking MT/NLP methodological advances to the availability and use of parallel corpora and TM; comparative analysis of model development histories described in the paper. No numerical sample reported.
high positive Translators as Invisible Teachers of AI: Copyright, Translat... dependence of major MT/LLM advances on accumulated translation data
Translation memories (TM) and parallel corpora preserve a one-to-one correspondence between source and target text and therefore constitute extraordinarily valuable supervised training data for machine translation.
Conceptual argument and literature review of machine translation practice (discussion of TM/parallel corpora as supervised training data); examples and descriptive evidence from MT research and industry practice presented in the paper. No numerical sample reported.
high positive Translators as Invisible Teachers of AI: Copyright, Translat... value of translation data as supervised training inputs for MT
To balance promotion of innovation with preservation of human creativity, it is essential to revise existing laws and introduce novel approaches such as defining a specific intellectual property right for AI-generated works or designating ownership among associated human agents.
Normative recommendation derived from the paper's comparative legal analysis and discussion of enforcement challenges (no empirical sample size).
high positive Examining the Challenges of Intellectual Property in AI-Gene... policy/legislative reforms to IP law for AI-generated works
Artificial intelligence systems are capable of autonomously generating artistic, literary, musical works, and even inventions without direct human intervention.
Stated as part of the paper's premise and supported by the paper's literature/theoretical review of advances in AI creative and inventive capabilities (no empirical sample size reported).
high positive Examining the Challenges of Intellectual Property in AI-Gene... existence/capability of AI to autonomously generate creative works and invention...
EmoDistill learns skills from offline agent-to-agent interactions, avoiding costly online negotiation during training.
Methodological claim that training is performed offline using recorded agent-to-agent interaction data rather than online interactions; described as part of framework benefits.
high positive EmoDistill: Offline Emotion Skill Distillation for Language ... training approach (offline learning) and its cost-avoidance benefit
Transfer studies demonstrate generalization across domains, unseen counterparties, and trained-vs-trained tournaments.
Reported transfer experiments in which EmoDistill-trained policies were evaluated on different negotiation domains, with unseen counterparties, and in tournaments between trained agents; results reportedly show generalization. (Exact metrics and sample sizes not provided in the excerpt.)
high positive EmoDistill: Offline Emotion Skill Distillation for Language ... generalization of policy performance (utility) across domains and opponents
Ablations show that emotion conditioning is essential.
Ablation experiments reported in the paper removing or altering emotion conditioning, which reportedly degrade performance relative to the full EmoDistill model. (No numeric results provided in the excerpt.)
high positive EmoDistill: Offline Emotion Skill Distillation for Language ... performance/utility difference when emotion conditioning is removed
Across four emotion-sensitive, high-stakes negotiation domains, SLM policies trained under the EmoDistill framework achieve the highest utility, outperforming vanilla SLM/LLM baselines and IQL-only emotion selection.
Empirical evaluation across four negotiation domains comparing EmoDistill-trained SLM policies to vanilla SLM/LLM baselines and an ablated IQL-only emotion selector. (Paper reports comparative utility results, but exact sample sizes and numeric effect sizes are not provided in the excerpt.)
high positive EmoDistill: Offline Emotion Skill Distillation for Language ... utility (negotiation reward/outcome)
EmoDistill decomposes emotional strategy into emotion selection and emotion expression: an Implicit Q-Learning (IQL) selector learns which emotion to express, while a Low-Rank Adaptation (LoRA)-based policy learns how to express it through Supervised Fine-Tuning (SFT) and Judge Policy Optimization (JPO).
Description of model architecture and training approach: IQL used as selector; LoRA-based policy trained with SFT and JPO for expression. (Design/implementation claim from methods section.)
high positive EmoDistill: Offline Emotion Skill Distillation for Language ... ability to select and express emotion (method decomposition)
We introduce EmoDistill, an offline framework for distilling emotional negotiation skills into language model agents.
Methodological contribution described in the paper: design and presentation of the EmoDistill framework (decomposition, training pipeline). This is a description of a proposed method rather than an empirical result.
high positive EmoDistill: Offline Emotion Skill Distillation for Language ... method/framework existence and capability to distill emotional negotiation skill...
The present paper states the primitive contract, the toll identity, the within-boundary no-arbitrage result, and the budget guarantee that the later empirical, mechanism-design, and dynamic-underwriting companion papers depend on.
Paper's stated scope and organization asserting that these formal primitives and theorems are provided as foundations for follow-on empirical and companion studies.
high positive Foundations of a Time-Consistent Counterfactual Actuarial Ru... presence/statement of specific formal primitives and theorems (primitive contrac...
(iv) A conservative runtime gating theorem translates high-probability toll envelopes into an executed-action budget guarantee.
Mathematical theorem in the paper proving that given high-probability bounds (toll envelopes), one can derive a guarantee on executed-action budget consumption (runtime gating).
high positive Foundations of a Time-Consistent Counterfactual Actuarial Ru... budget guarantee on executed actions derived from probabilistic toll envelopes
(iii) An irreversible-authority premium is characterized and splits into a strictly positive action-level component plus an if-and-only-if characterization of the set-level robust capital increase.
Formal decomposition/theorem in the paper proving existence of the irreversible-authority premium, showing the action-level component is strictly positive, and providing an iff condition for set-level robust capital increases.
high positive Foundations of a Time-Consistent Counterfactual Actuarial Ru... irreversible-authority premium decomposition and positivity of action-level comp...
(ii, corollary) Gaming-resistance of the system is tied to the design of the underwriting boundary (i.e., a corollary linking gaming-resistance to boundary design).
Corollary derived from the no-splitting theorem that links strategic gaming-resistance properties to specific features of the underwriting boundary.
high positive Foundations of a Time-Consistent Counterfactual Actuarial Ru... relationship between underwriting-boundary design and resistance to gaming/manip...
(ii) A no-splitting property holds within an underwriting boundary that telescopes path-decomposed actions into a boundary potential.
Formal theorem in the paper proving a no-splitting property and showing how path-decomposed action contributions aggregate (telescoping) into a boundary potential.
high positive Foundations of a Time-Consistent Counterfactual Actuarial Ru... no-splitting aggregation property (telescoping into boundary potential)
(i) There exists a well-defined counterfactual toll under a chosen safe-default mapping and continuation policy.
Theoretical derivation / formal proof presented in the paper establishing existence of the toll under specified mappings and policies.
high positive Foundations of a Time-Consistent Counterfactual Actuarial Ru... well-definedness/existence of a counterfactual toll
The framework treats per-action insurance as the primary unit of analysis and replaces post-hoc annual liability cover with a pre-action transaction layer.
Conceptual and design claim supported by the paper's theoretical argumentation and proposed contract primitives; no empirical validation reported.
high positive Foundations of a Time-Consistent Counterfactual Actuarial Ru... shift from annual liability models to per-action pre-action insurance (design/op...
We propose a foundational runtime actuarial layer for autonomous AI agents in which every side-effect-bearing action carries a time-consistent, counterfactual risk toll computed against a contractually fixed safe default, inside an explicit underwriting boundary.
Theoretical proposal and formal description of an actuarial framework presented in the paper (architectural/axiomatic exposition). No empirical sample or experiment reported.
high positive Foundations of a Time-Consistent Counterfactual Actuarial Ru... existence of a runtime actuarial layer assigning counterfactual risk tolls per a...
Hybrid Fusion significantly accelerated the recovery of smaller Slow AI teams (+6.9% at N=4).
Reported intervention result: Hybrid Fusion produced a +6.9% acceleration in recovery for smaller Slow AI teams, reported at N=4.
high positive The Timing Dependencies of Trust: Speed, Accuracy, and cBCI ... team recovery acceleration (performance improvement) after Hybrid Fusion
Integrating these isolated veridical signals via Hybrid Fusion successfully rescued the Fast AI team (+7.6% at N=8).
Reported intervention result: application of Hybrid Fusion integration produced a +7.6% improvement in Fast AI team performance, reported at N=8.
high positive The Timing Dependencies of Trust: Speed, Accuracy, and cBCI ... team performance improvement after Hybrid Fusion
The Riemannian Oracle adapted to task states by heavily restricting temporal windows (< 0.8s) to intercept fast reflexive compliance and widening windows (> 1.2s) to capture delayed cognitive conflict.
Reported algorithmic behavior of the 2D Adaptive Riemannian Oracle in response to measured spatial covariance: window sizes described as <0.8s for fast states and >1.2s for slow states.
high positive The Timing Dependencies of Trust: Speed, Accuracy, and cBCI ... temporal gating/window size of the Riemannian Oracle
In the Slow AI condition, behavioural teams (N=8) eventually recovered to 100.0%.
Reported team performance metric for behavioural teams in Slow AI condition with N=8; team performance reported to reach 100.0%.
high positive The Timing Dependencies of Trust: Speed, Accuracy, and cBCI ... team accuracy/recovery over time
Policy makers and education/training organizations should comprehensively consider AI and EPU to cope with market uncertainty and ensure the stability and sustainability of China’s ETM.
Policy recommendation derived from the paper's empirical findings on causality, quantile dependence, and asymmetric risk spillovers (argumentative/conclusion statement rather than a direct empirical result).
high positive Quantile-based Nonlinear Impact of Artificial Intelligence a... Stability and sustainability of the education and training market (ETM)
There is an interaction between AI and EPU: EPU promotes AI during periods of economic stability.
Cross-quantilogram analysis indicating quantile-specific causality/interactions, with EPU predicting AI in stable-period quantiles (method reported; sample size not stated).
high positive Quantile-based Nonlinear Impact of Artificial Intelligence a... AI (as the dependent/predicted variable)
There is an interaction between AI and EPU: AI promotes EPU in bullish markets.
Cross-quantilogram analysis showing quantile-dependent interaction (method reported; sample size not stated); specific result described for bullish-market quantiles.
high positive Quantile-based Nonlinear Impact of Artificial Intelligence a... Economic Policy Uncertainty (EPU)
The cross-quantilogram indicates quantile dependence among AI, EPU and ETM: the positive predictive effect of AI on ETM is mainly concentrated in bullish markets.
Cross-quantilogram analysis (quantile cross-dependence test) applied to AI and ETM time-series in China (method reported; sample size not stated).
high positive Quantile-based Nonlinear Impact of Artificial Intelligence a... Education and training market (ETM) (predictive effect)
The nonparametric quantile causality test shows a unidirectional causal relationship from AI to China’s education and training market (ETM).
Nonparametric quantile causality test applied to time-series data on AI and ETM in China (method reported; sample size not stated).
high positive Quantile-based Nonlinear Impact of Artificial Intelligence a... Education and training market (ETM)
The nonparametric quantile causality test shows a unidirectional causal relationship from AI to EPU.
Nonparametric quantile causality test applied to time-series data on AI and Economic Policy Uncertainty (EPU) in China (method reported; sample size not stated in the provided text).
high positive Quantile-based Nonlinear Impact of Artificial Intelligence a... Economic Policy Uncertainty (EPU)
The proposed policy framework contributes to establishing a foundation for Vietnam to proactively embrace the Agent Economy safely and effectively.
Claim in abstract about the intended contribution/impact of the proposed framework; no empirical evaluation or measured outcomes presented.
high positive Regulatory Policy for the Agent Economy in the Digital Age: ... capacity of Vietnam to embrace Agent Economy safely/effectively (foundation-buil...
The Agent Economy promises substantial gains in productivity and innovation.
Asserted in paper abstract as an anticipated outcome; no empirical measurement, sample size, or quantified effect provided.
high positive Regulatory Policy for the Agent Economy in the Digital Age: ... productivity and innovation gains
We hope JobBench shifts the community's target labour-market effect from replacement to enhancement: building agents that do what humans actually want delegated, not only what is most economically valuable.
Authors' stated aim/goal for the benchmark (normative/aspirational statement in the paper).
high positive JobBench: Aligning Agent Work With Human Will intended shift in community priorities / framing of labour-market effects (repla...
Each task is packaged as a workspace of heterogeneous reference files, requiring the agent to reason through the cluttered information streams of real professional work.
Design description of task packaging in JobBench (benchmark construction/methodological detail).
high positive JobBench: Aligning Agent Work With Human Will realism of task inputs (heterogeneous reference files; information clutter)
We introduce JobBench, which evaluates AI agents on the workflows that experts identify as high-priority for delegation, empowering humans based on their needs instead of replacing them with GDP value.
Description of a new benchmark (JobBench) presented by the authors; methodological design claim about target tasks and intent (expert-identified workflows prioritized for delegation).
high positive JobBench: Aligning Agent Work With Human Will evaluation of AI agents on expert-identified high-priority workflows for delegat...
This study proposes a Workforce Resilience Governance Framework (WRGF) that includes task-level exposure assessment, human augmentation design, reskilling, redeployment, transparent communication, psychological safety, workforce impact accountability, and policy alignment.
Conceptual framework proposed by the authors in the paper (design/proposal; no empirical test described in the excerpt).
high positive From Automation Panic to Workforce Resilience: A Governance ... components of a governance framework for AI workforce transitions
The paper concludes with policy recommendations for accelerating human-centred AI integration in public-sector HRM.
Stated conclusion and policy recommendations section in the paper; recommendations derived from empirical findings.
high positive Determinants of Artificial Intelligence Adoption in Public S... policy recommendations for AI integration
Access to modern digital tools positively moderates AI uptake.
Reported moderation/interaction effects in regression/path analysis indicating that access to modern digital tools is associated with higher AI adoption/uptake; exact effect size not specified in summary.
Holding a managerial position is the strongest predictor of active AI adoption (OR = 1.609).
Reported odds ratio from the binary logistic regression for role/position predictor (managerial status) predicting active AI adoption; OR = 1.609.
high positive Determinants of Artificial Intelligence Adoption in Public S... active AI adoption (binary)
Internal HR factors exert a stronger influence on perceived HR effectiveness (β = 0.463) than external factors (β = 0.227).
Reported standardized (?) path/regression coefficients from OLS/path analysis linking internal and external HR quality indices to perceived HR effectiveness; coefficients β = 0.463 and β = 0.227 respectively.
high positive Determinants of Artificial Intelligence Adoption in Public S... perceived HR effectiveness
Future evaluations should use artifact-level denominators, reproducible parsing rules, correction taxonomies, and independent coding of governance events.
Authors' recommendations based on methodological lessons from this structured self-observed implementation case study and observed parsing/governance challenges.
high positive Persistent AI Agents in Academic Research: A Single-Investig... recommended methodological practices for future evaluations (artifact-level deno...