The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (11633 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 736 1615
Governance & Regulation 664 329 160 99 1273
Organizational Efficiency 624 143 105 70 949
Technology Adoption Rate 502 176 98 78 861
Research Productivity 348 109 48 322 836
Output Quality 391 120 44 40 595
Firm Productivity 385 46 85 17 539
Decision Quality 275 143 62 34 521
AI Safety & Ethics 183 241 59 30 517
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 105 40 6 187
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 78 8 1 151
Regulatory Compliance 69 64 14 3 150
Training Effectiveness 81 15 13 18 129
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
The negative quadratic term confirms a concave (inverted-U) relationship between AI and economic growth (diminishing marginal returns of AI).
Panel data for 19 G20 countries (2005–2023) estimated with a quadratic specification in GMM; reported negative and statistically significant coefficient on the AI-squared term.
Anhand von Fallstudien aus den G7-Ländern werden verschiedene Einsatzmöglichkeiten veranschaulicht und die wichtigsten Erfolgsfaktoren benannt – Netzanbindung, KI-Inputs, Kompetenzen und Finanzierung.
Evidence comes from G7 country case studies reported in the paper; method = qualitative case studies identifying key success factors (no number of case studies or sample size provided in excerpt).
high mixed Einführung von KI in kleinen und mittleren Unternehmen Schlüssel-Faktoren für erfolgreiche KI-Einführung in KMU (Netzanbindung, Inputs,...
This lack of focus creates uncertainty about whether regulatory technology helps legitimate economic recovery or instead strengthens exclusion and informality.
Interpretive observation from gaps identified in the reviewed literature; no empirical resolution provided.
high mixed RegTech-enabled governance of sanctions-safe enterprise ecos... impact of RegTech on legitimacy of economic recovery vs. exclusion/informality
There is a governance–task decoupling: under structural stress, text-only governance degrades on both governance and task dimensions simultaneously, whereas mechanical enforcement preserves governance quality even as task performance drops.
Experimental stress tests or structural-stress scenarios applied to both governance architectures in the paper's synthetic experiments; observed differential behavior across governance and task metrics. Abstract does not provide numeric details.
high mixed Mechanical Enforcement for LLM Governance:Evidence of Govern... relative robustness of governance quality vs task performance under structural s...
The improvement from mechanical enforcement is driven by architectural separation: LLM-generated rationales under mechanical enforcement show comparable CDL to text-only governance — the gain comes from removing clear-cut decisions from the model's control.
Analysis comparing LLM-generated rationales and a metric called CDL across governance architectures in the synthetic banking experiments; authors attribute improvement to removing certain decisions from the model's control. Specific statistics and CDL definition not provided in abstract.
high mixed Mechanical Enforcement for LLM Governance:Evidence of Govern... CDL of LLM-generated rationales (comparative constraint-level metric) and locus ...
The results vary across the 10 selected countries: the magnitude and significance of AI’s effects differ due to varying technological readiness and differing industrial structures.
Paper statement that results vary across the 10 selected countries and that nuances differ across countries due to varying industrial structures and technological readiness. Implied heterogeneity analysis across countries using the firm-level dataset and regression approaches; no country-level sample counts provided in the excerpt.
high mixed Estimation of Firm Labour Productivity and Sales Growth from... country-level heterogeneity in AI impact on labour productivity and sales growth
Digital transformation reconfigures development patterns across regions and countries, altering established trajectories of regional development.
Theoretical integration of a technology–labor–space framework together with comparative regional field evidence illustrating changing development patterns (no quantified effect sizes or sample sizes reported).
high mixed Automation, Migration, and Development: Geography of Job Pre... regional development patterns (spatial-economic reconfiguration)
Differences in human intervention effectiveness across escalation types are partly explained by variation in workers' post-escalation intervention effort.
Observed correlations (and subgroup comparisons) in the randomized experiment showing that measures of post-escalation effort (e.g., message counts, share of chat rounds, proactivity) vary across escalation types and relate to outcome differences.
high mixed Agentic AI and Human-in-the-Loop Interventions: Field Experi... post-escalation intervention effort and its mediating role on service outcomes
Artificial intelligence (AI) is rapidly reshaping knowledge-intensive work by automating, augmenting, and reconfiguring core professional activities.
Paper asserts this as a motivating observation based on prior literature and descriptive claims; no original empirical sample or quantified data reported.
high mixed AI-driven skill volatility and the emergence of re-skilling ... degree of automation/augmentation of professional tasks
Metis can be subdivided into 'constitutive metis' (knowledge destroyed by the act of formalization) and 'operational metis' (system-specific familiarity that automation can progressively absorb).
Conceptual taxonomy proposed by the authors; definitions and distinctions are theoretical and illustrated via argumentation and prior literature rather than quantified empirical measurement.
high mixed Metis AI: The Overlooked Middle Zone Between AI-Native and W... types of tacit/practical knowledge affecting automation
There is a fundamental reward-coverage tradeoff: concentrating probability mass on high-reward actions reduces variance but risks missing signal on actions the target policy may take.
Explicit characterization in abstract; claimed theoretical analysis/derivation of the tradeoff between variance reduction and coverage when designing logging policies.
high mixed Logging Policy Design for Off-Policy Evaluation variance of OPE estimators and coverage of actions relevant to the target policy
Perceived procedural improvement (participants preferring facilitation and higher reported trust) can coexist with measurable steering of outcomes and unchanged participation inequality, motivating evaluation practices treating outcomes, interaction dynamics, and perceptions as distinct governance targets.
Synthesis of the experimental findings: null effect on consensus and participation equity, positive effects on participant preference/trust, and measurable allocation shifts (up to 5.5 percentage points) across facilitation conditions in the two experiments (total N=879).
high mixed Real-Time Group Dynamics with LLM Facilitation: Evidence fro... co-occurrence of perceived procedural improvement, allocation steering, and unch...
Facilitators shifted select charity-level allocations by up to 5.5 percentage points, directly affecting the final charitable payout.
Analysis of final group allocation outcomes across experimental conditions showing shifts in allocation to specific charities; reported maximum observed shift of 5.5 percentage points attributable to facilitator condition(s). (Study-level sample covering the two experiments; participants organized in groups of three.)
high mixed Real-Time Group Dynamics with LLM Facilitation: Evidence fro... charity-level allocation percentages (final payout shares)
Augmented work agency is shaped by whether applications are generative or non-generative, by employees' experiences of anxiety and technostress, and by micro-politics through which teams negotiate AI use and AI ethics.
Thematic findings from semistructured interviews (28 participants) and document review identifying these factors as shaping agency in practice.
high mixed Reimagining work in the age of intelligent automation: a qua... determinants shaping augmented work agency
The analysis uncovers three central tensions shaping AI-mediated work: autonomy versus orchestration; capability versus dependency; and experimentation versus ethics.
Recurring themes identified through qualitative interviews (28 participants) and document review; interpretive synthesis presented in findings.
high mixed Reimagining work in the age of intelligent automation: a qua... tensions influencing dynamics of AI-mediated work
AI integration transforms managerial practices, workforce identities and organizational coordination.
Thematic and interpretive analysis of semistructured interviews with 28 managers/professionals across 12 organizations and review of organizational documents.
high mixed Reimagining work in the age of intelligent automation: a qua... managerial practices, workforce identities, organizational coordination
These AIECI benefits were contingent on complementary conditions—particularly data quality, governance, managerial interpretation, and integration of intelligence outputs into operating decisions.
Cross-case pattern-matching across five analytical dimensions (intelligence source, AI mechanism, decision domain, economic implication, boundary condition) identifying recurring contingencies in the four firms' archival evidence.
high mixed Artificial Intelligence Enabled Competitive Intelligence as ... conditionality of benefits on complementary organizational factors (data quality...
Accounting for heterogeneity in AI literacy (agents' ability to identify and adapt to inaccurate AI outputs) can produce skill polarization in the long-run steady state.
Analytical/theoretical steady-state distribution analysis of agent skill dynamics with heterogeneous AI literacy parameters; paper reports conditions under which polarization emerges (theoretical, no empirical sample).
high mixed Human-AI Productivity Paradoxes: Modeling the Interplay of S... distribution of agent skill levels (skill polarization across population)
Beyond length biases, fine-tuning amplifies sycophancy and relationship-seeking behaviours in models.
Behavioral analysis of model outputs in the within-subject experiment (530 participants) showing increased incidence/intensity of sycophantic and relationship-seeking responses after preference fine-tuning compared to baseline models.
high mixed PRISM-X: Experiments on Personalised Fine-Tuning with Human ... frequency/intensity of sycophantic and relationship-seeking behaviours in model ...
Adapting to individual preference data yields only marginal gains over training on pooled preferences from a diverse population.
Comparison within the same within-subject experiment (530 participants) between models fine-tuned on individual preferences versus models trained on pooled preferences across participants; reported as 'marginal gains'.
high mixed PRISM-X: Experiments on Personalised Fine-Tuning with Human ... incremental improvement in human-judged preference alignment when using individu...
Specialized detectors generally perform better but remain inconsistent across generators and can produce false positives on real-damaged samples.
Experimental comparison showing specialized AI-generated image detectors outperform MLLMs on some generator subsets, yet show variability across generators and some false positives on genuine damaged images.
high mixed FraudBench: A Multimodal Benchmark for Detecting AI-Generate... detection accuracy and false positive rate of specialized detectors across gener...
The intervention serves as a middle ground in the trade-off between higher costs (from more granular demographic targeting) and skew (from ignoring demographics entirely).
Authors' comparative claim about cost–skew trade-offs observed in their intervention versus alternatives; no quantitative cost or skew figures provided in the excerpt.
high mixed Into the Unknown: Accounting for Missing Demographic Data wh... trade-off between advertising cost and magnitude of ad delivery skew
The dominant explanation for the gap locates it in model capability; instead, software-engineering capability emerges from a model-harness-environment system where a runtime substrate (the harness) mediates how an agent observes a project, acts on it, receives feedback, and establishes that a change is complete.
Conceptual argument and reframing presented in the paper (abstract). The paper formalizes this perspective rather than reporting a large-scale empirical test in the abstract.
high mixed AI Harness Engineering: A Runtime Substrate for Foundation-M... effect of runtime harness design on the emergence of software-engineering capabi...
There is a quality–motivation dissociation in AI-assisted goal-setting: AI-authored goals are objectively higher quality but produce lower motivation and worse behavioral follow-through.
Synthesis of experimental findings from the preregistered trial: higher SMART scores for LLM goals (d = 2.26) combined with lower self-reported motivation measures and lower two-week follow-up action rates.
high mixed Optimized but Unowned: How AI-Authored Goals Undermine the M... divergence between objective goal quality (SMART) and motivational/behavioral ou...
The research challenges for this vision stem from a broader flexibility–robustness tension that requires moving beyond the on-the-fly paradigm to navigate effectively.
Analytical claim in paper identifying a design trade-off (flexibility vs. robustness) as the core challenge motivating the proposed shift; no empirical demonstration provided.
high mixed Engineering Robustness into Personal Agents with the AI Work... trade-off between flexibility and robustness in agent design
Current LLM agents are proficient at calling isolated APIs but struggle with the "last mile" of commercial software automation.
Authors' comparative characterization based on literature context and their benchmark motivation; stated in introduction rather than a quantified experiment in the excerpt.
high mixed ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdepend... ability to successfully perform end-to-end software automation tasks (vs. isolat...
Aggregate effects are geographically uneven (geographic unevenness in AI-driven labor market impacts).
Synthesis across studies observing variation by geography and noting non-Anglophone markets and developing economies as under-studied and differentially affected.
high mixed Creation, validation, obsolescence: observed evidence of AI-... geographic heterogeneity in labor market impacts
Wage polarization characterizes the aggregate pattern of labor market change associated with recent AI advances.
Aggregate characterization from synthesized studies reporting divergent wage outcomes (higher wages for AI-augmented workers, pressures on junior/routine roles) consistent with polarization.
high mixed Creation, validation, obsolescence: observed evidence of AI-... wage distribution changes (polarization)
Sectoral effects are heterogeneous: infrastructure, security, and quality-assurance roles have expanded while developer roles have contracted.
Qualitative and quantitative results aggregated across the included studies noting role-level expansions and contractions; no single pooled effect size provided.
high mixed Creation, validation, obsolescence: observed evidence of AI-... changes in employment/posting volumes by occupational role (infrastructure, secu...
Non-routine employment and wages exhibit a crossing pattern: initially higher under fast adoption, then lower — so faster adoption can simultaneously raise long-run wages for survivors while permanently reducing participation.
Comparative dynamic trajectories in the model showing time paths for non-routine employment and wages under fast vs. slow adoption scenarios (analytical and/or simulated model paths).
high mixed Too Fast to Adjust: Adoption Speed and the Permanent Cost of... non-routine employment and non-routine wages (time-path / crossing pattern)
Even when two economies share the same long-run automation level, adoption speed alone determines transition welfare.
Comparative-welfare analysis in the dynamic theoretical model holding long-run automation level fixed while varying adoption speed (analytical comparative statics).
Under open-ended prompts, trust drops to 3-55%, confirming prompt framing as a confound; we report both conditions.
Experimental comparison reported by authors between directed queries and open-ended prompts; observed trust rates under open-ended prompts ranged from 3% to 55% (no explicit per-model sample sizes reported in the summary).
high mixed Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise A... model trust rate in accepting poisoned data under open-ended prompts
Generative AI lowers barriers to solo entrepreneurship while reinforcing team-based advantages.
Synthesis of the observed patterns in the Product Hunt data: sharp increase in solo launches after ChatGPT-3.5 (barrier lowering) combined with persistent team dominance among top-quality outcomes (reinforcing team advantages).
high mixed Generative AI Fuels Solo Entrepreneurship, but Teams Still L... barriers to entry for solo entrepreneurship (proxied by solo launch rates) and c...
Fine-tuning and reinforcement learning improve in-distribution performance, but generalization to unseen part families remains limited.
Experiments reported in the paper/abstract applying fine-tuning and reinforcement learning to models evaluated on BenchCAD; observed improvements on in-distribution data and limited generalization to unseen families.
high mixed BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... in-distribution_performance_and_out-of-distribution_generalization
Across 10+ frontier models, current systems often recover coarse outer geometry but fail to produce faithful parametric CAD programs.
Empirical evaluation reported in the paper/abstract across more than ten contemporary multimodal / large language models on the BenchCAD dataset; observed pattern that coarse outer geometry is often recovered while faithful parametric program synthesis fails.
high mixed BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... faithfulness_of_generated_parametric_CAD_programs
AI exhibits a significant U-shaped spatial effect on Lae.
Spatial econometric analysis (spatial Durbin model) on panel data for 30 Chinese provincial regions (2012–2022); kernel density estimation used for distributional analysis.
high mixed A study of the impact of artificial intelligence on the low-... low-altitude economic growth (Lae) across space
AI has a significant inverted U-shaped impact on the low-altitude economy (Lae), with diminishing marginal returns after a certain turning point.
Panel data from 2012–2022 for 30 Chinese provincial regions; composite AI and Lae indices constructed via the entropy method; estimated using spatial Durbin models and non-linear specification to detect inverted U-shape.
high mixed A study of the impact of artificial intelligence on the low-... low-altitude economic growth (Lae)
Evidence suggests both top-down and bottom-up diffusion: worker use can occur without firm adoption, and vice versa.
Cross-tabulation of firm-level adoption indicators and reports of worker-level use in the BTOS AI supplement (Nov 2025–Jan 2026) indicating non-perfect overlap between firm-declared adoption and reported worker use; analytic approach descriptive (no sample size in excerpt).
high mixed The Microstructure of AI Diffusion: Evidence from Firms, Bus... co-occurrence (or lack thereof) of firm-wide adoption and worker-level AI use
Depending on the used fairness metric, the Pareto frontier may include upper-bound threshold rules, thus preferring individuals with lower success probabilities.
Analytical derivations showing that for certain fairness metrics the set of Pareto-optimal rules includes rules that impose upper-bound thresholds; theoretical examples and arguments in the paper.
high mixed Fairness vs Performance: Characterizing the Pareto Frontier ... presence of upper-bound threshold rules on Pareto frontier (preference toward lo...
The study reframes VTech adoption as legitimacy-seeking rather than efficiency-driven.
Thematic analysis using Rogers' diffusion of innovations and institutional theory, resulting in the institutionally mediated diffusion of innovations (IDOI) framework which emphasizes legitimacy concerns.
high mixed Exploring barriers to valuation technology adoption in prope... primary motivations for VTech adoption (legitimacy vs efficiency)
Practitioners stress that human judgement remains indispensable, positioning technology as an aid rather than a replacement.
Interview responses from valuers and firm leaders emphasizing the continued role of human judgement; thematic analysis framed by the IDOI model.
high mixed Exploring barriers to valuation technology adoption in prope... role of human judgement vs automation in valuation practice
Responses [about AI's effects] vary by cohort and depending on survey framing.
Paper asserts heterogeneity in survey responses across demographic cohorts and due to framing effects (no subgroup sample sizes or framing experiment details in excerpt).
high mixed AI’s Economy and Its Political and Institutional Consequence... variation in survey responses by cohort and framing
This [model divergence] may explain why public opinion is not settled about the effects of AI.
Paper's interpretive claim linking model divergence to unsettled public opinion (presented as a plausible explanation; no causal test or survey linkage provided in excerpt).
high mixed AI’s Economy and Its Political and Institutional Consequence... public opinion about AI's effects
Current models about the vulnerability level of occupations and economic sectors differ widely in their forecasts.
Paper's comparative statement about existing models and their forecasts (no specific models, quantitative comparisons, or sample sizes provided in the excerpt).
high mixed AI’s Economy and Its Political and Institutional Consequence... disagreement across model forecasts of occupational/sector vulnerability
Message for AI alignment: smooth scoring-based oversight cannot elicit truthful reports from a strategic agent; sharp thresholds (step functions) are the calibration-preserving design.
Synthesis of the paper's theoretical impossibility and constructive results applied to AI oversight setting (argument plus the step-function constructive escape).
high mixed The Endogeneity of Miscalibration: Impossibility and Escape ... ability of oversight designs (smooth scoring vs. sharp thresholds) to preserve c...
Screening and algorithmic targeting can act as complements or substitutes; the paper empirically characterizes when they do so.
Empirical and theoretical analysis in the paper that identifies conditions (notably levels of aleatoric uncertainty) under which screening increases or decreases the marginal value of algorithmic targeting.
high mixed The Limits of AI-Driven Allocation: Optimal Screening under ... interaction between screening and algorithmic targeting (complementarity vs subs...
Governance machinery from energy systems and critical infrastructure offers a partial template for governing automated web actors, but only some dimensions transfer.
Comparative governance argument drawing on adjacent-sector governance literature; conceptual mapping rather than empirical governance trial reported.
high mixed The Vanishing User: Web Analytics in an Agent-Dominated Inte... applicability of governance frameworks from energy/critical infrastructure to AI...
Public discussion of generative AI in accounting swings between the allure of full automation and job-displacement anxiety, yet the most immediate reality in organizations is human + AI work.
Paper's background/intro synthesizing recent research and practitioner commentary (2023–2025); conceptual observation rather than empirical test.
Integrating Generative AI into agile development processes has potential benefits and limitations for planning efficiency.
High-level conclusion based on the controlled experiment with GitLab Duo and qualitative participant feedback discussed in the paper.
high mixed Splitting User Stories Into Tasks with AI -- A Foe or an All... planning efficiency (benefits and limitations)
Larger models do not consistently outperform smaller ones on tool-use tasks.
Empirical observations from the paper's evaluations across the five function-calling benchmarks.
high mixed Switchcraft: AI Model Router for Agentic Tool Calling relative performance of larger vs smaller models on tool-use tasks