The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
Collaboration between humans and AI enhances decision-making, efficiency, and innovation.
Reported result from thematic evaluation of literature and secondary data (qualitative synthesis). No sample size or quantified effect provided.
high positive Human–AI Collaboration in the Indian IT Industry: A Qualitat... decision-making quality (and related efficiency and innovation)
AI improves overall organisational productivity.
Authors' synthesis of peer-reviewed studies and secondary data indicating productivity impacts (qualitative literature review). No quantitative sample size reported.
high positive Human–AI Collaboration in the Indian IT Industry: A Qualitat... organisational productivity
AI increases human capacities.
Conclusion from comprehensive analysis of peer-reviewed literature and thematic evaluation of secondary data (literature review). No primary sample size reported.
high positive Human–AI Collaboration in the Indian IT Industry: A Qualitat... human capacities / capabilities
Policy responses should prioritise governance frameworks that emphasise equity, accountability, and inclusive distribution of value to address concentrated digital power.
Normative policy recommendations derived from the paper's conceptual analysis and synthesis of recent literature (policy prescription, no empirical evaluation reported).
high positive Beyond Access: Rethinking Digital Power in Data-Driven Indus... policy orientation toward equity, accountability, and inclusive value distributi...
Time and effort dissociate: participants reported lower subjective effort with AI despite equivalent completion times.
Empirical result reported in the abstract: subjective effort ratings were lower for AI-assisted conditions even though measured completion times were equivalent (preregistered study, N = 1237).
high positive Cognitive offloading and the speedup illusion in human-AI in... subjective effort (self-reported); actual completion time also measured
Participants predicted AI to be significantly faster.
Empirical result reported in the abstract: participants' predicted completion times indicated AI-assisted completion would be faster than independent completion (statistical significance claimed). Sample from preregistered study (N = 1237).
high positive Cognitive offloading and the speedup illusion in human-AI in... predicted completion time
Large language models (LLMs) have the potential to boost human productivity by speeding up task completion -- provided users know when to offload cognitive work to them.
Framing/introductory claim in the paper (theoretical/argumentative), no direct empirical evidence reported in the abstract.
high positive Cognitive offloading and the speedup illusion in human-AI in... task completion speed (potential)
Together, these results bring individual-level LLM-based resident simulation within reach of resource-constrained local administrations, enabling community-governance decisions to be systematically pre-evaluated in silico before real-world deployment.
Aggregate of dataset creation, benchmark results, algorithm (curriculum-LoRA) efficiency gains, and system integration reported in the paper; claim is a stated implication/claim about practical feasibility for local administrations.
high positive Benchmarking LLMs for Community Governance Simulation with L... feasibility of in-silico pre-evaluation of community-governance decisions by res...
The system integrates curriculum-LoRA into a closed-loop policy-evaluation pipeline.
System-level description and implementation in the paper that embeds curriculum-LoRA within a closed-loop pipeline for policy evaluation and iteration.
high positive Benchmarking LLMs for Community Governance Simulation with L... system integration of personalization algorithm into a policy-evaluation workflo...
Curriculum-LoRA Pareto-dominates every configuration tested.
Empirical comparisons across the tested configurations in the experiments reported in the paper; curriculum-LoRA outperforms or matches all other configurations on the fidelity-versus-cost Pareto frontier.
high positive Benchmarking LLMs for Community Governance Simulation with L... Pareto frontier position with respect to fidelity and cost metrics
Curriculum-LoRA is a parameter-efficient personalization framework that, by closing the fidelity-cost gap, matches the strongest baseline's fidelity at roughly 10x lower per-call cost.
Experimental evaluation comparing curriculum-LoRA to baselines on fidelity and per-call cost metrics; reported result that curriculum-LoRA attains comparable fidelity while reducing per-call cost by about a factor of ten.
high positive Benchmarking LLMs for Community Governance Simulation with L... tradeoff between simulation fidelity and per-call cost (input tokens / cost per ...
Adding rich life-history profiles meaningfully raises fidelity above the no-profile baseline.
Benchmark comparisons between prompting strategies that include rich life-history profiles versus a no-profile baseline across the evaluated LLMs, using the interview-derived dataset to assess fidelity.
high positive Benchmarking LLMs for Community Governance Simulation with L... simulation fidelity (how well LLM outputs match expected resident responses)
The dataset comprises approximately 1.2 million characters of first-person narrative collected through two-hour semi-structured interviews with each of 92 residents in an urban community, organized around nine community-governance domains.
Reported dataset construction: two-hour semi-structured interviews with each of 92 residents (92 interviews), organized around nine governance domains; reported total text volume ~1.2 million characters.
high positive Benchmarking LLMs for Community Governance Simulation with L... size and composition of dataset (characters of first-person narrative, number of...
Wage gains coincide with an increase in within-firm wage dispersion in small firms, with wage variance rising by around 7.5%.
Within-firm wage variance analysis (likely computed from worker-level wages aggregated to firm-level dispersion) showing a ~7.5% increase in wage variance in small firms after automation adoption.
high positive Firm size and the automation wage premium within-firm wage dispersion (wage variance)
Using a difference-in-differences framework exploiting import lumpiness in product categories linked to automation technologies, we find a positive average adoption effect on adopters’ average wages, which stabilizes at around 4% five years after an automation spike.
Difference-in-differences (DiD) estimation exploiting time variation in import 'spikes' in automation-related product categories (including robots) on the integrated panel of Italian importing firms (2011–2019).
high positive Firm size and the automation wage premium adopters' average wages (average within-firm wages over time)
Mincer-type wage regressions reveal that automation adopters pay approximately 3% higher wages after controlling for worker sorting.
Mincer-style wage regressions with controls for worker sorting (individual-level regression analysis on the integrated dataset).
high positive Firm size and the automation wage premium individual wages (conditional on worker characteristics)
The automation wage premium for adopting firms stands at approximately 10%.
Descriptive comparison of wages between automation-adopting firms and others using integrated firm-worker-trade data for Italian importing firms (2011–2019).
high positive Firm size and the automation wage premium wages (adopters vs non-adopters)
The design isolates the contribution of the platform's algorithm to the outcome which is separable from creative content.
Methodological claim supported by the proposed three-arm design and its empirical demonstration in the live campaign.
high positive Algorithm or Creative? A Three-Arm Experimental Design for D... isolated contribution of algorithm to outcome
Roughly three-quarters of the absolute reallocation is algorithmic.
Empirical decomposition from the live Meta campaign reported in the paper (proportion of total reallocation attributed to algorithmic channel).
high positive Algorithm or Creative? A Three-Arm Experimental Design for D... share of total impression reallocation attributable to algorithm
In a live Meta campaign with a women-targeted text fragment, the algorithmic channel raises female impression share by +2.07 ppt.
Empirical result from a live Meta campaign reported in the paper; conveys a measured effect size (+2.07 percentage points).
high positive Algorithm or Creative? A Three-Arm Experimental Design for D... female impression share (change attributable to algorithmic channel)
We propose a three-arm design that adds an arm exposing the algorithm to the treatment metadata while holding the user-facing creative identical to control, point-identifying the natural indirect (algorithmic) and direct (creative) effects without sequential ignorability.
Methodological proposal in the paper (design description and identification claim); presumably supported by theoretical derivation/proof in the paper.
high positive Algorithm or Creative? A Three-Arm Experimental Design for D... identification of natural indirect and direct effects
The platform's delivery algorithm routes each creative to the audience it predicts will engage.
Descriptive claim in paper about algorithmic delivery behavior; likely supported by platform operational details and the motivating discussion.
high positive Algorithm or Creative? A Three-Arm Experimental Design for D... audience routing by delivery algorithm
Online advertising platforms host hundreds of thousands of A/B tests.
Statement in paper (assertion about industry scale); no sample size or citation provided in excerpt.
high positive Algorithm or Creative? A Three-Arm Experimental Design for D... count of A/B tests hosted on platforms
Recommendations for adapting employment policy to AI transformation conditions have been proposed.
Policy recommendations derived from the paper's analysis of statistical data, industry reviews, and regulatory/legal documents; recommendations are proposed by the authors (not empirically validated within the paper).
high positive The Impact of Artificial Intelligence During the Transformat... proposed employment policy adaptations
In 2024-2025, the labor market of Uzbekistan is characterized by duality: there is an increasing demand for IT specialists and workers with digital skills.
Analysis of 2024–2025 labor market statistics and industry reviews cited in the paper (no numerical sample size or survey sampling reported).
high positive The Impact of Artificial Intelligence During the Transformat... demand for IT specialists and workers with digital skills
The aim is to keep autonomous agency composable while keeping accountability non-negotiable, so that coordination itself can become shared infrastructure for a human-AI society that is open, pluralistic, and governable.
Stated design/ethical objective in the paper; normative claim about intended social and governance outcomes rather than an empirically validated result.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... feasibility of composable autonomous agency combined with enforceable accountabi...
FP is designed to wrap and bridge existing protocols rather than replace them, enabling incremental adoption while reducing integration and governance overhead.
Design rationale/claim in the paper about interoperability and incremental adoption strategy; no empirical deployment, integration case studies, or measured overhead reductions presented.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... ability to interoperate with existing protocols and reduce integration/governanc...
FP treats policy, provenance, and audit as first-class concerns.
Design/architectural claim in the paper stating that policy, provenance, and audit are prioritized within FP; no empirical compliance or audit trials presented.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... integration of policy, provenance, and audit mechanisms into the protocol
FP provides economic primitives for metering, receipts, and settlement.
Design claim in the paper listing economic primitives as part of FP; no deployment or economic experiments reported.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... availability of built-in primitives for metering usage, issuing receipts, and pe...
FP supports native multi-party organization and event-based collaboration.
Feature/architecture claim in the paper describing native support for multi-party organization and event-driven collaboration; no empirical evaluation or user studies provided.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... support for multi-party organizational constructs and event-based collaboration ...
FP unifies heterogeneous entities, including agents, tools, resources, humans, institutions, and organizations.
Design specification/feature claim in the paper describing FP's data and entity model; no empirical interoperability study reported.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... ability to represent and integrate diverse entity types within the protocol
This paper introduces the Foundation Protocol (FP), a graph-first coordination layer for an emerging human-AI society.
Claim of authorship/introduction in the paper; architectural/design proposal rather than an evaluated system.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... existence of a proposed coordination layer (Foundation Protocol)
Agents need to form reliable relationships, organize multi-agent work, exchange value, support an AI economy, and stay safe and accountable under real-world oversight.
Normative/requirements statement in the paper describing necessary capabilities for scaled multi-agent systems; no empirical validation or experimental data provided.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... requirements for multi-agent operation (reliability of relationships, work organ...
Autonomous agents are moving from tools into a layer of social infrastructure: they browse, purchase, deploy software, manage systems, and increasingly interact with one another.
Statement in the paper's introductory/abstract text presenting an observed trend; conceptual/qualitative claim without empirical data or measured sample.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... degree of autonomous agent activity across social and economic functions (browsi...
Prior work has demonstrated that people generally find AI narrative explanations to be understandable, trustworthy, and convincing for changing beliefs and opinions.
Citation to prior literature reported in the paper (background literature review claiming general findings about perceptions of AI narrative explanations).
high positive Human Decision-Making with Persuasive and Narrative LLM Expl... perceived understandability/trustworthiness/convincingness of narrative explanat...
Narrative explanations increased reliance on the AI, both when the AI prediction was correct and when it was incorrect.
Findings from the paper's human behavioral experiment reporting increased reliance on AI with accompanying narratives under both correct and incorrect AI prediction conditions.
The development of LLM agents has led to a growing body of work on knowledge-work AI, including coding, research, and healthcare.
Statement grounded in observation of recent literature trends and the cited body of work on LLM agents applied to coding, research, and healthcare domains.
high positive Design and Report Benchmarks for Knowledge Work growth of literature/work on knowledge-work AI enabled by LLM agents in specifie...
These cases show how benchmark design choices shape the strongest work claim a score can support, and where gaps arise between the benchmarked task, tested setting, scored product, and broader work claim.
Qualitative findings from the three case analyses demonstrating how different design choices limit or enable particular work claims and exposing gaps between task, setting, and scored product.
high positive Design and Report Benchmarks for Knowledge Work degree to which benchmark scores can support work claims; identification of gaps...
APEX-SWE [is] a software-engineering benchmark with executable scored products.
Description of the APEX-SWE benchmark in the paper's case analysis.
high positive Design and Report Benchmarks for Knowledge Work nature of APEX-SWE benchmark (software-engineering, executable product scoring)
OfficeQA Pro [is] a grounded document-analysis benchmark scored by final answers.
Description of the OfficeQA Pro benchmark in the paper's case analysis.
high positive Design and Report Benchmarks for Knowledge Work scoring methodology and nature of OfficeQA Pro (grounded document-analysis, fina...
GDPval [is] a non-code occupational deliverable benchmark.
Description of the GDPval benchmark in the paper's case analysis.
high positive Design and Report Benchmarks for Knowledge Work nature of GDPval benchmark (non-code occupational deliverable)
We demonstrate the approach through three benchmark case analyses: GDPval, OfficeQA Pro, and APEX-SWE.
Empirical/methodological demonstration reported in paper via three case analyses of existing benchmarks; the paper applies its three-step approach to each case.
high positive Design and Report Benchmarks for Knowledge Work demonstration of approach via case analyses (number of cases = 3)
To name the work activity being evaluated and distinguish it from common benchmark tasks, we derive an inventory of 18 work activities from the O*NET occupational task database.
Method described in paper: mapping/derivation from the O*NET occupational task database to produce an inventory of 18 work activities.
high positive Design and Report Benchmarks for Knowledge Work inventory size and coverage (18 work activities derived)
We translate these concerns into benchmark design and reporting guidance, covering how tasks should be mapped to work activities, how tested settings should specify materials, tools, roles, and constraints, and how scoring should focus on the work product left by the system.
Paper provides prescriptive guidance derived from conceptual analysis and the reviewed literature; guidance illustrated via application to case benchmarks.
high positive Design and Report Benchmarks for Knowledge Work quality of benchmark design and reporting (alignment with real-world work concer...
We review work studies showing that knowledge work is organized through roles and responsibilities, local materials and tools, and artifacts that must remain usable in downstream workflows.
Literature review of work studies cited in the paper; synthesis of organizational features of knowledge work.
high positive Design and Report Benchmarks for Knowledge Work organizational characteristics of knowledge work (roles, materials, tools, artif...
This paper contributes a three-step approach for making explicit how benchmarked tasks represent the work claims attached to their scores: defining the work activity under evaluation, specifying the tested setting, and scoring the appropriate work product.
Methodological contribution described in paper; approach presented and motivated, and later applied in case analyses (three benchmark case studies).
high positive Design and Report Benchmarks for Knowledge Work quality of benchmark-to-work claim mapping (explicitness of representation)
European AI companies increasingly face differing regulatory expectations across global markets, and European institutions should provide structured support (advisory mechanisms, regulatory guidance, dialogue with partner jurisdictions) to help companies navigate emerging compliance requirements abroad.
Combined descriptive claim and policy recommendation; the text asserts increasing regulatory asymmetry faced by firms but provides no empirical data or firm-level survey evidence.
high positive Governing Relational AI: China’s Regulation of Anthropomorph... need for institutional support for European firms operating under asymmetric reg...
Systematic monitoring of global regulatory developments (for example through foresight functions within the European Commission or the AI Office) would help anticipate regulatory divergence and support future adjustments to European governance frameworks.
Policy recommendation advocating institutional monitoring mechanisms; argumentative justification rather than empirical demonstration in the text.
high positive Governing Relational AI: China’s Regulation of Anthropomorph... implementation of systematic monitoring/foresight functions and their utility in...
European regulators should monitor whether conversational systems begin to assume intermediary or gatekeeping roles within digital ecosystems and consider how existing platform governance frameworks might apply.
Policy recommendation advocating monitoring and potential regulatory application; no empirical study in text demonstrating current gatekeeping behavior.
high positive Governing Relational AI: China’s Regulation of Anthropomorph... regulatory monitoring of intermediary/gatekeeping roles by conversational system...
Risk assessments and auditing standards should explicitly examine interaction design, including engagement optimisation mechanisms, recommendation loops, and other features that may encourage behavioural influence or dependency.
Normative recommendation arguing current frameworks focus mainly on outputs; no empirical evaluation or sample reported.
high positive Governing Relational AI: China’s Regulation of Anthropomorph... inclusion of interaction design elements in risk assessments and audits