The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6574 claims)

Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 761 200 101 904 2020
Governance & Regulation 829 400 191 122 1566
Organizational Efficiency 784 193 125 84 1197
Technology Adoption Rate 637 236 124 97 1103
Research Productivity 431 131 58 340 972
Output Quality 481 183 59 47 770
Decision Quality 332 177 82 49 647
Firm Productivity 439 57 88 20 610
AI Safety & Ethics 218 279 66 33 602
Market Structure 181 170 123 24 503
Task Allocation 214 64 72 33 388
Skill Acquisition 174 62 62 17 315
Innovation Output 204 27 45 18 295
Employment Level 105 54 108 13 282
Fiscal & Macroeconomic 132 69 43 26 277
Consumer Welfare 117 63 42 11 233
Firm Revenue 154 48 26 3 231
Task Completion Time 173 31 8 12 225
Inequality Measures 44 123 50 6 223
Worker Satisfaction 89 65 22 12 188
Error Rate 71 92 10 2 175
Regulatory Compliance 77 69 14 5 165
Automation Exposure 58 56 26 13 156
Training Effectiveness 96 21 14 19 152
Wages & Compensation 77 37 25 6 145
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 81 21 1 115
Hiring & Recruitment 52 7 8 3 70
Creative Output 32 20 8 3 64
Skill Obsolescence 5 47 6 1 59
Social Protection 28 16 8 2 54
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
The platform's delivery algorithm routes each creative to the audience it predicts will engage.
Descriptive claim in paper about algorithmic delivery behavior; likely supported by platform operational details and the motivating discussion.
high positive Algorithm or Creative? A Three-Arm Experimental Design for D... audience routing by delivery algorithm
Online advertising platforms host hundreds of thousands of A/B tests.
Statement in paper (assertion about industry scale); no sample size or citation provided in excerpt.
high positive Algorithm or Creative? A Three-Arm Experimental Design for D... count of A/B tests hosted on platforms
The aim is to keep autonomous agency composable while keeping accountability non-negotiable, so that coordination itself can become shared infrastructure for a human-AI society that is open, pluralistic, and governable.
Stated design/ethical objective in the paper; normative claim about intended social and governance outcomes rather than an empirically validated result.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... feasibility of composable autonomous agency combined with enforceable accountabi...
FP is designed to wrap and bridge existing protocols rather than replace them, enabling incremental adoption while reducing integration and governance overhead.
Design rationale/claim in the paper about interoperability and incremental adoption strategy; no empirical deployment, integration case studies, or measured overhead reductions presented.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... ability to interoperate with existing protocols and reduce integration/governanc...
FP treats policy, provenance, and audit as first-class concerns.
Design/architectural claim in the paper stating that policy, provenance, and audit are prioritized within FP; no empirical compliance or audit trials presented.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... integration of policy, provenance, and audit mechanisms into the protocol
FP provides economic primitives for metering, receipts, and settlement.
Design claim in the paper listing economic primitives as part of FP; no deployment or economic experiments reported.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... availability of built-in primitives for metering usage, issuing receipts, and pe...
FP supports native multi-party organization and event-based collaboration.
Feature/architecture claim in the paper describing native support for multi-party organization and event-driven collaboration; no empirical evaluation or user studies provided.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... support for multi-party organizational constructs and event-based collaboration ...
FP unifies heterogeneous entities, including agents, tools, resources, humans, institutions, and organizations.
Design specification/feature claim in the paper describing FP's data and entity model; no empirical interoperability study reported.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... ability to represent and integrate diverse entity types within the protocol
This paper introduces the Foundation Protocol (FP), a graph-first coordination layer for an emerging human-AI society.
Claim of authorship/introduction in the paper; architectural/design proposal rather than an evaluated system.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... existence of a proposed coordination layer (Foundation Protocol)
Agents need to form reliable relationships, organize multi-agent work, exchange value, support an AI economy, and stay safe and accountable under real-world oversight.
Normative/requirements statement in the paper describing necessary capabilities for scaled multi-agent systems; no empirical validation or experimental data provided.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... requirements for multi-agent operation (reliability of relationships, work organ...
Autonomous agents are moving from tools into a layer of social infrastructure: they browse, purchase, deploy software, manage systems, and increasingly interact with one another.
Statement in the paper's introductory/abstract text presenting an observed trend; conceptual/qualitative claim without empirical data or measured sample.
high positive Foundation Protocol: A Coordination Layer for Agentic Societ... degree of autonomous agent activity across social and economic functions (browsi...
Prior work has demonstrated that people generally find AI narrative explanations to be understandable, trustworthy, and convincing for changing beliefs and opinions.
Citation to prior literature reported in the paper (background literature review claiming general findings about perceptions of AI narrative explanations).
high positive Human Decision-Making with Persuasive and Narrative LLM Expl... perceived understandability/trustworthiness/convincingness of narrative explanat...
Narrative explanations increased reliance on the AI, both when the AI prediction was correct and when it was incorrect.
Findings from the paper's human behavioral experiment reporting increased reliance on AI with accompanying narratives under both correct and incorrect AI prediction conditions.
The development of LLM agents has led to a growing body of work on knowledge-work AI, including coding, research, and healthcare.
Statement grounded in observation of recent literature trends and the cited body of work on LLM agents applied to coding, research, and healthcare domains.
high positive Design and Report Benchmarks for Knowledge Work growth of literature/work on knowledge-work AI enabled by LLM agents in specifie...
These cases show how benchmark design choices shape the strongest work claim a score can support, and where gaps arise between the benchmarked task, tested setting, scored product, and broader work claim.
Qualitative findings from the three case analyses demonstrating how different design choices limit or enable particular work claims and exposing gaps between task, setting, and scored product.
high positive Design and Report Benchmarks for Knowledge Work degree to which benchmark scores can support work claims; identification of gaps...
APEX-SWE [is] a software-engineering benchmark with executable scored products.
Description of the APEX-SWE benchmark in the paper's case analysis.
high positive Design and Report Benchmarks for Knowledge Work nature of APEX-SWE benchmark (software-engineering, executable product scoring)
OfficeQA Pro [is] a grounded document-analysis benchmark scored by final answers.
Description of the OfficeQA Pro benchmark in the paper's case analysis.
high positive Design and Report Benchmarks for Knowledge Work scoring methodology and nature of OfficeQA Pro (grounded document-analysis, fina...
GDPval [is] a non-code occupational deliverable benchmark.
Description of the GDPval benchmark in the paper's case analysis.
high positive Design and Report Benchmarks for Knowledge Work nature of GDPval benchmark (non-code occupational deliverable)
We demonstrate the approach through three benchmark case analyses: GDPval, OfficeQA Pro, and APEX-SWE.
Empirical/methodological demonstration reported in paper via three case analyses of existing benchmarks; the paper applies its three-step approach to each case.
high positive Design and Report Benchmarks for Knowledge Work demonstration of approach via case analyses (number of cases = 3)
To name the work activity being evaluated and distinguish it from common benchmark tasks, we derive an inventory of 18 work activities from the O*NET occupational task database.
Method described in paper: mapping/derivation from the O*NET occupational task database to produce an inventory of 18 work activities.
high positive Design and Report Benchmarks for Knowledge Work inventory size and coverage (18 work activities derived)
We translate these concerns into benchmark design and reporting guidance, covering how tasks should be mapped to work activities, how tested settings should specify materials, tools, roles, and constraints, and how scoring should focus on the work product left by the system.
Paper provides prescriptive guidance derived from conceptual analysis and the reviewed literature; guidance illustrated via application to case benchmarks.
high positive Design and Report Benchmarks for Knowledge Work quality of benchmark design and reporting (alignment with real-world work concer...
We review work studies showing that knowledge work is organized through roles and responsibilities, local materials and tools, and artifacts that must remain usable in downstream workflows.
Literature review of work studies cited in the paper; synthesis of organizational features of knowledge work.
high positive Design and Report Benchmarks for Knowledge Work organizational characteristics of knowledge work (roles, materials, tools, artif...
This paper contributes a three-step approach for making explicit how benchmarked tasks represent the work claims attached to their scores: defining the work activity under evaluation, specifying the tested setting, and scoring the appropriate work product.
Methodological contribution described in paper; approach presented and motivated, and later applied in case analyses (three benchmark case studies).
high positive Design and Report Benchmarks for Knowledge Work quality of benchmark-to-work claim mapping (explicitness of representation)
European AI companies increasingly face differing regulatory expectations across global markets, and European institutions should provide structured support (advisory mechanisms, regulatory guidance, dialogue with partner jurisdictions) to help companies navigate emerging compliance requirements abroad.
Combined descriptive claim and policy recommendation; the text asserts increasing regulatory asymmetry faced by firms but provides no empirical data or firm-level survey evidence.
high positive Governing Relational AI: China’s Regulation of Anthropomorph... need for institutional support for European firms operating under asymmetric reg...
Systematic monitoring of global regulatory developments (for example through foresight functions within the European Commission or the AI Office) would help anticipate regulatory divergence and support future adjustments to European governance frameworks.
Policy recommendation advocating institutional monitoring mechanisms; argumentative justification rather than empirical demonstration in the text.
high positive Governing Relational AI: China’s Regulation of Anthropomorph... implementation of systematic monitoring/foresight functions and their utility in...
European regulators should monitor whether conversational systems begin to assume intermediary or gatekeeping roles within digital ecosystems and consider how existing platform governance frameworks might apply.
Policy recommendation advocating monitoring and potential regulatory application; no empirical study in text demonstrating current gatekeeping behavior.
high positive Governing Relational AI: China’s Regulation of Anthropomorph... regulatory monitoring of intermediary/gatekeeping roles by conversational system...
Risk assessments and auditing standards should explicitly examine interaction design, including engagement optimisation mechanisms, recommendation loops, and other features that may encourage behavioural influence or dependency.
Normative recommendation arguing current frameworks focus mainly on outputs; no empirical evaluation or sample reported.
high positive Governing Relational AI: China’s Regulation of Anthropomorph... inclusion of interaction design elements in risk assessments and audits
European institutions (in particular the European AI Office) should issue guidance on how systems designed for sustained social or emotional interaction should be assessed in the implementation of the AI Act.
Policy recommendation contained in the text; prescriptive argument rather than an empirical finding; no supporting data or empirical evaluation provided.
high positive Governing Relational AI: China’s Regulation of Anthropomorph... issuance of regulatory guidance by European institutions
Existing regulatory frameworks will need to consider risks that arise not only from system outputs but also from longer-term patterns of human–AI interaction.
Normative recommendation based on the document's argument that conversational AI generates risks through sustained interaction; no empirical method or data reported.
high positive Governing Relational AI: China’s Regulation of Anthropomorph... scope of regulatory risk assessment (outputs vs. long-term interaction patterns)
The paper proposes five evaluation dimensions for AutoResearch systems: novelty, validity, impact, reliability, and provenance.
Paper explicitly proposes these five dimensions as an evaluation rubric; conceptual proposal.
high positive AutoResearch AI: Towards AI-Powered Research Automation for ... n/a (evaluation framework)
The field can be organized around five workflow conditions: literature and research grounding; hypothesis formation and planning; experimentation and tool use; feedback, validation, and review; and reporting and knowledge communication.
Authors propose this five-condition organizational framework as part of their survey and synthesis; conceptual contribution.
high positive AutoResearch AI: Towards AI-Powered Research Automation for ... n/a (framework/organizational taxonomy)
Vibe Research denotes the human-steered region of prompt-based assistance and human-verified execution within AutoResearch.
Paper-introduced terminology and conceptual delineation of a sub-region of the AutoResearch spectrum; definitional statement.
high positive AutoResearch AI: Towards AI-Powered Research Automation for ... n/a (terminology/definition)
AutoResearch is defined as the developmental spectrum of AI-powered scientific workflow automation.
Paper provides an explicit definitional framing (terminology introduced by authors); conceptual contribution rather than empirical finding.
high positive AutoResearch AI: Towards AI-Powered Research Automation for ... n/a (terminology/definition)
This shift marks a transition from task-level AI for science to workflow-level research automation.
Conceptual argument backed by literature survey and examples of systems that coordinate multiple research tasks; no single quantitative study reported.
high positive AutoResearch AI: Towards AI-Powered Research Automation for ... degree of automation along research workflows (task-level vs workflow-level)
Scientific research is being reshaped by AI systems that move beyond isolated assistance toward longer-horizon workflows spanning literature grounding, hypothesis generation, experimentation, validation, reporting, and revision.
Survey / conceptual synthesis of recent AI research systems and literature; paper presents this as an observed trend rather than reporting original empirical measurements.
high positive AutoResearch AI: Towards AI-Powered Research Automation for ... extent of AI integration across research workflows (literature grounding, hypoth...
The study advances multilevel propositions and outlines a research agenda for examining legitimacy in hybrid human–AI decision systems.
Paper presents multilevel theoretical propositions and a suggested agenda for future empirical research (conceptual contribution; no empirical validation reported).
high positive Decision Legitimacy in AI-Enabled Organizations: A Multileve... presence of multilevel propositions and proposed research directions
Human judgment remains essential for contextual interpretation and accountability in hybrid human–AI decision systems.
Conceptual claim advanced through theoretical argumentation and literature references in the paper (no empirical sample reported).
high positive Decision Legitimacy in AI-Enabled Organizations: A Multileve... role of human judgment in contextual interpretation and accountability
Legitimacy of AI-enabled decisions depends on transparency, explainability, and perceived fairness.
Conceptual argument and literature synthesis in the paper emphasizing transparency, explainability, and fairness as determinants (no empirical sample reported).
high positive Decision Legitimacy in AI-Enabled Organizations: A Multileve... decision legitimacy as a function of transparency, explainability, perceived fai...
AI enhances efficiency and consistency in organizational decision-making.
Theoretical claim supported by referenced literature and conceptual argumentation within the paper (no empirical test or sample reported).
high positive Decision Legitimacy in AI-Enabled Organizations: A Multileve... efficiency and consistency of decisions
Procedural, distributive, and cognitive legitimacy are key dimensions of decision legitimacy in AI-enabled organizations.
Conceptual development in the paper drawing on institutional theory, socio-technical systems, and behavioral decision-making; literature synthesis and theoretical argumentation (no empirical sample reported).
high positive Decision Legitimacy in AI-Enabled Organizations: A Multileve... procedural legitimacy; distributive legitimacy; cognitive legitimacy
Together, the capability profile and the jaggedness measure give a deployment-relevant diagnostic that the overall ranking alone cannot provide.
Argument supported by observed cases in the experiments where models with similar overall ranks differed on capability axes and jaggedness, implying additional diagnostic value.
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... diagnostic usefulness for deployment decisions
Newer frontier-tier models score higher on average.
Aggregate results from the head-to-head tournament comparing nine models across sampled games (>36k matches).
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... average model score / overall strength
We introduce a jaggedness measure of within-distribution smoothness that detects when a model's advantage jumps unpredictably between strategically similar games.
Methodological contribution described in paper (jaggedness metric).
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... within-distribution smoothness / local volatility (jaggedness)
We pair the game distribution with a capability-profile methodology that decomposes model competence across six axes (state space, temporal depth, information sensitivity, opponent modeling, risk, and brittleness).
Methodological description in paper introducing the capability-profile decomposition.
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... decomposed capability profile across six axes
The generator can draw fresh games on demand, allowing for evergreen evaluation and resistance to contamination.
Method claim about generator capability described in the paper.
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... freshness/resistance-to-contamination of benchmarks
We introduce GENSTRAT, which uses procedurally generated strategic environments to address the limitations of fixed benchmarks.
Methodological contribution described in paper: design and implementation of GENSTRAT.
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... availability of procedurally generated strategic environments for evaluation
Large language models (LLMs) are increasingly deployed as economic agents in marketplaces, auctions, and bidding settings.
Introductory statement in the paper situating motivation; no empirical data reported in the abstract to quantify the increase.
high positive GENSTRAT: Toward a Science of Strategic Reasoning in Large L... deployment of LLMs as economic agents
We propose efforts that individuals and leaders can take to support their colleagues through AI transformation while preserving healthy company cultures that support diverse thinking, collaboration, and informal interactions.
Authors' prescriptive recommendations derived from interview insights; recommendations are not empirically validated in the study.
high positive Beyond the Org Chart: AI and the Transformation of Invisible... leadership and individual practices to preserve culture during AI adoption
We propose steps that AI companies can take to make the invisible work more visible.
Authors' normative recommendations based on synthesis of the qualitative interview findings; not empirically tested within the paper.
high positive Beyond the Org Chart: AI and the Transformation of Invisible... organizational practices to surface invisible work
Some of these changes are positive, such as smoother collaboration between peers.
Interviewee accounts from the 24-participant qualitative study reporting perceived improvements in peer collaboration due to AI tools.
high positive Beyond the Org Chart: AI and the Transformation of Invisible... peer collaboration / team coordination