The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6491 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
This reframes the question from whether the model can think to whether the human-AI system can reason.
Conceptual reframing stated in the paper; no empirical evidence required as it is a change of perspective.
high positive Governing Reflective Human-AI Collaboration: A Framework for... system_level_reasoning_evaluation (human-AI system reasoning instead of model-on...
We introduce 'The Architect's Pen' as a practical method where the human uses the model as an external medium for structured reflection by embedding phases of articulation, critique, and revision into human-AI interaction.
Method description / practical proposal included in the paper; no experimental evaluation, user study, or quantitative validation reported.
high positive Governing Reflective Human-AI Collaboration: A Framework for... structured_reflection_via_interaction_protocol (articulation/critique/revision l...
This perspective emphasizes collaborative intelligence, combining human judgment and contextual understanding with machine speed, memory, and associative capacity.
Theoretical claim about complementary strengths of humans and models within the proposed framework; presented without empirical tests.
high positive Governing Reflective Human-AI Collaboration: A Framework for... collaborative_intelligence (integration of human judgment and machine capabiliti...
Building on recent work on 'System-2' learning, reflective reasoning can be relocated to the interaction layer and framed as a cognitive protocol that can be structured, measured, and governed using existing systems.
Conceptual extension of prior literature ('System-2' learning) into an interaction-layer protocol; no empirical protocol testing or measurement evidence provided.
high positive Governing Reflective Human-AI Collaboration: A Framework for... measurability_and_governability_of_reasoning (via interaction protocols)
Reasoning should be treated as a relational process distributed between human and model rather than an internal capability of either.
Methodological proposal / theoretical framing presented by the authors; no empirical validation reported.
high positive Governing Reflective Human-AI Collaboration: A Framework for... system_level_reasoning_capability (human-AI distributed reasoning)
Large language models have advanced rapidly, from pattern recognition to emerging forms of reasoning.
Stated as an observational claim in the paper's introduction; no empirical evaluation or dataset provided.
high positive Governing Reflective Human-AI Collaboration: A Framework for... model_capability (advancement from pattern recognition to emerging reasoning)
This approach aligns with emerging compliance expectations, including the EU AI Act and ISO/IEC 42001, by making reasoning processes traceable under real conditions of use.
Claim of regulatory alignment made by the authors; presented as interpretive/legal/standards-relevant argument rather than supported by empirical analysis or legal review data in this excerpt.
high positive The Missing Knowledge Layer in AI: A Framework for Stable Hu... alignment with regulatory/compliance requirements (traceability of reasoning)
Stabilising interaction makes uncertainty and drift visible before enforcement is applied, enabling more precise capability governance.
Normative/operational claim in the paper about the anticipated effect of the proposed interventions; no empirical test or measurement reported in this excerpt.
high positive The Missing Knowledge Layer in AI: A Framework for Stable Hu... visibility of uncertainty/drift and precision of capability governance
Together, these layers form a missing operational substrate for governance by increasing signal-to-noise at the point of use.
Argumentative claim from the paper proposing that the combined interventions improve the information available at the decision point; no empirical validation or sample size provided here.
high positive The Missing Knowledge Layer in AI: A Framework for Stable Hu... signal-to-noise ratio of reasoning outputs at point of use (informational qualit...
This paper is the first in a five-paper research series on stabilising human-AI reasoning that proposes a two-layer approach: Parts II–IV introduce human-side mechanisms (uncertainty cues, conflict surfacing, auditable reasoning traces) and Part V develops a model-side Epistemic Control Loop (ECL) that detects instability and modulates generation.
Descriptive claim about the structure and scope of the paper series as stated by the authors; internal to the publication (no external dataset).
high positive The Missing Knowledge Layer in AI: A Framework for Stable Hu... proposal of methodological architecture for stabilising human-AI reasoning
Large language models are increasingly integrated into decision-making in areas such as healthcare, law, finance, engineering, and government.
Statement in paper describing observed/adoptive trend; no empirical dataset, sample size, or quantitative analysis reported in the text.
high positive The Missing Knowledge Layer in AI: A Framework for Stable Hu... integration/adoption of LLMs into decision-making
For settings with multiple interventions, a tractable approximation that prioritizes interventions based on the magnitude of the policy-value discrepancy is effective.
Proposed algorithm/approximation in the paper (methodological contribution); evaluated empirically in simulations and experiments described in the paper.
high positive Improving Human Performance with Value-Aware Interventions: ... effectiveness of intervention prioritization under intervention budget constrain...
In the single-intervention regime, the optimal strategy is to recommend the action that maximizes the human value function.
Theoretical result derived in the paper within a Markov decision process model for single-intervention settings.
high positive Improving Human Performance with Value-Aware Interventions: ... optimality of single-intervention recommendation (maximizing human value functio...
Policy-value inconsistencies naturally identify opportunities for intervention.
Analytical/formal argument within a Markov decision process framework showing that when human policy-value consistency fails, discrepancies indicate intervention opportunities.
high positive Improving Human Performance with Value-Aware Interventions: ... identification of states/actions where intervention is beneficial (policy-value ...
These cooperation mechanisms become more effective under evolutionary pressures to maximize individual payoffs.
Authors report results from experiments or simulations applying evolutionary-pressure dynamics (selection for payoff-maximizing agents) and observing increased effectiveness of mechanisms; no numeric results or sample sizes in excerpt.
high positive CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and... mechanism effectiveness (cooperation outcomes) under evolutionary pressure
Contracting and mediation are most effective in achieving cooperative outcomes between capable LLM models.
Empirical results from the authors' experiments across four social dilemmas comparing mechanism performance; specifics (which models, quantitative cooperation rates) are not included in the excerpt.
high positive CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and... effectiveness of mechanisms at producing cooperative outcomes
Continuous learning and diversity of ideas are essential if AI is to play a meaningful role in original scientific discovery.
Normative/conditional claim supported by conceptual reasoning in the article; no empirical evidence or measured sample provided.
high positive The Agentification of Scientific Research: A Physicist's Per... AI's effectiveness in contributing to original scientific discovery
AI is likely to fundamentally reshape scientific publication.
Author's argument and discussion of implications for publishing and evaluation; no reported empirical study.
high positive The Agentification of Scientific Research: A Physicist's Per... structure and practice of scientific publication
There is a gradual path from AI as a research tool to AI as a scientific collaborator.
Narrative/theoretical progression outlined in the article; conceptual roadmap rather than empirical demonstration.
high positive The Agentification of Scientific Research: A Physicist's Per... role of AI in research from tool to collaborator
AI for Science is especially important because it may transform not only the efficiency of research, but also the structure of scientific collaboration, discovery, publishing, and evaluation.
Argumentative/theoretical analysis in the article; forward-looking claim without reported empirical data or experimental sample.
high positive The Agentification of Scientific Research: A Physicist's Per... efficiency of research and the structure of scientific collaboration, discovery,...
The most important significance of the AI revolution, especially the rise of large language models, lies not simply in automation, but in a fundamental change in how complex information and human know-how are carried, replicated, and shared.
Conceptual argument presented in the article (theoretical/essayistic reasoning); no empirical sample or quantitative study reported.
high positive The Agentification of Scientific Research: A Physicist's Per... how complex information and human know-how are carried, replicated, and shared
The paper proposes a conceptual framework of the underlying mechanisms of the LLM fallacy and a typology of its manifestations across computational, linguistic, analytical, and creative domains.
Author(s) contribution described in the paper (framework and typology); no empirical testing reported in the abstract.
high positive The LLM Fallacy: Misattribution in AI-Assisted Cognitive Wor... formal framework and typology coverage across domains
The rapid integration of large language models (LLMs) into everyday workflows has transformed how individuals perform cognitive tasks such as writing, programming, analysis, and multilingual communication.
Author(s) assertion based on literature review and conceptual overview; no empirical sample or experiment reported in the abstract.
high positive The LLM Fallacy: Misattribution in AI-Assisted Cognitive Wor... how individuals perform cognitive tasks (writing, programming, analysis, multili...
A hybrid AI-human sprint planning framework should assign algorithmic tools to estimation and backlog formatting while mandating human deliberation for risk assessment and ambiguity resolution.
Theoretical framework proposed by the authors, motivated by the experimental findings (trade-offs observed between efficiency and risk capture/rework) and qualitative analysis.
high positive Cognitive Offloading in Agile Teams: How Artificial Intellig... task allocation between AI and humans / recommended planning process
Human-only planning excels at adaptability.
Controlled experiment comparing human-only, AI-only, and hybrid models with qualitative indicators of planning robustness and adaptability showing superior adaptability for human-only planning.
high positive Cognitive Offloading in Agile Teams: How Artificial Intellig... adaptability / planning robustness
AI-only planning minimizes time and cost.
Controlled, three-condition experiment (AI-only, human-only, hybrid) conducted on a live client deliverable at a mid-sized digital agency; quantitative metrics included time and cost measures (reported alongside estimation accuracy, rework rates, and scope change recovery time).
The bounded-autonomy architecture is a practical, deployed approach for making imperfect language models operationally useful in enterprise systems.
Deployment and reported performance in the described multi-tenant enterprise application evaluation (completion rates, safety interceptions, speedups); the paper synthesizes these empirical results to support the practical claim.
high positive Bounded Autonomy for Enterprise AI: Typed Action Contracts a... operational usefulness of LLMs in enterprise context
The enterprise application remains the source of truth for business logic and authorization, while the orchestration engine operates over an explicit published actions manifest.
Architectural proposal and implementation details described in the paper; asserted as part of the bounded-autonomy design deployed in the enterprise application.
high positive Bounded Autonomy for Enterprise AI: Typed Action Contracts a... system design property (source-of-truth and orchestration behavior)
Several safety properties are structurally enforced by code and intercepted all targeted violations regardless of model output.
Design and deployment of bounded-autonomy architecture with typed action contracts, permission-aware capability exposure, scoped context, validation before side effects, and consumer-side execution boundaries; empirical claim that these code-enforced properties intercepted targeted violations during evaluation.
high positive Bounded Autonomy for Enterprise AI: Typed Action Contracts a... interception of targeted violations / enforcement of safety properties
Both AI conditions delivered 13–18x speedup over manual operation.
Timing/performance comparison across the three experimental conditions (manual operation, unconstrained AI, full bounded autonomy) within the deployed evaluation; reported speedup range 13–18x relative to manual operation.
high positive Bounded Autonomy for Enterprise AI: Typed Action Contracts a... task completion time (speedup vs. manual)
The bounded-autonomy system completed 23 of 25 tasks with zero unsafe executions.
Evaluation in a deployed multi-tenant enterprise application across 25 scenario trials spanning seven failure families; comparison across three conditions (manual, unconstrained AI with safety layers disabled, full bounded autonomy).
high positive Bounded Autonomy for Enterprise AI: Typed Action Contracts a... tasks completed / unsafe executions
Overall, GAI provides a principled and scalable approach to integrating AI-generated information.
Summary claim in the abstract based on the combination of the theoretical properties and empirical results reported in the paper.
high positive Generative Augmented Inference scalability and principled integration of AI-generated information
Across applications, GAI improves confidence interval coverage without inflating width.
Empirical claim reported across the multiple application studies in the paper (abstract states CI coverage improvement while maintaining or not inflating width); details in main text/appendix presumably contain the quantitative analysis.
high positive Generative Augmented Inference confidence interval coverage and width (statistical inference quality)
In health insurance choice, GAI cuts labeling requirements by over 90% while maintaining decision accuracy.
Reported empirical result from the paper's health insurance choice experiment; abstract gives the >90% reduction claim but does not include sample size or exact metrics in the abstract.
high positive Generative Augmented Inference human labeling requirements; decision accuracy
In retail pricing, where all methods access the same auxiliary inputs, GAI consistently outperforms alternative estimators, highlighting the value of its construction rather than differences in information.
Empirical experiment in a retail pricing application comparing multiple estimators given identical auxiliary inputs; stated as consistent outperformance in the abstract (no numerical effect sizes or sample sizes provided there).
high positive Generative Augmented Inference estimator performance in retail pricing (e.g., predictive or decision accuracy /...
In conjoint analysis with weak auxiliary signals, GAI reduces estimation error by about 50% and lowers human labeling requirements by over 75%.
Reported empirical result from the paper's conjoint analysis experiment(s); exact sample size and experimental details are not stated in the abstract.
high positive Generative Augmented Inference estimation error; human labeling requirements
Empirically, GAI outperforms benchmarks across diverse settings.
Empirical experiments reported across multiple application settings (conjoint analysis, retail pricing, health insurance choice) comparing GAI to alternative estimators/benchmarks.
high positive Generative Augmented Inference overall performance relative to benchmarks (estimation error / predictive perfor...
The authors establish asymptotic normality for the GAI estimator and show a 'safe default' property: relative to human-data-only estimators, GAI weakly improves estimation efficiency under arbitrary auxiliary signals and yields strict gains whenever the auxiliary information is predictive.
The paper claims formal theoretical results (asymptotic normality and efficiency comparisons) — supported by analytic derivations/proofs in the manuscript as referenced in the abstract.
high positive Generative Augmented Inference estimation efficiency (asymptotic variance / efficiency relative to baseline)
GAI uses an orthogonal moment construction that enables consistent estimation and valid inference with flexible, nonparametric relationship between LLM-generated outputs and human labels.
The paper presents a methodological proposal (Generative Augmented Inference) and states theoretical properties (orthogonal moment construction, consistency, valid inference) — supported by formal asymptotic analysis/proofs in the paper (the abstract references establishing asymptotic normality).
high positive Generative Augmented Inference consistent estimation and valid inference (statistical estimation properties)
This work takes a foundational step toward dignified human-AI interaction futures by balancing productivity with the preservation of human expertise.
Author-stated contribution and goal of the paper (conceptual + empirical work). Abstract claims contribution but does not present quantified validation of 'foundational' status.
high positive From Future of Work to Future of Workers: Addressing Asympto... balance between productivity and preservation of expertise
AI delivers initial operational/productivity gains in high-stakes work settings.
Claimed empirical observation from the year-long study (abstract: 'Initial operational gains'). No quantitative productivity metrics reported in abstract.
high positive From Future of Work to Future of Workers: Addressing Asympto... operational gains / productivity
The framework operationalizes 'sociotechnical immunity' via dual-purpose mechanisms that both serve institutional quality goals and build worker power to detect, contain, and recover from skill erosion while preserving human identity.
Descriptive claim about the nộive of the proposed framework as stated in the abstract; no empirical performance metrics provided in abstract.
high positive From Future of Work to Future of Workers: Addressing Asympto... mechanisms for detection/containment/recovery from skill erosion and preservatio...
We offer a framework for dignified Human-AI interaction co-constructed with professional knowledge workers facing AI-induced skill erosion without traditional labor protections.
Paper contribution: proposed framework described as co-constructed with knowledge workers; abstract states aim and intended beneficiaries but does not report empirical validation details in the abstract.
high positive From Future of Work to Future of Workers: Addressing Asympto... design of human-AI interaction frameworks to mitigate skill erosion and protect ...
Clear specifications, explicit governance, and ongoing human-AI collaboration are critical for successful scaling of regression automation.
Conclusions and recommendations derived from the case study's lessons and mixed-method evaluation.
high positive Human-AI Collaboration for Scaling Agile Regression Testing:... success of scaling regression automation / effectiveness of human-AI teaming
The Copilot achieves 30-50% code reuse when generating candidate test scripts.
Quantitative result reported in the paper's evaluation (stated 30-50% code reuse in the abstract/summary).
high positive Human-AI Collaboration for Scaling Agile Regression Testing:... code reuse in generated test scripts
Mixed-method evaluation shows the AI accelerates script authoring and increases throughput.
Empirical claim based on the paper's mixed-method evaluation (qualitative and quantitative data reported in the case study); specific sample sizes not provided in the summary.
high positive Human-AI Collaboration for Scaling Agile Regression Testing:... script authoring speed and throughput
Automated regression testing is essential for maintaining rapid, high-quality delivery in Agile and Scrum organizations.
Introductory/position statement in the paper; general premise motivating the case study (no specific empirical test reported).
high positive Human-AI Collaboration for Scaling Agile Regression Testing:... ability to maintain rapid, high-quality delivery
AIBuildAI ranks first on MLE-Bench with a medal rate of 63.1%, outperforming all existing baseline methods and matching the capability of highly experienced AI engineers.
Empirical evaluation on MLE-Bench reported in the paper (benchmark ranking, metric = medal rate).
high positive AIBuildAI: An AI Agent for Automatically Building AI Models medal rate (task success rate) on MLE-Bench
AIBuildAI adopts a hierarchical agent architecture in which a manager agent coordinates three specialized sub-agents: a designer for modeling strategy, a coder for implementation and debugging, and a tuner for training and performance optimization; each sub-agent is itself an LLM-based agent capable of multi-step reasoning and tool use, enabling end-to-end automation of the AI model development process that goes beyond the scope of existing AutoML approaches.
System architecture description in the paper (methods/architecture section).
high positive AIBuildAI: An AI Agent for Automatically Building AI Models system architecture and claimed capabilities (multistep reasoning, tool use, end...
We introduce AIBuildAI, an AI agent that automatically builds AI models from a task description and training data.
Methodological contribution: system design and implementation described in the paper (introduction/methods).
high positive AIBuildAI: An AI Agent for Automatically Building AI Models ability to produce AI models from task descriptions and training data