The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (1902 claims)

Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 369 105 58 432 972
Governance & Regulation 365 171 113 54 713
Research Productivity 229 95 33 294 655
Organizational Efficiency 354 82 58 34 531
Technology Adoption Rate 277 115 63 27 486
Firm Productivity 273 33 68 10 389
AI Safety & Ethics 112 177 43 24 358
Output Quality 228 61 23 25 337
Market Structure 105 118 81 14 323
Decision Quality 154 68 33 17 275
Employment Level 68 32 74 8 184
Fiscal & Macroeconomic 74 52 32 21 183
Skill Acquisition 85 31 38 9 163
Firm Revenue 96 30 22 148
Innovation Output 100 11 20 11 143
Consumer Welfare 66 29 35 7 137
Regulatory Compliance 51 61 13 3 128
Inequality Measures 24 66 31 4 125
Task Allocation 64 6 28 6 104
Error Rate 42 47 6 95
Training Effectiveness 55 12 10 16 93
Worker Satisfaction 42 32 11 6 91
Task Completion Time 71 5 3 1 80
Wages & Compensation 38 13 19 4 74
Team Performance 41 8 15 7 72
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 17 15 9 5 46
Job Displacement 5 28 12 45
Social Protection 18 8 6 1 33
Developer Productivity 25 1 2 1 29
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 7 4 9 20
Clear
Skills Training Remove filter
Participants who received the training delegated a higher percentage of tasks to the agent than participants who did not receive teamwork training.
Between-subjects comparison in KeyWe testbed with a scripted agent; measured percentage of tasks delegated by participants in trained vs. untrained groups.
high positive Teaming Up With an AI Agent: Training Humans to Develop Huma... percentage_of_tasks_delegated_to_agent
A HAT training intervention that took less than 30 minutes was developed to train humans on seven teamwork competencies.
Study description: developed a training intervention under 30 minutes targeting seven teamwork competencies; implemented as part of the experiment.
high positive Teaming Up With an AI Agent: Training Humans to Develop Huma... training_duration_and_content (existence of <30 min training on seven competenci...
Because instructional signals are usable only when the learner has acquired the prerequisites needed to parse them, the effective communication channel depends on the learner's current state of knowledge and becomes more informative as learning progresses.
Theoretical consequence derived from the model's prerequisite-structure assumption and sequential teaching formalization (as described in the abstract).
high positive A Mathematical Theory of Understanding informativeness of communication / effectiveness of instruction over time
Generative AI has transformed the economics of information production, making explanations, proofs, examples, and analyses available at very low cost.
Statement in paper (intro/abstract) asserting an empirical/observational fact about generative AI; no empirical sample or data reported in the abstract.
high positive A Mathematical Theory of Understanding cost of information production / availability of informational artifacts
An approach is needed focused on emerging and future interdependencies between professionals and generative machine learning, implying extending but also reimagining theoretical perspectives on expertise, work and organizations.
Paper's central argument based on theoretical reasoning and literature synthesis about generative ML characteristics and their implications for professionals; method: conceptual/theoretical development; no empirical sample.
high positive Generative machine learning in professional work and profess... interdependencies between professionals and generative ML; implications for theo...
Existing theories need to be extended whilst also responding to the distinctive characteristics of generative machine learning and the implications for how we theorize change.
Argumentative/theoretical claim in the paper based on comparison of features of generative ML with prior digital/algorithmic technologies; method: conceptual analysis and literature engagement; no empirical sample.
high positive Generative machine learning in professional work and profess... scope and adequacy of theoretical perspectives on organizational change
We develop an approach using insights from existing literature on digital, algorithmic and artificial intelligence technologies.
Paper's stated contribution: theoretical development based on synthesis of existing literature (digital, algorithmic, AI). Method: conceptual synthesis; no empirical testing or sample reported.
high positive Generative machine learning in professional work and profess... development of a theoretical approach/framework
There is a need for an approach to theorizing professional work and professional service firms in the generative machine learning age.
Conceptual argument presented in the paper (literature-based rationale); method is theoretical/literature review and argumentation; no empirical sample reported.
high positive Generative machine learning in professional work and profess... theorizing professional work / existence of a required theoretical approach
The technology particularly benefits less experienced practitioners by providing comprehensive starting points for legal research, while experienced attorneys can use it for quality control and initial drafts.
Authors' interpretation of AI outputs from the experiment and reasoning about how those outputs map onto different practitioner needs (qualitative judgment).
high positive Robot Wingman: Using AI to Assess an Employment Termination benefit to practitioners (training/assistance, drafting, quality control)
The analysis reveals AI’s potential to transform law firm economics by dramatically reducing research time while maintaining analytical quality, though careful attorney oversight remains essential.
Inference from the experimental finding that four AI systems produced substantive analysis comparable to junior-associate work on one transcript and the stated observation about traditional research time (8–40 hours); authors' qualitative judgment about economic implications and need for oversight.
high positive Robot Wingman: Using AI to Assess an Employment Termination law firm economics (research time reduction and analytical quality)
Statutory and regulatory citations proved generally accurate and useful.
Authors' examination of statutory and regulatory references produced by the four AI engines in the experiment, judged to be generally correct and helpful.
high positive Robot Wingman: Using AI to Assess an Employment Termination accuracy/usability of statutory and regulatory citations
All four engines successfully spotted legal issues, assessed claim strengths and weaknesses, and suggested follow-up investigation—tasks that traditionally required eight to forty hours of junior attorney research time.
Observed outputs from the four AI engines on the single transcript showing issue-spotting, strengths/weaknesses assessment, and suggested follow-ups; comparison to typical junior attorney research time (stated as 8–40 hours).
high positive Robot Wingman: Using AI to Assess an Employment Termination issue-spotting and assessment quality; implied time savings relative to traditio...
Contemporary generative AI performs sophisticated legal analysis comparable to experienced associates, correctly identifying major employment law claims including ADA violations, Title VII discrimination, OSHA retaliation, FMLA interference, and workers’ compensation retaliation.
Qualitative assessment of outputs from the four AI engines applied to the single hypothetical transcript; comparison against expected legal claims (authors' judgment that outputs matched those an experienced associate would produce).
high positive Robot Wingman: Using AI to Assess an Employment Termination ability to identify relevant legal claims and assess them
Four major generative AI engines—DeepSeek, Claude, ChatGPT, and Grok—are useful legal analysis tools for employment law practitioners.
Experimental evaluation in which a single hypothetical client interview transcript was submitted to each of the four AI systems and their outputs were assessed by the authors.
high positive Robot Wingman: Using AI to Assess an Employment Termination usefulness of AI as legal analysis tools (quality of analysis/output)
Organizational support and continuous learning are important to maximize the benefits of AI integration in startup environments.
Conclusions drawn from thematic analysis of interviews with 12 startup employees emphasizing need for organizational support and ongoing learning.
high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... role of organizational support and continuous learning in realizing AI benefits
AI functions as a workforce augmentation tool that enhances human capabilities rather than replacing employees.
Reported perceptions from 12 startup employees in semi-structured interviews; thematic coding indicated view of AI as augmentation rather than replacement.
high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... AI role relative to job displacement (augmentation vs replacement)
Most employees demonstrated progressive adjustment and competency improvement over time after initial adaptation.
Interview data from 12 startup employees with thematic analysis indicating progressive adjustment and competency gains over time.
high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... progressive adjustment and competency improvement over time
AI improves employee performance by supporting more accurate decision-making and increasing work effectiveness and output quality.
Findings from semi-structured interviews of 12 startup employees, analyzed via thematic coding and frequency scoring, reporting improved decision accuracy and output quality with AI support.
high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... decision-making accuracy, work effectiveness, output quality
AI integration contributes to competency development, particularly in digital literacy, analytical thinking, and adaptive learning.
Qualitative semi-structured interviews with 12 startup employees; thematic coding highlighted competencies (digital literacy, analytical thinking, adaptive learning).
high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... competency development (digital literacy, analytical thinking, adaptive learning...
AI significantly enhances employee productivity by accelerating task completion, reducing manual workload, and improving workflow efficiency.
Qualitative study using semi-structured interviews with 12 startup employees; data analyzed with thematic coding, frequency scoring, and visualized analysis.
high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... employee productivity (task completion speed, manual workload, workflow efficien...
Human-AI systems should be designed under a cognitive sustainability constraint so that gains in hybrid performance do not come at the cost of degradation in human expertise.
Normative recommendation in the paper based on the conceptual/mathematical framework and the identified trade-off; presented as an argument rather than empirically validated policy outcome in the excerpt.
high positive Cognitive Amplification vs Cognitive Delegation in Human-AI ... preservation of human expertise under human-AI design choices
Together, these quantities provide a low-dimensional metric space for evaluating whether human-AI systems achieve genuine synergistic performance and whether such performance is cognitively sustainable for the human component over time.
Claim about the utility of the defined metrics, supported within the paper by the conceptual/mathematical framework and the proposed metric definitions (theoretical demonstration rather than reported empirical validation in the excerpt).
high positive Cognitive Amplification vs Cognitive Delegation in Human-AI ... hybrid human-AI performance and cognitive sustainability
The paper defines a set of operational metrics: the Cognitive Amplification Index (CAI*), the Dependency Ratio (D), the Human Reliance Index (HRI), and the Human Cognitive Drift Rate (HCDR).
Explicit listing of newly proposed operational metrics in the paper; this is a descriptive claim about the paper's content (theoretical definitions), no sample size or empirical estimation provided in the excerpt.
high positive Cognitive Amplification vs Cognitive Delegation in Human-AI ... operational metrics for human-AI cognitive interaction (CAI*, D, HRI, HCDR)
The paper introduces a conceptual and mathematical framework to distinguish cognitive amplification (AI improves hybrid human-AI performance while preserving human expertise) from cognitive delegation (reasoning is progressively outsourced to AI).
Explicit contribution claim in the paper (description of a conceptual and mathematical framework); evidence consists of the model and formal definitions presented in the paper (no external empirical validation reported in the excerpt).
high positive Cognitive Amplification vs Cognitive Delegation in Human-AI ... mode of human-AI interaction (amplification vs delegation)
Given these findings, policymakers should favor 'strategic forbearance'—apply existing laws rather than create new regulations that could stifle innovation and diffusion of AI.
Authors' normative policy recommendation based on their interpretation of the reviewed empirical literature (risk–benefit assessment); this is a prescriptive conclusion rather than an empirical finding, so no sample size applies.
high positive AI, Productivity, and Labor Markets: A Review of the Empiric... regulatory approach to AI governance (strategy of forbearance vs. new regulation...
Generative AI lowers entry costs for startups, facilitating new firm entry and product development.
Cited empirical and descriptive evidence in the literature review indicating reduced development costs and faster product prototyping enabled by AI tools; the brief does not provide a pooled sample size or a single quantitative estimate.
high positive AI, Productivity, and Labor Markets: A Review of the Empiric... barriers to entry / startup costs and rate of new product development
Generative AI significantly boosts productivity in specific tasks like coding, writing, and customer service—often by 15% to 50%.
Synthesis/review of empirical literature through 2025 (multiple empirical studies of task-level impacts, including field and lab studies and observational analyses); the brief reports aggregate reported effect ranges but does not list a single pooled sample size.
high positive AI, Productivity, and Labor Markets: A Review of the Empiric... task-level productivity in coding, writing, and customer service
End-to-end verified pipelines can produce provably correct code from informal specifications.
The paper surveys early research demonstrating pipelines that go from informal specifications to formally verified code; the provided text does not include experimental sample sizes or benchmarks.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... provable correctness of generated code
AI-generated postconditions catch real-world bugs missed by prior methods.
Surveyed early research asserted by the paper indicating empirical instances where AI-generated postconditions found bugs that other methods missed; no numeric details provided in the excerpt.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... bugs detected / error detection rate
Interactive test-driven formalization improves program correctness.
Paper surveys early research that reportedly demonstrates this effect (described as 'interactive test-driven formalization that improves program correctness'); the excerpt does not include specific study details or sample sizes.
The central bottleneck is validating specifications: since there is no oracle for specification correctness other than the user, we need semi-automated metrics that can assess specification quality with or without code, through lightweight user interaction and proxy artifacts such as tests.
Analytical claim and research agenda item in the paper; motivates need for new metrics and interaction designs. No empirical validation or sample size reported in the excerpt.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... ability to validate specification correctness / specification quality
Intent formalization offers a tradeoff spectrum suitable to the reliability needs of different contexts: from lightweight tests that disambiguate likely misinterpretations, through full functional specifications for formal verification, to domain-specific languages from which correct code is synthesized automatically.
Conceptual framework proposed in the paper describing a spectrum of specification formality; presented as an argument rather than an empirical finding, with no sample sizes provided in the excerpt.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... suitability of specification approaches for reliability requirements
Intent formalization — translating informal user intent into checkable formal specifications — is the key challenge that will determine whether AI makes software more reliable or merely more abundant.
Normative argument presented by the authors as the central thesis of the paper; no empirical study or sample size cited in the provided text.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... software reliability (correctness relative to user intent)
Agentic AI systems can now generate code with remarkable fluency.
Authoritative assertion in the paper based on contemporary observations of large code-generating models; no empirical sample size or benchmark numbers reported in the text provided.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... code generation fluency / ability to produce code
In a preregistered mediation model, perceived accountability mediated the AI-over-questionnaire effect on goal progress (indirect effect = 0.15, 95% CI [0.04, 0.31]).
Mediation analysis preregistered and reported in the paper using data from the RCT (N = 517); indirect effect estimate 0.15 with 95% confidence interval [0.04, 0.31].
high positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... goal progress (mediated by perceived social accountability)
The AI chatbot produced significantly higher goal progress than the no-support control at two-week follow-up.
Between-groups comparison in the preregistered RCT (N = 517); reported effect size d = 0.33 and p = .016 for AI vs control on goal progress measured at two-week follow-up.
high positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... goal progress (self-reported goal progress at two-week follow-up)
Rigorous research priorities include randomized controlled trials with long-run follow-ups, cost-effectiveness studies, structural adoption models, and validated metrics for feedback quality and learning durability.
Actionable research recommendations produced by the 50-scholar interdisciplinary meeting; prescriptive synthesis rather than empirical results.
high positive The Future of Feedback: How Can AI Help Transform Feedback t... existence and quality of RCTs and long-run studies; availability of validated me...
Observations span multiple agent platforms (Moltbook, The Colony, 4claw) with more than 167,000 agents interacting as peers.
Author-reported coverage from naturalistic observations across the named platforms during the one-month observation window; count reported as ≈167k agents.
high positive When Openclaw Agents Learn from Each Other: Insights from Em... number of agents observed interacting as peers
Evaluation metrics for the benchmark include task-specific metrics such as win-rate for battling and completion time for speedruns, as well as strategic robustness measures.
Paper's evaluation section lists metrics used: win-rate, completion time, strategic robustness; describes how they are computed and used to compare agents.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... evaluation metrics used (win-rate, completion time, strategic robustness)
Speedrunning Track includes an open-source multi-agent orchestration system and standardized evaluation scenarios for reproducible multi-agent comparisons.
Paper describes and releases an open-source orchestration harness for orchestrating LLMs/agents and provides standardized scenarios and evaluation tools meant for reproducibility.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... availability of open-source orchestration code and standardized evaluation scena...
Community interest in the benchmark was validated by a NeurIPS 2025 competition with 100+ teams and published analyses of winning submissions.
Paper reports organization/validation via a NeurIPS 2025 competition, states participation of 100+ teams, and includes documentation/analyses of top submissions.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... number of competing teams (100+), availability of competition analyses/winning s...
The project is a living benchmark: the Battling Track has a live leaderboard and the Speedrunning Track uses self-contained evaluation to ensure reproducibility.
Paper/documentation notes a live leaderboard for Battling and provides self-contained evaluation pipelines/orchestration for Speedrunning intended to support reproducible runs.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... presence of live leaderboard and self-contained evaluation pipelines
Baselines include heuristic rule-based agents, reinforcement-learning (RL) agents trained for specialist play, and LLM-based agents/harnesses for generalist approaches.
Paper presents baseline implementations and experiments spanning heuristic, RL, and LLM-based agents and describes training procedures and architectures used for each baseline category.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... presence and types of baseline agents (heuristic, RL, LLM)
The benchmark is split into two complementary tracks: a Battling Track (competitive, partial-observability battles) and a Speedrunning Track (long-horizon RPG tasks with a multi-agent orchestration harness).
Paper structure and dataset descriptions specify two tracks, their scopes, and the inclusion of a multi-agent orchestration system for the Speedrunning Track.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... benchmark partitioning (presence of Battling and Speedrunning tracks)
The Battling Track dataset contains more than 20 million recorded battle trajectories.
Paper reports a Battling Track dataset of >20M recorded battle trajectories collected from simulated/match play; size reported explicitly in dataset and methods section.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... number of recorded battle trajectories (>20,000,000)
PokeAgent Challenge is a large, realistic multi-agent benchmark built on Pokemon that stresses partial observability, game-theoretic reasoning, and long-horizon planning simultaneously.
Paper describes design and motivation of the benchmark, detailing two tracks (Battling and Speedrunning) intended to capture partial observability, adversarial/game-theoretic interactions, and long-horizon sequential planning; benchmark implementation built on Pokemon simulator and described task specifications.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... benchmark task characteristics (partial observability, game-theoretic complexity...
iDaVIE's modular architecture supports extensibility (planned features include subcube loading, advanced render modes, video scripting, and collaborative VR sessions).
Paper describes modular architecture and lists planned/possible future features; this is a software design claim rather than an empirical result.
high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... software extensibility and planned feature set
Because iDaVIE is open-source and extensible, software licensing costs are low and marginal adoption costs fall over time.
Paper states iDaVIE is open-source and designed for community-driven enhancements; economic claim based on general properties of open-source software rather than empirical cost accounting.
high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... licensing cost implication and marginal adoption costs
iDaVIE includes interaction features such as selection, cropping/subcube tools, catalogue overlays, and export back to existing pipelines.
Feature list in paper describing selection, cropping, overlays, in-VR metrics and export functionality; demonstrated integration to export edited masks/subcubes.
high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... availability and functionality of in-VR interaction and export tools
Streaming and downsampling pipelines implemented as Unity plug-ins make large volumes interactively viewable in VR while preserving needed detail for inspection.
Technical description of custom Unity plug-ins for streaming/downsampling and on-the-fly statistics; tested on HI cubes (telescopes listed) per the paper.
high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... interactive rendering performance and retention of inspection-relevant detail