The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (2432 claims)

Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 369 105 58 432 972
Governance & Regulation 365 171 113 54 713
Research Productivity 229 95 33 294 655
Organizational Efficiency 354 82 58 34 531
Technology Adoption Rate 277 115 63 27 486
Firm Productivity 273 33 68 10 389
AI Safety & Ethics 112 177 43 24 358
Output Quality 228 61 23 25 337
Market Structure 105 118 81 14 323
Decision Quality 154 68 33 17 275
Employment Level 68 32 74 8 184
Fiscal & Macroeconomic 74 52 32 21 183
Skill Acquisition 85 31 38 9 163
Firm Revenue 96 30 22 148
Innovation Output 100 11 20 11 143
Consumer Welfare 66 29 35 7 137
Regulatory Compliance 51 61 13 3 128
Inequality Measures 24 66 31 4 125
Task Allocation 64 6 28 6 104
Error Rate 42 47 6 95
Training Effectiveness 55 12 10 16 93
Worker Satisfaction 42 32 11 6 91
Task Completion Time 71 5 3 1 80
Wages & Compensation 38 13 19 4 74
Team Performance 41 8 15 7 72
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 17 15 9 5 46
Job Displacement 5 28 12 45
Social Protection 18 8 6 1 33
Developer Productivity 25 1 2 1 29
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 7 4 9 20
Clear
Labor Markets Remove filter
Framing decisions as contestable and revisable (via dialectical challenge and update) increases robustness and trust in AI-supported decision-making.
Conceptual claim arguing that contestability/revision improve robustness and trust; no experimental evidence or user studies provided.
medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... measures of robustness (resilience to error) and human trust in decisions
Running formal dialectical/acceptability semantics and dialogue protocols over AFs enables agents that reason with humans through structured debates and revisions.
Conceptual integration of formal semantics (Dung-style, bipolar, weighted) and dialogue protocols; no human-subject studies or system evaluations reported.
medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... capacity for structured debate/revision (dialogue performance, acceptability out...
Argumentation Framework Synthesis: mined fragments can be combined into coherent formal argumentation frameworks (AFs) with explicit semantics enabling verification and automated inference.
Conceptual algorithmic proposal (graph synthesis, canonicalization, formal semantics); no empirical synthesis results or benchmarks presented.
medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... coherence and correctness of synthesized AFs and verifiability of derived infere...
Argumentation Framework Mining: LLMs and NLP pipelines can be used to extract claims, premises, relations (attack/support), and provenance from text corpora.
Proposed methodological pipeline (fine-tuning/prompting LLMs and IE pipelines); conceptual proposal without implementation details or experimental results.
medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... accuracy/fidelity of extracted argument elements (claims, premises, relations, p...
Combining formal argument structures with LLMs’ ability to mine and generate rich, contextual arguments from unstructured text promises human-aware, verifiable, and trustable AI for high‑stakes domains.
Conceptual synthesis of computational argumentation (formal AFs) and LLM capabilities; no empirical validation or quantified metrics provided.
medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... trustworthiness/verifiability of AI outputs in high-stakes decision contexts
Integrating computational argumentation with large language models (LLMs) creates a new paradigm—Argumentative Human-AI Decision‑Making—where AI agents participate in dialectical, contestable, and revisable decision processes with humans.
Conceptual / design argument presented in the paper; no empirical implementation or sample; draws on prior work in computational argumentation and capabilities of LLMs.
medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... degree of human-AI dialectical participation (ability to engage in contestable, ...
There will likely be growth in complementary markets for model verification, provenance tracking, legal-AI audits, and human-in-the-loop workflow services.
Market foresight based on identified unmet needs (explainability, verification) and illustrative examples; no market-sizing data.
medium positive Why Avoid Generative Legal AI Systems? Hallucination, Overre... market size and growth rates for verification/audit and related services
Open-source orchestration and evaluation harnesses plus a self-contained evaluation pipeline improve reproducibility for the Speedrunning Track.
Paper claims and documents the release of orchestration and evaluation code and describes the self-contained pipeline designed for deterministic reproducible evaluation.
medium positive The PokeAgent Challenge: Competitive and Long-Context Learni... reproducibility capability via released code and self-contained pipelines
There is a need for policies supporting workforce transitions (retraining, portability of skills) and safety/regulation for embodied agents operating in public spaces.
Policy recommendation grounded in anticipated labor and safety risks; proposed but not empirically evaluated.
medium positive Why AI systems don't learn and what to do about it: Lessons ... policy adoption; retraining program coverage; safety/regulatory frameworks imple...
Benchmarks and tasks that mix observation and intervention (imitation with sparse feedback, active imitation, transfer under domain shift, continual learning streams) are required to evaluate the architecture.
Proposal for evaluation tasks and benchmarks; not empirically validated in the paper.
medium positive Why AI systems don't learn and what to do about it: Lessons ... benchmark performance on mixed observation-intervention tasks
Embodied robotics experiments are necessary to evaluate real-world constraints such as sample efficiency, physical affordances, and motor learning.
Methodological recommendation recognizing simulation-to-real gaps; no experiments reported.
medium positive Why AI systems don't learn and what to do about it: Lessons ... sample efficiency and performance in real-world embodied tasks
Simulated environments (procedural, nonstationary), multi-agent social domains, and open-world 3D simulators are appropriate for scalable iteration to test the proposed architecture.
Methodological recommendation and suggested experimental approaches; not tested in the paper.
medium positive Why AI systems don't learn and what to do about it: Lessons ... suitability and scalability of simulation platforms for architecture evaluation
Neuromodulatory systems and meta-decision circuits in animals provide analogies for implementing meta-control (M) in artificial systems.
Neuroscience analogy cited to motivate architectural choices; not empirically instantiated in the paper.
medium positive Why AI systems don't learn and what to do about it: Lessons ... effectiveness of biologically inspired gating/plasticity mechanisms on learning ...
Developmental trajectories can scaffold gradual competence (from observation to exploratory action) and should be reflected in training curricula.
Argument from developmental biology and learning theory; proposed as a design principle rather than empirically tested here.
medium positive Why AI systems don't learn and what to do about it: Lessons ... learning progression speed; final competence given staged curricula
Evolution supplies inductive biases and slow structural priors that can be leveraged in artificial learners.
Biological analogy and theoretical suggestion; no empirical experiments presented to quantify effect in AI systems.
medium positive Why AI systems don't learn and what to do about it: Lessons ... effect of structural priors on learning speed and generalization
LLMs are more likely to complement human tacit skills than to replace explicit rule‑following jobs; value accrues to workers and firms that integrate model outputs with human judgment and tacit expertise.
Labor‑economics style argument and theoretical reasoning; no empirical labor market analysis provided.
medium positive Why the Valuable Capabilities of LLMs Are Precisely the Unex... complementarity vs substitution of human labor (especially tacit-skill jobs)
Commoditization via rule extraction is limited; firms that can harness and deploy tacit LLM capabilities will retain economic rents.
Theoretical economic argument based on non‑rule‑encodability; no empirical firm‑level data included.
medium positive Why the Valuable Capabilities of LLMs Are Precisely the Unex... ability to commoditize/replicate LLM capabilities via rule extraction
The highest‑value attributes of LLMs may be inherently non‑decomposable into simple, auditable rules, which increases the value of proprietary, black‑box models and strengthens economies of scale and scope for large model providers.
Economic reasoning and theoretical implications drawn from the central thesis; no empirical market analyses provided.
medium positive Why the Valuable Capabilities of LLMs Are Precisely the Unex... value capture by model providers (proprietary rents/economies of scale)
Some LLM capabilities are tacit, practice‑derived, or 'insight'‑like, akin to the Chinese concept of Wu (sudden insight through practiced skill).
Philosophical framing and analogy to the concept of tacit knowledge (Wu); argumentative rather than empirical support.
medium positive Why the Valuable Capabilities of LLMs Are Precisely the Unex... characterization of LLM competence as tacit/insight-like
The economically valuable capabilities of large language models are precisely those that cannot be fully encoded as a complete, human‑readable set of discrete rules.
Formal, conceptual argument (proof by contradiction) plus qualitative historical case analysis comparing expert systems and LLMs; no new empirical datasets or experiments reported.
medium positive Why the Valuable Capabilities of LLMs Are Precisely the Unex... economic value / capability of LLMs (degree of rule‑encodability vs tacitness)
Distilling corrected decision trajectories into the model via supervised fine-tuning produces better recovery behavior than relying solely on reward signals or final-outcome optimization.
Comparative training setup where LEAFE uses supervised fine-tuning on corrected trajectories and is empirically compared to outcome-driven methods (e.g., GRPO) that optimize rewards; improved Pass@k reported.
medium positive Internalizing Agency from Reflective Experience Recovery behavior performance reflected in Pass@k (success rates) after training
LEAFE's gains occur across diverse interactive coding and agentic tasks with limited interaction budget.
Reported evaluation across a suite of long-horizon tasks (examples include multi-step coding problems and agentic tasks with rich feedback channels) with consistent improvements claimed.
medium positive Internalizing Agency from Reflective Experience Pass@k across multiple task types (interactive coding and agentic tasks)
LEAFE uses the same environmental interactions more effectively, improving sample efficiency under fixed interaction budgets.
Experimental regime with fixed interaction budgets demonstrating higher Pass@k for LEAFE relative to baselines given the same number of environment interactions; paper argues LEAFE converts richer feedback into targeted training signals rather than only final rewards.
medium positive Internalizing Agency from Reflective Experience Sample efficiency operationalized as Pass@k achieved under fixed interaction bud...
LEAFE converts rich environment feedback into actionable corrective supervision rather than optimizing only final success signals, which drives performance gains.
Algorithmic description: LEAFE summarizes error messages/intermediate observations into experience items, backtracks to causal decision points, explores corrective branches, and distills corrected trajectories via supervised fine-tuning. Empirical comparisons show improved Pass@k relative to reward-only/outcome-driven baselines.
medium positive Internalizing Agency from Reflective Experience Pass@k performance; also qualitative measure of learned recovery behavior (impli...
Overall conclusion: forecast-then-execute (anticipatory trajectory reasoning) is an effective principle for building multimodal agents capable of reasoning, planning, and acting in complex environments.
Paper's Conclusion in the provided summary asserts this, based on the reported experimental comparisons and the two-stage TraceR1 framework.
medium positive Anticipatory Planning for Multimodal AI Agents agent capability on complex, multi-step multimodal tasks (planning, reasoning, a...
The paper reports improvements in planning stability (consistency of multi-step plans), execution robustness (success under environment/tool variability), and generalization (out-of-distribution tasks and unseen tool/environment states).
Reported outcomes in the summary explicitly list these three improvement categories; the specific metrics and magnitudes are not provided in the summary.
medium positive Anticipatory Planning for Multimodal AI Agents planning stability, execution robustness, generalization
Compared to reactive agents that optimize actions stepwise without trajectory anticipation, TraceR1 yields better multi-step planning and execution.
Baselines & comparisons described in the summary include reactive agents; the paper reports improvements of TraceR1 relative to these baselines across the benchmarks (no numeric values in the provided text).
medium positive Anticipatory Planning for Multimodal AI Agents multi-step planning stability, execution success rate
Explicit anticipatory (trajectory-level) reasoning is a crucial design principle for reliable multi-step task performance in complex real-world environments.
Paper reports comparisons between anticipatory (trajectory-forecasting) agents and reactive / single-stage baselines, concluding the anticipatory design yields better multi-step reliability; exact experimental details and statistics not included in the provided summary.
medium positive Anticipatory Planning for Multimodal AI Agents multi-step task reliability (task success over sequences), plan coherence
TraceR1 materially improves planning coherence, execution robustness, and generalization in multimodal, tool-using agents versus reactive or single-stage baselines.
Reported evaluation across seven benchmarks (online and offline computer-use, multimodal tool-use reasoning) comparing TraceR1 to reactive agents and single-stage RL baselines; summary states 'substantial gains' though no numerical results are provided in the provided text.
medium positive Anticipatory Planning for Multimodal AI Agents planning coherence (stability), execution robustness (success rate under variabi...
Policy instruments that can support shorter workweeks include tax incentives for firms that maintain pay while reducing hours, regulatory transition frameworks, and conditionality on AI subsidies or public procurement tied to job-preservation or reduced hours.
Policy-analytic argument drawing on standard policy toolkits and selected prior examples; no new policy pilot results presented.
medium positive A Shorter Workweek as a Policy Response to AI-Driven Labor D... adoption rate of shorter workweeks, preservation of pay, conditionality complian...
Shorter workweeks help sustain consumer purchasing power by reducing aggregate labor supply and thereby distributing automation gains more equitably.
Theoretical labour-supply reasoning plus historical case studies of work-time reductions; argumentual and normative rather than demonstrated with new macroeconomic empirical tests in AI-rich settings.
medium positive A Shorter Workweek as a Policy Response to AI-Driven Labor D... consumer purchasing power, distribution of productivity/earnings gains
A gradual, policy-driven reduction in the standard workweek can absorb labor displaced by automation, help maintain employment levels, and preserve wages per hour.
Synthesis of prior empirical findings on work-hour reductions and historical precedents (e.g., six-day to five-day transition); no new randomized or large-scale contemporary trials presented.
medium positive A Shorter Workweek as a Policy Response to AI-Driven Labor D... employment levels, hours worked per worker, hourly wages
Firms use layoffs strategically to signal efficiency and boost short-term stock prices, even when automation is not fully substitutive.
Organizational- and finance-literature synthesis on signaling and market reactions to cost-cutting; historical/case examples referenced rather than new econometric estimates.
medium positive A Shorter Workweek as a Policy Response to AI-Driven Labor D... short-term stock price/market reaction following layoffs; incidence of layoffs u...
Employers are increasingly demanding digital literacy, basic data competencies, and stronger communication and interpersonal skills.
Employer survey analysis tracking changes in required skills; descriptive summary of survey frequencies and employer-reported skill priorities. Survey sample size and representativeness not specified in summary.
medium positive The AI Transition: Assessing Vulnerability and Structural Re... frequency/intensity of employer-reported demand for specific skills (digital lit...
Some occupations experience efficiency and productivity gains where AI complements tasks, implying complementarity effects for those jobs.
Qualitative case studies of firms and employer survey reports documenting productivity/efficiency improvements in certain roles following AI adoption; descriptive analysis of sectoral/occupational outcomes. Quantitative magnitude not specified.
medium positive The AI Transition: Assessing Vulnerability and Structural Re... productivity or efficiency gains at job/occupation level (firm-reported producti...
Policymakers should prioritize retraining programs, strengthened social protection, and redistributive policies to mitigate automation-induced unemployment and inequality.
Policy recommendation based on the author's synthesis of risks and expert judgment; not based on an empirical intervention study in the paper.
medium positive DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... mitigation of technological unemployment and inequality (employment rates, incom...
There has been progress in software import substitution, contributing to partial technological sovereignty in Russia.
Use of statistics on software import substitution (authors reference national statistics but do not report detailed numbers or methodology).
medium positive DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... software import substitution rate / domestic share of software supply
Digitalization enables management optimization (improved management processes and decision-making) in Russian enterprises and public administration.
Qualitative analysis of policy documents and expert assessment by the author; no empirical evaluation or quantified effect sizes provided.
medium positive DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... management efficiency/optimization (process improvements, decision-making qualit...
Digitalization has produced measurable labor productivity growth in segments of the Russian economy.
Author's interpretation drawing on national statistics and strategic documents; statistical details (period, sectors, sample sizes) not specified in the paper.
medium positive DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... labor productivity (aggregate or sectoral productivity indicators)
Policy implication: prioritize large-scale, targeted reskilling and lifelong learning programs to enable workforce adaptability and capture AI complementarity gains.
Policy recommendations derived from the paper's findings (association between AI adoption and skill shifts, heterogeneous sectoral impacts) and the literature synthesis that links reskilling interventions to better labor outcomes; recommendation is prescriptive rather than empirically tested within the study.
medium positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Policy effect is recommended but not empirically measured in the study (intended...
The paper provides empirical support for the complementarity hypothesis: AI tends to reconfigure jobs and create hybrid roles rather than eliminate employment wholesale.
Convergence of simulated sectoral employment patterns (some sectors showing net gains and hybrid-role growth), the strong correlation between AI adoption and skill shifts (r = 0.71), and corroborating studies from the literature synthesis emphasizing augmentation and hybridization mechanisms.
medium positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Employment change and hybrid job share (evidence for complementarity vs. substit...
Institutional reskilling programs and governance frameworks markedly moderate labor-market outcomes: better frameworks correlate with more complementarities and lower net job loss.
Integration of literature-derived mechanisms with simulated empirical patterns; paper reports correlations/moderation-style comparisons across simulated sector-year cases incorporating policy/institutional variables (described in methods), supported by studies in the systematic review linking policy interventions to labor outcomes.
medium positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Net employment change; measures of complementarity (e.g., hybrid share) conditio...
Healthcare and IT Services experienced net employment gains consistent with AI complementarity (augmented tasks and creation of new hybrid roles).
Simulated sectoral employment trends and net-change metrics for Healthcare and IT Services (2020–2024) presented in the paper, supported by literature synthesis examples showing human–AI complementarities in these sectors.
medium positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Employment levels and net change by sector (Healthcare, IT Services)
The largest rises in hybrid jobs occurred in IT Services and Healthcare.
Sectoral decomposition of hybrid job share trends in the simulated dataset across the seven industries (2020–2024) and supporting qualitative/quantitative findings from the literature synthesis focused on IT Services and Healthcare.
medium positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Hybrid job share by sector (IT Services, Healthcare)
Hybrid human–AI jobs increased substantially across all seven analyzed sectors between 2020 and 2024.
Descriptive trend analysis of the simulated dataset's hybrid job share metric (fraction of roles reclassified as human–AI hybrid) for the seven industries over 2020–2024, combined with corroborating examples from the literature synthesis (selected ACM/IEEE/Springer studies 2020–2024).
medium positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Hybrid job share (sector-level, 2020–2024)
A matching/ranking algorithm that scores candidate-job pairs by skill fit and predicted remuneration (and proximity) improves the alignment of workers to short-term gigs.
System incorporates a ranking algorithm combining inferred-skill fit, predicted wages, and proximity constraints; pilot comparison reported improved matches, but quantitative algorithmic performance metrics are not provided in the summary.
medium positive AI-Driven Skill Mapping and Gig Economy Matching Algorithm f... match alignment/fit metrics; placement rates
ML models can continuously derive available gigs and demand signals from marketplace activity, producing up-to-date opportunity lists and predicted wages.
Implemented ML models ingest real-time market activity/platform signals in the pilot to generate opportunity lists and wage predictions; no reported out-of-sample accuracy or prediction error metrics in the summary.
medium positive AI-Driven Skill Mapping and Gig Economy Matching Algorithm f... availability/recency of opportunity lists; accuracy of predicted wages
Skills can be inferred from multiple nontraditional inputs—self-reported information, short-term work histories, and community recommendations—creating richer profiles beyond formal work experience.
System design uses NLP to normalize and extract skills from profiles, short-term work records, and community recommendations; claim is supported by the implemented data integration approach rather than by quantified external validation in the summary.
medium positive AI-Driven Skill Mapping and Gig Economy Matching Algorithm f... inferred skill coverage/quality or profile richness
The pilot implementation produced higher reported wages for youth matched through the system relative to baseline informal methods.
Pilot comparison reported higher reported wages for matched youth; summary lacks sample size, measurement protocol, and statistical inference.
medium positive AI-Driven Skill Mapping and Gig Economy Matching Algorithm f... reported wages (self-reported earnings)
The pilot implementation led to higher correct matches compared to existing informal search methods.
Pilot deployment compared matching accuracy versus baseline informal job-search approaches; the paper summary reports a 'marked increase' but provides no numerical details, sample size, or significance levels.
medium positive AI-Driven Skill Mapping and Gig Economy Matching Algorithm f... matching accuracy / proportion of correct matches