The Commonplace
Home Papers Evidence Explore Syntheses Digests About 🎲 Workforce Futures
Direction, evidence grade, and study type are AI-generated labels (gpt-5-mini), not human-verified. Syntheses are LLM-written. "Tensions" are machine-detected candidates, not confirmed contradictions. A research-acceleration tool, not peer review. How this is built →

Evidence (14922 claims)

Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.

The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).

Browse by theme

Nine broad, paper-level topics. Click one to filter the claims below.

Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filter claims →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →

Claims by outcome category

Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.

Outcome Positive Negative Mixed Null Total
Other 795 210 105 955 2131
Governance & Regulation 886 414 197 126 1654
Organizational Efficiency 826 204 129 87 1257
Technology Adoption Rate 681 259 128 110 1189
Research Productivity 464 138 65 349 1028
Output Quality 503 196 61 53 813
Decision Quality 351 180 84 51 673
AI Safety & Ethics 238 288 71 34 637
Firm Productivity 455 58 92 20 631
Market Structure 186 172 123 25 511
Task Allocation 222 70 76 34 407
Innovation Output 238 28 48 18 334
Skill Acquisition 177 62 62 17 318
Employment Level 107 57 108 13 287
Fiscal & Macroeconomic 135 72 44 26 284
Firm Revenue 172 50 28 5 256
Consumer Welfare 121 68 45 12 246
Task Completion Time 183 33 10 13 240
Inequality Measures 45 126 50 6 227
Worker Satisfaction 95 74 23 12 204
Error Rate 77 98 11 4 190
Regulatory Compliance 84 73 17 7 181
Automation Exposure 61 61 27 14 166
Training Effectiveness 98 21 14 19 154
Wages & Compensation 78 37 25 6 146
Developer Productivity 105 18 14 6 144
Team Performance 87 17 28 10 143
Job Displacement 12 83 23 1 119
Hiring & Recruitment 53 8 8 3 72
Social Protection 39 17 8 2 66
Creative Output 32 20 8 3 64
Skill Obsolescence 5 50 6 1 62
Labor Share of Income 17 20 17 54
Worker Turnover 15 15 3 33
Industry 1 1
The reasoning preset (1:5 input:output) elevates frontier closed models that the chat preset penalizes on price.
Observed leaderboard reordering when using a reasoning workload preset (1:5) compared to the chat preset; specific elevation of frontier closed models noted (no numeric counts provided in excerpt).
medium positive Token Arena: A Continuous Benchmark Unifying Energy and Cogn... leaderboard ranking changes for frontier closed models under the reasoning prese...
To our knowledge, this is the first demonstration of an AI agentic system autonomously identifying and experimentally validating a nontrivial, previously unreported physical mechanism.
Authors' novelty claim, supported by the reported autonomous proposal and experimental validation of the optical bilinear interaction in their study.
medium positive End-to-end autonomous scientific discovery on a real optical... novelty of AI-driven autonomous experimental discovery (identification + experim...
Qiushi Engine converts an abstract coherence-order theory into experimental observables, providing the first observation of this class of coherence-order structure.
Reported experimental procedure translating coherence-order theory into measurable observables and claiming the first observation of that class of structure; experimental data and analysis presented in the paper supporting the observation.
medium positive End-to-end autonomous scientific discovery on a real optical... observation of coherence-order structure predicted by theory
Gradient attribution is established as a computationally validated signal for model-informed reward allocation in participatory weather sensing.
Synthesis/conclusion in paper based on the computational experiments and evaluations (results across >400 configurations demonstrating fidelity and limitations).
medium positive Calibrating Attribution Proxies for Reward Allocation in Par... validity of gradient attribution as a reward allocation signal
Attribution captures near-optimal sensor placement utility with monotonically faithful payments.
Comparative experiments in the paper showing that gradient attribution corresponds closely to near-optimal sensor placement utility and yields monotonically faithful payment signals (experimental comparisons to optimal/benchmark placements).
medium positive Calibrating Attribution Proxies for Reward Allocation in Par... sensor placement utility captured by attribution; monotonicity/faithfulness of p...
Principled symbolic abstraction bridges generative AI and the numerical precision required for engineering design.
Broader conclusion drawn by the authors based on the reported modular architecture, symbolic lifting operator, and experimental improvements in geometric error and structural validity.
medium positive Language Models Refine Mechanical Linkage Designs Through Sy... ability to combine generative models with numerical precision for engineering ta...
These simulations produce rich experiential learning signals, whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations.
Evaluation experiments reported in the paper claiming statistically/qualitatively significant improvements in agent performance on in-domain and out-of-domain productivity benchmarks after training on simulation-generated signals.
medium positive Synthetic Computers at Scale for Long-Horizon Productivity S... agent performance on productivity evaluations (in-domain and out-of-domain)
Embedding governance into agent reasoning produces more consistent, explainable, and auditable compliance than external enforcement.
Comparative claim asserted in the paper, apparently supported by the reported production deployment results (95% compliance, zero false escalations); explicit experimental comparison details are not provided in the abstract.
medium positive Think Before You Act -- A Neurocognitive Governance Model fo... consistency, explainability, and auditability of compliance
Technology has increased efficiency in organisations based in large cities in India.
Review result statement claiming observed efficiency gains in urban organisations according to the literature summarized; based on reviewed studies (no single sample size reported in excerpt).
medium positive A Comprehensive Review of Technology Adoption and Its Impact... organizational efficiency gains in urban organisations
Controversial questions frequently result in an AIO.
Analysis of the 11,500-query benchmark with annotation/identification of 'controversial' queries and observed higher incidence of AIO generation for those queries.
medium positive How Generative AI Disrupts Search: An Empirical Study of Goo... likelihood of AIO generation for controversial queries
AI-enabled process capability contributes to sustained enterprise value growth.
Authors report empirical associations between PI (AI-enabled process capability) and measures tied to enterprise value (Feltham–Ohlson based abnormal earnings / profitability) across the panel sample.
medium positive A Data-Driven Evaluation Framework for Quantifying the Impac... enterprise value / sustained value growth
Prompt modifications, Chain-of-Thought (CoT) reasoning, and visual token reduction can mitigate visual-priming effects on VLM behavior (with varying effectiveness across models).
Intervention experiments applying prompt engineering, CoT-style prompts, and reducing the number of visual tokens to observe whether these interventions reduce the influence of image content and color cues on IPD choices across several VLMs. (Abstract states these mitigation strategies were explored and their effectiveness varied by model; precise quantitative mitigation effects not provided in abstract.)
medium positive The Effects of Visual Priming on Cooperative Behavior in Vis... reduction in priming-induced changes to cooperation/defection choices after appl...
Generative AI is increasingly embedded in China's short-video production.
Authors' background claim supported by qualitative data collection with 16 in-depth interviews of short-video creators active on Xiaohongshu and Douyin; observational grounding in participant reports.
medium positive AI passing and invisible authenticity labor: trust vulnerabi... degree of AI adoption in short-video production
Reliability did not come from the base model alone; it emerged from the operating layer around the model (prompt compilation, typed controls, policy validation, execution guards, memory design, and trace-level observability).
Comparative analysis of system design and observed failure modes during pre-launch testing and deployment; qualitative/operational reasoning linking operating-layer components to improved reliability.
medium positive Operating-Layer Controls for Onchain Language-Model Agents U... system reliability attributable to operating-layer components
The proposed, validated model can equip fintech managers and regulators with a governance-based approach to tackling algorithmic bias and better position them to engender trust and financial inclusion.
Concluding assertion based on the integrated framework developed from the SLR (45 papers) and the structured five-expert validation; positioned as the intended practical utility of the model rather than an empirically measured outcome.
medium positive Corporate-Governance-Driven Algorithmic Fairness in SME Fint... trust and financial inclusion outcomes resulting from governance-based mitigatio...
Participants showed strong willingness to substitute human IT support at costs well below human benchmarks.
Participant responses and willingness-to-pay / substitution questions collected in the controlled study (reported qualitatively in the paper); comparison to unspecified human-cost benchmarks.
medium positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... willingness to substitute human IT support (cost thresholds / preferences)
Claude 3.5 Sonnet aligns with a narrative funder profile, showing greater responsiveness to qualitative aspects of the pitch, somewhat higher funding levels, and strong cross-run reliability.
Comparative observations across the experiment: Claude 3.5 Sonnet was more responsive to qualitative information in pitch decks, tended to recommend higher funding levels, and demonstrated strong reliability across runs.
medium positive Algorithmic personalities and the myth of neutrality: financ... responsiveness to qualitative aspects; funding levels; reliability
Agentic Architect is the first end-to-end open-source framework for agentic AI architecture exploration and optimization.
Authors' claim in the abstract and paper asserting novelty and open-source release. No independent verification provided in the abstract.
Our results establish C2C as a testbed for studying and building LM-based agents that can navigate the sophisticated coordination required for real-world deployments.
Authors' interpretation/implication based on the experiments and dataset produced (conclusion statement).
medium positive Cooperate to Compete: Strategic Coordination in Multi-Agent ... suitability of C2C as a research testbed
The system provides infrastructure supporting interactive search, cohort generation, and downstream LLM-powered clinical applications without requiring specialized informatics expertise.
Conclusion asserts the deployed system can support interactive search, cohort generation, and downstream LLM applications and that it does not require specialized informatics expertise for these uses.
medium positive Health System Scale Semantic Search Across Unstructured Clin... need_for_specialized_informatics_expertise / applicability_to_downstream_applica...
Health-system-scale semantic search is both technically and operationally feasible.
Authors conclude feasibility based on successful deployment, measured latency, cost, retrieval quality, and clinical utility experiments.
medium positive Health System Scale Semantic Search Across Unstructured Clin... feasibility_of_health_system_scale_deployment
The paper's findings provide practical guidance for selecting between joint and modular training modalities based on environmental conditions to optimize reinforcement learning–based scheduling performance.
Authors' stated implication/conclusion based on their sensitivity analysis and comparative evaluations across environmental regimes.
medium positive An Analysis of the Coordination Gap between Joint and Modula... guidance effectiveness for selecting training modality to optimize performance
Foundation models are strong potential solutions for scalable and generalizable forecasting in the energy domain, particularly in data-constrained and privacy-sensitive settings.
Synthesis and interpretation of benchmark results showing generalization across datasets and better performance in scenarios with limited data; argument made in paper conclusions.
medium positive FETS Benchmark: Foundation Models Outperform Dataset-specifi... suitability of foundation models for data-constrained and privacy-sensitive fore...
These results establish a practical pathway for extending industrial automation with learning-based methods.
Authors' concluding claim based on the reported deployment results (interpretation/implication rather than a new empirical measurement).
medium positive Learning-augmented robotic automation for real-world manufac... practical applicability/adoption potential of learning-based automation methods
The paper proposes a safety-oriented inductive bias for rational AI decision-makers whose desiderata align with implementable policy constraints in high-stakes, low-signal situations.
Theoretical proposal and normative argument in the paper linking the proposed inductive bias (negligibility threshold and associated norms) to policy-implementable constraints; argued rather than empirically demonstrated.
medium positive Bounding the Long Tail: Ai Norms for Decision-Making Under N... alignment of a proposed inductive bias with implementable policy constraints; im...
These patterns are consistent with transfer emerging through accumulated interaction between owners (or owners' computer environments) and their agents in everyday use.
Interpretation offered by the authors based on observed alignment patterns and robustness checks; the paper argues consistency with an interaction-driven transfer mechanism rather than providing a direct experimental causal test.
medium positive Behavioral Transfer in AI Agents: Evidence and Privacy Impli... inferred_mechanism_of_transfer (accumulated_interaction)
This transfer persists among agents without explicit configuration.
Subgroup analyses (described in paper) isolating agents lacking explicit configuration settings and comparing behavioral alignment to owners; reported persistence of alignment in that subgroup.
medium positive Behavioral Transfer in AI Agents: Evidence and Privacy Impli... behavioral_alignment_in_unconfigured_agents
Trade unions have increasingly pursued algorithmic transparency and stronger technology governance rights through collective bargaining, and governments are accelerating legislative initiatives to establish and protect workplace technology rights.
Descriptive review of labor-movement responses and recent government legislative initiatives reported in the literature (case studies and policy reviews).
medium positive From Technological Substitution to Institutional Response: A... union bargaining activity and government legislative action on workplace technol...
Using these artifacts shifts human effort toward higher-level design and validation activities.
Reported as a preliminary finding from the exploratory evaluation; the abstract states that human effort shifted from low-level implementation to higher-level design/validation when artifacts were embedded (no sample size or time-allocation metrics provided).
medium positive Shift-Up: A Framework for Software Engineering Guardrails in... allocation of human effort to design and validation
Embedding machine-readable requirements and architectural artifacts stabilizes agent behavior.
Reported as a preliminary finding from the exploratory evaluation comparing approaches; the abstract states that embedding such artifacts stabilizes agent behavior (no numeric metrics or sample size reported).
medium positive Shift-Up: A Framework for Software Engineering Guardrails in... agent behavior stability
The central obstacle to agent self-improvement is not what to remember but how to use what has been remembered (which retrieval policy to apply, how to interpret prior outcomes, and when the current strategy itself must change).
Conceptual claim supported by authors' argumentation and by the experimental results (ablation showing gains from reflection/use mechanisms rather than added architectural complexity).
medium positive AEL: Agent Evolving Learning for Open-Ended Environments bottleneck characterization for agent self-improvement
Visibility mechanisms, such as public algorithm registers or role-sensitive explainability, can be effective tools in regaining citizen trust.
Review examines studies on transparency/visibility mechanisms; abstract states these mechanisms are examined for effectiveness but does not report definitive quantitative results or study counts.
medium positive Artificial Intelligence, Public Policy and Governance - impl... citizen trust in algorithmic governance
By capturing complete interaction traces with human vs. agent code authorship attribution, SWE-chat provides an empirical foundation for moving beyond curated benchmarks towards an evidence-based understanding of how AI agents perform in real developer workflows.
Claims about dataset capabilities and intended use: the dataset contains interaction traces and authorship labels enabling empirical research; asserted by authors as an implication of the dataset contents.
medium positive SWE-chat: Coding Agent Interactions From Real Users in the W... utility of SWE-chat for empirical research and benchmark improvement
Enterprise deployment of long-horizon decision agents in regulated domains (underwriting, claims adjudication, tax examination) is dominated by retrieval-augmented pipelines despite a decade of increasingly sophisticated stateful memory architectures.
Stated observation/argument in the paper's introduction; no empirical sample size or systematic industry survey reported in the abstract.
medium positive Stateless Decision Memory for Enterprise AI Agents prevalence of retrieval-augmented pipelines in enterprise deployment
An accompanying open-source interactive tool, the Co-creation Provenance Lab, enables policymakers to audit and iteratively improve summaries, establishing genuine human-in-the-loop oversight at scale.
Statement in the paper about an open-source tool released alongside the research; likely demonstration or software repository provided.
medium positive Participatory provenance as representational auditing for AI... availability and claimed capability of the Co-creation Provenance Lab to support...
AI adoption enhances the reliability of financial reporting and the effectiveness of audits by reducing information asymmetry and strengthening internal monitoring processes.
Argument grounded in theory and supported empirically via SEM showing AI adoption associated with greater reporting transparency and internal control quality, which are linked to higher audit quality.
medium positive Artificial Intelligence Adoption in Financial Reporting and ... financial reporting reliability and audit effectiveness (via reduced information...
AI-enabled reporting systems strengthen firm-level governance mechanisms (e.g., reporting transparency and internal controls), which enhances audit quality (governance substitution perspective complemented by institutional and technology diffusion theories).
Theoretical framing (governance substitution, institutional and technology diffusion theories) combined with empirical SEM results linking AI adoption to proxies for governance (reporting transparency, internal control quality) and to audit quality.
medium positive Artificial Intelligence Adoption in Financial Reporting and ... firm-level governance mechanisms (reporting transparency, internal control quali...
Differences in institutional quality, digital infrastructure, and absorptive capacity explain the disparity in technology impacts between GCC and non-GCC countries.
Exploratory/mediation or interaction analysis linking institutional quality, measures of digital infrastructure, and absorptive capacity to heterogeneity in estimated technology effects across countries in the panel.
medium positive Digital Transformation, AI Efficiency, and Sustainable Devel... heterogeneity in the effect of digital transformation/AI on sustainable developm...
The capital market evaluates AI investment as a future 'growth option' selectively in industries with strong data infrastructure, digital workforce readiness, and absorptive capacity.
Inference from heterogeneous positive Tobin's Q effect found in the ICT industry and null average effect across all firms; authors argue market valuation responds to industry-specific complementary assets and ecosystem conditions.
medium positive The Dynamic Causal Effects of Corporate AI Adoption on Profi... market valuation response to AI investment (interpreted as growth-option pricing...
Pair programming between students is well studied and known to be beneficial to self-efficacy and academic achievement.
Background literature claim presented in the paper's introduction (cites existing research on pair programming benefits).
medium positive Fast and Forgettable: A Controlled Study of Novices' Perform... self-efficacy and academic achievement associated with pair programming
Developing and further developed countries only integrate with China, signaling China's expanding influence over the international AI research landscape.
Observed integration patterns in the publication-based collaboration and citation networks showing that (some) developing and further developed countries connect primarily with China rather than the US; comparison to randomized networks.
medium positive Polarization and Integration in Global AI Research international research integration of developing and further developed countries...
The calibration mapping suggests Google and OpenAI face conditions most conducive to foreclosure.
Outcomes of the paper's stylized calibration/comparative mapping across four providers (April 2026 data); authors' interpretation.
medium positive The Inference Bottleneck: A Formal Model of Vertical Foreclo... conduciveness to foreclosure
Artificial intelligence algorithms are increasingly used by firms to set prices.
Statement in paper's introduction/abstract referencing prior adoption trends; no specific empirical study or sample reported in the excerpt.
medium positive Convergence to collusion in algorithmic pricing use/adoption of AI algorithms for pricing by firms
The proposed approach aligns machine learning with actuarial portfolio optimization by explicitly integrating profit-driven objectives and operational constraints, offering two practical and scalable solutions for risk-based decision-making in real-world insurance settings.
Conceptual claim supported by the combination of methodological design and empirical results presented in the paper (method descriptions + experimental validation).
medium positive Advanced Insurance Risk Modeling for Pseudo-New Customers Us... risk-based_decision-making_effectiveness
The balanced ensemble provides the most favourable trade-off between predictive performance, robustness, interpretability, and computational efficiency, making it suitable for deployment in regulated insurance environments.
Authors' synthesis of experimental results (performance, robustness tests, interpretability considerations, and computational efficiency measurements) and discussion regarding regulatory deployment suitability.
medium positive Advanced Insurance Risk Modeling for Pseudo-New Customers Us... suitability_for_deployment / trade-off_between_metrics
These variables (education, gender inclusiveness, digital literacy, perceived fairness) are mutually dependent and the use of AI combined with inclusive policies is necessary to sustainably realize financial inclusion.
Paper asserts mutual dependence based on SEM results and provides a policy recommendation that AI plus inclusive policies are necessary, citing prior literature (Salami et al., 2025; Berg et al., 2019; Fuster et al., 2021).
medium positive A Machine Learning Perspective on FinTech-Driven Inclusion: ... sustainable financial inclusion
Synthetic experiments complement the theoretical results and showcase the benefits of collective action across different market regimes.
Simulation-based experiments described in the paper (synthetic experiments across market regimes). Paper does not report a real-world sample size; results are from computational experiments.
medium positive Stochastic wage suppression on gig platforms and how to orga... benefit of collective action (improvements in wages/total spending across simula...
Spatial heterogeneity: Eastern regions are driven by knowledge recombination opportunities.
Reported spatial heterogeneity findings indicating Eastern China’s diffusion is driven more by recombination/opportunity measures than by reliance on core hubs.
medium positive Mapping China’s digital transformation: a multilayer network... drivers of diffusion in Eastern regions (knowledge recombination)
Spatial heterogeneity: Western regions rely heavily on core technological hubs.
Spatial analysis / heterogeneity results reported by region indicating Western China depends on core technological hubs as diffusion sources or anchors.
medium positive Mapping China’s digital transformation: a multilayer network... regional dependence on core technological hubs (Western regions)
Heterogeneity analysis: market-driven enterprises heavily rely on high-value core technologies.
Reported heterogeneity results indicating enterprises (market-driven actors) concentrate on and depend upon core, high-value technologies within identified diffusion paths.
medium positive Mapping China’s digital transformation: a multilayer network... enterprises' reliance on core high-value technologies