The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
Having a diverse team broadens the search for solutions, delays premature consensus, and allows for the pursuit of unconventional approaches.
Theoretical/argumentative claim referencing literature in complex systems and organizational behavior as support; no quantitative evidence or sample reported in the excerpt.
high positive The Future of AI is Many, Not One search breadth, timing of consensus formation, and pursuit of unconventional sol...
Deep intellectual breakthroughs should be expected to come from epistemically diverse groups of AI agents working together rather than singular superintelligent agents.
Predictive/theoretical claim motivated by referenced research and formal results in complex systems, organizational behavior, and philosophy of science; no empirical experiment or sample size given in the excerpt.
high positive The Future of AI is Many, Not One occurrence of deep intellectual breakthroughs (scientific/innovative discoveries...
We should abandon the individual approach if we're hoping for AI to support groundbreaking innovation and scientific discovery.
Normative prescription based on theoretical argument and synthesis of literature from complex systems, organizational behavior, and philosophy of science; no empirical trial or quantified evaluation reported in the excerpt.
high positive The Future of AI is Many, Not One ability of AI to support groundbreaking innovation and scientific discovery
AI innovation achieves corporate low-carbon development by reorienting investment toward green assets.
Mechanism analysis reported in the paper (mediation/path analysis) using the same 21,428 firm-year observations; investment reorientation toward green assets identified as a mediation path.
high positive Artificial Intelligence Innovation, Internal Structure Optim... corporate carbon emission intensity (mediated via investment reorientation towar...
AI innovation achieves corporate low-carbon development by upgrading emission-reducing production processes.
Mechanism analysis reported in the paper (mediation/path analysis) on the 21,428 firm-year sample; production-process upgrades identified as a mediation path.
high positive Artificial Intelligence Innovation, Internal Structure Optim... corporate carbon emission intensity (mediated via production process upgrades)
AI innovation achieves corporate low-carbon development by optimizing low-carbon organizational governance.
Mechanism analysis reported in the paper (mediation/path analysis) using the same sample of 21,428 firm-year observations; paper identifies organizational governance optimization as one of three mediation paths.
high positive Artificial Intelligence Innovation, Internal Structure Optim... corporate carbon emission intensity (mediated via organizational governance chan...
With further development, this approach may exceed traditional methods regarding risk accuracy and help drive innovation in the insurance industry.
Forward-looking claim by the authors extrapolating from current prototype results and potential improvements; no empirical evidence provided that it already exceeds traditional methods.
high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... risk assessment accuracy and industry innovation
ARQuest shows great potential to improve user satisfaction and streamline insurance processes.
Interpretation based on experimental findings (fewer questions, user preference) and the proposed framework; forward-looking claim rather than a fully established empirical result.
high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... user satisfaction and process streamlining
Adaptive versions were preferred by users for their more fluid and engaging experience.
User preference reported from the experiments (qualitative/user feedback or preference metric); specific measures and sample size not provided in excerpt.
high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... user preference / perceived fluidity and engagement
Adaptive versions powered by GPT models required fewer questions.
Experimental result reported in paper comparing question counts between adaptive GPT-powered questionnaires and traditional questionnaires; no numeric counts or sample sizes provided in the excerpt.
high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... number of questions required (survey length / task completion effort)
Techniques such as social media image analysis, geographic data categorization, and Retrieval Augmented Generation (RAG) are used to extract meaningful user insights and guide targeted follow-up questions.
Described methods/techniques used within the ARQuest system implementation in the paper.
high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... ability to extract user insights and guide follow-up questions
The ARQuest framework introduces a new approach to underwriting by using Large Language Models (LLMs) and alternative data sources to create personalized and adaptive questionnaires.
Methodological contribution described in the paper (framework design); description of components and intended function rather than a quantified outcome.
high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... personalization and adaptiveness of questionnaires
Achieving near-perfect success rates at this minimally sufficient quality level or comparable success rates at superior quality would require several additional years.
Authors' forecast/commentary on timeline beyond the 2029 projection; conditional expectation based on historical pace of improvements.
high positive Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... time-to-reach near-perfect or superior-quality success rates
If recent trends in AI capability growth persist, LLMs will be able to complete most text-related tasks with success rates of, on average, 80%-95% by 2029 at a minimally sufficient quality level.
Longer-term projection contingent on continuation of recent capability growth trends (model-based forecast stated by the authors).
high positive Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... projected average task success rate for most text-related tasks by 2029 (minimal...
AI success rates for those tasks increase to about 65% by 2025-Q3.
Short-term projection / trend extrapolation reported in the paper (from the ongoing evaluation data).
high positive Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... projected task success rate by 2025-Q3
In 2024-Q2, AI models successfully complete tasks that take humans approximately 3-4 hours with about a 50% success rate.
Empirical measurement/estimate from the ongoing evaluation (reported temporal snapshot for 2024-Q2); based on tasks mapped to human completion time and observed model success rates from the >17,000 evaluations.
high positive Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... task success rate for tasks taking humans ~3–4 hours
AI performance is high and improving rapidly across a wide range of tasks.
Empirical results from the ongoing evaluation of >3,000 tasks and >17,000 evaluations showing high and increasing success/performance metrics.
high positive Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... AI success/performance on tasks (performance level and trend)
Substantial evidence that rising tides are the primary form of AI automation.
Patterns observed in the same large-scale evaluation across tasks and human judgments indicating broad-based, continuous capability improvements across many tasks.
high positive Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... breadth and continuity of AI capability improvements across tasks ('rising tides...
Only interventions that reshape risk allocation can plausibly shift stable system-level behaviour.
Argument based on the paper's game-theoretic reasoning and stylised example (theoretical claim; no empirical testing reported in the abstract).
high positive Incentives, Equilibria, and the Limits of Healthcare AI: A G... ability of interventions to shift stable system-level behaviour
Artificial intelligence (AI) is widely promoted as a promising technological response to healthcare capacity and productivity pressures.
Author assertion in the paper's introduction/abstract, based on literature/policy discourse (no empirical sample or quantitative analysis reported in the abstract).
high positive Incentives, Equilibria, and the Limits of Healthcare AI: A G... promotion of AI as a solution to healthcare capacity and productivity pressures
We open-source the complete benchmark, including scenario specifications, ground truth templates, tool implementations, and evaluation scripts.
Paper statement committing to open-sourcing the benchmark components and artifacts.
high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... availability of open-source benchmark artifacts
We evaluated leading agent frameworks (ReAct, Cursor Agent, Claude Code) paired with frontier LLMs (Claude Sonnet 4.0, GPT-4o, Granite-3.0-8B).
Paper reports extensive evaluations using the listed agent frameworks and LLM models paired together to run the benchmark scenarios.
high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... evaluation coverage across agent frameworks and LLMs
Execution-based evaluators were implemented with task-commensurate metrics: MAE/RMSE for regression, F1-score for classification, and categorical matching for health assessments.
Paper statement describing the evaluation methodology and the specific metrics used for regression, classification and health-assessment tasks.
high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... metricized evaluation of model outputs (MAE/RMSE, F1, categorical matching)
We construct 65 specialized tools across two MCP servers to enable interactions for the benchmark.
Paper statement reporting the number of specialized tools (65) and that they are deployed across two MCP servers as part of the benchmark implementation.
high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... number of specialized tools and server deployment
The benchmark encompasses 75 expert-curated scenarios spanning 7 industrial asset classes (turbofan engines, bearings, electric motors, gearboxes, aero-engines) across 5 core task categories: Remaining Useful Life (RUL) Prediction, Fault Classification, Engine Health Analysis, Cost-Benefit Analysis, and Safety/Policy Evaluation.
Explicit statement in paper listing the number of scenarios (75), number of asset classes (7) and enumerating the 5 task categories; benchmark construction described by authors.
high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... count and coverage of benchmark scenarios, asset classes, and task categories
PHMForge is the first comprehensive benchmark specifically designed to evaluate LLM agents on Prognostics and Health Management (PHM) tasks through realistic interactions with domain-specific MCP servers.
Paper statement introducing PHMForge as a benchmark and describing its construction to evaluate LLM agents via MCP servers; benchmark implementation is presented in the manuscript.
high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... availability of a domain-specific benchmark for LLM agents
Improvements in operational resilience enhance firms' capacity for sustainable development.
Further analysis in the paper showing a positive relationship between OR improvements and indicators of firms' sustainable development capacity.
high positive Does Artificial Intelligence Improve the Operational Resilie... capacity for sustainable development
The enabling effect of AI on operational resilience is more pronounced for capital-intensive enterprises.
Heterogeneity/subsample analysis showing larger AI effects on OR for capital-intensive firms.
high positive Does Artificial Intelligence Improve the Operational Resilie... operational resilience (OR) — heterogeneous treatment effect by capital intensit...
The enabling effect of AI on operational resilience is more pronounced for technology-intensive enterprises.
Heterogeneity/subsample tests reported in the paper indicating stronger AI effects on OR for technology-intensive firms.
high positive Does Artificial Intelligence Improve the Operational Resilie... operational resilience (OR) — heterogeneous treatment effect by technology inten...
The enabling effect of AI on operational resilience is more pronounced for enterprises in the growth stage.
Heterogeneity/subsample analysis showing larger AI-induced OR gains among firms classified as in the growth stage.
high positive Does Artificial Intelligence Improve the Operational Resilie... operational resilience (OR) — heterogeneous treatment effect by firm life-cycle ...
The enabling effect of AI on operational resilience is more pronounced for enterprises located in the coastal eastern region.
Heterogeneity/subsample analysis reported in the paper showing larger AI effects for firms in the coastal eastern region compared to other regions.
high positive Does Artificial Intelligence Improve the Operational Resilie... operational resilience (OR) — heterogeneous treatment effect by region
AI promotes operational resilience by optimizing supply chain allocation performance.
Mechanism tests in the paper linking AI adoption to improved supply chain allocation/performance metrics, which are associated with higher OR.
high positive Does Artificial Intelligence Improve the Operational Resilie... supply chain allocation performance
Application of AI significantly enhances corporate operational resilience (OR).
Staggered DID estimation exploiting AIIAPZ policy as quasi-natural experiment on Chinese A-share listed manufacturing firms (2012–2023); main regression results reported as significant.
high positive Does Artificial Intelligence Improve the Operational Resilie... operational resilience (OR)
Voluntary safety commitments can sustain cooperative (higher-quality) outcomes when they are observable and credible.
Theoretical analysis of an equilibrium with voluntary, observable commitments: when commitments are binding/credible and observable, firms can coordinate to avoid preemption and achieve cooperative outcomes.
high positive Optimal Release Timing of AI Systems: A Strategic Analysis w... sustaining cooperative (higher-quality) release outcomes via voluntary safety co...
Minimum quality standards can implement the first-best outcome.
Theoretical policy analysis within the model: imposing a minimum quality threshold for release is shown to align private incentives with the social optimum, implementing the first-best.
high positive Optimal Release Timing of AI Systems: A Strategic Analysis w... achievement of the social optimum (first-best) via regulatory minimum quality st...
Employment reallocation exerted a narrowing influence on the gender wage gap, particularly in 2005–2010.
Dynamic shift-share decomposition attributing a portion of changes in the gender wage gap to employment reallocation effects, with a notable equalizing contribution in 2005–2010.
high positive Routine-Biased Technological Change and the Gender Wage Gap ... contribution of employment reallocation to change in the gender wage gap
Displaced women reallocated substantially toward non-routine interpersonal roles (occupational upgrading).
Observed occupational transition patterns in decomposition results showing female movement into non-routine interpersonal occupations; authors interpret this as occupational upgrading.
high positive Routine-Biased Technological Change and the Gender Wage Gap ... occupational reallocation toward non-routine interpersonal roles
Design implication: adaptive AI coaching systems should align support intensity with individual readiness, rather than assuming universal effectiveness.
Authors' design recommendation derived from experimental results showing heterogeneous effects by personality profile.
high positive Not My Truce: Personality Differences in AI-Mediated Workpla... appropriateness of intervention intensity (design recommendation)
The system is in production, serving 21 industry verticals with 650+ agents.
Deployment claim reported in paper (production system metrics: number of verticals and agents).
high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... production deployment scale (industry verticals served, agent count)
We propose a framework for output-side ontological validation (response validation, reasoning verification, compliance checking).
Proposed framework described in paper (conceptual/procedural proposal; not described as empirically validated in abstract).
high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... output-side ontological validation capability
We introduce ontology-constrained tool discovery via SQL-pushdown scoring.
Methodological/implementation contribution described in the paper (technical mechanism introduced).
high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... tool discovery constrained by ontology using SQL-pushdown scoring
Improvements from ontology coupling are greatest where LLM parametric knowledge is weakest—particularly in Vietnam-localized domains.
Observed pattern reported from the controlled experiment across the five industries, with stronger improvements in Vietnam-localized domains (no per-industry sample sizes reported in abstract).
high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... relative improvement magnitude by domain / localization
Ontology-coupled agents significantly outperform ungrounded agents on Role Consistency (p < .001, W = .614).
Controlled experiment with 600 runs; statistical test reported (p-value and W statistic provided in abstract).
Ontology-coupled agents significantly outperform ungrounded agents on Regulatory Compliance (p = .003, W = .318).
Controlled experiment with 600 runs; statistical test reported (p-value and W statistic provided in abstract).
Ontology-coupled agents significantly outperform ungrounded agents on Metric Accuracy (p < .001, W = .460).
Controlled experiment with 600 runs; statistical test reported (p-value and W statistic provided in abstract).
We formalize the concept of asymmetric neurosymbolic coupling, wherein symbolic ontological knowledge constrains agent inputs (context assembly, tool discovery, governance thresholds) while proposing mechanisms for extending this coupling to constrain agent outputs (response validation, reasoning verification, compliance checking).
Theoretical/formalization contribution described in the paper (conceptual and methodological development).
high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... asymmetric neurosymbolic coupling formalization and proposed mechanisms
Our approach introduces a three-layer ontological framework--Role, Domain, and Interaction ontologies--that provides formal semantic grounding for LLM-based enterprise agents.
Design contribution described in the paper (formal model specification).
high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... existence of a formal three-layer ontology for semantic grounding
We present a neurosymbolic architecture implemented within the Foundation AgenticOS (FAOS) platform that addresses these limitations through ontology-constrained neural reasoning.
System design and implementation claim: description of architecture and its implementation in the FAOS platform (technical/design evidence reported in paper).
high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... ability to constrain LLM reasoning (reduce hallucination, domain drift, improve ...
The empirical results are robust across parallel trend analysis, placebo tests, propensity score matching (PSM), and alternative measures of sustainable performance.
Reported battery of robustness checks listed in the abstract (parallel trend, placebo, PSM, alternative outcome measures).
high positive The impact of R&amp;D innovation strategy on the sustainable... robustness of estimated policy effect on sustainable development performance
The R&D deduction policy has stronger effects on larger-scale firms.
Heterogeneity analysis reported in the paper showing larger estimated effects for firms of larger scale.
high positive The impact of R&amp;D innovation strategy on the sustainable... heterogeneous treatment effect on sustainable development performance (by firm s...