The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (4560 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Productivity Remove filter
The paper's primary approach is conceptual/theoretical development and agenda-setting; it does not report large-scale empirical or experimental data.
Explicit methods statement in the paper: synthesis, illustrative examples, framework development; absence of reported empirical sample or experiments.
high null result AI as a universal collaboration layer: Eliminating language ... presence/absence of empirical/experimental data in the paper
The study's empirical base consists of 40 semi-structured interviews with cross-industry project practitioners in the UK, analyzed using thematic qualitative methods.
Stated data and methods in the paper: sample size (40), interview method, cross-industry sampling, and thematic analysis.
high null result AI in project teams: how trust calibration reconfigures team... study sample and methodology (empirical basis)
Limitation: Implementation heterogeneity — the costs and feasibility of the recommended HR changes vary by context and may affect generalisability.
Explicit limitation acknowledged in the paper; drawn from theoretical reasoning about contextual heterogeneity and practitioner variability.
high null result Symbiarchic leadership: leading integrated human and AI cybe... implementation costs; feasibility; effect on generalisability
Limitation: The framework is conceptual and requires empirical validation across sectors, firm sizes and AI‑intensity levels.
Explicit limitation acknowledged by the authors; based on the paper's method (theoretical synthesis, no original data).
high null result Symbiarchic leadership: leading integrated human and AI cybe... generalizability and empirical validity across contexts
The paper generates empirically testable propositions (e.g., how leader practices affect AI adoption speed, task reallocation, productivity, error rates, employee well‑being and turnover) and suggests natural‑experiment settings for evaluation.
Stated methodological output of the conceptual synthesis; the paper lists candidate empirical tests and research opportunities but contains no original empirical tests.
high null result Symbiarchic leadership: leading integrated human and AI cybe... AI adoption speed; task reallocation; productivity; error rates; employee well‑b...
Typical methods used are deep learning for property prediction and representation learning, protein-structure modelling tools, generative models for de novo design, NLP for knowledge extraction, and ADME/Tox in silico models integrated with traditional computational chemistry.
Methodological survey in the paper listing these approaches and examples of their application.
high null result Has AI Reshaped Drug Discovery, or Is There Still a Long Way... methods deployed in AI-driven drug discovery workflows
Commonly used data types in AI-driven drug discovery include biochemical/binding assay data, protein structural data, HTS results, ADME/Tox and PK datasets, omics/phenotypic readouts, and scientific literature/patents.
Cataloguing of data sources used across studies and company pipelines described in the paper.
high null result Has AI Reshaped Drug Discovery, or Is There Still a Long Way... types of datasets employed in model training and discovery workflows
AI became widely adopted in pharmaceutical discovery during the 2010s, driven by greater compute, larger datasets, and advances in deep learning.
Historical overview and trend analysis in the paper referencing increased compute availability, growth in public and proprietary datasets, and the rise of deep-learning publications and tools over the 2010s.
high null result Has AI Reshaped Drug Discovery, or Is There Still a Long Way... timeline and adoption rate of AI methods in pharmaceutical discovery
The available evidence consists mainly of promising empirical studies and case studies, but there are few long-run, generalized ROI or productivity estimates; results are heterogeneous across therapeutic areas.
Self-described limitation of the narrative review: heterogeneity of study designs and outcomes precluded pooled quantitative estimates and long-run ROI assessment.
high null result From Algorithm to Medicine: AI in the Discovery and Developm... evidence quality (availability of long-run ROI/productivity estimates) and heter...
AI applications span the full drug development pipeline, including target discovery, in silico screening and de novo design, preclinical safety models, clinical trial design and patient selection/monitoring, and post-marketing surveillance.
Comprehensive literature synthesis across preclinical, clinical, and post-marketing sources in the narrative review summarizing documented uses across these stages.
high null result From Algorithm to Medicine: AI in the Discovery and Developm... coverage of pipeline stages by AI applications (scope)
Current evidence is illustrative rather than systematic; there is a lack of long-run, quantitative measures of AI’s effect on late-stage clinical outcomes in the literature reviewed.
Explicit methodological statement in the paper: study is an expert/opinion synthesis and narrative review with no new causal econometric estimates or primary experimental data.
high null result Learning from the successes and failures of early artificial... existence/availability of long-run quantitative measures linking AI adoption to ...
Suggested metrics for researchers and investors to monitor include R&D cycle time, cost per IND/NDA, proportion of projects using AI, success rates at development stages, market concentration measures, and investment flows into AI-enabled biotech vs incumbents.
Recommendations made in the Implications section as metrics to watch; no empirical tracking or baseline measures provided.
high null result AI as the Catalyst for a New Paradigm in Biomedical Research recommended monitoring metrics for AI impact in pharma/biotech
Limitations of the analysis include limited empirical validation of archetypes or impacts and potential selection bias toward prominent firms and technologies.
Explicit limitations stated in the Data & Methods section of the paper.
high null result AI as the Catalyst for a New Paradigm in Biomedical Research generalizability and representativeness of the paper's claims
The paper is an editorial/conceptual synthesis rather than a primary empirical study: it uses qualitative analysis and illustrative examples, and reports no new quantitative estimates.
Explicit statement in the Data & Methods section of the paper describing document type, approach, evidence base, and limitations.
high null result AI as the Catalyst for a New Paradigm in Biomedical Research empirical evidence provision (absence of new quantitative data)
Ethical oversight and governance (addressing bias, consent, downstream risks) are critical constraints that must be addressed for AI to generate sustained benefits.
Normative synthesis referencing common ethical concerns; no empirical evaluation of oversight mechanisms in the paper.
high null result AI as the Catalyst for a New Paradigm in Biomedical Research ethical acceptability and downstream risk mitigation
Transparency and auditability for model behavior, provenance, and decisions are essential for trustworthy deployment and regulatory acceptance.
Policy and governance synthesis drawing on regulatory dynamics; no empirical study of regulatory outcomes included.
high null result AI as the Catalyst for a New Paradigm in Biomedical Research trustworthiness/regulatory acceptability of models
Rigorous model validation and reproducibility across datasets and settings are necessary constraints for successful AI deployment.
Normative claim in the editorial based on reproducibility concerns in ML and biomedical research; no reported validation trials within the paper.
high null result AI as the Catalyst for a New Paradigm in Biomedical Research reliability and generalizability of AI models across settings
Recommendation (research): Future research should link AI adoption to objective performance metrics (profitability, default rates, processing times) and use longitudinal or quasi-experimental designs to identify causal effects.
Authors' suggested research directions noted in the summary, motivated by limitations of cross-sectional, self-reported data.
high null result From Data to Decisions: Harnessing Artificial Intelligence f... research design and outcome measurement (recommendation)
The summary omits important reporting details: p-values, standard errors, model control variables, and exact variable operationalizations are not provided.
Explicit reporting gap noted in the paper summary (absence of p-values, SEs, controls, and operationalization details).
high null result From Data to Decisions: Harnessing Artificial Intelligence f... statistical reporting completeness
Because the data are cross-sectional and self-reported, the design limits causal inference about AI adoption causing the observed outcomes.
Study design (cross-sectional survey, self-reported measures) and explicit limitation noted in the paper summary.
high null result From Data to Decisions: Harnessing Artificial Intelligence f... ability to infer causality
Key measures are self-reported Likert scales for AI adoption/usage and the dependent outcomes (financial decision-making efficiency, operational efficiency, financial resilience, and AI-based analytics effectiveness).
Measurement description in Methods: independent and dependent variables reported as self-reported Likert measures collected in the cross-sectional survey.
high null result From Data to Decisions: Harnessing Artificial Intelligence f... measurement type (self-reported Likert scales)
The study is a cross-sectional quantitative survey of 312 professionals in banks, fintechs, and financial service firms.
Study design and sample description reported in Data & Methods; sample size explicitly given as N = 312 and composition described as professionals across financial institutions, fintech organizations, and financial service companies.
The SKILL.md used in the with-skill condition encodes workflow logic, API patterns, and business rules as portable domain guidance for agents.
Paper description of the with-skill intervention specifying the content and intended role of SKILL.md.
high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... presence and content type of injected domain guidance (workflow logic, API patte...
We evaluated open-weight models under two conditions: baseline (generic agent with tool access but no domain guidance) and with-skill (agent augmented with a portable SKILL.md document encoding workflow logic, API patterns, and business rules).
Experimental design in paper describing the two agent conditions; SKILL.md described as the injected domain guidance artifact.
high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... experimental condition (baseline vs with-skill)
Each scenario is grounded in live mock API servers with seeded production-representative data, MCP tool interfaces, and deterministic evaluation rubrics combining response content checks, tool-call verification, and database state assertions.
Methods/benchmark design described in paper specifying environment: live mock APIs, seeded data, MCP tool interfaces, and deterministic evaluation combining content checks, tool-call verification, and DB assertions.
high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... evaluation environment fidelity and evaluation criteria (content checks, tool-ca...
SKILLS comprises 37 telecom operations scenarios spanning 8 TM Forum Open API domains (TMF620, TMF621, TMF622, TMF628, TMF629, TMF637, TMF639, TMF724).
Framework specification in the paper; explicit statement of scenario count (37) and list of 8 TMF Open API domains.
high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... coverage: number of scenarios (37) and number of API domains (8) included
We introduce SKILLS (Structured Knowledge Injection for LLM-driven Service Lifecycle operations), a benchmark framework for telecom operations.
Paper describes the design and release of the SKILLS benchmark framework as the contribution; methods section outlines framework components and usage.
high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... existence and definition of the SKILLS benchmark framework
The paper identifies three core mechanisms underlying calibrated trust and complementarity: (1) calibrated trust balancing reliance and oversight, (2) complementarity–trust interaction for optimal performance, and (3) dynamic feedback loops producing reinforcing learning cycles.
Explicit identification of mechanisms claimed in the paper's synthesis; this is a descriptive claim about the paper's content rather than an empirical finding—no sample or empirical test reported in the abstract.
high null result Optimising Human– AI Decision Performance: A Trust and Cap... n/a (identification of theoretical mechanisms)
AI-adopting firms do not increase capital expenditures following adoption.
Firm-level capex analysis showing no significant change in capital expenditures for adopters versus nonadopters post-adoption in the paper's empirical framework.
high null result AI and Productivity: The Role of Innovation capital expenditures (capex)
SWE-Skills-Bench is available at https://github.com/GeniusHTX/SWE-Skills-Bench.
Repository URL provided in the paper for the benchmark's code/data.
high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... public availability (URL) of the benchmark
SWE-Skills-Bench provides a testbed for evaluating the design, selection, and deployment of skills in software engineering agents.
Benchmark design pairs skills, repositories, and deterministic verification tests; intended use stated by authors as a testbed for evaluation of skills.
high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... availability of a benchmarking testbed for evaluating agent skills
39 of 49 skills yield zero pass-rate improvement.
Empirical evaluation over 49 skills and ~565 task instances reporting that 39 skills produced no improvement in test pass rate when injected.
high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... change in task acceptance-test pass rate (zero improvement)
The authors introduce a deterministic verification framework that maps each task's acceptance criteria to execution-based tests, enabling controlled paired evaluation with and without the skill.
Method: creation of a deterministic verification framework that converts acceptance criteria into executable tests; used to perform paired evaluations (with skill vs. without skill).
high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... ability to deterministically verify task acceptance criteria via execution-based...
SWE-Skills-Bench pairs 49 public SWE skills with authentic GitHub repositories pinned at fixed commits and requirement documents with explicit acceptance criteria, yielding approximately 565 task instances across six SWE subdomains.
Benchmark construction: 49 public skills, repositories pinned to fixed commits, requirement documents with acceptance criteria, producing ~565 task instances spanning six SWE subdomains (as reported by the paper).
high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... number of skill-repo-task instances (~565) and coverage across six subdomains
The article introduces a novel Bayesian Item Response Theory framework that quantifies human–AI synergy by separately estimating individual ability, collaborative ability, and AI model capability while controlling for task difficulty.
Methodological contribution described in the paper: development and application of a Bayesian Item Response Theory model that includes separate parameters for individual ability, collaborative ability, AI model capability, and task difficulty (method section of the paper).
high null result Quantifying and Optimizing Human-AI Synergy: Evidence-Based ... estimated parameters for individual ability, collaborative ability, AI model cap...
The Planner is trained via Supervised Fine-Tuning (SFT) to internalize diagnostic capabilities and then aligned with business outcomes (conversion rate) via Reinforcement Learning (RL).
Method description in the paper specifying SFT initialization followed by RL alignment targeting conversion rate (UCVR) as reward signal.
high null result Probe-then-Plan: Environment-Aware Planning for Industrial E... Planner diagnostic behavior and policy alignment with conversion rate (model tra...
EASP's Offline Data Synthesis stage: a Teacher Agent synthesizes diverse, execution-validated plans by diagnosing the probed environment.
Method description in the paper detailing the Teacher Agent's role in synthesizing execution-validated plans during offline data synthesis.
high null result Probe-then-Plan: Environment-Aware Planning for Industrial E... synthesized execution-validated search plans (data generation outcome)
The Probe-then-Plan mechanism uses a lightweight Retrieval Probe to expose the retrieval snapshot, enabling the Planner to diagnose execution gaps and generate grounded search plans.
Methodological description in the paper: design and implementation of Retrieval Probe and Planner; validated through synthesized data and downstream evaluations (offline and online).
high null result Probe-then-Plan: Environment-Aware Planning for Industrial E... retrieval snapshot exposure and Planner diagnostic output (implementation/functi...
A quantitative methodology was employed, utilizing a structured questionnaire administered to 400 small business owners.
Explicit methodological statement in the paper: structured questionnaire survey with sample size N=400 small business owners.
high null result The role of artificial intelligence in enhancing financial l... method / sample (use of structured questionnaire; sample size = 400)
Foi realizada etnografia organizacional orientada ao SCF, com roteiro e triangulação de evidências.
Método qualitativo divulgado no resumo: etnografia organizacional com roteiro e triangulação; o resumo não fornece número de organizações, duração ou amostragem.
high null result A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... evidências qualitativas da existência e manifestação da fricção psicoantropológi...
Foi construído e validado um instrumento psicométrico (escala SCF-30) e calculado um índice 0–100, com modelagem por Equações Estruturais (SEM) e testes de confiabilidade/validade.
Descrição metodológica explícita no resumo: construção e validação da escala SCF-30, uso de SEM e testes de confiabilidade e validade. O resumo não detalha estatísticas, amostra ou resultados numéricos.
high null result A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... pontuação SCF (índice 0–100) e propriedades psicométricas da escala SCF-30 (conf...
O SCF é operacionalizado por três vetores centrais: Percepção de Complexidade (PC), Aversão ao Risco Institucional (AR) e Inércia Cultural (IC).
Estrutura conceitual e operacional apresentada no artigo; especificação explícita dos três vetores como componentes do construto SCF.
high null result A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... componentes constituintes do construto SCF (PC, AR, IC)
This research conducts a critical analysis of the ethical implications of artificial intelligence in terms of job displacement during the fifth industrial revolution.
Author-declared methodology: a literature-based critical analysis drawing on novel studies and the existing body of literature; no further methodological details (e.g., inclusion criteria, databases searched) provided in the excerpt.
high null result A Study on Work-Life Balance of Women Employees in the IT Se... ethical implications of AI-related job displacement
This study uses panel data on agricultural firms listed on the Shanghai and Shenzhen A-share markets from 2007 to 2023 and applies a multidimensional fixed-effects model to estimate the impact of AI on firms’ total factor productivity (TFP).
Methodological statement in the paper: dataset = panel of listed agricultural firms (Shanghai and Shenzhen A-share markets), time period 2007–2023; empirical approach = multidimensional fixed-effects model.
high null result Artificial intelligence and the sustainable development of a... study design / estimation of AI impact on total factor productivity (TFP)
The paper explores risk frameworks, ethical constraints, and policy imperatives related to AI.
Descriptive claim about the paper's analytic content (thematic/policy analysis); no empirical details or measurement approach are given in the abstract.
high null result AI for Good: Societal Impact and Public Policy analysis of risk frameworks, ethical constraints, and policy imperatives
This paper investigates societal applications of AI across domains such as healthcare, education, accessibility, environmental management, emergency response, and civic administration.
Descriptive statement of the paper's scope and methods (literature review / cross-domain analysis implied); the abstract lists the domains but does not specify empirical procedures or sample sizes.
high null result AI for Good: Societal Impact and Public Policy coverage of AI applications in specified domains (healthcare, education, accessi...
Chatbot suggestions were artificially varied in aggregate accuracy across treatment conditions from low (53%) to high (100%).
Paper describes experimental manipulation of chatbot suggestion accuracy with aggregate accuracies ranging from 53% to 100%; manipulation method (how suggestions were generated or sampled) described in methods (not fully detailed in excerpt).
high null result LLMs in social services: How does chatbot accuracy affect hu... manipulated chatbot suggestion accuracy (range 53%–100%)
Caseworkers in the control condition (no chatbot suggestions) had a mean accuracy of 49%.
Reported experimental outcome: mean accuracy for control group = 49%; based on the randomized experiment using the 770-question benchmark.
high null result LLMs in social services: How does chatbot accuracy affect hu... caseworker accuracy (mean percent correct in control condition = 49%)
We conducted a randomized experiment with caseworkers recruited from nonprofit outreach organizations in Los Angeles.
Paper describes a randomized experiment recruiting caseworkers from nonprofit outreach organizations in Los Angeles; sample size and recruitment details not given in the excerpt.
high null result LLMs in social services: How does chatbot accuracy affect hu... execution of a randomized experiment with nonprofit caseworker participants (loc...
The benchmark questions have corresponding expert-verified answers.
Paper states benchmark questions have expert-verified answers; verification method and number/credentials of experts not specified in the excerpt.
high null result LLMs in social services: How does chatbot accuracy affect hu... availability of expert-verified reference answers for benchmark questions