The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (2954 claims)

Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 369 105 58 432 972
Governance & Regulation 365 171 113 54 713
Research Productivity 229 95 33 294 655
Organizational Efficiency 354 82 58 34 531
Technology Adoption Rate 277 115 63 27 486
Firm Productivity 273 33 68 10 389
AI Safety & Ethics 112 177 43 24 358
Output Quality 228 61 23 25 337
Market Structure 105 118 81 14 323
Decision Quality 154 68 33 17 275
Employment Level 68 32 74 8 184
Fiscal & Macroeconomic 74 52 32 21 183
Skill Acquisition 85 31 38 9 163
Firm Revenue 96 30 22 148
Innovation Output 100 11 20 11 143
Consumer Welfare 66 29 35 7 137
Regulatory Compliance 51 61 13 3 128
Inequality Measures 24 66 31 4 125
Task Allocation 64 6 28 6 104
Error Rate 42 47 6 95
Training Effectiveness 55 12 10 16 93
Worker Satisfaction 42 32 11 6 91
Task Completion Time 71 5 3 1 80
Wages & Compensation 38 13 19 4 74
Team Performance 41 8 15 7 72
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 17 15 9 5 46
Job Displacement 5 28 12 45
Social Protection 18 8 6 1 33
Developer Productivity 25 1 2 1 29
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 7 4 9 20
Clear
Human Ai Collab Remove filter
Limitation: Implementation heterogeneity — the costs and feasibility of the recommended HR changes vary by context and may affect generalisability.
Explicit limitation acknowledged in the paper; drawn from theoretical reasoning about contextual heterogeneity and practitioner variability.
high null result Symbiarchic leadership: leading integrated human and AI cybe... implementation costs; feasibility; effect on generalisability
Limitation: The framework is conceptual and requires empirical validation across sectors, firm sizes and AI‑intensity levels.
Explicit limitation acknowledged by the authors; based on the paper's method (theoretical synthesis, no original data).
high null result Symbiarchic leadership: leading integrated human and AI cybe... generalizability and empirical validity across contexts
The paper generates empirically testable propositions (e.g., how leader practices affect AI adoption speed, task reallocation, productivity, error rates, employee well‑being and turnover) and suggests natural‑experiment settings for evaluation.
Stated methodological output of the conceptual synthesis; the paper lists candidate empirical tests and research opportunities but contains no original empirical tests.
high null result Symbiarchic leadership: leading integrated human and AI cybe... AI adoption speed; task reallocation; productivity; error rates; employee well‑b...
The paper is primarily discursive and invitational: it opens a dialogue and proposes a research agenda rather than providing definitive empirical answers.
Stated methodological stance and limits: conceptual/philosophical analysis, interdisciplinary literature synthesis, qualitative/illustrative examples, and explicit note of no systematic empirical evaluation.
high null result At the table with Wittgenstein: How language shapes taste an... presence/absence of new empirical datasets or systematic experimental validation...
The collection includes a mix of methodological papers, empirical applications demonstrating ecological insight, and translational work focused on policy or conservation practice.
Study-types categorization provided in the paper (descriptive tally/characterization of the kinds of contributions in the collection).
high null result Towards ‘digital ecology’: Advances in integrating artificia... types of studies present in the collection
Methods in the collection span from automated image and signal processing for routine tasks to integrated modelling that couples ecological theory with data‑driven methods.
Methods-scope summary in the paper describing the range of AI/ML approaches used across the collection (descriptive across studies).
high null result Towards ‘digital ecology’: Advances in integrating artificia... range of methodological approaches used
The collection uses large ecological observational datasets such as camera‑trap imagery, sensor streams, biodiversity surveys, and other high‑volume ecological monitoring data.
Data & methods section listing the data types represented across the reviewed papers (descriptive inventory of dataset types used in the collection).
high null result Towards ‘digital ecology’: Advances in integrating artificia... types of data used in ecological AI research
The SKILL.md used in the with-skill condition encodes workflow logic, API patterns, and business rules as portable domain guidance for agents.
Paper description of the with-skill intervention specifying the content and intended role of SKILL.md.
high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... presence and content type of injected domain guidance (workflow logic, API patte...
We evaluated open-weight models under two conditions: baseline (generic agent with tool access but no domain guidance) and with-skill (agent augmented with a portable SKILL.md document encoding workflow logic, API patterns, and business rules).
Experimental design in paper describing the two agent conditions; SKILL.md described as the injected domain guidance artifact.
high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... experimental condition (baseline vs with-skill)
Each scenario is grounded in live mock API servers with seeded production-representative data, MCP tool interfaces, and deterministic evaluation rubrics combining response content checks, tool-call verification, and database state assertions.
Methods/benchmark design described in paper specifying environment: live mock APIs, seeded data, MCP tool interfaces, and deterministic evaluation combining content checks, tool-call verification, and DB assertions.
high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... evaluation environment fidelity and evaluation criteria (content checks, tool-ca...
SKILLS comprises 37 telecom operations scenarios spanning 8 TM Forum Open API domains (TMF620, TMF621, TMF622, TMF628, TMF629, TMF637, TMF639, TMF724).
Framework specification in the paper; explicit statement of scenario count (37) and list of 8 TMF Open API domains.
high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... coverage: number of scenarios (37) and number of API domains (8) included
We introduce SKILLS (Structured Knowledge Injection for LLM-driven Service Lifecycle operations), a benchmark framework for telecom operations.
Paper describes the design and release of the SKILLS benchmark framework as the contribution; methods section outlines framework components and usage.
high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... existence and definition of the SKILLS benchmark framework
The paper identifies three core mechanisms underlying calibrated trust and complementarity: (1) calibrated trust balancing reliance and oversight, (2) complementarity–trust interaction for optimal performance, and (3) dynamic feedback loops producing reinforcing learning cycles.
Explicit identification of mechanisms claimed in the paper's synthesis; this is a descriptive claim about the paper's content rather than an empirical finding—no sample or empirical test reported in the abstract.
high null result Optimising Human– AI Decision Performance: A Trust and Cap... n/a (identification of theoretical mechanisms)
AI-adopting firms do not increase capital expenditures following adoption.
Firm-level capex analysis showing no significant change in capital expenditures for adopters versus nonadopters post-adoption in the paper's empirical framework.
high null result AI and Productivity: The Role of Innovation capital expenditures (capex)
It remains unclear how developers' general programming and security-specific experience, and the type of AI tool used (free vs. paid), affect the security of the resulting software — motivating this study.
Paper's stated research gap/motivation: the authors identify uncertainty in the literature regarding interactions between developer experience, AI tool tier (free vs. paid), and resulting code security.
high null result The Impact of AI-Assisted Development on Software Security: ... the combined effect of developer experience and AI tool type on code security (i...
Participants were assigned a security-related programming task using either no AI tools, the free version, or the paid version of Gemini.
Experimental design described in the paper: random/conditional assignment of participants into three groups (no AI, free Gemini, paid Gemini) performing the same security-related programming task.
high null result The Impact of AI-Assisted Development on Software Security: ... experimental condition (tool used) as it relates to subsequent code security out...
We conducted a quantitative programming study with software developers (n = 159) exploring the impact of Google's AI tool Gemini on code security.
Explicit methodological statement in the paper: a quantitative study with 159 participating software developers assigned to experimental conditions to evaluate Gemini's impact on security-related programming tasks.
high null result The Impact of AI-Assisted Development on Software Security: ... impact of Gemini on code security (security of code produced in the study)
The authors surveyed workers and developers on a representative sample of 171 tasks and used language models (LMs) to scale ratings to 10,131 computer-assisted tasks across all U.S. occupations.
Study methodology reported in the paper: surveys of 'workers and developers' on 171 tasks, plus LM-based scaling to 10,131 tasks (coverage claims across U.S. occupations).
high null result Are We Automating the Joy Out of Work? Designing AI to Augme... coverage and scaling of task-level ratings (number of tasks surveyed and number ...
SWE-Skills-Bench is available at https://github.com/GeniusHTX/SWE-Skills-Bench.
Repository URL provided in the paper for the benchmark's code/data.
high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... public availability (URL) of the benchmark
SWE-Skills-Bench provides a testbed for evaluating the design, selection, and deployment of skills in software engineering agents.
Benchmark design pairs skills, repositories, and deterministic verification tests; intended use stated by authors as a testbed for evaluation of skills.
high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... availability of a benchmarking testbed for evaluating agent skills
39 of 49 skills yield zero pass-rate improvement.
Empirical evaluation over 49 skills and ~565 task instances reporting that 39 skills produced no improvement in test pass rate when injected.
high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... change in task acceptance-test pass rate (zero improvement)
The authors introduce a deterministic verification framework that maps each task's acceptance criteria to execution-based tests, enabling controlled paired evaluation with and without the skill.
Method: creation of a deterministic verification framework that converts acceptance criteria into executable tests; used to perform paired evaluations (with skill vs. without skill).
high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... ability to deterministically verify task acceptance criteria via execution-based...
SWE-Skills-Bench pairs 49 public SWE skills with authentic GitHub repositories pinned at fixed commits and requirement documents with explicit acceptance criteria, yielding approximately 565 task instances across six SWE subdomains.
Benchmark construction: 49 public skills, repositories pinned to fixed commits, requirement documents with acceptance criteria, producing ~565 task instances spanning six SWE subdomains (as reported by the paper).
high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... number of skill-repo-task instances (~565) and coverage across six subdomains
The article introduces a novel Bayesian Item Response Theory framework that quantifies human–AI synergy by separately estimating individual ability, collaborative ability, and AI model capability while controlling for task difficulty.
Methodological contribution described in the paper: development and application of a Bayesian Item Response Theory model that includes separate parameters for individual ability, collaborative ability, AI model capability, and task difficulty (method section of the paper).
high null result Quantifying and Optimizing Human-AI Synergy: Evidence-Based ... estimated parameters for individual ability, collaborative ability, AI model cap...
Descriptive statistics, reliability tests, regression analysis, and structural equation modelling (SEM) were employed to analyse the relationships between AI adoption and entrepreneurial outcomes.
Methods section reporting use of descriptive statistics, reliability tests, regression analysis, and SEM to evaluate relationships between AI adoption and measured outcomes.
high null result Entrepreneurship in the Era of Artificial Intelligence: Rede... not applicable (methodological detail)
The study used a quantitative research design and collected data from 350 entrepreneurs and managers of small and medium-sized enterprises (SMEs) who had adopted AI in their business operations.
Methods section of the paper specifying a quantitative design and a sample size of 350 AI-adopting SME entrepreneurs/managers.
high null result Entrepreneurship in the Era of Artificial Intelligence: Rede... not applicable (methodological detail)
Foi realizada etnografia organizacional orientada ao SCF, com roteiro e triangulação de evidências.
Método qualitativo divulgado no resumo: etnografia organizacional com roteiro e triangulação; o resumo não fornece número de organizações, duração ou amostragem.
high null result A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... evidências qualitativas da existência e manifestação da fricção psicoantropológi...
Foi construído e validado um instrumento psicométrico (escala SCF-30) e calculado um índice 0–100, com modelagem por Equações Estruturais (SEM) e testes de confiabilidade/validade.
Descrição metodológica explícita no resumo: construção e validação da escala SCF-30, uso de SEM e testes de confiabilidade e validade. O resumo não detalha estatísticas, amostra ou resultados numéricos.
high null result A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... pontuação SCF (índice 0–100) e propriedades psicométricas da escala SCF-30 (conf...
O SCF é operacionalizado por três vetores centrais: Percepção de Complexidade (PC), Aversão ao Risco Institucional (AR) e Inércia Cultural (IC).
Estrutura conceitual e operacional apresentada no artigo; especificação explícita dos três vetores como componentes do construto SCF.
high null result A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... componentes constituintes do construto SCF (PC, AR, IC)
Distinguishing between base models and fine-tuned systems is important for researchers using LLMs to study cultural patterns, because fine-tuning and alignment can change the behaviors relevant to behavioral research.
Analytical distinction and methodological guidance in the paper; claim grounded in conceptual reasoning about model development workflows rather than a specific experimental demonstration in the excerpt.
high null result The Third Ambition: Artificial Intelligence and the Science ... impact of model provenance (base vs fine-tuned) on suitability for behavioral/cu...
Contemporary artificial intelligence research has been organized around two dominant ambitions: productivity (treating AI systems as tools for accelerating work and economic output) and alignment (ensuring increasingly capable systems behave safely and in accordance with human values).
Literature synthesis and conceptual framing within the paper (review of prevailing research agendas and priorities in AI literature). No original empirical sample or experiment reported for this claim in the provided text.
high null result The Third Ambition: Artificial Intelligence and the Science ... categorization of dominant research ambitions in contemporary AI (productivity v...
The study contributes to the literature by integrating evidence across higher education, vocational training, and lifelong learning to emphasize the need for balanced policy approaches to skill formation.
Stated contribution in the paper: cross-pathway synthesis of existing empirical evidence and secondary data (methods described as comparative synthesis; no primary empirical contribution reported in the summary).
high null result Balancing Higher Education, Vocational Training, and Lifelon... scholarly contribution / integrative synthesis
The study uses secondary data and comparative evidence from prior empirical studies to analyze relationships between higher education, vocational education, and lifelong learning.
Stated methodology in the paper: analysis of secondary data and synthesis of prior empirical/comparative studies (no primary data collection; no sample sizes reported).
high null result Balancing Higher Education, Vocational Training, and Lifelon... methodological approach / data sources
This study analyzed survey data from 466 Chinese food delivery riders using structural equation modeling and bootstrapping procedures, modeling work pressure as a mediator and perceived autonomy as a moderator.
Statement in abstract describing sample size (466 Chinese food delivery riders) and analytic approach (SEM and bootstrapping) and modeled variables (work pressure mediator, perceived autonomy moderator).
high null result Not all algorithmic controls are equal: the double-edged imp... methodology / analysis approach
Drawing on leadership theory, emotional intelligence research and AI ethics informs the proposed framework.
Methodological/design statement in the paper describing its intellectual grounding; indicates literature-based synthesis rather than primary data collection.
high null result Deconstructing success: why being human still matters sources informing the framework (theoretical influences)
Chatbot suggestions were artificially varied in aggregate accuracy across treatment conditions from low (53%) to high (100%).
Paper describes experimental manipulation of chatbot suggestion accuracy with aggregate accuracies ranging from 53% to 100%; manipulation method (how suggestions were generated or sampled) described in methods (not fully detailed in excerpt).
high null result LLMs in social services: How does chatbot accuracy affect hu... manipulated chatbot suggestion accuracy (range 53%–100%)
Caseworkers in the control condition (no chatbot suggestions) had a mean accuracy of 49%.
Reported experimental outcome: mean accuracy for control group = 49%; based on the randomized experiment using the 770-question benchmark.
high null result LLMs in social services: How does chatbot accuracy affect hu... caseworker accuracy (mean percent correct in control condition = 49%)
We conducted a randomized experiment with caseworkers recruited from nonprofit outreach organizations in Los Angeles.
Paper describes a randomized experiment recruiting caseworkers from nonprofit outreach organizations in Los Angeles; sample size and recruitment details not given in the excerpt.
high null result LLMs in social services: How does chatbot accuracy affect hu... execution of a randomized experiment with nonprofit caseworker participants (loc...
The benchmark questions have corresponding expert-verified answers.
Paper states benchmark questions have expert-verified answers; verification method and number/credentials of experts not specified in the excerpt.
high null result LLMs in social services: How does chatbot accuracy affect hu... availability of expert-verified reference answers for benchmark questions
We created a 770-question multiple-choice benchmark dataset of difficult, but realistic questions that a caseworker might receive.
Paper reports creation of a benchmark dataset containing 770 multiple-choice questions described as difficult and realistic; questions and dataset construction described in methods (no sample-of-questions or external validation details provided in the excerpt).
high null result LLMs in social services: How does chatbot accuracy affect hu... benchmark dataset size and content (770 multiple-choice questions)
Extensive experiments were conducted using both synthetic and real hospital datasets to evaluate the framework.
Statement in the paper indicating experiments on synthetic and real datasets; exact sizes, sources, and composition of these datasets are not provided in the excerpt.
high null result Enhancing hospital workforce planning, scheduling, and perfo... breadth of experimental evaluation (use of synthetic and real datasets)
The machine-learning based analytical approach used in the study captures complex, nonlinear relationships among emotional, psychological and economic variables.
Methodological claim: authors used machine learning (including ensembles) to model nonlinear and complex relationships. The excerpt does not provide algorithmic details, tuning, validation strategy, or sample size.
high null result Emotional Intelligence as Human Capital: A Behavioral Econom... relationships among emotional, psychological, and economic variables (nonlinear ...
Work environment and digital/AI intensity were incorporated as contextual moderators in the analysis to reflect contemporary labor market conditions.
Methodological description in the excerpt states these variables were included as moderators; no details on measurement, operationalization, or sample size are provided.
high null result Emotional Intelligence as Human Capital: A Behavioral Econom... moderation by work environment and digital/AI intensity (contextual moderation)
Coordination is treated as a structural property of the coupled dynamics (agents + incentives + persistent environment) rather than as the solution to a centralized global optimization objective or purely agent-centric learning problem.
Conceptual framing supported by the formal dynamical model and theorems showing properties of the closed-loop dynamics that do not rely on an underlying global objective.
high null result How Intelligence Emerges: A Minimal Theory of Dynamic Adapti... conceptual characterization of 'coordination' as a structural dynamical property
The persistent environment component of the model stores accumulated coordination signals, and a distributed incentive field transmits those signals locally to adaptive agents, which update their states in response.
Model construction and definitions in the paper describing (i) an environmental state variable with persistent dynamics that accumulates signals, (ii) a spatially/distributed incentive field mapping environmental memory to local agent inputs, and (iii) adaptive update rules for agents.
high null result How Intelligence Emerges: A Minimal Theory of Dynamic Adapti... model components: environmental memory, incentive field, and agent update mappin...
The paper formalizes agents, incentives, and the environment as a recursively closed feedback architecture (i.e., a coupled dynamical system in which agents adapt to incentive signals that themselves depend on a persistent environmental memory produced by agent actions).
Mathematical model and definitions presented in the paper (formal system specification of agent states, incentive field, and persistent environment; no empirical data).
high null result How Intelligence Emerges: A Minimal Theory of Dynamic Adapti... existence and specification of a recursively closed feedback architecture (model...
The study used a mixed-methods design incorporating surveys from 150 LEP immigrants, interviews with 50 employers, and interviews with 20 translation service providers in various linguistically diverse U.S. cities, with quantitative analysis performed in SPSS Version 28 and qualitative thematic coding in NVivo 14.
Reported study design and sample: survey n=150 LEP immigrants; employer interviews n=50; translation provider interviews n=20; analytic software specified as SPSS v28 (quantitative) and NVivo 14 (qualitative).
high null result Translation Models Empowering Immigrant Workforce Integratio... study design / data collection (sample composition and analytic methods)
In a field experiment on the DiagnosUs medical crowdsourcing platform, the authors held the true prevalence in the unlabeled stream fixed at 20% (blasts) while varying the prevalence of positives in the gold-standard feedback stream (20% vs. 50%) and the response interface (binary labels vs. elicited probabilities).
Field experiment conducted on the DiagnosUs platform with experimental manipulations: (i) true prevalence in unlabeled stream fixed at 20% blasts, (ii) feedback-stream prevalence manipulated to 20% vs 50%, (iii) response interface manipulated between binary labels and elicited probabilities. (Sample size and number of workers not specified in the provided excerpt.)
high null result Managing Cognitive Bias in Human Labeling Operations for Rar... experimental manipulations (true prevalence, feedback prevalence, response inter...
The study is limited by the scope of available industry data and the generalisability of case study findings.
Explicit limitation reported in the paper summary stating constraints related to industry data availability and generalisability of case studies.
high null result Artificial intelligence and organisational transformation: t... generalizability / external validity
The research adopts a mixed-method approach, combining theoretical analysis with empirical insights, and uses data gathered from the 'AI-driven transformation' Scopus database.
Explicit methodological statement in the paper summary: mixed-method design and Scopus database as the data source. (No further methodological details or sample counts provided in the summary.)
high null result Artificial intelligence and organisational transformation: t... N/A (methodological description)