The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (14055 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
The study coded 500 adaptation events.
Explicit statement: 'and 500 coded adaptation events.'
high null result Research on the adaptation path of corporate strategy based ... adaptation_event_count
The qualitative dataset included 48 executive and technical informants.
Explicit statement: 'including 48 executive and technical informants'.
The study uses a comparative multi-case dataset of 12 multinational firms (4 tri-jurisdictional, 4 Atlantic, 4 China-primary).
Explicit dataset description in the paper: 'A comparative multi-case dataset of 12 multinational firms (4 tri-jurisdictional, 4 Atlantic, 4 China-primary) was analyzed.'
We employ the Gemini API to generate reward function logic and weights across three refinement rounds rather than performing per-step inference.
Methodological description in abstract: use of Gemini API to generate reward logic and weights; three rounds of refinement.
high null result OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... LLM-mediated reward generation method
We deploy a Soft Actor-Critic (SAC) agent in CityLearn v2 for experiments.
Methodological description in abstract: SAC agent used within CityLearn v2 environment.
high null result OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... DRL agent deployment (method)
We use four empirically grounded occupant profiles from the ASHRAE Global Thermal Comfort Database II (13,440 votes).
Dataset citation and sample size reported in abstract: ASHRAE Global Thermal Comfort Database II with 13,440 votes; four occupant profiles derived from it.
high null result OccuReward: LLM-Guided Occupant-Centric Reward Shaping for D... occupant profile representation (number of votes in dataset)
We ran 24 matches pairing 23 expert humans with 16 AI agents, capturing 387 delegation and 1440 adoption decisions.
Author-reported experimental setup and counts from the study (24 matches; 23 human experts; 16 AI agents; counts of delegation and adoption decisions).
high null result AI, Take the Wheel: What Drives Delegation and Trust in Huma... delegation and adoption decisions
The model introduces the 'Sciencepreneur' as the central human archetype in agentic R&D.
Conceptual/design claim within the HARMONY artifact presented in the paper.
high null result From Replacement to Orchestration: A Socio-Technical Archite... role definition and skill profile for human operators in agentic R&D
Evidence also includes pattern matching with documented agentic R&D deployments.
Methodological statement in the paper claiming pattern matching with documented agentic R&D deployments (unspecified number/source).
high null result From Replacement to Orchestration: A Socio-Technical Archite... similarity between proposed design and existing agentic R&D deployments
The study includes a foresight scenario analysis projecting four plausible 2040 R&D futures to stress-test design choices.
Methodological statement in the paper describing a four-scenario foresight analysis.
high null result From Replacement to Orchestration: A Socio-Technical Archite... plausibility and robustness of design across future scenarios
Empirical evidence for the design is triangulated from four semi-structured expert interviews with senior R&D leaders across industrial, healthcare, and academic settings.
Methodological statement in the paper specifying four semi-structured expert interviews.
high null result From Replacement to Orchestration: A Socio-Technical Archite... qualitative expert insights informing design
This discrimination was invisible to standard action-log audits: bias operated entirely through who received each action, not what actions were chosen, with action-type distributions showing no increase in negative actions across conditions.
Comparison of action-recipient patterns vs action-type distributions across the experimental conditions in the simulation; reported observation that action-type distributions did not show increased negative actions and that audits of action logs (action types) failed to reveal the bias.
high null result Human-like in-group bias in instruction-tuned language model... action-type distribution (no increase in negative actions) and detectability of ...
Because all observations come from a single practitioner, the inferential statistics are exploratory and hypothesis-generating rather than confirmatory; portability across the full portfolio awaits multi-practitioner replication.
Explicit limitation stated in the paper about the single-practitioner design and its implications for inference.
high null result Augment Engineering: A Methodology for Multi-Tool AI Orchest... generalizability/replicability of the findings
The framework is illustrated with an accounts-payable simulation and a companion spreadsheet.
Empirical illustration: the paper includes (or accompanies) an accounts-payable simulation and a spreadsheet to demonstrate the model and estimation approach.
high null result Modeling Agentic Technical Debt and Stochastic Tax: A Standa... practical illustration of framework through accounts-payable simulation and spre...
The note starts from a compact dashboard expression, expands it into a fuller structural model, defines all variables and parameters, and shows how each cost category can be estimated from operational data.
Methodological description in the paper: construction of dashboard, expansion to structural model, full variable/parameter definitions, and stated procedures for estimating cost categories from operational data; accompanied by worked examples.
high null result Modeling Agentic Technical Debt and Stochastic Tax: A Standa... methodological capacity to estimate agentic costs from operational data
Agentic Technical Debt is a stock of accumulated design and governance liability.
Definition provided in the paper as part of the conceptual framework that labels Agentic Technical Debt as a stock (accumulated) liability tied to design and governance.
high null result Modeling Agentic Technical Debt and Stochastic Tax: A Standa... conceptual characterization of Agentic Technical Debt (stock of design and gover...
This note develops a formal and managerially usable model that distinguishes Agentic Technical Debt from Stochastic Tax.
Author states development of a formal, managerially usable model and explicit distinction between the two constructs; supported by model construction in the paper (structural model and dashboard).
high null result Modeling Agentic Technical Debt and Stochastic Tax: A Standa... ability to distinguish Agentic Technical Debt from Stochastic Tax via a formal m...
Agentic AI systems combine probabilistic reasoning with delegated action through tools, context, memory, orchestration, and external workflow integration.
Conceptual/definitional statement in the paper; presented as the working characterization of 'Agentic AI systems' within the model specification.
high null result Modeling Agentic Technical Debt and Stochastic Tax: A Standa... structural composition of agentic AI systems (probabilistic reasoning + delegate...
We evaluate SIA across three contrasting domains: Chinese legal charge classification (LawBench), low-level GPU kernel optimisation, and single-cell RNA denoising.
Experimental design described in the paper (three benchmark domains used for evaluation).
high null result SIA: Self Improving AI with Harness & Weight Updates domains/tasks used for evaluation
We propose SIA, a self-improving loop in which a language-model agent (the Feedback-Agent) updates both the harness and the weights of a task-specific agent.
Methodological contribution described in the paper (proposal of a new combined approach; implementation details presumably in methods).
high null result SIA: Self Improving AI with Harness & Weight Updates capability of an agent to update both harness and weights
These two silos (harness-update and test-time training) operate in isolation.
Authors' characterization of the research landscape presented in the paper (conceptual claim/literature observation).
high null result SIA: Self Improving AI with Harness & Weight Updates degree of integration between research lines
Two largely disjoint research lines attack this bottleneck: the harness-update school (a meta-agent rewrites the scaffold while model weights are fixed) and the test-time training school (hand-written RL pipelines update model weights while the harness is fixed).
Paper's literature/positioning claim classifying prior work into two categories (conceptual/literature summary).
high null result SIA: Self Improving AI with Harness & Weight Updates classification of prior research approaches
(i, continued) The counterfactual toll has explicit non-uniqueness (i.e., non-uniqueness of the toll is demonstrated).
Mathematical argument in the paper identifying conditions or constructions that lead to multiple valid tolls (formal counterexample or theorem on non-uniqueness).
high null result Foundations of a Time-Consistent Counterfactual Actuarial Ru... non-uniqueness property of the counterfactual toll
Seventeen operators completed continuous search tasks under high cognitive workload while their spatial covariance was mapped using a 2D Adaptive Riemannian Oracle.
Methodological description in the paper: 17 human operators performed continuous search tasks in a Virtual Reality drone task; spatial covariance recorded using a 2D Adaptive Riemannian Oracle.
high null result The Timing Dependencies of Trust: Speed, Accuracy, and cBCI ... experiment sample and measurement modality (operators; spatial covariance mappin...
The paper proposes a policy framework consisting of six groups of solutions for Vietnam to both promote AI development and control risks in the digital age.
Declared in abstract: the paper presents a six-group policy framework for Vietnam; the framework itself is the paper's output (proposal), not empirically tested in the paper.
high null result Regulatory Policy for the Agent Economy in the Digital Age: ... existence of a six-group policy framework aimed at promoting AI development and ...
This study employs document synthesis and comparative analysis of international policies.
Methodological statement in the paper abstract describing the research approach; no sample size specified beyond document sources.
high null result Regulatory Policy for the Agent Economy in the Digital Age: ... research method used (document synthesis and comparative policy analysis)
The rise of artificial intelligence (AI) is shaping a new Agent Economy (AE), in which autonomous AI agents represent humans in performing a wide range of complex tasks.
Statement in paper abstract/intro (conceptual definition); no empirical data or sample reported.
high null result Regulatory Policy for the Agent Economy in the Digital Age: ... existence/definition of Agent Economy (autonomous AI agents representing humans ...
Outputs are graded by a fact-anchored chain of rubrics, averaging 35.6 binary criteria per task.
Benchmark grading methodology reported by the authors, with a reported average of 35.6 binary criteria per task (presumably calculated across the benchmark tasks).
high null result JobBench: Aligning Agent Work With Human Will granularity of evaluation (number of binary rubric criteria per task)
JobBench covers 130 agentic tasks across 35 occupations.
Dataset/benchmark composition reported by the authors (explicit counts provided in the paper).
high null result JobBench: Aligning Agent Work With Human Will scope/coverage of the benchmark (number of tasks and occupations)
The study contributes a taxonomy of AI workforce impact, a Workforce Resilience Readiness Score (WRRS), an AI Workforce Trust Index (AWTI), an Ethical Automation Boundary concept, and a pilot empirical validation design.
Declared methodological and conceptual contributions in the paper (these are presented as deliverables of the study; no validated results reported in the excerpt).
high null result From Automation Panic to Workforce Resilience: A Governance ... new measurement/conceptual tools (taxonomy, WRRS, AWTI, Ethical Automation Bound...
The International Labour Organization's 2025 update highlights the need to assess the exposure of generative AI at the task level using task data, expert input, and AI model predictions.
Reference to ILO 2025 update recommendation described in the paper (policy/technical guidance rather than primary empirical data in the excerpt).
high null result From Automation Panic to Workforce Resilience: A Governance ... recommended assessment methods for AI exposure (task-level approach)
A path analysis was used to trace structural relationships between HR quality, effectiveness perceptions, and AI readiness.
Paper reports a path analysis linking composite HR quality indices, perceived HR effectiveness, and AI readiness measures; uses same survey sample.
high null result Determinants of Artificial Intelligence Adoption in Public S... AI readiness and perceived HR effectiveness
A binary logistic regression modelling active AI adoption was estimated with McFadden R² = 0.032.
Reported logistic regression model fit (McFadden R² = 0.032) for AI adoption outcome using the survey data.
high null result Determinants of Artificial Intelligence Adoption in Public S... active AI adoption (binary)
An OLS regression was estimated explaining perceived HR effectiveness with R² = 0.446.
Reported OLS model fit statistics in the paper (R-squared = 0.446); model explains perceived HR effectiveness using survey data.
high null result Determinants of Artificial Intelligence Adoption in Public S... perceived HR effectiveness
Constructed and validated a composite index of external HR quality factors with Cronbach's α = 0.959.
Measurement validation reported in the paper; Cronbach's alpha reported for external HR factors.
high null result Determinants of Artificial Intelligence Adoption in Public S... external HR quality index reliability
Constructed and validated a composite index of internal HR quality factors with Cronbach's α = 0.924.
Measurement validation reported in the paper; Cronbach's alpha reported for internal HR factors.
high null result Determinants of Artificial Intelligence Adoption in Public S... internal HR quality index reliability
A large-scale empirical survey of 12,562 public servants was conducted in June 2025 in Kazakhstan.
Statement in paper specifying survey sample and date; sample of public servants N = 12,562, June 2025.
high null result Determinants of Artificial Intelligence Adoption in Public S... AI adoption determinants (survey data collection)
A strict May 2026 trajectory subset captured 627 model-completed events and 73.95 million recorded tokens, of which 82.9% were cache reads.
Subset analysis of telemetry for a May 2026 trajectory reported by authors; counts of model-completed events and token logs, with cache-read classification.
high null result Persistent AI Agents in Academic Research: A Single-Investig... model-completed events, total recorded tokens, proportion of tokens served from ...
Memory-derived records identified 482 output-proxy events and 889 failure, verification, correction, or protocol-proxy events.
Analysis/parsing of memory-derived records from the persistent environment yielding categorized event counts.
high null result Persistent AI Agents in Academic Research: A Single-Investig... counts of output-proxy events and counts of failure/verification/correction/prot...
Active system time was 579.7 hours (30-minute capped-gap estimate).
Computed runtime activity metric from system telemetry/logs over the study period; authors report a 30-minute capped-gap estimate to compute active system time.
high null result Persistent AI Agents in Academic Research: A Single-Investig... active system runtime (hours)
The workspace included 502 memory-related files, 17 configured agent directories, and 57 skill files.
Inventory of the implemented persistent agent workspace reported by authors as part of the case study (counts extracted from workspace metadata/filesystem).
high null result Persistent AI Agents in Academic Research: A Single-Investig... counts of workspace memory files, agent directories, and skill files
Recoverable main-agent telemetry contained 75,671 de-duplicated records across 96 active days, with 8,059 user-role and 23,710 assistant-role messages.
Structured self-observed implementation case study (unit: a single persistent human-agent environment) conducted Jan 31–May 25, 2026; authors report recoverable telemetry logs totaling these counts.
high null result Persistent AI Agents in Academic Research: A Single-Investig... number of telemetry records and role-specific messages
We compare and benchmark strategy profiles adopted by open and proprietary state-of-the-art language models deployed in AgentSociety against best response.
Empirical benchmarking experiments comparing multiple language models' strategy profiles to best-response strategies (experimental evaluation / benchmarking).
high null result AgentSociety: Incentivizing Agentic Social Intelligence strategy profiles of open and proprietary language models versus best-response
Historically, the most visible high-end bugonomics was offense-priced because production-grade zero-days and exploit chains were expensive specialist outputs for governments, brokers, and offensive vendors.
Historical observation corroborated by reference to public exploit-market price anchors (market price data referenced; no specific figures included in the abstract).
high null result Demystifying the Mythos or Disrupting Bugonomics? From Zero-... price/scarcity of production-grade zero-days and exploit chains in exploit marke...
Identification limits prevent a strict causal claim; the paper outlines an agenda for cleaner tests.
Authors' explicit caveat in the abstract noting limits to identification and stating they outline future cleaner tests.
high null result Coding Beyond Your Training: Claude Code and the Technologic... causal identification credibility / limitations
The analysis exploits the staggered rollout of Claude Code across GitHub between May 2025 and January 2026, using a panel of 5,838 developers observed monthly over 28 months, with treatment defined by a developer's first Claude-co-authored commit and not-yet-treated developers as controls, and estimates obtained via the doubly robust Callaway and Sant'Anna (2021) estimator.
Methods and data description as stated in the abstract: staggered rollout timing, sample size (5,838), observation window (28 months), treatment definition (first Claude-co-authored commit), estimator (Callaway & Sant'Anna 2021).
high null result Coding Beyond Your Training: Claude Code and the Technologic... study design / identification strategy
Results are robust to two stricter activity filters.
Robustness checks reported in the paper applying two stricter activity filters to the sample; claim refers to consistency of estimated effects under these alternate sample definitions.
high null result Coding Beyond Your Training: Claude Code and the Technologic... sensitivity/robustness of estimated treatment effects to stricter activity filte...
The actual water footprint of a specific load varies dynamically with generation dispatch and network conditions.
Conceptual claim presented in the paper motivating the need for dynamic attribution (discussion/analysis rather than a reported empirical sample).
high null result From Accounting to Coordination: A Virtual Water-Aware Elect... water footprint variability of specific electricity loads as a function of dispa...
Water withdrawals associated with electricity consumption occur at generation sites and are virtually allocated to demand based on network power flows.
Conceptual statement about how water withdrawals are attributed to loads via network power flow accounting (methodological description in paper).
high null result From Accounting to Coordination: A Virtual Water-Aware Elect... virtual allocation of generation-site water withdrawals to electricity demand
The analysis is structured across past, present, and future phases using an integrative socio-technical political economy framework and validated secondary sources (OECD, ILO, UNDP, WTO, WEF) alongside official Indian statistics and sector evidence.
Methodological claim stated in abstract describing the approach and data sources used in the paper (OECD, ILO, UNDP, WTO, WEF, MoSPI/NSO, PLFS, HCES, Reuters, Nasscom).
high null result ARTIFICIAL INTELLIGENCE, INEQUALITIES OF KNOWLEDGE AND RESOU... methodological approach and data sources