The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13870 claims)

Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 196 98 892 1984
Governance & Regulation 817 394 188 121 1544
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 627 233 123 96 1088
Research Productivity 411 123 56 332 933
Output Quality 467 178 59 47 751
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 167 122 24 496
Task Allocation 207 64 71 32 379
Skill Acquisition 165 59 60 17 301
Innovation Output 203 27 43 18 292
Employment Level 105 52 107 13 279
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 150 48 26 3 227
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 63 20 12 184
Error Rate 69 92 10 2 173
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 93 21 13 19 148
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Creative Output 31 17 7 3 59
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
Industrial robot penetration is used as a proxy measure for AI adoption in Chinese provinces.
Paper explicitly states industrial robot penetration was used as the proxy for AI adoption in the empirical analysis.
high null result Nonlinear effects of ageing population and AI on China’s GDP... AI adoption (proxied by industrial robot penetration)
The study uses panel data on 31 Chinese provinces for the period 2000–2022 and employs panel threshold regression models with ageing and AI adoption as threshold variables.
Paper description: panel data from 31 provinces (2000–2022); use of panel threshold regression models; threshold variables specified as ageing and AI adoption (industrial robot penetration).
high null result Nonlinear effects of ageing population and AI on China’s GDP... methodological approach (panel threshold regression)
Specification and implementation are available at https://github.com/chelof100/acp-framework-en
Repository URL provided in the specification text; points to the stated implementation and documentation artifacts.
high null result Agent Control Protocol: Admission Control for Agent Actions availability of specification and implementation at the given URL
The specification defines more than 62 verifiable requirements and 12 prohibited behaviors.
Quantitative claims stated in the specification about requirement and prohibited-behavior counts.
high null result Agent Control Protocol: Admission Control for Agent Actions number of verifiable requirements and prohibited behaviors
The v1.13 release includes an OpenAPI 3.1.0 specification for all HTTP endpoints.
Specification/repository statement indicating an OpenAPI 3.1.0 specification is provided for HTTP endpoints.
high null result Agent Control Protocol: Admission Control for Agent Actions presence of OpenAPI 3.1.0 specification covering HTTP endpoints
The v1.13 release includes 51 signed conformance test vectors (Ed25519 + SHA-256).
Repository/specification statement listing 51 signed conformance test vectors and the signature/hash algorithms used.
high null result Agent Control Protocol: Admission Control for Agent Actions count and cryptographic scheme of conformance test vectors
The v1.13 release includes a Go reference implementation of 22 packages covering all L1-L4 capabilities.
Repository statement describing a Go reference implementation comprising 22 packages and coverage claim for L1-L4.
high null result Agent Control Protocol: Admission Control for Agent Actions number of Go packages in the reference implementation and claimed coverage of co...
The v1.13 specification comprises 36 technical documents organized into five conformance levels (L1-L5).
Explicit quantitative statement in the specification/repository describing document count and organization.
high null result Agent Control Protocol: Admission Control for Agent Actions number of technical documents and conformance-level organization
The experiment compared three prompt conditions: (A) simple prompts, (B) raw PPS JSON, and (C) natural-language-rendered PPS.
Method description of the three prompt conditions used in the controlled experiment.
The study used three specific LLMs: DeepSeek-V3, Qwen-Max, and Kimi.
Method section listing the three models evaluated in the experiment.
We ran a controlled three-condition study across 60 tasks in three domains (business, technical, and travel), three large language models (DeepSeek-V3, Qwen-Max, and Kimi), and three prompt conditions, collecting 540 AI-generated outputs evaluated by an LLM judge.
Authors report an experimental study design: 60 tasks × 3 models × 3 prompt conditions = 540 outputs, with outputs evaluated by an LLM judge (methodological description in the paper).
high null result Evaluating 5W3H Structured Prompting for Intent Alignment in... experimental_data_collection (AI outputs evaluated by LLM judge)
The paper presents a formal evolutionary taxonomy of generative AI spanning five eras (1943–present) and analyzes frontier lab dynamics, sovereign AI emergence, and post-training alignment evolution from RLHF through GRPO.
Conceptual taxonomy and historical/organizational analysis provided in the paper. No empirical sample size reported in the excerpt.
high null result The Institutional Scaling Law: Non-Monotonic Fitness, Capabi... evolutionary taxonomy and contextual analysis of generative AI eras and dynamics
The framework extends the Sustainability Index of Han et al. (2025) from hardware-level analysis to ecosystem-level analysis.
Conceptual / methodological extension claimed by the authors referencing Han et al. (2025). No empirical sample size reported in the excerpt.
high null result The Institutional Scaling Law: Non-Monotonic Fitness, Capabi... scope/level of the Sustainability Index (hardware-level → ecosystem-level)
Classical scaling laws model AI performance as monotonically improving with model size.
Statement about prior literature / modeling assumptions (classical scaling laws). No empirical sample size reported in the excerpt.
high null result The Institutional Scaling Law: Non-Monotonic Fitness, Capabi... AI performance as a function of model size
Existing financial question answering benchmarks primarily focus on company balance sheet data and rarely evaluate reasoning over how company stocks trade in the market or their interactions with fundamentals.
Literature/background claim made in the paper motivating the new benchmark; authors contrast prior benchmarks' focus on balance sheet data with the lack of market/trading-signal evaluation.
high null result FinTradeBench: A Financial Reasoning Benchmark for LLMs scope of existing financial QA benchmarks (focus on balance sheet data vs. tradi...
Retrieval provides limited benefit for trading-signal reasoning.
Experimental comparison reported in the paper showing that retrieval-augmentation had little impact on performance for trading-signal-focused questions.
high null result FinTradeBench: A Financial Reasoning Benchmark for LLMs change in performance on trading-signal-focused questions with retrieval
To ensure reliability at scale, we adopt a calibration-then-scaling framework that combines expert seed questions, multi-model response generation, intra-model self-filtering, numerical auditing, and human-LLM judge alignment.
Methodological claim in the paper describing the QA and annotation pipeline; the paper reports using these components as part of their reliability framework.
high null result FinTradeBench: A Financial Reasoning Benchmark for LLMs benchmark annotation and validation procedure
The benchmark is organized into three reasoning categories: fundamentals-focused, trading-signal-focused, and hybrid questions requiring cross-signal reasoning.
Direct description of the benchmark's taxonomy in the paper; the authors specify these three categories as the organizational structure for the 1,400 questions.
high null result FinTradeBench: A Financial Reasoning Benchmark for LLMs benchmark organization / task taxonomy
FinTradeBench contains 1,400 questions grounded in NASDAQ-100 companies over a ten-year historical window.
Statement in the paper describing the benchmark construction and scope; the paper reports the benchmark size (1,400 questions) and the dataset grounding (NASDAQ-100 over ten years).
high null result FinTradeBench: A Financial Reasoning Benchmark for LLMs benchmark size and scope (number of questions; data grounding)
The paper derives formal conditions under which the inversion (smaller, orchestrated models outperforming frontier models) holds.
Mathematical derivations and stated sufficient/necessary conditions presented in the paper.
high null result Punctuated Equilibria in Artificial Intelligence: The Instit... parameter conditions for comparative performance inversion
We develop the Institutional Fitness Manifold, a mathematical framework that evaluates AI systems along four dimensions: capability, institutional trust, affordability, and sovereign compliance.
Theoretical/model development presented in the paper (formal definition of the manifold and its four dimensions).
high null result Punctuated Equilibria in Artificial Intelligence: The Instit... institutional fitness evaluated across four dimensions
There have been five eras of AI development since 1943, and within the current Generative AI Era there are four distinct epochs, each initiated by a discontinuous event.
Descriptive/historical classification within the paper (counts of eras and epochs; named initiating events such as the transformer and the 'DeepSeek Moment').
high null result Punctuated Equilibria in Artificial Intelligence: The Instit... count and classification of historical AI eras/epochs
The study uses panel data for 30 Chinese provinces from 2013–2022 to measure urban circular economy efficiency (UCEE) with a Super-SBM model including undesirable outputs, track dynamics via the Global Malmquist–Luenberger index, and estimate spatial effects with a spatial Durbin model.
Methodological description in the abstract: explicit statement of data (30 provinces, 2013–2022) and the three methods used (Super-SBM with undesirable outputs, GML index, spatial Durbin model).
high null result How artificial intelligence and environmental regulation inf... use of Super-SBM measurement, GML dynamics, and spatial Durbin estimation (metho...
Despite fears of mass unemployment, aggregate labor-market data through 2025 show limited labor-market disruption from generative AI.
Review of aggregate employment and labor-market studies and macro-level data through 2025 cited in the brief; methods include analyses of employment statistics and macro labor indicators (no single sample size reported).
high null result AI, Productivity, and Labor Markets: A Review of the Empiric... aggregate employment / labor-market disruption
We scored rule-breaking and abuse outcomes with an independent rubric-based judge across 28,112 transcript segments from multi-agent governance simulations.
Reported methodology: multi-agent governance simulations with agents in formal governmental roles, outcomes evaluated by an independent rubric-based judge; explicit sample count of 28,112 transcript segments.
high null result I Can't Believe It's Corrupt: Evaluating Corruption in Multi... rule-breaking and abuse outcomes (as assessed by rubric-based judge)
We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines.
Empirical study design described in the paper: open competition with reported counts of teams and participants (29 teams, 80 participants); comparison between participant submissions and AI-only baselines.
high null result AgentDS Technical Report: Benchmarking the Future of Human-A... competition participation enabling comparison
AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking.
Descriptive dataset/benchmark specification in the paper stating task count and industry coverage.
high null result AgentDS Technical Report: Benchmarking the Future of Human-A... number of challenges and industry coverage
Open research challenges that define the research agenda include scaling beyond benchmarks, achieving compositionality over changes, metrics for validating specifications, handling rich logics, and designing human-AI specification interactions.
Authors' explicit enumeration of open problems and a proposed multi-disciplinary research agenda; presented as expert opinion rather than empirical finding.
high null result Intent Formalization: A Grand Challenge for Reliable Coding ... progress on research questions (research agenda advancement)
The interaction between selection and recourse generates a closed-loop dynamical system linking candidate selection and strategic recourse.
Formalization in the paper showing feedback dynamics between selection outcomes and candidate adjustments (modeling/result claim).
high null result Actionable Recourse in Competitive Environments: A Dynamic G... closed-loop dynamics between selection and recourse (system state over time)
This setting produces endogenous selection, in which both the decision rule and the selection threshold are determined by the population's current feature state.
Derived implication of the framework and model dynamics described in the paper (theoretical consequence of the model).
high null result Actionable Recourse in Competitive Environments: A Dynamic G... dependence of decision rule and threshold on population feature distribution
The success benchmark evolves endogenously as many candidates adjust simultaneously.
Analytical property of the proposed model: simultaneous adjustments by candidates change the effective benchmark (theoretical result asserted by authors).
high null result Actionable Recourse in Competitive Environments: A Dynamic G... endogenous evolution of the selection benchmark/threshold
The study proposes a framework that models recourse as a strategic interaction among candidates under a risk-based selection rule.
The paper introduces a formal/modeling framework (methodological contribution described by the authors).
high null result Actionable Recourse in Competitive Environments: A Dynamic G... structure of the formal model (strategic interactions under a risk-based rule)
Actionable recourse studies whether individuals can modify feasible features to overturn unfavorable outcomes produced by AI-assisted decision-support systems.
Definition and framing stated by the authors in the paper's introduction/background (conceptual claim).
high null result Actionable Recourse in Competitive Environments: A Dynamic G... ability of individuals to change features to reverse AI-produced outcomes (quali...
A GNN graph is constructed from reasoning embeddings and trading decisions are made using a PPO-DSR policy.
Method description: the paper reports embedding agents' reasoning, building a graph neural network (GNN) from those embeddings, and using a PPO-DSR reinforcement learning policy to trade. Specific GNN/PPO-DSR hyperparameters and architecture are not provided in the excerpt.
high null result Can Blindfolded LLMs Still Trade? An Anonymization-First Fra... use of GNN on reasoning embeddings and use of PPO-DSR policy to produce trading ...
Four LLM agents output scores along with reasoning.
Method description: the paper states that four LLM agents produce numeric scores and associated textual reasoning. The number of agents is explicitly given as four; no further architecture or model-family details included in the excerpt.
high null result Can Blindfolded LLMs Still Trade? An Anonymization-First Fra... agent outputs: numeric scores and textual reasoning
BlindTrade anonymizes tickers and company names (blindfolding agents by anonymizing all identifiers).
Methodological description in the paper: the system design explicitly replaces tickers and company names with anonymized identifiers. Implementation details and examples not provided in the excerpt.
high null result Can Blindfolded LLMs Still Trade? An Anonymization-First Fra... presence/absence of identifier anonymization (anonymization applied to input dat...
Data ethics, as a central pillar of digital ethics, emphasizes the responsible use and protection of personal information.
Conceptual/definitional statement in the paper situating data ethics within digital ethics and highlighting protection of personal information as a core concern.
Big data usage is proxied by keyword frequency in firms' annual reports.
Operationalization described in the paper: frequency/count of big-data-related keywords in annual reports used as the proxy for firms' big data application.
high null result How Big Data Enhances Firm Value Under Data Privacy Regulati... big data usage (proxy)
The empirical analysis uses a fixed-effects regression approach to measure the impact of big data application on firm value.
Methodological statement in the paper specifying fixed-effects regression as the primary econometric approach.
The study analyzes panel data covering Chinese A-share listed companies from 2007 to 2021.
Description of dataset in the paper: panel of Chinese A-share listed companies spanning the years 2007–2021 (sample period stated).
The analysis extends the dynamic taxation setup of Slavik and Yazici (2014).
Methodological claim: the model and solution approach build on and modify the framework from Slavik and Yazici (2014) (reference to prior theoretical framework rather than empirical data).
high null result Workers' Incentives and the Optimal Taxation of AI scope and structure of the theoretical model (extension of the referenced dynami...
We characterize the optimal tax policy in an economy with human manual and cognitive labor, physical capital, and artificial intelligence (AI).
Theoretical/analytical work: the paper develops and analyzes a dynamic general-equilibrium model that includes manual and cognitive human labor, physical capital, and AI. (No empirical sample; model-based characterization.)
high null result Workers' Incentives and the Optimal Taxation of AI form and properties of the optimal tax policy in the specified theoretical econo...
Self-concordance did not mediate the AI-over-questionnaire effect on goal progress.
Preplanned mediation model reported in the paper found no evidence that self-concordance mediated the AI vs questionnaire effect on goal progress; reported as non-significant in the preregistered analysis.
high null result AI-Assisted Goal Setting Improves Goal Progress Through Soci... goal progress (mediator tested: self-concordance, self-report)
Compared with the matched written-reflection questionnaire, the AI did not significantly improve overall goal progress.
Preplanned comparison within the preregistered RCT; reported non-significant difference between AI and written-reflection condition on overall goal progress at two-week follow-up (no significant p-value reported in the summary).
high null result AI-Assisted Goal Setting Improves Goal Progress Through Soci... goal progress (self-reported goal progress at two-week follow-up)
We conducted a preregistered three-arm randomized controlled trial (RCT) comparing an AI career coach ('Leon,' powered by Claude Sonnet), a matched structured written questionnaire, and a no-support control.
Preregistered RCT reported in the paper; three arms as described; total sample size N = 517; participants randomized to AI coach, written-reflection questionnaire, or no-support control; outcomes assessed at two-week follow-up.
high null result AI-Assisted Goal Setting Improves Goal Progress Through Soci... trial design / allocation and follow-up measurement of goal-related outcomes at ...
All code, data, and logs are publicly available at https://github.com/pepealonso95/TDAD.
Provision of a public GitHub repository URL in the paper.
high null result TDAD: Test-Driven Agentic Development - Reducing Code Regres... availability of code, data, and logs (public repository)
Evaluation was performed on SWE-bench Verified with two local models: Qwen3-Coder 30B on 100 instances and Qwen3.5-35B-A3B on 25 instances.
Experimental setup reported in the paper specifying benchmark (SWE-bench Verified) and model-instance counts.
high null result TDAD: Test-Driven Agentic Development - Reducing Code Regres... evaluation sample size / benchmark coverage (number of instances per model)
Controlled experiments were run with N = 250 across five content types to validate the mechanisms.
Experimental methods reported in the paper: controlled experiments with specified sample size and content-type breakdown.
high null result Governed Memory: A Production Architecture for Multi-Agent W... experimental sample size and content-type breadth (N=250, 5 content types)
The dependent variable is the Market Opportunity Index, which is a combination of indicators of innovation activity, the share of firms with new products, and the share of opportunity-oriented entrepreneurs.
Paper provides the construction/definition of the dependent variable (components listed in the excerpt).
high null result Innovative Cognitive Tools for Studying Market Opportunities... Market Opportunity Index (composite of innovation activity, share of firms with ...
The model used lags of the dependent variable to take into account inertia in the development of entrepreneurial opportunities, and the stability of the impact of cognitive tools was tested.
Paper states the model specification included lagged dependent variables and that stability tests for the impact of cognitive tools were performed (no further details on lag length or test statistics in the excerpt).
high null result Innovative Cognitive Tools for Studying Market Opportunities... Market Opportunity Index (lagged dynamics)