The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13870 claims)

Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 196 98 892 1984
Governance & Regulation 817 394 188 121 1544
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 627 233 123 96 1088
Research Productivity 411 123 56 332 933
Output Quality 467 178 59 47 751
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 167 122 24 496
Task Allocation 207 64 71 32 379
Skill Acquisition 165 59 60 17 301
Innovation Output 203 27 43 18 292
Employment Level 105 52 107 13 279
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 150 48 26 3 227
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 63 20 12 184
Error Rate 69 92 10 2 173
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 93 21 13 19 148
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Creative Output 31 17 7 3 59
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
AI-generated postconditions catch real-world bugs missed by prior methods.
Surveyed early research asserted by the paper indicating empirical instances where AI-generated postconditions found bugs that other methods missed; no numeric details provided in the excerpt.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... bugs detected / error detection rate
Interactive test-driven formalization improves program correctness.
Paper surveys early research that reportedly demonstrates this effect (described as 'interactive test-driven formalization that improves program correctness'); the excerpt does not include specific study details or sample sizes.
The central bottleneck is validating specifications: since there is no oracle for specification correctness other than the user, we need semi-automated metrics that can assess specification quality with or without code, through lightweight user interaction and proxy artifacts such as tests.
Analytical claim and research agenda item in the paper; motivates need for new metrics and interaction designs. No empirical validation or sample size reported in the excerpt.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... ability to validate specification correctness / specification quality
Intent formalization offers a tradeoff spectrum suitable to the reliability needs of different contexts: from lightweight tests that disambiguate likely misinterpretations, through full functional specifications for formal verification, to domain-specific languages from which correct code is synthesized automatically.
Conceptual framework proposed in the paper describing a spectrum of specification formality; presented as an argument rather than an empirical finding, with no sample sizes provided in the excerpt.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... suitability of specification approaches for reliability requirements
Intent formalization — translating informal user intent into checkable formal specifications — is the key challenge that will determine whether AI makes software more reliable or merely more abundant.
Normative argument presented by the authors as the central thesis of the paper; no empirical study or sample size cited in the provided text.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... software reliability (correctness relative to user intent)
Agentic AI systems can now generate code with remarkable fluency.
Authoritative assertion in the paper based on contemporary observations of large code-generating models; no empirical sample size or benchmark numbers reported in the text provided.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... code generation fluency / ability to produce code
This paper employs large language models to conduct semantic analysis on the text of annual reports from Chinese A-share listed companies from 2006 to 2024.
Methodological statement in the abstract describing use of LLM-based semantic analysis on annual report texts spanning 2006–2024.
high positive The Spillover Effects of Peer AI Rinsing on Corporate Green ... methodological approach (use of LLMs for semantic analysis)
The paper recommends that the government design targeted support tools to 'enhance market returns and alleviate financing constraints', adopt a differentiated regulatory strategy, and establish a disclosure mechanism combining 'professional identification and reputational sanctions' to curb peer AI washing behaviour.
Policy prescriptions derived from empirical findings and simulation results reported in the paper; presented as recommendations in the abstract.
high positive The Spillover Effects of Peer AI Rinsing on Corporate Green ... effectiveness of policy interventions in curbing AI washing and supporting green...
Simulation results indicate that a combination of policy tools can effectively improve market equilibrium (mitigating the negative effects of AI washing).
Simulation exercises reported in the paper (model specification not provided in abstract) testing policy tool combinations and their effects on market equilibrium.
high positive The Spillover Effects of Peer AI Rinsing on Corporate Green ... market equilibrium (improvement in market outcomes related to AI washing and gre...
The study implies policy actions to promote high-quality development based on the finding that innovation and the digital economy now play larger roles in growth.
Authors' discussion/conclusion drawing policy implications from empirical findings (declining capital elasticity, rising TFP and digital economy contribution).
high positive Analysis of China's Economic Growth Drivers: An Empirical St... policy implication for promoting high-quality development
Overall, China's growth model shifted over 2010–2022 from being investment-driven to being innovation-driven.
Synthesis of results: declining capital elasticity, rising TFP contribution, substantial share of digital economy in TFP, and regional patterns reported by the study.
high positive Analysis of China's Economic Growth Drivers: An Empirical St... structural shift in the growth model (investment-driven → innovation-driven)
The study's method is novel because it uses both migrant worker monitoring data and digital-economy proxy indicators, giving a more accurate picture of how labor quality and technological progress affect each other.
Author-reported methodological description: extended Cobb–Douglas approach combined with quality-adjusted labor measures derived from migrant worker monitoring data and proxy indicators for the digital economy.
high positive Analysis of China's Economic Growth Drivers: An Empirical St... measurement accuracy of labor quality and technology interaction (methodological...
Regional analysis shows coastal regions have been driven by innovation, with an estimated (innovation) coefficient of approximately 0.31.
Regional decomposition/estimation reported in the paper's analysis of coastal vs inland regions using the extended production function and digital/labour-quality measures.
high positive Analysis of China's Economic Growth Drivers: An Empirical St... innovation-related elasticity/coefficient in coastal regions (≈0.31)
The digital economy accounted for 40% of the observed increase in TFP (i.e., made up 40% of the TFP contribution).
Attribution within the growth decomposition from the extended production function, where digital economy indicators are included and their contribution to TFP is estimated.
high positive Analysis of China's Economic Growth Drivers: An Empirical St... share of TFP contribution attributable to the digital economy
The contribution rate of total factor productivity (TFP) rose from 18% to 26% between the earlier and later periods.
Decomposition of growth using the extended Cobb–Douglas production function for China over 2010–2022, reporting TFP contribution rates for the two periods.
high positive Analysis of China's Economic Growth Drivers: An Empirical St... TFP contribution rate to economic growth
The initially selected candidates determine both the benchmark of success and the direction of improvement.
Theoretical result asserted by the authors based on analysis of the closed-loop system (paper's analytical finding).
high positive Actionable Recourse in Competitive Environments: A Dynamic G... influence of initially selected group on subsequent benchmark and improvement di...
Rejected individuals exert effort to improve actionable features along directions implied by the decision rule.
Model assumption and dynamic behavior encoded in the proposed framework (assumption/behavioral mechanism in the model).
high positive Actionable Recourse in Competitive Environments: A Dynamic G... effort or change in actionable features by rejected candidates
The paper proposes design principles for effective, accountable, and adaptive sandboxes to contribute to debates on experimentalism in AI governance.
Stated contribution of the paper (descriptive claim about content; abstract does not list the principles or empirical testing).
high positive Experimentalism beyond ex ante regulation: A law and economi... existence and articulation of design principles for RSs
Regulatory sandboxes (RSs) have emerged as a potential solution to AI regulatory challenges.
Descriptive observation and normative framing within the paper; contextual reference to the EU AI Act's treatment of sandboxes (no empirical sample reported in the abstract).
high positive Experimentalism beyond ex ante regulation: A law and economi... adoption/emergence of RSs as a governance mechanism for AI
External inputs that bypass internal filtering shorten recognition delays (i.e., speed up detection of regime shifts).
Model extensions/analysis showing that when some inputs are allowed to bypass internal exclusion mechanisms, the dynamics of anchor updating detect regime changes faster; result comes from theoretical model manipulations, not empirical testing.
high positive Cohesion as Concentration: Exclusion-Driven Fragility in Fin... time to recognize regime shift (recognition delay)
In a preregistered mediation model, perceived accountability mediated the AI-over-questionnaire effect on goal progress (indirect effect = 0.15, 95% CI [0.04, 0.31]).
Mediation analysis preregistered and reported in the paper using data from the RCT (N = 517); indirect effect estimate 0.15 with 95% confidence interval [0.04, 0.31].
high positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... goal progress (mediated by perceived social accountability)
The AI chatbot produced significantly higher goal progress than the no-support control at two-week follow-up.
Between-groups comparison in the preregistered RCT (N = 517); reported effect size d = 0.33 and p = .016 for AI vs control on goal progress measured at two-week follow-up.
high positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... goal progress (self-reported goal progress at two-week follow-up)
The authors provide a demo video, a hosted website, and an installable package demonstrating JobMatchAI.
Paper explicitly states availability of a demo video, a hosted website, and an installable package. No links, access dates, or artifact verification details are provided in the excerpt.
high positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... availability of demonstration artifacts (video, hosted website, installable pack...
The authors provide a hybrid retrieval stack combining BM25, a skill knowledge graph, and semantic components to evaluate skill generalization.
Paper describes a hybrid retrieval stack composed of BM25, a knowledge graph, and semantic retrieval components intended for evaluation of skill generalization. No evaluation metrics or comparisons are included in the excerpt.
high positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... retrieval stack composition (BM25 + knowledge graph + semantic components) inten...
The authors release JobSearch-XS benchmark.
Paper explicitly states release of the JobSearch-XS benchmark. No dataset size, annotation protocol, or access URL provided in the excerpt.
high positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... availability of JobSearch-XS benchmark (artifact release)
JobMatchAI integrates Transformer embeddings, skill knowledge graphs, and interpretable reranking.
Statement in paper describing system architecture and components (implementation claim). No quantitative implementation details or component-level ablation results provided in the supplied excerpt.
high positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... system design / component integration (presence of Transformer embeddings, knowl...
TDAD (Test-Driven Agentic Development) combines abstract-syntax-tree (AST) based code-test graph construction with weighted impact analysis to surface the tests most likely affected by a proposed change.
Description of the tool/methodology and its implementation (TDAD is presented as an open-source tool in the paper).
high positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... identification/surfacing of tests likely impacted by code changes (test prioriti...
PIER is an offline reinforcement learning framework that learns fuel‑efficient, safety‑aware routing policies from physics‑calibrated environments grounded in historical vessel tracking data and ocean reanalysis products, requiring no online simulator.
Methodological description of PIER in the paper: offline RL trained on environments constructed from AIS and reanalysis data; no online simulator used for policy learning (implementation details provided).
high positive Physics-informed offline reinforcement learning eliminates c... requirement for online simulator (method characteristic)
Bootstrap 95% confidence interval for PIER mean CO2 savings relative to great-circle routing is [2.9%, 15.7%].
Bootstrap analysis applied to the 2023 AIS validation results (840 episodes per method) producing the stated 95% CI for mean percent savings.
high positive Physics-informed offline reinforcement learning eliminates c... 95% bootstrap confidence interval for mean percent CO2 savings
PIER reduces per‑voyage fuel consumption variance by a factor of 3.5 (p < 0.001).
Statistical comparison of per-voyage fuel variance between PIER and baseline routing on 840 episodes per method from 2023 AIS data; significance reported with p < 0.001.
high positive Physics-informed offline reinforcement learning eliminates c... variance of per-voyage fuel consumption
On the LoCoMo benchmark, the architecture achieves 74.8% overall accuracy.
Benchmark evaluation reported in the paper using the LoCoMo benchmark with a reported overall accuracy of 74.8%.
high positive Governed Memory: A Production Architecture for Multi-Agent W... overall accuracy on the LoCoMo benchmark (percentage)
Adversarial governance compliance was 100%.
Adversarial compliance testing reported in the paper (linked to the adversarial query experiments); reported compliance = 100%.
high positive Governed Memory: A Production Architecture for Multi-Agent W... governance compliance under adversarial queries (percentage)
There was zero cross-entity leakage across 500 adversarial queries.
Adversarial testing reported in the paper: 500 adversarial queries used to test cross-entity leakage; result = zero leakage.
high positive Governed Memory: A Production Architecture for Multi-Agent W... cross-entity information leakage (count/occurrence across 500 queries)
Progressive context delivery yielded a 50% token reduction.
Reported experimental result in the controlled experiments indicating token usage reduction from progressive delivery = 50%.
high positive Governed Memory: A Production Architecture for Multi-Agent W... token usage reduction (percentage)
Governance routing precision was 92% in the experiments.
Reported experimental metric from the controlled experiments (N=250, five content types) showing governance routing precision = 92%.
high positive Governed Memory: A Production Architecture for Multi-Agent W... governance routing precision (percentage)
The system achieved 99.6% fact recall (with complementary dual-modality coverage) in the controlled experiments.
Reported experimental result from the controlled experiments (N=250, five content types) as stated in the paper.
high positive Governed Memory: A Production Architecture for Multi-Agent W... fact recall (percentage recall of facts)
Immediate practical steps include improved documentation, stakeholder audits, and multi‑metric evaluation; medium‑term steps include standards for participatory evaluation and tooling for transparency and monitoring; long‑term steps include institutional governance, interoperable safety APIs, and public‑interest evaluation infrastructure.
Prescriptive roadmap in the paper based on conceptual analysis and prior literature; these are recommended policy/program milestones rather than empirically validated interventions.
high positive LLM Alignment should go beyond Harmlessness–Helpfulness and ... implementation status of the recommended immediate, medium‑term, and long‑term a...
Transparency (detailed documentation of data, objectives, evaluation processes, and deployment constraints; audit and contest mechanisms) is a necessary mechanism for accountable alignment.
Normative and practical argumentation supported by prior work on model cards, documentation standards, and auditing; no new audits are presented in the paper.
high positive LLM Alignment should go beyond Harmlessness–Helpfulness and ... availability and granularity of documentation and auditability of model developm...
Pluralistic evaluation—using multiple, diverse evaluation criteria and stakeholder‑informed metrics rather than single aggregated alignment scores—will better capture the values and harms at stake.
Argumentative rationale and literature synthesis advocating multi‑metric evaluation approaches; examples from prior evaluation critiques are referenced rather than new empirical comparison.
high positive LLM Alignment should go beyond Harmlessness–Helpfulness and ... evaluation coverage of diverse values, harms, and stakeholder perspectives
The Flourishing–Justice–Autonomy (FJA) framework should guide alignment efforts, emphasizing (1) Flourishing (human well‑being and meaningful opportunities), (2) Justice (distributional fairness and protection of vulnerable groups), and (3) Autonomy (informed choice and user control).
Prescriptive proposal grounded in conceptual analysis and synthesis of ethical and technical literature; the paper defines and motivates the three principles as its core normative contribution.
high positive LLM Alignment should go beyond Harmlessness–Helpfulness and ... alignment criteria operationalized as Flourishing, Justice, and Autonomy metrics...
The positive spillover effects of CAFTA on third‑country agricultural imports are concentrated in medium and large firms.
Heterogeneity analysis using firm‑size subgroup DID estimates derived from the China Industrial Enterprise Database (2000–2014) showing stronger effects for medium and large enterprises.
high positive How regional trade policy uncertainty affects agricultural i... firm‑level import increases from third countries, by firm size (medium/large vs ...
CAFTA induced spillovers that significantly increased China's agricultural imports from non‑ASEAN (third) countries.
Difference‑in‑differences (DID) estimation exploiting CAFTA as an exogenous shock; import outcomes drawn from China Customs Database 2000–2014; robustness checks reported (mediator tests and subgroup analyses).
high positive How regional trade policy uncertainty affects agricultural i... China's agricultural imports from non‑ASEAN countries (import volumes/values)
The report issues seven policy recommendations grouped into three goals: (1) improve understanding of the emerging threat, (2) strengthen defenses, and (3) ensure responsible development and deployment.
Policy synthesis based on threat analysis and governance review (report-authored recommendations; descriptive).
high positive Highly Autonomous Cyber-Capable Agents: Anticipating Capabil... adoption and implementation of the seven recommended policy actions
Total effect of trust on brand loyalty is approximately 0.800 (total β ≈ 0.800 = direct β 0.410 + indirect β ≈ 0.390), all reported as statistically significant (p < .001 for direct effects; p = .001 for indirect).
Path coefficients reported from SEM (n = 450) and arithmetic combination of direct and indirect standardized effects as reported in the paper.
high positive Trust in AI-Driven Marketing and its Impact on Brand Loyalty... Brand Loyalty (total effect of Trust)
Adoption intention for AI marketing strongly predicts brand loyalty (Adoption Intention → Brand Loyalty: standardized β = 0.717, p < .001).
Cross-sectional survey (n = 450 Gen Z); SEM (SPSS AMOS); reported standardized path coefficient β = 0.717 with p < .001.
Trust in AI-driven marketing directly increases Generation Z consumers' brand loyalty (Trust → Brand Loyalty: standardized β = 0.410, p < .001).
Cross-sectional survey (n = 450 Gen Z); SEM (SPSS AMOS); reported standardized path coefficient β = 0.410 with p < .001.
Trust in AI-driven marketing has a strong positive effect on Generation Z consumers' intention to adopt AI marketing (Trust → Adoption Intention: standardized β = 0.718, p < .001).
Cross-sectional survey (n = 450 Generation Z respondents); analysis via Structural Equation Modeling (SPSS AMOS); reported standardized path coefficient β = 0.718 with p < .001.
The study's strengths include multimethod triangulation, a very large behavioral dataset (150 million interactions), and controlled simulation experiments informed by empirical observation.
Methods reported: mixed‑methods sequential design with (1) 6‑month lab ethnography (n = 23), (2) computational analysis of 150 million customer interactions, and (3) empirically grounded agent‑based simulation experiments.
high positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... study validity/robustness (methodological strength)
The Algorithmic Canvas is an operational medium where segmentation, targeting, and positioning parameters co‑evolve through iterative human–AI collaboration.
Design and implementation described in the study; observation of Canvas‑mediated interactions during a 6‑month lab ethnography inside a Fortune 500 company (n = 23).
high positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... co‑evolution of STP parameters (qualitative and operational behavior observed vi...
Autopoietic STP + Algorithmic Canvas approach is 44% more resilient to market shocks than traditional, process‑based STP (p < 0.01).
Agent‑based simulations and comparative analyses informed by empirical calibration; supported by large‑scale behavioral data (150 million customer interactions) and simulation experiments. Statistical test reported with p < 0.01. Exact number of simulation runs and full test details not specified in the summary.
high positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... resilience to market shocks (comparative resilience between autopoietic vs. trad...