The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (8486 claims)

Adoption
5821 claims
Productivity
5033 claims
Governance
4561 claims
Human-AI Collaboration
3600 claims
Labor Markets
2749 claims
Innovation
2687 claims
Org Design
2648 claims
Skills & Training
2107 claims
Inequality
1429 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 440 117 68 507 1148
Governance & Regulation 458 216 125 67 883
Research Productivity 270 101 34 303 713
Organizational Efficiency 441 105 76 43 669
Technology Adoption Rate 346 130 76 45 602
Firm Productivity 322 38 72 13 450
Output Quality 272 75 27 30 404
AI Safety & Ethics 122 188 46 27 385
Market Structure 119 134 86 14 358
Decision Quality 182 79 41 20 326
Fiscal & Macroeconomic 95 58 34 22 216
Employment Level 78 37 80 9 206
Skill Acquisition 102 37 41 9 189
Innovation Output 124 12 26 13 176
Firm Revenue 99 37 24 160
Consumer Welfare 77 38 37 7 159
Task Allocation 93 17 36 8 156
Inequality Measures 29 81 33 6 149
Regulatory Compliance 54 61 13 3 131
Task Completion Time 92 8 4 3 107
Error Rate 45 53 6 104
Worker Satisfaction 48 36 12 8 104
Training Effectiveness 59 13 12 16 101
Wages & Compensation 56 16 20 5 97
Team Performance 50 13 15 8 87
Automation Exposure 28 29 12 7 79
Job Displacement 7 45 13 65
Hiring & Recruitment 40 4 7 3 54
Developer Productivity 38 4 4 3 49
Social Protection 22 12 7 2 43
Creative Output 17 8 6 1 32
Skill Obsolescence 3 25 2 30
Labor Share of Income 12 7 10 29
Worker Turnover 10 12 3 25
Data science plays a critical role in transforming complex data into actionable insights across numerous domains.
Background statement in the paper (no empirical test or dataset provided to support this claim).
high positive AgentDS Technical Report: Benchmarking the Future of Human-A... transforming complex data into actionable insights
LLM-generated peer reviews assign scores that, on average, are a full point higher than human reviews.
Analysis of scores in the conference peer review dataset comparing LLM-generated vs human reviews; the excerpt states an average increase of one full point but does not include sample size or scale range.
high positive How LLMs Distort Our Written Language assigned review scores
About 21% of scientific peer reviews at a recent top AI conference were AI-generated (LLM-generated) in the wild.
Analysis of peer reviews from a recent top AI conference reported in the paper; the excerpt reports the 21% figure but does not give total number of reviews in the excerpt.
high positive How LLMs Distort Our Written Language share/proportion of peer reviews that were AI-generated
Even when LLMs are prompted with expert feedback and asked to only make grammar edits, they still change the text in a way that significantly alters its semantic meaning.
Experiment in which LLMs were given expert feedback and explicit instructions to perform only grammar edits; comparisons show significant semantic alteration despite constrained instructions; sample size not provided.
high positive How LLMs Distort Our Written Language semantic alteration of text despite constrained grammar-only prompt
Using a dataset of human-written essays (collected in 2021 before widespread LLM release), asking an LLM to revise essays based on human-written feedback induces large changes in the resulting content and meaning.
Controlled experiments applying LLM revision to a pre-LLM essay dataset and comparing pre- and post-revision content/semantics; dataset described as collected in 2021 but sample size not stated in the excerpt.
high positive How LLMs Distort Our Written Language magnitude of content and semantic changes after LLM revision
In a human user study, extensive LLM use led to a nearly 70% increase in essays that remained neutral in answering the topic question.
Human user study reported in the paper; the excerpt gives the quantified result (nearly 70% increase) but does not report sample size here.
high positive How LLMs Distort Our Written Language proportion of essays judged as neutral in answering the topic question
LLMs consistently alter the intended meaning of human writing.
Experiments in which human-written essays were revised by LLMs (including prompts asking only for grammar edits) and comparison of pre- and post-LLM text semantics; exact sample sizes not stated in the excerpt.
high positive How LLMs Distort Our Written Language degree of semantic change / alteration of intended meaning
LLMs alter the voice and tone of human writing.
Reported results from a human user study and subsequent experiments comparing original human-written text to LLM-assisted/LLM-revised text; sample sizes not provided in the excerpt.
high positive How LLMs Distort Our Written Language change in voice and tone of writing
Large language models (LLMs) are used by over a billion people globally, most often to assist with writing.
Statement in paper (likely based on external usage statistics or surveys cited by authors); no sample size reported in the provided text.
high positive How LLMs Distort Our Written Language LLM adoption and primary use case (writing assistance)
The code and data used in the study are publicly available at the referenced repository.
Paper statement that code and data are publicly available at a repository (link provided in paper).
high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... availability of replication materials (code and data)
A sensitivity analysis over patrol radius, officer count, and citizen reporting probability reveals outcomes are most sensitive to officer deployment levels.
Reported sensitivity analysis across patrol radius, officer count, and reporting probability showing officer count as the most influential parameter in the simulation outcomes.
high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... sensitivity of bias/detection outcomes to simulation parameters (patrol radius, ...
Persistent Gini coefficients of 0.43 to 0.62 across all conditions indicate concentrated detection inequality.
Reported range of Gini coefficients from simulation experiments across conditions.
high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... Gini Coefficient (detection distribution inequality)
Experiments reveal extreme and year-variant bias in Baltimore's detected mode, with mean annual DIR up to 15,714 in 2019.
Reported experimental result from simulations on Baltimore data giving mean annual DIR up to 15,714 for 2019.
high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... Disparate Impact Ratio (DIR)
We compute four monthly bias metrics across 264 city-year-mode observations: the Disparate Impact Ratio (DIR), Demographic Parity Gap, Gini Coefficient, and a composite Bias Amplification Score.
Statement of metrics computed and the number of observations (264 city-year-mode observations) reported in the paper.
high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... monthly bias metrics (DIR, Demographic Parity Gap, Gini, Bias Amplification Scor...
The study uses 145,000+ Part 1 crime records from Baltimore (2017–2019) and 233,000+ records from Chicago (2022), augmented with US Census ACS demographic data.
Reported dataset sizes and data sources in the paper (crime records from Baltimore and Chicago; ACS demographic augmentation).
high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... data sample size / dataset composition
We present a reproducible simulation framework that couples a Generative Adversarial Network (GAN) with a Noisy OR patrol detection model to measure how racial bias propagates through the full enforcement pipeline from crime occurrence to police contact.
Description of methods in paper: coupling a GAN (CTGAN) for synthetic crime generation with a Noisy OR detection/patrol model; method-level claim rather than a numerical result.
high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... bias propagation through enforcement pipeline (simulation framework)
Empirical simulations of five game scenarios (ranging from repeated prisoner's dilemma to stylized repeated marketing promotion games) validate the theoretical predictions: AI agents naturally exhibit the proposed reasoning patterns and attain stable equilibrium behaviors intrinsically.
Simulation experiments reported in the paper across five distinct game scenarios; these simulations are presented as empirical validation of the theoretical results.
high positive Reasonably reasoning AI agents can avoid game-theoretic fail... frequency/occurrence of stable equilibrium behaviors (Nash-like play) in simulat...
Relaxing the common-knowledge payoff assumption—allowing stage payoffs to be unknown and each agent to observe only its own privately realized stochastic payoffs—still yields the same on-path Nash convergence guarantee.
Theoretical extension/proof in the paper showing convergence results hold under private, stochastic stage payoffs (no common-knowledge of payoffs).
high positive Reasonably reasoning AI agents can avoid game-theoretic fail... on-path Nash convergence under private, stochastic payoffs
We prove that 'reasonably reasoning' agents—agents capable of forming beliefs about others' strategies from previous observation and learning to best respond to these beliefs—eventually behave along almost every realized play path in a way that is weakly close to a Nash equilibrium of the continuation game.
Formal theoretical proof provided in the paper (mathematical analysis of agent belief-formation and best-response learning leading to on-path closeness to Nash equilibria).
high positive Reasonably reasoning AI agents can avoid game-theoretic fail... on-path proximity (weak closeness) to Nash equilibrium of the continuation game
Off-the-shelf reasoning AI agents can achieve Nash-like play zero-shot, without explicit post-training.
Stated claim in the paper supported by a combination of theoretical results (formal proofs about convergence properties of 'reasonably reasoning' agents) and empirical simulations across five game scenarios (including repeated prisoner's dilemma and stylized repeated marketing promotion games).
high positive Reasonably reasoning AI agents can avoid game-theoretic fail... attainment of Nash-like play / strategic equilibrium (zero-shot)
End-to-end verified pipelines can produce provably correct code from informal specifications.
The paper surveys early research demonstrating pipelines that go from informal specifications to formally verified code; the provided text does not include experimental sample sizes or benchmarks.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... provable correctness of generated code
AI-generated postconditions catch real-world bugs missed by prior methods.
Surveyed early research asserted by the paper indicating empirical instances where AI-generated postconditions found bugs that other methods missed; no numeric details provided in the excerpt.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... bugs detected / error detection rate
Interactive test-driven formalization improves program correctness.
Paper surveys early research that reportedly demonstrates this effect (described as 'interactive test-driven formalization that improves program correctness'); the excerpt does not include specific study details or sample sizes.
The central bottleneck is validating specifications: since there is no oracle for specification correctness other than the user, we need semi-automated metrics that can assess specification quality with or without code, through lightweight user interaction and proxy artifacts such as tests.
Analytical claim and research agenda item in the paper; motivates need for new metrics and interaction designs. No empirical validation or sample size reported in the excerpt.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... ability to validate specification correctness / specification quality
Intent formalization offers a tradeoff spectrum suitable to the reliability needs of different contexts: from lightweight tests that disambiguate likely misinterpretations, through full functional specifications for formal verification, to domain-specific languages from which correct code is synthesized automatically.
Conceptual framework proposed in the paper describing a spectrum of specification formality; presented as an argument rather than an empirical finding, with no sample sizes provided in the excerpt.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... suitability of specification approaches for reliability requirements
Intent formalization — translating informal user intent into checkable formal specifications — is the key challenge that will determine whether AI makes software more reliable or merely more abundant.
Normative argument presented by the authors as the central thesis of the paper; no empirical study or sample size cited in the provided text.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... software reliability (correctness relative to user intent)
Agentic AI systems can now generate code with remarkable fluency.
Authoritative assertion in the paper based on contemporary observations of large code-generating models; no empirical sample size or benchmark numbers reported in the text provided.
high positive Intent Formalization: A Grand Challenge for Reliable Coding ... code generation fluency / ability to produce code
This paper employs large language models to conduct semantic analysis on the text of annual reports from Chinese A-share listed companies from 2006 to 2024.
Methodological statement in the abstract describing use of LLM-based semantic analysis on annual report texts spanning 2006–2024.
high positive The Spillover Effects of Peer AI Rinsing on Corporate Green ... methodological approach (use of LLMs for semantic analysis)
The paper recommends that the government design targeted support tools to 'enhance market returns and alleviate financing constraints', adopt a differentiated regulatory strategy, and establish a disclosure mechanism combining 'professional identification and reputational sanctions' to curb peer AI washing behaviour.
Policy prescriptions derived from empirical findings and simulation results reported in the paper; presented as recommendations in the abstract.
high positive The Spillover Effects of Peer AI Rinsing on Corporate Green ... effectiveness of policy interventions in curbing AI washing and supporting green...
Simulation results indicate that a combination of policy tools can effectively improve market equilibrium (mitigating the negative effects of AI washing).
Simulation exercises reported in the paper (model specification not provided in abstract) testing policy tool combinations and their effects on market equilibrium.
high positive The Spillover Effects of Peer AI Rinsing on Corporate Green ... market equilibrium (improvement in market outcomes related to AI washing and gre...
The study implies policy actions to promote high-quality development based on the finding that innovation and the digital economy now play larger roles in growth.
Authors' discussion/conclusion drawing policy implications from empirical findings (declining capital elasticity, rising TFP and digital economy contribution).
high positive Analysis of China's Economic Growth Drivers: An Empirical St... policy implication for promoting high-quality development
Overall, China's growth model shifted over 2010–2022 from being investment-driven to being innovation-driven.
Synthesis of results: declining capital elasticity, rising TFP contribution, substantial share of digital economy in TFP, and regional patterns reported by the study.
high positive Analysis of China's Economic Growth Drivers: An Empirical St... structural shift in the growth model (investment-driven → innovation-driven)
The study's method is novel because it uses both migrant worker monitoring data and digital-economy proxy indicators, giving a more accurate picture of how labor quality and technological progress affect each other.
Author-reported methodological description: extended Cobb–Douglas approach combined with quality-adjusted labor measures derived from migrant worker monitoring data and proxy indicators for the digital economy.
high positive Analysis of China's Economic Growth Drivers: An Empirical St... measurement accuracy of labor quality and technology interaction (methodological...
Regional analysis shows coastal regions have been driven by innovation, with an estimated (innovation) coefficient of approximately 0.31.
Regional decomposition/estimation reported in the paper's analysis of coastal vs inland regions using the extended production function and digital/labour-quality measures.
high positive Analysis of China's Economic Growth Drivers: An Empirical St... innovation-related elasticity/coefficient in coastal regions (≈0.31)
The digital economy accounted for 40% of the observed increase in TFP (i.e., made up 40% of the TFP contribution).
Attribution within the growth decomposition from the extended production function, where digital economy indicators are included and their contribution to TFP is estimated.
high positive Analysis of China's Economic Growth Drivers: An Empirical St... share of TFP contribution attributable to the digital economy
The contribution rate of total factor productivity (TFP) rose from 18% to 26% between the earlier and later periods.
Decomposition of growth using the extended Cobb–Douglas production function for China over 2010–2022, reporting TFP contribution rates for the two periods.
high positive Analysis of China's Economic Growth Drivers: An Empirical St... TFP contribution rate to economic growth
The initially selected candidates determine both the benchmark of success and the direction of improvement.
Theoretical result asserted by the authors based on analysis of the closed-loop system (paper's analytical finding).
high positive Actionable Recourse in Competitive Environments: A Dynamic G... influence of initially selected group on subsequent benchmark and improvement di...
Rejected individuals exert effort to improve actionable features along directions implied by the decision rule.
Model assumption and dynamic behavior encoded in the proposed framework (assumption/behavioral mechanism in the model).
high positive Actionable Recourse in Competitive Environments: A Dynamic G... effort or change in actionable features by rejected candidates
The paper proposes design principles for effective, accountable, and adaptive sandboxes to contribute to debates on experimentalism in AI governance.
Stated contribution of the paper (descriptive claim about content; abstract does not list the principles or empirical testing).
high positive Experimentalism beyond ex ante regulation: A law and economi... existence and articulation of design principles for RSs
Regulatory sandboxes (RSs) have emerged as a potential solution to AI regulatory challenges.
Descriptive observation and normative framing within the paper; contextual reference to the EU AI Act's treatment of sandboxes (no empirical sample reported in the abstract).
high positive Experimentalism beyond ex ante regulation: A law and economi... adoption/emergence of RSs as a governance mechanism for AI
External inputs that bypass internal filtering shorten recognition delays (i.e., speed up detection of regime shifts).
Model extensions/analysis showing that when some inputs are allowed to bypass internal exclusion mechanisms, the dynamics of anchor updating detect regime changes faster; result comes from theoretical model manipulations, not empirical testing.
high positive Cohesion as Concentration: Exclusion-Driven Fragility in Fin... time to recognize regime shift (recognition delay)
In a preregistered mediation model, perceived accountability mediated the AI-over-questionnaire effect on goal progress (indirect effect = 0.15, 95% CI [0.04, 0.31]).
Mediation analysis preregistered and reported in the paper using data from the RCT (N = 517); indirect effect estimate 0.15 with 95% confidence interval [0.04, 0.31].
high positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... goal progress (mediated by perceived social accountability)
The AI chatbot produced significantly higher goal progress than the no-support control at two-week follow-up.
Between-groups comparison in the preregistered RCT (N = 517); reported effect size d = 0.33 and p = .016 for AI vs control on goal progress measured at two-week follow-up.
high positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... goal progress (self-reported goal progress at two-week follow-up)
The authors provide a demo video, a hosted website, and an installable package demonstrating JobMatchAI.
Paper explicitly states availability of a demo video, a hosted website, and an installable package. No links, access dates, or artifact verification details are provided in the excerpt.
high positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... availability of demonstration artifacts (video, hosted website, installable pack...
The authors provide a hybrid retrieval stack combining BM25, a skill knowledge graph, and semantic components to evaluate skill generalization.
Paper describes a hybrid retrieval stack composed of BM25, a knowledge graph, and semantic retrieval components intended for evaluation of skill generalization. No evaluation metrics or comparisons are included in the excerpt.
high positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... retrieval stack composition (BM25 + knowledge graph + semantic components) inten...
The authors release JobSearch-XS benchmark.
Paper explicitly states release of the JobSearch-XS benchmark. No dataset size, annotation protocol, or access URL provided in the excerpt.
high positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... availability of JobSearch-XS benchmark (artifact release)
JobMatchAI integrates Transformer embeddings, skill knowledge graphs, and interpretable reranking.
Statement in paper describing system architecture and components (implementation claim). No quantitative implementation details or component-level ablation results provided in the supplied excerpt.
high positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... system design / component integration (presence of Transformer embeddings, knowl...
TDAD (Test-Driven Agentic Development) combines abstract-syntax-tree (AST) based code-test graph construction with weighted impact analysis to surface the tests most likely affected by a proposed change.
Description of the tool/methodology and its implementation (TDAD is presented as an open-source tool in the paper).
high positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... identification/surfacing of tests likely impacted by code changes (test prioriti...
PIER is an offline reinforcement learning framework that learns fuel‑efficient, safety‑aware routing policies from physics‑calibrated environments grounded in historical vessel tracking data and ocean reanalysis products, requiring no online simulator.
Methodological description of PIER in the paper: offline RL trained on environments constructed from AIS and reanalysis data; no online simulator used for policy learning (implementation details provided).
high positive Physics-informed offline reinforcement learning eliminates c... requirement for online simulator (method characteristic)
Bootstrap 95% confidence interval for PIER mean CO2 savings relative to great-circle routing is [2.9%, 15.7%].
Bootstrap analysis applied to the 2023 AIS validation results (840 episodes per method) producing the stated 95% CI for mean percent savings.
high positive Physics-informed offline reinforcement learning eliminates c... 95% bootstrap confidence interval for mean percent CO2 savings