The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

Generative AI can trigger a sudden collapse in peer-review effort; when AI crosses a critical capability threshold, journals should loosen selectivity and invest in detection rather than tighten standards. Editors who continue to tighten risk amplifying wasteful author polishing without improving paper sorting.

Buying the Right to Monitor:Editorial Design in AI-Assisted Peer Review
Zaruhi Hakobyan · April 26, 2026
arxiv theoretical n/a evidence 8/10 relevance Source PDF
A three-sided equilibrium model shows that once generative-AI capability passes a critical threshold, reviewer effort collapses discontinuously, creating a welfare trade-off that makes editors optimally reverse policy from tightening acceptance standards to loosening them while investing in AI detection.

Generative AI acts as a disruptive technological shock to evaluative organizations. In academic peer review, it enters both sides of the market: authors use AI to polish submissions, and reviewers use it to generate plausible reports without exerting evaluative effort. We develop a three-sided equilibrium model to analyze this dual adoption and derive a counterintuitive managerial implication for journal policy. We show that when AI capability crosses a critical threshold, reviewer effort collapses discontinuously. This transition creates a welfare misalignment: authors benefit from a weakened ``rat race,'' while editors suffer from degraded signal informativeness. Characterizing the editor's optimal constrained response, we identify a strict policy reversal. Before the AI transition, editors should tighten acceptance standards to curb rent-dissipating author polishing. After the transition, conventional intuition fails: editors must loosen acceptance standards while investing in AI detection, because further tightening only amplifies dissipative polishing without improving sorting. We prove analytically that this sign reversal is a structural consequence of the reviewer effort collapse under log-concave quality distributions. Ultimately, addressing AI in evaluative systems requires treating monitoring and loosened selectivity as complementary design instruments.

Summary

Main Finding

When generative AI crosses a critical capability threshold, reviewers discontinuously shift from reading carefully to submitting plausible AI-generated reports. This creates a welfare split—authors gain (the competitive “polish” rat race weakens) while editors lose (review signals become less informative). The paper shows that, under standard assumptions (log-concave quality distributions, convex polishing costs), the editor’s optimal constrained response reverses sign across this transition: before the reviewer-effort collapse, the editor should tighten acceptance to curb rent-dissipating author polishing; after the collapse, the editor should instead loosen acceptance while investing in detection of shirking. Detection and loosened selectivity are complementary—monitoring restores signal quality but increases the return to polishing, so higher acceptance compensates authors and makes monitoring politically feasible.

Key Points

  • Three-sided mechanism: authors (choose polishing), reviewers (decide whether to accept and exert effort or shirk via AI), editor (chooses panel size N, acceptance rate K, and detection intensity pdet).
  • Author-side effect: polishing is a pure rat race. In symmetric equilibrium polishing increases relative acceptance chances but leaves aggregate acceptance K unchanged, so polishing is privately valuable but socially dissipative. The marginal return to polishing scales with the effort share m of conscientious reviewers.
  • Reviewer-side phase transition: as AI capability γ increases, shirking becomes attractive at a critical γ1 = −R/ψα. Below γ1 only conscientious reviewers accept and exert effort (m = 1). At or above γ1 more reviewers accept but a fraction shirk (m < 1). The change at γ1 is discontinuous: reviewer pool mass jumps and average effort drops.
  • Welfare misalignment: at the transition authors strictly benefit (weaker rat race → lower cA(a)), while editors strictly lose (weaker signals → worse sorting), so editor and author incentives diverge.
  • Editor’s constrained optimization: editor maximizes expected accepted-paper quality minus invitation and detection costs, subject to an author-welfare (participation/IR) constraint (authors must be no worse off than in the decentralized equilibrium). Holding N fixed at its decentralized post-transition value, the optimal (K, pdet) pair exhibits a sign reversal:
    • Pre-transition: tighten acceptance (lower K) to reduce wasteful polishing and improve sorting.
    • Post-transition: loosen acceptance (raise K) and invest in detection (pdet > 0). Detection increases the effective effort share M(m,pdet) but reduces the retained sample size Nret; because detection raises the value of polishing, the editor must increase K to satisfy the author-welfare constraint—thus the editor effectively “buys the right to monitor” by making acceptance less stringent.
  • Analytical generality: the sign reversal is proven analytically under log-concave quality distributions and convex polishing costs; the reversal is a structural consequence of the reviewer-effort collapse rather than an artifact of parameter choices.
  • Detection is modeled as ex-post screening: reviewers decide effort without anticipating detection; detection discards a fraction pdet of shirking reports, raising the effective effort share among retained reports: M = m / [m + (1−m)(1−pdet)]. This creates a tradeoff between composition quality and sample size (Nret = N[m + (1−m)(1−pdet)]).

Data & Methods

  • Methodological approach: a tractable theoretical model producing subgame-perfect Nash equilibrium in a three-sided market (authors, reviewers, editor).
    • Authors: unit mass, latent quality θ ~ F (log-concave), choose polishing intensity a (cost cA convex) prior to observing θ; publication value V.
    • Reviewers: heterogeneous conscientiousness t ~ G, decide whether to accept invitation and whether to exert effort e ∈ {0,1}. AI capability γ makes shirking reports appear more plausible; appearance reward ψα affects the decision. Reviewer costs cR(t) decreasing in t.
    • Editor: commits to π = (N,K,pdet); accepts top-K fraction by aggregate signal; faces invitation cost εN and detection cost D(pdet) (convex).
    • Signals: informative reports when e=1; noisy when e=0. Aggregate retained signal precision depends on effort composition M and retained sample size Nret.
  • Key closed-form/analytic elements:
    • Critical threshold for shirking: γ1 = −R/ψα (with assumptions ensuring γ1 ∈ (0,1)).
    • Author first-order condition in symmetric equilibrium: c′A(a) = V β m hm(τK; a), so a* is increasing in m.
    • Effective effort share after detection: M(m,pdet) = m / (m + (1−m)(1−pdet)).
    • Editor objective: expected accepted-paper quality Q(Nret,K,M) − εN − D(pdet), with Q increasing in M, decreasing in K, and increasing (with diminishing returns) in Nret.
  • Assumptions used for analytic tractability and results:
    • F is log-concave; cA strictly convex, c′A(0)=0; reviewer cost cR decreasing; detection cost D convex with D′(0)=0.
    • Detection modeled as ex-post (so reviewers’ accept/effort choices do not anticipate pdet).
    • Author polishing is symmetric (independent of realized θ); N held fixed in the constrained editorial optimization (choosing K and pdet).
  • Empirical context and references: the model is motivated by empirical evidence of AI adoption:
    • Liu et al. (2025): rapid growth in AI-assisted scientific writing, especially outside English-speaking countries.
    • Liang et al. (2024): corpus evidence of AI-generated conference reviews and reduced rebuttal engagement by AI-using reviewers.
    • Lepp & Smith (2025): persistent language-related biases despite AI availability.
  • Comparative statics/calibrations: the paper reports comparative statics across calibrations (Section 8) to illustrate robustness of the qualitative results.

Implications for AI Economics

  • Institutional design when technology affects both signaling and evaluation: when a technology simultaneously reduces the cost of costly visible signaling (author-side AI reduces polish costs) and reduces informativeness of evaluation (reviewer-side AI enables plausible shirking), optimal institutional responses can be non-monotone and counterintuitive. Policymakers and managers should consider monitoring and loosened selection as joint instruments rather than reflexively tightening standards.
  • Political economy constraint matters: editorial reforms that harm incumbent authors are hard to sustain. The paper formalizes this by imposing an author-welfare constraint; under this constraint, detection must often be paired with higher acceptance to maintain author buy-in—hence "buying the right to monitor."
  • Testable empirical predictions:
    • A discontinuous drop in reviewer substantive effort (measured by review length, citations to methodology, or interaction in rebuttals) once AI capability crosses a threshold.
    • Simultaneous increases in the number of accepted reviewers (more acceptances) but falls in measures of review informativeness.
    • Journals that invest in detection will tend to raise acceptance rates (or otherwise compensate authors) to keep submissions, all else equal.
    • Author-level polish expenditures (editing services, AI-assisted polishing disclosures) decline or become less socially valuable as reviewer shirking increases.
  • Broader applications: the model applies to other evaluative organizations where agents both signal and are judged (hiring, grant review, promotion committees). The key design principle is complementarity of monitoring and loosened selectivity when evaluation informativeness is degraded by a technology.
  • Practical editorial recommendations:
    • Do not reflexively tighten acceptance rates when AI-assisted reviewing proliferates.
    • Invest in reliable detection/monitoring of reviewer shirking to restore signal quality.
    • Pair monitoring with compensatory loosening of selectivity (higher K) to satisfy author-welfare constraints and to make monitoring politically and practically feasible.
    • Consider redesigning reviewer incentives (beyond ex-post detection), e.g., reputational rewards or anticipating detection in the reviewer effort decision (extension beyond the ex-post model).
  • Limitations and directions for future empirical/theoretical work:
    • The model assumes ex-post detection (reviewers do not anticipate pdet). Allowing pdet to enter reviewer incentives endogenously could change participation and effort composition and is a promising extension.
    • Authors are modeled with symmetric polish (no conditioning on realized θ). Allowing endogenous quality-contingent polishing would enrich sorting insights.
    • The editor’s optimization held N fixed for tractability; allowing endogenous N jointly with (K,pdet) could change quantitative recommendations.
    • The author-welfare (IR) constraint is taken as given; endogenizing submission choices or journal competition would provide a richer political-economy microfoundation.
    • Empirical validation: use longitudinal data on review characteristics, review submission acceptance rates, editorial policies (detection investments), and author behavior before and after observable jumps in AI capability to test the model’s predicted phase transition and sign reversal.

Summary: The paper provides a compact theoretical explanation for why the right institutional response to prevalent AI-assisted shirking is often the opposite of conventional intuition: monitor reviewers, but relax selectivity to offset increased rent dissipation incentives—i.e., buy the right to monitor by loosening acceptance.

Assessment

Paper Typetheoretical Evidence Strengthn/a — The paper is fully theoretical and provides formal proofs of mechanism and equilibrium properties, but it presents no empirical tests, experiments, or observational validation of the model's assumptions or predicted threshold behavior. Methods Rigorhigh — The analysis constructs an explicit three-sided equilibrium, identifies a critical parameter value producing a discontinuous effort collapse, and proves structural results under log-concavity assumptions; the math appears formal and internally consistent, though it relies on stylized assumptions. SampleNo empirical sample; an analytic model of a three-sided market with representative authors, reviewers, and an editor, quality drawn from a log-concave distribution, parameters for AI capability, author polishing effort, reviewer effort cost, acceptance standards, and editor investments in AI detection. Themesorg_design governance IdentificationAnalytical derivation of a three-sided equilibrium (authors, reviewers, editors) showing how changes in a model parameter representing AI capability produce a discontinuous drop in reviewer effort; proofs exploit properties of log-concave quality distributions to demonstrate the threshold/phase transition and derive comparative statics for editor policy. GeneralizabilityRelies on strong, stylized assumptions (representative agents, log-concave quality distribution) that may not hold across real evaluative settings, No empirical validation — magnitude and existence of threshold effects untested in real journals or other evaluative institutions, Ignores heterogeneity across reviewers, authors, disciplines, and editorial practices, Abstracts from dynamic adaptation, reputational effects, multi-stage review processes, and cross-journal competition, Assumes a single scalar ‘AI capability’ and simplified detection technology/cost structure, which may misrepresent real-world AI tools and monitoring

Claims (9)

ClaimDirectionConfidenceOutcomeDetails
Generative AI acts as a disruptive technological shock to evaluative organizations. Automation Exposure negative high disruption to evaluative organizations (change in organizational evaluative processes/effort)
0.12
In academic peer review, generative AI enters both sides of the market: authors use AI to polish submissions, and reviewers use it to generate plausible reports without exerting evaluative effort. Task Allocation mixed high adoption of AI by authors and reviewers (change in task allocation and effort)
0.06
When AI capability crosses a critical threshold, reviewer effort collapses discontinuously. Task Allocation negative high reviewer effort (level of evaluative effort exerted by reviewers)
discontinuous collapse (no numerical magnitude provided)
0.12
The reviewer-effort collapse creates a welfare misalignment: authors benefit from a weakened 'rat race' while editors suffer from degraded signal informativeness. Worker Satisfaction mixed high welfare for authors (utility/payoff) and informativeness of editorial signals
0.12
Before the AI transition, editors should tighten acceptance standards to curb rent-dissipating author polishing. Governance And Regulation negative high editorial acceptance standards (policy intensity) as a response to author polishing
0.12
After the AI transition, editors must loosen acceptance standards while investing in AI detection, because further tightening only amplifies dissipative polishing without improving sorting. Governance And Regulation mixed high optimal editorial policy (acceptance standards and investment in AI detection) and its impact on author behavior and sorting
0.12
There is a strict policy reversal in optimal editorial policy sign: tightening is optimal pre-transition, loosening is optimal post-transition. Governance And Regulation mixed high direction of optimal editorial policy change (tighten vs loosen) across regimes
0.12
The sign reversal is a structural consequence of the reviewer effort collapse under log-concave quality distributions; this is proved analytically. Other null_result high existence of sign reversal as a robust structural model implication under log-concavity
0.12
Addressing AI in evaluative systems requires treating monitoring (AI detection) and loosened selectivity as complementary design instruments. Governance And Regulation positive high effectiveness of combined interventions (monitoring + loosened selectivity) on evaluative system performance
0.12

Notes