The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

Replicating two scholars' published reasoning produced scholar-bots that attained senior-lecturer–level evaluations across supervision, peer review and panel debate, suggesting that extractable publication records could enable low-cost functional replacement of some academic labor unless disclosure and compensation rules are adopted now.

The Relic Condition: When Published Scholarship Becomes Material for Its Own Replacement
Lin Deng, Chang-bo Liu · April 17, 2026
arxiv quasi_experimental low evidence 7/10 relevance Source PDF
Two scholar-bots distilled from published corpora matched appointment-level expert judgments for senior academic tasks (peer review, supervision, lecturing, panel debate), implying that publicly legible scholarly reasoning can be extracted and functionally replicated with modest engineering effort.

We extracted the scholarly reasoning systems of two internationally prominent humanities and social science scholars from their published corpora alone, converted those systems into structured inference-time constraints for a large language model, and tested whether the resulting scholar-bots could perform core academic functions at expert-assessed quality. The distillation pipeline used an eight-layer extraction method and a nine-module skill architecture grounded in local, closed-corpus analysis. The scholar-bots were then deployed across doctoral supervision, peer review, lecturing and panel-style academic exchange. Expert assessment involved three senior academics producing reports and appointment-level syntheses. Across the preserved expert record, all review and supervision reports judged the outputs benchmark-attaining, appointment-level recommendations placed both bots at or above Senior Lecturer level in the Australian university system, and recovered panel scores placed Scholar A between 7.9 and 8.9/10 and Scholar B between 8.5 and 8.9/10 under multi-turn debate conditions. A research-degree-student survey showed high performance ratings across information reliability, theoretical depth and logical rigor, with pronounced ceiling effects on a 7-point scale, despite all participants already being frontier-model users. We term this the Relic condition: when publication systems make stable reasoning architectures legible, extractable and cheaply deployable, the public record of intellectual labor becomes raw material for its own functional replacement. Because the technical threshold for this transition is already crossed at modest engineering effort, we argue that the window for protective frameworks covering disclosure, consent, compensation and deployment restriction is the present, while deployment remains optional rather than infrastructural.

Summary

Main Finding

The authors demonstrate an existence proof that stable reasoning architectures embedded in two scholars’ public publication corpora can be extracted, encoded as inference-time constraints for a general-purpose LLM, and deployed as “scholar-bots” that perform core academic functions (peer review, doctoral supervision, lecturing, panel debate) at expert-assessed, benchmark-attaining quality. They name the resulting structural vulnerability the "Relic condition": when publication systems make scholars’ reasoning systems legible, extractable, and cheaply deployable, the public record of intellectual labor becomes raw material for its own functional replacement. The authors argue the technical threshold for this transition has already been crossed with modest engineering effort, so the policy window for protections is immediate.

Key Points

  • Distillation scope and method
    • Two humanities/social-science scholars were reconstructed from their published corpora alone.
    • Scholar A: 68 analytical units (~1,742 pages). Scholar B: 35 fully processed local corpus items (papers, chapters, long-form work).
    • Pipeline: an eight-layer extraction framework and a nine-module skill architecture; local closed-corpus analysis (no hidden archives or heavy domain fine-tuning).
  • Evaluation and outcomes
    • Evaluations: 18 task-specific expert reviews + 6 appointment-level syntheses across three senior-academic reviewer groups; a 3-round panel debate including a stress-test Scholar C.
    • All preserved peer-review and supervision reports judged outputs benchmark-attaining.
    • Appointment-level syntheses placed both scholar-bots at or above Senior Lecturer (Australian system ≈ tenured Associate Professor in US).
    • Panel scores: Scholar A between 7.9–8.9/10; Scholar B between 8.5–8.9/10 under multi-turn debate.
    • Differential profiles: Scholar A emphasized conceptual boundary control and theoretical authority; Scholar B emphasized operationalization, mediation analysis and pedagogical scaffolding—indicating capture of scholar-specific reasoning signatures.
  • Student usability pilot (n=10)
    • Ten doctoral students who were already frontier-model users gave high ratings (7-point scale) with pronounced ceiling effects: composite performance mean 6.68; composite confidence 6.62.
    • Most-used scenarios: theory comparison, framing and writing guidance.
  • Limitations acknowledged by authors
    • This is an existence proof from two scholars in a single subfield; it does not establish prevalence or cross-field generality.
    • Heterogeneous evaluation formats (mix of rubrics and narrative), small student sample, and other experimental constraints.

Data & Methods

  • Data sources
    • Publicly available published corpora of the two source scholars (solo-authored monographs, articles, chapters).
    • No private materials, hidden archives, or proprietary training data used.
  • Distillation pipeline (summary)
    • Eight-layer extraction framework to identify recurrent analytic units, conceptual operators, evaluative thresholds, citation logic and recurrent argumentative moves.
    • Reassembly into a nine-module skill architecture that constrains a general-purpose LLM at inference time (not domain fine-tuning).
    • Local, closed-corpus analysis aimed to reconstruct an interpretable, executable reasoning constraint set rather than black-box fine-tuning.
  • Evaluation protocol
    • Task families: peer review, doctoral supervision, lecture preparation, multi-turn panel exchange.
    • Expert assessment: three senior academics produced 18 task-specific reports; six appointment-level syntheses; panels scored over multiple rounds, including order-reversal and an added third discussant (Scholar C) as stress test.
    • Student survey (n=10) measured performance and confidence across information reliability, innovation inspiration, academic knowledge, theoretical depth, logical rigor; usage scenarios and willingness-to-pay indicators collected.
  • Measurement caveats
    • Authors report quantitative scores where explicit; otherwise rely on convergent qualitative patterns.
    • Heterogeneous rubrics and small-sample student data limit inferential generality.

Implications for AI Economics

  • Labor substitution and task reallocation
    • High-skill academic tasks previously considered resistant to automation—peer review, doctoral supervision, field-calibrated judgment—are vulnerable when scholars’ public texts are sufficiently stable and legible.
    • Expect downward pressure on demand (and possibly wages) for mid-level academic tasks that map to stable standards and structured outputs (e.g., routine supervision, reviewing, lecture drafting).
    • Complementarity effects: tasks requiring novelty, embodied practice, or opaque tacit judgement may retain rent-bearing human value; scholars may shift toward more exploratory, inventive, or institutionally embedded work.
  • Market formation and rent capture
    • New commercial niches: “scholar distillation” services, bespoke scholar-shaped LLMs, licensing/royalty markets for distilled reasoning artifacts.
    • Platform firms that operate model infra and access to distilled artifacts may capture substantial rents; institutions and scholars risk losing bargaining power if extraction is unmanaged.
    • Willingness-to-pay signals (pilot) suggest potential demand among research users for scholar-shaped systems, implying monetizable markets.
  • Distributional and institutional effects
    • Differential exposure: fields with dense, solo-authored, canonical textual outputs (many humanities and social-science subfields) are more distillable than disciplines where tacit or data-heavy practice dominates (some lab sciences, crafts).
    • Universities and publishers may internalize or externalize value differently (e.g., sale/ license of corpora, collective bargaining over data-use), affecting incentive structures for publishing and career investment.
    • Credentialing and hiring: if scholar-bots meet benchmark labor needs, institutions may recalibrate staffing (fewer roles for some functions; higher premium on novel research leadership).
  • Policy, regulation and governance implications
    • Urgency: authors argue the window for protective frameworks (disclosure, consent, compensation, deployment restriction) is immediate because extraction is already feasible with ordinary public materials and modest engineering.
    • Possible policy responses:
      • Copyright/consent regimes and licensing models for corpora that explicitly cover extraction for reasoning-distillation.
      • Mandatory disclosure when scholar-shaped systems are deployed in evaluative or pedagogical contexts.
      • Collective bargaining / pooled licensing for scholars to capture compensation or control deployment.
      • Publication-platform interventions (APIs, access controls, metadata indicating distillation risk).
    • Welfare trade-offs: regulation must balance authors’ property and labor-market protections against potential efficiency and productivity gains from scholar-bots (e.g., scaling supervision capacity, faster peer review).
  • Measurement and research priorities for economists
    • Operationalize a "distillability" index across fields (factors: text stability, solo-authorship prevalence, conceptual formalization, corpus accessibility).
    • Incorporate Relic risk into task-based automation models: estimate share of academic tasks susceptible to extraction and the elasticity of substitution between human and distilled agents.
    • Empirical data to collect: corpus accessibility metrics, licensing prevalence, uptake of scholar-shaped tools, hires and promotion patterns, compensation flows tied to distilled outputs.
    • Design market experiments: opt-in licensing pilots, revenue-sharing trials, platform-level attribution and traceability mechanisms; test behavioral responses of scholars to potential income loss or new income streams.
  • Normative and macroeconomic considerations
    • Potential negative externalities include knowledge extractivism, erosion of scholars’ bargaining power, and concentration of normative authority in platform actors that control distilled artifacts.
    • Offsetting gains may include increased throughput of pedagogical and review processes and lower transaction costs for research design and training—with distributional consequences that need countervailing policy.
    • Long-run equilibrium may involve layered markets (licensed scholar-bots for standardized tasks; premium human labor for novelty and institutional functions).

Brief limitations to carry forward - The study is an existence proof, not a prevalence study; generalization across disciplines and scales is an open empirical question. - Evaluations are expert judgments and small user samples; larger, standardized trials and market data are needed to quantify economic magnitude.

Recommended immediate actions for researchers and policymakers in AI economics - Start measuring distillability across disciplines and monitor early commercial deployments of scholar-shaped systems. - Convene stakeholders (academics, publishers, universities, platforms) to design pilot licensing and disclosure regimes. - Model labor-market scenarios that include scholar-bot substitution, complementarity, and redistribution outcomes to inform regulation.

Assessment

Paper Typequasi_experimental Evidence Strengthlow — The paper presents a compelling demonstration but from a very small and non-random sample (two scholars) with a small number of expert evaluators (three senior academics) and unspecified survey sample size; evaluations are subjective, benchmarking is contextual (Australian appointment levels), and there are no longitudinal, market-level, or causal analyses to support broad claims about economic impacts or workforce displacement. Methods Rigormedium — The technical pipeline appears structured (an eight-layer extraction method and nine-module skill architecture) and uses closed-corpus, reproducible procedures, and multi-modal expert assessment; however, important methodological details are missing or limited (sample sizes, evaluator selection and blinding, robustness checks, model details, reproducibility materials), and the evaluation lacks experimental controls and counterfactual comparisons. SampleTwo internationally prominent humanities/social-science scholars (published corpora only) were used to distill reasoning architectures into two scholar-bots; evaluations included three senior academic expert assessors producing reports and appointment-level syntheses, multi-turn debate panel scores (reported ranges: Scholar A 7.9–8.9/10, Scholar B 8.5–8.9/10), and a research-degree-student survey of model users (size not reported) composed of frontier-model users; the LLM and exact engineering effort are described as modest but not fully specified. Themeslabor_markets human_ai_collab IdentificationNo formal causal identification strategy; claims are supported by an experimental demonstration in which two 'scholar-bots' were constructed from the published corpora of two scholars and evaluated by expert assessors and users across a set of academic tasks (peer review, supervision, lecturing, panel debate). There is no randomization, control group, pre-registration, or counterfactual analysis to isolate causal effects on outcomes such as labor displacement or productivity at scale. GeneralizabilityVery small sample (n=2 scholars) — results may not generalize across disciplines, styles, or individual scholars., Only humanities and social-science scholarly corpora were used; technical, quantitative, or lab-based sciences may differ., Evaluation relied on a small set of expert assessors (n=3) and unclear student sample sizes, introducing potential bias and limited external validity., Benchmarks tied to Australian academic appointment levels may not map to other countries' hiring standards or institutional contexts., Results depend on a specific model, extraction pipeline, and closed-corpus availability; different LLMs or corpora may perform differently., Outcomes reflect short-term task performance in simulated settings, not long-term labor-market effects or large-scale deployment., Participants were already frontier-model users; lay or non-expert user experiences may differ.

Claims (11)

ClaimDirectionConfidenceOutcomeDetails
We extracted the scholarly reasoning systems of two internationally prominent humanities and social science scholars from their published corpora alone. Other positive high successful extraction of reasoning systems from published corpora
n=2
0.8
We converted those systems into structured inference-time constraints for a large language model. Other positive high conversion of extracted reasoning systems into inference-time constraints
n=2
0.8
The distillation pipeline used an eight-layer extraction method and a nine-module skill architecture grounded in local, closed-corpus analysis. Other neutral high pipeline architecture (layers/modules)
0.8
The scholar-bots were deployed across doctoral supervision, peer review, lecturing and panel-style academic exchange. Output Quality positive high ability to perform academic tasks (supervision, peer review, lecturing, panel exchange)
0.48
Expert assessment involved three senior academics producing reports and appointment-level syntheses. Other neutral high expert assessment procedure (number and type of assessors)
n=3
0.8
Across the preserved expert record, all review and supervision reports judged the outputs benchmark-attaining. Output Quality positive high benchmark attainment in review and supervision reports
0.48
Appointment-level recommendations placed both bots at or above Senior Lecturer level in the Australian university system. Hiring positive high appointment/rank recommendation
n=2
0.48
Recovered panel scores placed Scholar A between 7.9 and 8.9/10 and Scholar B between 8.5 and 8.9/10 under multi-turn debate conditions. Output Quality positive high panel evaluation scores (0-10 scale) under multi-turn debate
Scholar A between 7.9 and 8.9/10; Scholar B between 8.5 and 8.9/10
0.48
A research-degree-student survey showed high performance ratings across information reliability, theoretical depth and logical rigor, with pronounced ceiling effects on a 7-point scale, despite all participants already being frontier-model users. Output Quality positive high student-rated performance on reliability, theoretical depth, logical rigor (7-point scale)
high performance ratings with pronounced ceiling effects on a 7-point scale
0.48
We term this the Relic condition: when publication systems make stable reasoning architectures legible, extractable and cheaply deployable, the public record of intellectual labor becomes raw material for its own functional replacement. Job Displacement negative high conceptual risk of intellectual-labor replacement derived from extractable publication record
0.08
Because the technical threshold for this transition is already crossed at modest engineering effort, the window for protective frameworks covering disclosure, consent, compensation and deployment restriction is the present, while deployment remains optional rather than infrastructural. Governance And Regulation negative high need for protective policy frameworks and timing
0.08

Notes