Replicating two scholars' published reasoning produced scholar-bots that attained senior-lecturer–level evaluations across supervision, peer review and panel debate, suggesting that extractable publication records could enable low-cost functional replacement of some academic labor unless disclosure and compensation rules are adopted now.
We extracted the scholarly reasoning systems of two internationally prominent humanities and social science scholars from their published corpora alone, converted those systems into structured inference-time constraints for a large language model, and tested whether the resulting scholar-bots could perform core academic functions at expert-assessed quality. The distillation pipeline used an eight-layer extraction method and a nine-module skill architecture grounded in local, closed-corpus analysis. The scholar-bots were then deployed across doctoral supervision, peer review, lecturing and panel-style academic exchange. Expert assessment involved three senior academics producing reports and appointment-level syntheses. Across the preserved expert record, all review and supervision reports judged the outputs benchmark-attaining, appointment-level recommendations placed both bots at or above Senior Lecturer level in the Australian university system, and recovered panel scores placed Scholar A between 7.9 and 8.9/10 and Scholar B between 8.5 and 8.9/10 under multi-turn debate conditions. A research-degree-student survey showed high performance ratings across information reliability, theoretical depth and logical rigor, with pronounced ceiling effects on a 7-point scale, despite all participants already being frontier-model users. We term this the Relic condition: when publication systems make stable reasoning architectures legible, extractable and cheaply deployable, the public record of intellectual labor becomes raw material for its own functional replacement. Because the technical threshold for this transition is already crossed at modest engineering effort, we argue that the window for protective frameworks covering disclosure, consent, compensation and deployment restriction is the present, while deployment remains optional rather than infrastructural.
Summary
Main Finding
The authors demonstrate an existence proof that stable reasoning architectures embedded in two scholars’ public publication corpora can be extracted, encoded as inference-time constraints for a general-purpose LLM, and deployed as “scholar-bots” that perform core academic functions (peer review, doctoral supervision, lecturing, panel debate) at expert-assessed, benchmark-attaining quality. They name the resulting structural vulnerability the "Relic condition": when publication systems make scholars’ reasoning systems legible, extractable, and cheaply deployable, the public record of intellectual labor becomes raw material for its own functional replacement. The authors argue the technical threshold for this transition has already been crossed with modest engineering effort, so the policy window for protections is immediate.
Key Points
- Distillation scope and method
- Two humanities/social-science scholars were reconstructed from their published corpora alone.
- Scholar A: 68 analytical units (~1,742 pages). Scholar B: 35 fully processed local corpus items (papers, chapters, long-form work).
- Pipeline: an eight-layer extraction framework and a nine-module skill architecture; local closed-corpus analysis (no hidden archives or heavy domain fine-tuning).
- Evaluation and outcomes
- Evaluations: 18 task-specific expert reviews + 6 appointment-level syntheses across three senior-academic reviewer groups; a 3-round panel debate including a stress-test Scholar C.
- All preserved peer-review and supervision reports judged outputs benchmark-attaining.
- Appointment-level syntheses placed both scholar-bots at or above Senior Lecturer (Australian system ≈ tenured Associate Professor in US).
- Panel scores: Scholar A between 7.9–8.9/10; Scholar B between 8.5–8.9/10 under multi-turn debate.
- Differential profiles: Scholar A emphasized conceptual boundary control and theoretical authority; Scholar B emphasized operationalization, mediation analysis and pedagogical scaffolding—indicating capture of scholar-specific reasoning signatures.
- Student usability pilot (n=10)
- Ten doctoral students who were already frontier-model users gave high ratings (7-point scale) with pronounced ceiling effects: composite performance mean 6.68; composite confidence 6.62.
- Most-used scenarios: theory comparison, framing and writing guidance.
- Limitations acknowledged by authors
- This is an existence proof from two scholars in a single subfield; it does not establish prevalence or cross-field generality.
- Heterogeneous evaluation formats (mix of rubrics and narrative), small student sample, and other experimental constraints.
Data & Methods
- Data sources
- Publicly available published corpora of the two source scholars (solo-authored monographs, articles, chapters).
- No private materials, hidden archives, or proprietary training data used.
- Distillation pipeline (summary)
- Eight-layer extraction framework to identify recurrent analytic units, conceptual operators, evaluative thresholds, citation logic and recurrent argumentative moves.
- Reassembly into a nine-module skill architecture that constrains a general-purpose LLM at inference time (not domain fine-tuning).
- Local, closed-corpus analysis aimed to reconstruct an interpretable, executable reasoning constraint set rather than black-box fine-tuning.
- Evaluation protocol
- Task families: peer review, doctoral supervision, lecture preparation, multi-turn panel exchange.
- Expert assessment: three senior academics produced 18 task-specific reports; six appointment-level syntheses; panels scored over multiple rounds, including order-reversal and an added third discussant (Scholar C) as stress test.
- Student survey (n=10) measured performance and confidence across information reliability, innovation inspiration, academic knowledge, theoretical depth, logical rigor; usage scenarios and willingness-to-pay indicators collected.
- Measurement caveats
- Authors report quantitative scores where explicit; otherwise rely on convergent qualitative patterns.
- Heterogeneous rubrics and small-sample student data limit inferential generality.
Implications for AI Economics
- Labor substitution and task reallocation
- High-skill academic tasks previously considered resistant to automation—peer review, doctoral supervision, field-calibrated judgment—are vulnerable when scholars’ public texts are sufficiently stable and legible.
- Expect downward pressure on demand (and possibly wages) for mid-level academic tasks that map to stable standards and structured outputs (e.g., routine supervision, reviewing, lecture drafting).
- Complementarity effects: tasks requiring novelty, embodied practice, or opaque tacit judgement may retain rent-bearing human value; scholars may shift toward more exploratory, inventive, or institutionally embedded work.
- Market formation and rent capture
- New commercial niches: “scholar distillation” services, bespoke scholar-shaped LLMs, licensing/royalty markets for distilled reasoning artifacts.
- Platform firms that operate model infra and access to distilled artifacts may capture substantial rents; institutions and scholars risk losing bargaining power if extraction is unmanaged.
- Willingness-to-pay signals (pilot) suggest potential demand among research users for scholar-shaped systems, implying monetizable markets.
- Distributional and institutional effects
- Differential exposure: fields with dense, solo-authored, canonical textual outputs (many humanities and social-science subfields) are more distillable than disciplines where tacit or data-heavy practice dominates (some lab sciences, crafts).
- Universities and publishers may internalize or externalize value differently (e.g., sale/ license of corpora, collective bargaining over data-use), affecting incentive structures for publishing and career investment.
- Credentialing and hiring: if scholar-bots meet benchmark labor needs, institutions may recalibrate staffing (fewer roles for some functions; higher premium on novel research leadership).
- Policy, regulation and governance implications
- Urgency: authors argue the window for protective frameworks (disclosure, consent, compensation, deployment restriction) is immediate because extraction is already feasible with ordinary public materials and modest engineering.
- Possible policy responses:
- Copyright/consent regimes and licensing models for corpora that explicitly cover extraction for reasoning-distillation.
- Mandatory disclosure when scholar-shaped systems are deployed in evaluative or pedagogical contexts.
- Collective bargaining / pooled licensing for scholars to capture compensation or control deployment.
- Publication-platform interventions (APIs, access controls, metadata indicating distillation risk).
- Welfare trade-offs: regulation must balance authors’ property and labor-market protections against potential efficiency and productivity gains from scholar-bots (e.g., scaling supervision capacity, faster peer review).
- Measurement and research priorities for economists
- Operationalize a "distillability" index across fields (factors: text stability, solo-authorship prevalence, conceptual formalization, corpus accessibility).
- Incorporate Relic risk into task-based automation models: estimate share of academic tasks susceptible to extraction and the elasticity of substitution between human and distilled agents.
- Empirical data to collect: corpus accessibility metrics, licensing prevalence, uptake of scholar-shaped tools, hires and promotion patterns, compensation flows tied to distilled outputs.
- Design market experiments: opt-in licensing pilots, revenue-sharing trials, platform-level attribution and traceability mechanisms; test behavioral responses of scholars to potential income loss or new income streams.
- Normative and macroeconomic considerations
- Potential negative externalities include knowledge extractivism, erosion of scholars’ bargaining power, and concentration of normative authority in platform actors that control distilled artifacts.
- Offsetting gains may include increased throughput of pedagogical and review processes and lower transaction costs for research design and training—with distributional consequences that need countervailing policy.
- Long-run equilibrium may involve layered markets (licensed scholar-bots for standardized tasks; premium human labor for novelty and institutional functions).
Brief limitations to carry forward - The study is an existence proof, not a prevalence study; generalization across disciplines and scales is an open empirical question. - Evaluations are expert judgments and small user samples; larger, standardized trials and market data are needed to quantify economic magnitude.
Recommended immediate actions for researchers and policymakers in AI economics - Start measuring distillability across disciplines and monitor early commercial deployments of scholar-shaped systems. - Convene stakeholders (academics, publishers, universities, platforms) to design pilot licensing and disclosure regimes. - Model labor-market scenarios that include scholar-bot substitution, complementarity, and redistribution outcomes to inform regulation.
Assessment
Claims (11)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We extracted the scholarly reasoning systems of two internationally prominent humanities and social science scholars from their published corpora alone. Other | positive | high | successful extraction of reasoning systems from published corpora |
n=2
0.8
|
| We converted those systems into structured inference-time constraints for a large language model. Other | positive | high | conversion of extracted reasoning systems into inference-time constraints |
n=2
0.8
|
| The distillation pipeline used an eight-layer extraction method and a nine-module skill architecture grounded in local, closed-corpus analysis. Other | neutral | high | pipeline architecture (layers/modules) |
0.8
|
| The scholar-bots were deployed across doctoral supervision, peer review, lecturing and panel-style academic exchange. Output Quality | positive | high | ability to perform academic tasks (supervision, peer review, lecturing, panel exchange) |
0.48
|
| Expert assessment involved three senior academics producing reports and appointment-level syntheses. Other | neutral | high | expert assessment procedure (number and type of assessors) |
n=3
0.8
|
| Across the preserved expert record, all review and supervision reports judged the outputs benchmark-attaining. Output Quality | positive | high | benchmark attainment in review and supervision reports |
0.48
|
| Appointment-level recommendations placed both bots at or above Senior Lecturer level in the Australian university system. Hiring | positive | high | appointment/rank recommendation |
n=2
0.48
|
| Recovered panel scores placed Scholar A between 7.9 and 8.9/10 and Scholar B between 8.5 and 8.9/10 under multi-turn debate conditions. Output Quality | positive | high | panel evaluation scores (0-10 scale) under multi-turn debate |
Scholar A between 7.9 and 8.9/10; Scholar B between 8.5 and 8.9/10
0.48
|
| A research-degree-student survey showed high performance ratings across information reliability, theoretical depth and logical rigor, with pronounced ceiling effects on a 7-point scale, despite all participants already being frontier-model users. Output Quality | positive | high | student-rated performance on reliability, theoretical depth, logical rigor (7-point scale) |
high performance ratings with pronounced ceiling effects on a 7-point scale
0.48
|
| We term this the Relic condition: when publication systems make stable reasoning architectures legible, extractable and cheaply deployable, the public record of intellectual labor becomes raw material for its own functional replacement. Job Displacement | negative | high | conceptual risk of intellectual-labor replacement derived from extractable publication record |
0.08
|
| Because the technical threshold for this transition is already crossed at modest engineering effort, the window for protective frameworks covering disclosure, consent, compensation and deployment restriction is the present, while deployment remains optional rather than infrastructural. Governance And Regulation | negative | high | need for protective policy frameworks and timing |
0.08
|