The Commonplace
Home Dashboard Papers Evidence Digests 🎲
← Papers

Scaling giant LLMs hits hard physical and economic limits—energy, cooling, grid stress and diminishing returns—so a practical path is many small, domain-specific superintelligences that trade scale for symbolic structure and orchestration, cutting inference costs and enabling on-device deployment.

An Alternative Trajectory for Generative AI
Margarita Belova, Yuval Kansal, Yihao Liang, Jiaxin Xiao, Niraj K. Jha · March 14, 2026
arxiv theoretical low evidence 8/10 relevance Source PDF
Monolithic LLM scaling is increasingly constrained by physical and economic limits and yields limited general reasoning outside formally abstracted domains, so building small, domain-specific, symbolically grounded 'societies' of specialized models offers a more sustainable and economically promising alternative.

The generative artificial intelligence (AI) ecosystem is undergoing rapid transformations that threaten its sustainability. As models transition from research prototypes to high-traffic products, the energetic burden has shifted from one-time training to recurring, unbounded inference. This is exacerbated by reasoning models that inflate compute costs by orders of magnitude per query. The prevailing pursuit of artificial general intelligence through scaling of monolithic models is colliding with hard physical constraints: grid failures, water consumption, and diminishing returns on data scaling. This trajectory yields models with impressive factual recall but struggles in domains requiring in-depth reasoning, possibly due to insufficient abstractions in training data. Current large language models (LLMs) exhibit genuine reasoning depth only in domains like mathematics and coding, where rigorous, pre-existing abstractions provide structural grounding. In other fields, the current approach fails to generalize well. We propose an alternative trajectory based on domain-specific superintelligence (DSS). We argue for first constructing explicit symbolic abstractions (knowledge graphs, ontologies, and formal logic) to underpin synthetic curricula enabling small language models to master domain-specific reasoning without the model collapse problem typical of LLM-based synthetic data methods. Rather than a single generalist giant model, we envision "societies of DSS models": dynamic ecosystems where orchestration agents route tasks to distinct DSS back-ends. This paradigm shift decouples capability from size, enabling intelligence to migrate from energy-intensive data centers to secure, on-device experts. By aligning algorithmic progress with physical constraints, DSS societies move generative AI from an environmental liability to a sustainable force for economic empowerment.

Summary

Main Finding

The paper argues that the dominant “bigger-is-better” trajectory—training ever-larger generalist LLMs—faces hard physical, economic, and algorithmic limits (energy, water, grid constraints, data quality, diminishing reasoning returns). As an alternative, the authors propose a bottom-up trajectory based on domain-specific superintelligence (DSS): small, specialized models trained on high-quality, explicitly abstracted domain representations (knowledge graphs, ontologies, formal semantics, program libraries) organized into modular “societies” with orchestration agents. This approach promises deeper, verifiable compositional reasoning, far lower training and inference energy footprints, on-device deployment, and broader economic democratization.

Key Points

  • Problem diagnosis of the current trajectory

    • Cost and sustainability: training and (increasingly) inference impose large, growing energy, water, and infrastructure demands concentrated in a few actors.
    • Reasoning gap: large monolithic LLMs show strong factual recall and surface pattern matching but consistently underperform on compositional, verifiable, domain-specific reasoning tasks unless domains already have rigorous abstractions (e.g., math, code).
    • Data quality: Internet-scale corpora are heterogeneous and unstructured; abstraction-first learning is argued to be necessary for reliable generalization.
    • Diminishing returns: scaling improves perplexity but does not guarantee proportional gains on abstract reasoning tasks; benchmarks can be misleading due to contamination and gaming.
  • Core elements of the proposed DSS trajectory

    • Explicit abstractions: construct symbolic structures (KGs, ontologies, formal logics, inductive program libraries) as primary training ingredients to teach compositional rules and causal structure.
    • Synthetic curricula grounded in abstractions: generate targeted, high-quality synthetic training examples from symbolic representations, enabling small models to learn deep domain reasoning.
    • Small Language Models (SLMs) and specialist DSS models: many narrowly focused, data- and abstraction-rich models instead of one giant generalist.
    • Societies of models and orchestration: front-end routing agents decompose queries and dispatch subqueries to specialist back-ends; modular composition integrates results.
    • Edge and on-device inference: capability migrates from energy-intensive datacenters to efficient on-device experts where feasible.
    • Neurosymbolic and modular reasoning engines: combine symbolic and subsymbolic methods to bootstrap and validate reasoning, prevent collapse associated with naive LLM-generated synthetic data loops.
  • Practical and empirical support cited

    • Observations that LLMs display true depth mostly in domains with rigorous abstractions (math, coding; cited successes with theorem-proving hybrids).
    • References to empirical work showing diminishing returns of scaling on multi-step reasoning and benchmarks that better capture abstraction (ARC-AGI).
    • Past domain-specific successes (protein folding, specialized scientific models) show specialization can outperform generalists within domains.
  • Proposed evaluation and research directions

    • Benchmarks focused on abstraction and programmatic generalization (ARC-AGI family), energy and deployment analyses, continual learning for agents, and domain case studies (medicine, engineering, education).
    • Metrics should include per-task energy/inference cost, verifiability, compositional generalization, and ability to learn from synthetic curricula grounded in abstractions.

Data & Methods

  • Nature of the work: conceptual / theoretical roadmap rather than a single new empirical experiment. The paper synthesizes:
    • Literature review across scaling laws, benchmarks, neurosymbolic systems, and domain-specific successes.
    • Empirical references: scaling-law studies, performance analyses on reasoning tasks (chain-of-thought, ARC-AGI), and recent systems (e.g., theorem-proving hybrids).
    • Diagnostic evidence on infrastructure and environmental constraints (energy, water, grid impacts) and on inference becoming dominant cost.
  • Methods proposed (for future instantiation and evaluation):
    • Construction and curation of symbolic abstractions (KGs, ontologies, formal semantics) for target domains.
    • Algorithmic pipelines to generate synthetic curricula from those abstractions (high-quality, targeted examples for training SLMs).
    • Modular architectures: training specialist SLMs/DSS models and designing orchestration agents for routing and composition.
    • Neurosymbolic integration: coupling symbolic engines (theorem provers, program synthesizers) with subsymbolic SLMs.
    • Evaluation frameworks: task-level energy accounting, benchmark suites prioritizing abstraction and compositionality, continual learning experiments, and domain case studies.
  • Limitations noted: paper does not provide large-scale experimental validation of the full DSS societies concept; many elements are proposals supported by prior domain-specific successes and theoretical arguments.

Implications for AI Economics

  • Changing capital and operating cost structure

    • Reduced centralization: moving from a capital-intensive datacenter model (large training runs, heavy inference in cloud) toward many lighter-weight specialist models lowers upfront capital barriers and ongoing energy costs.
    • Shift in cost focus: higher returns to data engineering, knowledge engineering, and abstraction construction (quality and curation become more valuable than sheer data volume).
    • Lower per-query inference costs: edge/on-device DSS reduces recurring cloud inference expenditure and externalities (electricity, cooling, water).
  • Market and competition effects

    • Democratization and entry: smaller actors and domain experts can build and monetize DSS models tailored to niches, expanding competition and innovation beyond a few hyperscalers.
    • New product and service markets: demand for tools and services for abstraction engineering (KG builders, ontology marketplaces), orchestration platforms, certifiable reasoning engines, and verification services.
    • Platformization of orchestration: business value may shift toward routing/orchestration providers that coordinate specialist models (a new layer of platform competition).
  • Labor and productivity implications

    • Augmentation of domain experts: on-device/domain DSS can raise worker productivity across trades (healthcare, technical services, education) without requiring centralized compute.
    • Reskilling incentives: demand for knowledge engineers, ontology specialists, and domain-data curators grows; less emphasis solely on large-model ML engineers.
  • Regulation, standards, and verification

    • Need for standardized evaluation, verification, and certification processes (verifiability and compositional correctness matter more for DSS deployments, especially in safety-critical domains).
    • Energy and environmental policy alignment: DSS societies align better with sustainability goals, potentially reducing regulatory friction and environmental externalities.
  • Risks and transition challenges

    • Coordination costs: building and orchestrating many specialized models requires standards, APIs, and interoperability investments.
    • Quality and trust: high-quality domain abstractions are labor-intensive; poor abstractions or badly generated synthetic curricula could produce brittle systems.
    • Strategic responses by incumbents: hyperscalers may invest in tooling for abstraction/knowledge engineering or consolidate orchestration layers, affecting competitive dynamics.
    • Possible “model collapse” concerns: naive synthetic-data bootstrapping (LLM generates training data for larger LLMs) can degrade quality; DSS approach claims to avoid this by grounding synthetic data in explicit abstractions.

Overall, the paper reframes the economics of future AI development: instead of escalating returns to scale and centralized capital intensity, it envisions a more modular, data- and abstraction-driven ecosystem where quality, verifiability, and energy-efficiency drive value—potentially redistributing economic opportunity and lowering environmental costs.

Assessment

Paper Typetheoretical Evidence Strengthlow — The paper is primarily conceptual and prescriptive, synthesizing trends and anecdotal/early empirical observations rather than presenting new causal empirical tests; claims about economic and environmental impacts are plausible but not demonstrated with rigorous data or causal identification. Methods Rigorlow — No systematic empirical design, no pre-registered experiments, and no formal estimation or robustness checks are provided; methodological recommendations are sensible but unimplemented, so claims rest on argumentation and selective evidence rather than rigorous methods. SampleNo original dataset or representative sample; synthesis draws on published reports and public observations about compute/energy trends, selective LLM benchmark results (not systematically analyzed here), anecdotal infrastructure constraints, and conceptual economic reasoning. Themesproductivity org_design Generalizabilitytheoretical_without_empirical_validation, uncertain_domain_transfer—proposals may work in some structured domains (math, code) but not in ill-structured domains, assumes feasibility of building high-quality ontologies/knowledge graphs across many domains, sensitive to future ML/compute innovations that could change cost/diminishing-return dynamics, institutional and market responses (regulation, firm strategy) are not modeled and may alter outcomes, implementation and coordination complexity of multi-agent DSS societies may limit practical scalability

Claims (15)

ClaimDirectionConfidenceOutcomeDetails
Scaling monolithic LLMs toward artificial general intelligence (AGI) is colliding with hard physical and economic limits (energy, grid stress, water use, diminishing returns). Fiscal And Macroeconomic negative medium feasibility of continued monolithic scaling measured by physical (power, water, cooling capacity) and economic (marginal returns on additional compute/data) constraints
0.04
The energetic burden of generative AI is shifting from one-time training to recurring, potentially unbounded inference costs as models become productized and high-traffic. Other negative medium distribution of energy consumption between training and inference (energy per inference, aggregate inference energy over time)
0.04
Reasoning-augmented models (e.g., models using chain-of-thought, multi-step reasoning, or external retrieval/looping) can inflate per-query compute by orders of magnitude, exacerbating sustainability problems. Other negative medium per-query compute cost and associated energy consumption (compute FLOPs or joules per query) under reasoning augmentation
0.04
Physical constraints (power grid reliability, water consumption for cooling, and data-center capacity) together with diminishing marginal returns on scaling make continued monolithic scaling economically and environmentally risky. Fiscal And Macroeconomic negative medium economic and environmental risk metrics (probability/impact of grid stress, water/resource usage, cost per incremental performance gain)
0.04
Current LLMs produce deep, reliable reasoning mainly in domains with rigorous, pre-existing abstractions (mathematics, programming) and underperform in domains that lack such formal abstractions. Output Quality mixed medium reasoning accuracy and reliability across domains (e.g., test performance on math/code benchmarks vs. open-ended/non-formal domains)
0.04
A more sustainable and effective trajectory is to build domain-specific superintelligences (DSS) grounded in explicit symbolic abstractions (knowledge graphs, ontologies, formal logic) and trained via synthetic curricula so compact models can learn robust, domain-level reasoning. Output Quality positive speculative domain-level reasoning robustness of compact DSS models (task accuracy, generalization) and resource metrics (model size, training/inference energy)
0.01
Architecturally, replacing single giant generalists with 'societies' of small, specialized DSS models routed by orchestration agents yields operational benefits (routing to experts, modular upgrades, specialization). Organizational Efficiency positive speculative end-to-end task success rate, routing efficiency, orchestration overhead, modular upgrade costs
0.01
DSS societies can achieve much lower inference energy per task and enable easier on-device/edge deployment compared to monolithic LLM deployments. Organizational Efficiency positive speculative energy per inference, feasibility of on-device deployment (latency, memory footprint, throughput) and aggregate infrastructure footprint
0.01
Shifting to DSS changes the cost structure of AI: it lowers recurring OPEX per user by reducing inference energy and enabling local/device processing instead of centralized, inference-heavy cloud services. Firm Productivity positive speculative OPEX per user, total cost of ownership, cost-per-task under DSS versus monolithic architectures
0.01
Specialization enables many niche DSS providers rather than a small number of dominant monolithic providers, thereby lowering entry barriers for vertical experts. Market Structure positive speculative market concentration (e.g., Herfindahl index), number of active providers per domain, barriers-to-entry indicators
0.01
DSS reduces environmental externalities (e.g., emissions, water use) relative to continued monolithic scaling and may reduce regulatory pressure tied to those externalities. Other positive speculative emissions (CO2e), water consumption for cooling, regulatory compliance incidents or costs
0.01
Smaller, verifiable DSS agents are easier to audit and align per domain, potentially reducing systemic risks associated with large opaque generalist models. Ai Safety And Ethics positive speculative auditability metrics (time/cost to audit, interpretability scores), alignment failure rates or incident counts
0.01
Operationalizing DSS requires building domain ontologies/knowledge graphs, designing synthetic curricula, training compact domain models, benchmarking against monolithic LLMs, and measuring total cost-of-ownership (energy, latency, bandwidth, infrastructure). Training Effectiveness null_result high validation metrics proposed by the paper (benchmark performance, energy/inference metrics, TCO comparisons)
0.06
The paper's argument is principally theoretical and prescriptive and requires empirical validation across domains and at scale. Other null_result high existence/absence of empirical validation (current lack of cross-domain, large-scale experimental results supporting the claims)
0.06
Research and funding priorities should reweight toward symbolic/structured knowledge, verification, curricula design, and orchestration algorithms rather than exclusive emphasis on model scale. Research Productivity positive speculative research funding allocations, publication trends, and development of tooling for symbolic/structured knowledge and orchestration
0.01

Notes