Current AI architectures lack the mechanisms for sustained, autonomous learning; a three-part design that integrates observation, active experimentation and an internal meta-controller could unlock more adaptable, sample-efficient agents and accelerate automation in embodied and social tasks.

Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science

Emmanuel Dupoux, Yann LeCun, Jitendra Malik · March 16, 2026

arxiv theoretical n/a evidence 7/10 relevance Source PDF

A biologically inspired three-part architecture — observation-driven System A, action-driven System B, and meta-control System M — could enable autonomous, sample-efficient lifelong learning that improves transfer, robustness, and adaptability in dynamic real-world environments.

We critically examine the limitations of current AI models in achieving autonomous learning and propose a learning architecture inspired by human and animal cognition. The proposed framework integrates learning from observation (System A) and learning from active behavior (System B) while flexibly switching between these learning modes as a function of internally generated meta-control signals (System M). We discuss how this could be built by taking inspiration on how organisms adapt to real-world, dynamic environments across evolutionary and developmental timescales.

Summary

Main Finding

Current AI models lack the architectures and control mechanisms required for sustained, autonomous learning in dynamic real-world settings. A biologically inspired three-part architecture — System A (learning from observation), System B (learning from active behavior), and System M (internally generated meta-control that flexibly switches between A and B) — can address these limitations. Integrating observation-driven and action-driven learning with meta-control and evolutionary/developmental priors should improve sample efficiency, robustness, transfer, and lifelong adaptation.

Key Points

Limitations of current models
- Heavy reliance on large static datasets and batch training; poor lifelong/continual learning.
- Weak integration between passive observation (supervised/representation learning) and active experimentation (reinforcement/exploratory learning).
- Limited meta-control: models do not autonomously decide when to explore, imitate, consult prior knowledge, or consolidate.
- Poor transfer across domains, brittle in nonstationary environments, and inefficient in physical/embodied tasks.
Proposed architecture
- System A: observation-driven learning (imitation, self-supervised representation learning, inverse RL). Builds models of others, social contingencies, and passive affordances.
- System B: action-driven learning (active exploration, reinforcement learning, hierarchical/skill learning). Learns through intervention, consequences, trial-and-error.
- System M: meta-control signal generator that decides when to prioritize A vs B, how to allocate attention, when to consolidate or query memory. M implements internal criteria (uncertainty, novelty, expected value of information, energy/effort costs) and can be shaped by developmental curricula and evolutionary priors.
- Cross-cutting elements: hierarchical organization, curriculum/bootstrapping, intrinsic motivation (curiosity/empowerment), uncertainty estimation, memory consolidation, and neuromodulatory analogs for gating and plasticity.
Biological inspirations
- Evolution supplies inductive biases and slow structural priors.
- Developmental trajectories scaffold gradual competence (from observation to exploratory action).
- Neuromodulatory systems and meta-decision circuits in animals provide analogies for M.

Data & Methods

Nature of contribution: conceptual/theoretical architecture paper rather than empirical results. Empirical validation should follow via the methods below.
Suggested experimental approaches
- Simulated environments (procedural, nonstationary) for scalable iteration: multi-agent social domains, open-world 3D simulators, long-horizon tasks.
- Embodied robotics experiments for real-world constraints (sample efficiency, physical affordances, motor learning).
- Benchmarks and tasks that require mixing observation and intervention: imitation with sparse feedback, active imitation, transfer under domain shift, continual learning streams.
- Algorithms and mechanism toolkit: imitation/inverse RL, offline RL with uncertainty-aware policy selection, intrinsic reward mechanisms (curiosity, information gain), hierarchical RL, meta-learning (e.g., MAML-style or learned controllers), gating networks and attention for switching, evolutionary algorithms to evolve priors/architectures.
- Evaluation metrics: sample efficiency, generalization across tasks, robustness to distribution shift, autonomy (fraction of learning decisions made internally), transfer speed to novel tasks, lifelong retention (catastrophic forgetting), and safety/constraint adherence.
Recommended empirical methods for testing meta-control
- Ablation studies disabling M or decoupling A and B.
- Manipulating costs/benefits of observation vs acting to probe switching behavior.
- Cross-validation with human developmental data where available.

Implications for AI Economics

Automation and productivity
- More autonomous learners (able to self-experiment and learn from observation) lower the cost of deploying adaptable agents in physical and social tasks, accelerating automation across more occupations — especially those involving embodied, social, or open-ended problem solving.
- Increased sample efficiency and transfer reduce compute/data costs, lowering barriers to entry for firms and broadening the range of feasible AI applications.
Labor markets and skills
- Faster, more generalist embodied AI could substitute for routine physical and social tasks, changing the allocation of human labor toward oversight, high-level planning, creativity, and tasks requiring flexible social cognition.
- Demand for skills in supervising, integrating, and aligning autonomous learners will rise; retraining needs will shift toward meta-control, robotics, and interdisciplinary work.
Market structure and R&D
- Lower data and compute requirements could decentralize innovation, reducing incumbent advantages tied to massive data/compute. However, high complexity of embodied systems and real-world testing environments could create new specialized incumbents (robotics platforms, simulation providers).
- Investment incentives: returns to R&D may favor firms that can combine simulation ecosystems, real-world platforms, and long-term developmental curricula.
Data, environments, and property rights
- Value may shift from passive corpora to rich interaction datasets and simulated/real environments. Ownership and control of simulation platforms and testbeds could become strategically important assets.
- Market for shared benchmarks and open environments becomes economically valuable for lowering coordination costs and facilitating innovation.
Policy and regulation
- Need for policies supporting workforce transitions (retraining, portability of skills) and safety/regulation for embodied agents operating in public spaces.
- Antitrust and innovation policy should account for platform effects (control of simulators, training environments).
- Public investment in open environments, robotics testbeds, and safety research can reduce concentration risks and externalities.
Risks and distributional effects
- Rapid deployment of autonomous learners could accelerate displacement in affected sectors, widening inequality if gains concentrate among capital owners or platform providers.
- Misalignment or poor meta-control could produce persistent unsafe behaviors; governance and oversight mechanisms will be crucial.
- International competition dynamics: nations investing in embodied/autonomous learning tech could gain strategic economic advantages in manufacturing, logistics, and services.
Research & policy recommendations
- Fund embodied and developmental research, open benchmarks, and shared simulation environments to democratize access.
- Develop regulatory standards for testing and certifying autonomous learners in safety-critical domains.
- Support labor-market policies for reskilling and social safety nets tied to anticipated displacement patterns.

If you want, I can convert this into a short slide deck outline, propose concrete experimental designs (tasks, metrics, ablations) to validate the architecture, or map specific economic sectors and labor tasks most likely to be affected.

Assessment

Paper Typetheoretical Evidence Strengthn/a — This is a conceptual/theoretical architecture proposal without empirical tests or causal identification; claims are plausible hypotheses but unvalidated by experiments or observational analysis. Methods Rigorn/a — The paper offers a well-structured conceptual synthesis drawing on biological and AI literatures and proposes concrete experimental directions, but it does not implement, formalize, or empirically evaluate the proposed architecture. SampleNo empirical sample or dataset used; contribution is a conceptual framework synthesizing findings from neuroscience, developmental psychology, and existing ML/RL/IRL literature and proposing experimental paradigms and benchmarks for future validation. Themesproductivity labor_markets innovation adoption human_ai_collab GeneralizabilityClaims are theoretical and not empirically validated, so practical performance and scalability in real-world systems are unknown, Biological analogies (evolutionary/developmental priors, neuromodulation) may not transfer directly to engineered systems, Effectiveness may vary strongly by domain: likely more relevant to embodied, social, and open-world tasks than to narrow perception-only tasks, Hardware and sample-efficiency constraints in real robots could limit applicability relative to simulated environments, Economic implications are contingent on adoption rates, regulatory responses, and complementarities with existing infrastructure and platforms

Claims (30)

Claim	Direction	Confidence	Outcome	Details
Current AI models lack the architectures and control mechanisms required for sustained, autonomous learning in dynamic real-world settings. Research Productivity	negative	medium	ability to sustain autonomous learning in dynamic real-world environments	0.01
A biologically inspired three-part architecture (System A: observation-driven learning; System B: action-driven learning; System M: internally generated meta-control) can address these limitations. Research Productivity	positive	speculative	sample efficiency; robustness; transfer; lifelong adaptation	0.0
Integrating observation-driven and action-driven learning with meta-control and evolutionary/developmental priors should improve sample efficiency, robustness, transfer, and lifelong adaptation. Research Productivity	positive	speculative	sample efficiency; robustness to distribution shift; cross-domain transfer; lifelong retention/adaptation	0.0
Current models heavily rely on large static datasets and batch training and exhibit poor lifelong/continual learning. Research Productivity	negative	high	continual learning performance; dependence on dataset size and batch training	0.02
There is weak integration between passive observation (supervised/representation learning) and active experimentation (reinforcement/exploratory learning) in current systems. Research Productivity	negative	medium	performance on mixed observation-action tasks; ability to combine passive and active learning signals	0.01
Current models have limited meta-control and do not autonomously decide when to explore, imitate, consult prior knowledge, or consolidate. Research Productivity	negative	medium	autonomy in meta-decisions (e.g., fraction of exploration/imitative acts chosen internally)	0.01
Current models transfer poorly across domains, are brittle in nonstationary environments, and are inefficient in physical/embodied tasks. Research Productivity	negative	medium	cross-domain generalization; robustness under nonstationarity; sample efficiency in embodied tasks	0.01
System A (observation-driven learning) should build models of others, social contingencies, and passive affordances through imitation, self-supervised representation learning, and inverse RL. Research Productivity	positive	speculative	quality of models learned from observation; accuracy of inferred social contingencies; representation quality	0.0
System B (action-driven learning) should learn through intervention, consequences, and trial-and-error, using active exploration, reinforcement learning, and hierarchical/skill learning. Research Productivity	positive	speculative	efficacy of skills learned through action (task success rates; learning speed from intervention)	0.0
System M (meta-control) should generate internal signals that decide when to prioritize A vs B, allocate attention, consolidate memory, and trade off uncertainty, novelty, expected information value, and effort costs. Research Productivity	positive	speculative	accuracy/effectiveness of switching decisions; overall learning efficiency when controlled by M	0.0
Cross-cutting elements (hierarchical organization, curriculum/bootstrapping, intrinsic motivation, uncertainty estimation, memory consolidation, neuromodulatory analogs) are important for improving learning in the proposed architecture. Research Productivity	positive	speculative	improvements in sample efficiency, robustness, transfer when these elements are incorporated	0.0
Evolution supplies inductive biases and slow structural priors that can be leveraged in artificial learners. Research Productivity	positive	medium	effect of structural priors on learning speed and generalization	0.01
Developmental trajectories can scaffold gradual competence (from observation to exploratory action) and should be reflected in training curricula. Research Productivity	positive	medium	learning progression speed; final competence given staged curricula	0.01
Neuromodulatory systems and meta-decision circuits in animals provide analogies for implementing meta-control (M) in artificial systems. Research Productivity	positive	medium	effectiveness of biologically inspired gating/plasticity mechanisms on learning outcomes	0.01
This paper is a conceptual/theoretical architecture proposal rather than an empirical study; empirical validation should follow via suggested experiments. Research Productivity	null_result	high	N/A (no empirical outcomes reported)	0.02
Simulated environments (procedural, nonstationary), multi-agent social domains, and open-world 3D simulators are appropriate for scalable iteration to test the proposed architecture. Research Productivity	positive	medium	suitability and scalability of simulation platforms for architecture evaluation	0.01
Embodied robotics experiments are necessary to evaluate real-world constraints such as sample efficiency, physical affordances, and motor learning. Research Productivity	positive	medium	sample efficiency and performance in real-world embodied tasks	0.01
Benchmarks and tasks that mix observation and intervention (imitation with sparse feedback, active imitation, transfer under domain shift, continual learning streams) are required to evaluate the architecture. Research Productivity	positive	medium	benchmark performance on mixed observation-intervention tasks	0.01
Evaluation metrics for the architecture should include sample efficiency, generalization across tasks, robustness to distribution shift, autonomy (fraction of learning decisions made internally), transfer speed, lifelong retention, and safety/constraint adherence. Research Productivity	null_result	high	listed evaluation metrics (sample efficiency; generalization; robustness; autonomy; transfer speed; lifelong retention; safety compliance)	0.02
Ablation studies disabling System M or decoupling Systems A and B will help test whether meta-control provides empirical benefits. Research Productivity	null_result	medium	performance difference with/without M; switching/adaptation behavior	0.01
Manipulating costs and benefits of observation versus action in experiments can probe the switching behavior driven by System M. Research Productivity	null_result	medium	switching thresholds; allocation of observation vs action; resultant task performance	0.01
More autonomous learners that can self-experiment and learn from observation will lower deployment costs for adaptable agents and accelerate automation across more occupations, especially embodied and social tasks. Automation Exposure	positive	speculative	cost of deploying adaptable agents; rate of automation adoption across occupations	0.0
Increased sample efficiency and transfer will reduce compute and data costs, lowering barriers to entry for firms and broadening feasible AI applications. Firm Productivity	positive	speculative	compute/data cost per task; market entry rates for firms	0.0
Faster, more generalist embodied AI could substitute for routine physical and social tasks, shifting human labor toward oversight, high-level planning, creativity, and flexible social cognition roles. Job Displacement	negative	speculative	occupational substitution rates; changes in labor demand composition	0.0
Lower data and compute requirements could decentralize innovation (reducing incumbent advantages tied to massive compute/data), but the complexity of embodied systems and real-world testing could create new specialized incumbents (robotics platforms, simulation providers). Market Structure	mixed	speculative	market concentration metrics; emergence of specialized incumbents; level of decentralization in R&D	0.0
Value in the AI ecosystem may shift from passive text/image corpora toward rich interaction datasets and simulated/real environments; ownership and control of simulation platforms and testbeds could become strategically important assets. Market Structure	positive	speculative	asset valuations for simulation/testbed providers; transaction volumes for interaction datasets	0.0
There is a need for policies supporting workforce transitions (retraining, portability of skills) and safety/regulation for embodied agents operating in public spaces. Governance And Regulation	positive	medium	policy adoption; retraining program coverage; safety/regulatory frameworks implemented	0.01
Rapid deployment of autonomous learners could accelerate displacement in affected sectors and widen inequality if gains concentrate among capital owners or platform providers. Inequality	negative	speculative	displacement rates; inequality measures (e.g., Gini); concentration of gains	0.0
Misalignment or poor meta-control could produce persistent unsafe behaviors in autonomous learners; governance and oversight mechanisms will be crucial. Ai Safety And Ethics	negative	medium	frequency and severity of unsafe behaviors; successful governance interventions	0.01
Public investment in open environments, robotics testbeds, and safety research can reduce concentration risks and externalities and democratize access to embodied AI research. Governance And Regulation	positive	speculative	accessibility of research infrastructure; distribution of research capabilities across institutions	0.0