DRL-driven autonomous vehicles raise highway capacity by roughly 7.5% and slash fuel use at high speeds by about 29% compared with a standard car-following model in NGSIM-based simulations. The benefits are strongly dependent on drivers' time-gap distributions and the share of RL-controlled vehicles, and derive from simulated mixed-traffic experiments rather than real-world trials.
Automated Vehicle (AV) control in mixed traffic, where AVs coexist with human-driven vehicles, poses significant challenges in balancing safety, efficiency, comfort, fuel efficiency, and compliance with traffic rules while capturing heterogeneous driver behavior. Traditional car-following models, such as the Intelligent Driver Model (IDM), often struggle to generalize across diverse traffic scenarios and typically do not account for fuel efficiency, motivating the use of learning-based approaches. Although Deep Reinforcement Learning (DRL) has shown strong microscopic performance in car-following conditions, its macroscopic traffic flow characteristics remain underexplored. This study focuses on analyzing the macroscopic traffic flow characteristics and fuel efficiency of DRL-based models in mixed traffic. A Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is implemented for AVs' control and trained using the NGSIM highway dataset, enabling realistic interaction with human-driven vehicles. Traffic performance is evaluated using the Fundamental Diagram (FD) under varying driver heterogeneity, heterogeneous time-gap penetration levels, and different shares of RL-controlled vehicles. A macroscopic level comparison of fuel efficiency between the RL-based AV model and the IDM is also conducted. Results show that traffic performance is sensitive to the distribution of safe time gaps and the proportion of RL vehicles. Transitioning from fully human-driven to fully RL-controlled traffic can increase road capacity by approximately 7.52%. Further, RL-based AVs also improve average fuel efficiency by about 28.98% at higher speeds (above 50 km/h), and by 1.86% at lower speeds (below 50 km/h) compared to the IDM. Overall, the DRL framework enhances traffic capacity and fuel efficiency without compromising safety.
Summary
Main Finding
A TD3 (Twin Delayed DDPG) deep reinforcement learning (DRL) longitudinal controller, trained on the NGSIM highway dataset and evaluated in mixed traffic with human drivers modelled by the IDM, improves macroscopic traffic performance and fuel efficiency. Transitioning from fully human-driven to fully RL-controlled traffic raised road capacity by ~7.52%. RL-based AVs increased average fuel efficiency by ≈28.98% at speeds >50 km/h and by ≈1.86% at speeds <50 km/h, while maintaining safety.
Key Points
- Model and training
- DRL algorithm: TD3 (continuous action space). Agent action = longitudinal acceleration in [-4, 2] m/s².
- State vector: ego speed, relative velocity, bumper-to-bumper gap, and a safe time-gap parameter T (used to encode driver heterogeneity).
- Reward: weighted sum targeting safety (TTC + large collision penalty), comfort (jerk penalty), driving efficiency (lognormal headway reward tied to desired spacing S* = S0 + vT), speed-adherence penalty, and fuel-efficiency reward.
- Heterogeneity and scenarios
- Driver heterogeneity modelled primarily via distribution of safe time gaps T (aggressive → small T; conservative → large T).
- Experiments vary: (i) distribution of safe time gaps, (ii) heterogeneous time-gap penetration levels, and (iii) share of RL-controlled vehicles in mixed traffic.
- Macroscopic evaluation
- Primary aggregate assessment via Fundamental Diagrams (flow–density–speed relationships).
- Sequential vehicle trajectories generated by the RL controller and IDM used to build FDs and compute macroscopic metrics.
- Results are sensitive to safe time-gap distribution and RL penetration; benefits scale with RL share but depend on heterogeneity.
- Fuel modelling
- Instantaneous fuel rate estimated using VT-CPFM-1 (Virginia Tech Comprehensive Power-based Fuel Model, Type 1).
- Vehicle parameters: mass = 2000 kg, and standard aerodynamic/rolling resistance coefficients from the VT-CPFM formulation.
- Comparative performance vs IDM
- Capacity: full RL fleet → ~7.52% higher road capacity than full human-driven (IDM) baseline.
- Fuel efficiency: RL yields large macroscopic fuel gains at higher speeds (~29% improvement >50 km/h) and modest gains at lower speeds (~1.86% <50 km/h).
- Safety: framework explicitly penalises low TTC and collisions; authors report no safety compromises in results.
- Limitations noted by the authors (implicit in methods/results)
- Sensitivity to time-gap distributions and penetration rates — gains are not uniform.
- Experiments performed in car-following / highway contexts using NGSIM-derived interactions; generalisation to multi-lane, network-level, or different geographies requires further testing.
Data & Methods
- Data: NGSIM highway trajectory dataset used to train RL agents and emulate realistic human-driver dynamics.
- RL architecture and training
- Algorithm: TD3 (an improved actor-critic method over DDPG addressing function approximation noise and overestimation).
- Action: continuous acceleration control; state includes an explicit T parameter to permit heterogeneous driving styles.
- Reward components and weights: combined safety (TTC with threshold + collision penalty), comfort (jerk penalisation), space-headway efficiency (lognormal PDF reward around desired spacing), speed adherence (quadratic penalty for overspeed), and fuel-efficiency (log-transformed distance-per-fuel metric).
- Vehicle dynamics and fuel
- Kinematic updates for speed, gap, and relative velocity with simulation time step ∆T.
- Fuel consumption model: VT-CPFM-1 using driveline power P(t) = max(0, (R(t) + 1.04 m a(t)) / (3600 ηd) × v(t)), where R(t) includes aerodynamic drag and rolling resistance; instantaneous fuel rate rFuel = 0.000341 + 0.0000583 P + 0.000001 P^2.
- Aggregate measures
- Fundamental Diagrams constructed from simulated trajectories across scenarios to estimate flow, density, and speed regimes.
- Macroscopic fuel-efficiency computed as distance per unit fuel aggregated over traffic streams and compared between RL and IDM scenarios.
Implications for AI Economics
- Productivity and congestion externalities
- A ~7.5% capacity increase implies reduced congestion and travel time externalities per unit of roadway, raising network throughput and potentially lowering average delay costs for users.
- Capacity gains are penetration-dependent; partial adoption yields smaller, non-linear benefits, so diffusion dynamics matter for realized economic gains.
- Energy and environmental impacts
- Significant fuel savings at higher speeds (~29%) translate into measurable reductions in fuel expenditures and CO2 emissions for highway travel. Even modest savings at lower speeds reduce operating costs at scale.
- Aggregate environmental benefits depend on mileage, penetration, and rebound effects (induced demand): improved travel conditions can increase VMT and partially offset per-vehicle fuel gains.
- Distributional and market effects
- Benefits accrue unevenly: early adopters, routes with higher highway speeds, or regions with higher RL penetration capture more gains. Planners and policymakers should anticipate heterogeneity in who benefits.
- Value propositions for vehicle manufacturers and fleet operators: fuel savings + capacity improvements strengthen the business case for deploying DRL-based longitudinal control, particularly for long-haul and fleet contexts.
- Regulation, standards, and induced investment
- Safety-aware reward design demonstrates that RL controllers can optimize multiple objectives, but regulators will need performance standards (e.g., minimum TTC behavior, explainability, verifiable tests) before large-scale deployment.
- Infrastructure and policy levers (e.g., incentives for AV adoption, managed lanes, standards for time-gap settings) could accelerate realisation of system-level gains and mitigate negative externalities.
- Modelling and policy evaluation
- Macroscopic transport and urban economic models should incorporate RL-driven behavioral primitives (heterogeneous time-gap policies, fuel-response to acceleration profiles) rather than relying solely on classical car-following models (IDM/ACC).
- Cost–benefit analyses of AV deployment must include realistic scaling of effects by penetration, heterogeneity, and potential rebound, and should use empirically-trained RL behaviours (as here) rather than optimistic idealized controllers.
- Research and data needs
- Field validation and multi-lane/network experiments are necessary to quantify real-world economic impacts and to evaluate induced demand, safety outcomes, and distributional effects.
- Standardised benchmarks and public datasets (beyond NGSIM) for macroscopic evaluation of learning-based controllers would improve comparability and policy relevance.
Suggested next steps for an economist interested in follow-up work - Quantify welfare gains from the reported capacity and fuel improvements under realistic penetration trajectories and VMT responses. - Model induced demand/rebound to estimate net energy and emissions impacts. - Perform distributional analysis across user groups and geographies to inform targeted deployment or subsidy strategies. - Incorporate RL-controlled AV behavior into city-/network-level transport-economic models to assess infrastructure investment priorities and regulatory design.
Assessment
Claims (9)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Traditional car-following models, such as the Intelligent Driver Model (IDM), often struggle to generalize across diverse traffic scenarios and typically do not account for fuel efficiency. Other | negative | high | model generalizability and accounting for fuel efficiency |
0.18
|
| Deep Reinforcement Learning (DRL) has shown strong microscopic performance in car-following conditions, but its macroscopic traffic flow characteristics remain underexplored. Other | null_result | high | extent of prior research on macroscopic traffic flow characteristics for DRL models |
0.18
|
| This study implements a Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm to control AVs and trains it using the NGSIM highway dataset to enable realistic interaction with human-driven vehicles. Other | positive | high | method used for AV control (TD3 trained on NGSIM) |
0.18
|
| Traffic performance is evaluated using the Fundamental Diagram (FD) under varying driver heterogeneity, heterogeneous time-gap penetration levels, and different shares of RL-controlled vehicles. Organizational Efficiency | neutral | high | traffic performance (via Fundamental Diagram) under varied heterogeneity and RL penetration |
0.18
|
| Traffic performance is sensitive to the distribution of safe time gaps and the proportion of RL vehicles. Organizational Efficiency | mixed | high | traffic performance (e.g., flow, capacity) sensitivity to time-gap distribution and RL vehicle proportion |
0.18
|
| Transitioning from fully human-driven to fully RL-controlled traffic can increase road capacity by approximately 7.52%. Organizational Efficiency | positive | high | road capacity |
7.52% increase
0.18
|
| RL-based AVs improve average fuel efficiency by about 28.98% at higher speeds (above 50 km/h) compared to the IDM. Organizational Efficiency | positive | high | average fuel efficiency at speeds > 50 km/h |
28.98% increase
0.18
|
| RL-based AVs improve average fuel efficiency by about 1.86% at lower speeds (below 50 km/h) compared to the IDM. Organizational Efficiency | positive | high | average fuel efficiency at speeds < 50 km/h |
1.86% increase
0.18
|
| Overall, the DRL framework enhances traffic capacity and fuel efficiency without compromising safety. Organizational Efficiency | positive | medium | traffic capacity, fuel efficiency, and safety |
0.11
|