← Papers

DRL-driven autonomous vehicles raise highway capacity by roughly 7.5% and slash fuel use at high speeds by about 29% compared with a standard car-following model in NGSIM-based simulations. The benefits are strongly dependent on drivers' time-gap distributions and the share of RL-controlled vehicles, and derive from simulated mixed-traffic experiments rather than real-world trials.

Macroscopic Characteristics of Mixed Traffic Flow with Deep Reinforcement Learning Based Automated and Human-Driven Vehicles

Pankaj Kumar, Pranamesh Chakraborty, Subrahmanya Swamy Peruru · March 26, 2026

arxiv descriptive medium evidence 7/10 relevance Source PDF

In highway mixed-traffic simulations using TD3-trained AV controllers on NGSIM data, replacing human drivers with RL-controlled vehicles raises road capacity by about 7.5% and improves average fuel efficiency substantially at higher speeds (≈29% above 50 km/h, ≈1.9% below 50 km/h) versus an IDM baseline, with outcomes sensitive to driver time-gap distributions and RL penetration rates.

Automated Vehicle (AV) control in mixed traffic, where AVs coexist with human-driven vehicles, poses significant challenges in balancing safety, efficiency, comfort, fuel efficiency, and compliance with traffic rules while capturing heterogeneous driver behavior. Traditional car-following models, such as the Intelligent Driver Model (IDM), often struggle to generalize across diverse traffic scenarios and typically do not account for fuel efficiency, motivating the use of learning-based approaches. Although Deep Reinforcement Learning (DRL) has shown strong microscopic performance in car-following conditions, its macroscopic traffic flow characteristics remain underexplored. This study focuses on analyzing the macroscopic traffic flow characteristics and fuel efficiency of DRL-based models in mixed traffic. A Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is implemented for AVs' control and trained using the NGSIM highway dataset, enabling realistic interaction with human-driven vehicles. Traffic performance is evaluated using the Fundamental Diagram (FD) under varying driver heterogeneity, heterogeneous time-gap penetration levels, and different shares of RL-controlled vehicles. A macroscopic level comparison of fuel efficiency between the RL-based AV model and the IDM is also conducted. Results show that traffic performance is sensitive to the distribution of safe time gaps and the proportion of RL vehicles. Transitioning from fully human-driven to fully RL-controlled traffic can increase road capacity by approximately 7.52%. Further, RL-based AVs also improve average fuel efficiency by about 28.98% at higher speeds (above 50 km/h), and by 1.86% at lower speeds (below 50 km/h) compared to the IDM. Overall, the DRL framework enhances traffic capacity and fuel efficiency without compromising safety.

Summary

Main Finding

A TD3 (Twin Delayed DDPG) deep reinforcement learning (DRL) longitudinal controller, trained on the NGSIM highway dataset and evaluated in mixed traffic with human drivers modelled by the IDM, improves macroscopic traffic performance and fuel efficiency. Transitioning from fully human-driven to fully RL-controlled traffic raised road capacity by ~7.52%. RL-based AVs increased average fuel efficiency by ≈28.98% at speeds >50 km/h and by ≈1.86% at speeds <50 km/h, while maintaining safety.

Key Points

Model and training
- DRL algorithm: TD3 (continuous action space). Agent action = longitudinal acceleration in [-4, 2] m/s².
- State vector: ego speed, relative velocity, bumper-to-bumper gap, and a safe time-gap parameter T (used to encode driver heterogeneity).
- Reward: weighted sum targeting safety (TTC + large collision penalty), comfort (jerk penalty), driving efficiency (lognormal headway reward tied to desired spacing S* = S0 + vT), speed-adherence penalty, and fuel-efficiency reward.
Heterogeneity and scenarios
- Driver heterogeneity modelled primarily via distribution of safe time gaps T (aggressive → small T; conservative → large T).
- Experiments vary: (i) distribution of safe time gaps, (ii) heterogeneous time-gap penetration levels, and (iii) share of RL-controlled vehicles in mixed traffic.
Macroscopic evaluation
- Primary aggregate assessment via Fundamental Diagrams (flow–density–speed relationships).
- Sequential vehicle trajectories generated by the RL controller and IDM used to build FDs and compute macroscopic metrics.
- Results are sensitive to safe time-gap distribution and RL penetration; benefits scale with RL share but depend on heterogeneity.
Fuel modelling
- Instantaneous fuel rate estimated using VT-CPFM-1 (Virginia Tech Comprehensive Power-based Fuel Model, Type 1).
- Vehicle parameters: mass = 2000 kg, and standard aerodynamic/rolling resistance coefficients from the VT-CPFM formulation.
Comparative performance vs IDM
- Capacity: full RL fleet → ~7.52% higher road capacity than full human-driven (IDM) baseline.
- Fuel efficiency: RL yields large macroscopic fuel gains at higher speeds (~29% improvement >50 km/h) and modest gains at lower speeds (~1.86% <50 km/h).
- Safety: framework explicitly penalises low TTC and collisions; authors report no safety compromises in results.
Limitations noted by the authors (implicit in methods/results)
- Sensitivity to time-gap distributions and penetration rates — gains are not uniform.
- Experiments performed in car-following / highway contexts using NGSIM-derived interactions; generalisation to multi-lane, network-level, or different geographies requires further testing.

Data & Methods

Data: NGSIM highway trajectory dataset used to train RL agents and emulate realistic human-driver dynamics.
RL architecture and training
- Algorithm: TD3 (an improved actor-critic method over DDPG addressing function approximation noise and overestimation).
- Action: continuous acceleration control; state includes an explicit T parameter to permit heterogeneous driving styles.
- Reward components and weights: combined safety (TTC with threshold + collision penalty), comfort (jerk penalisation), space-headway efficiency (lognormal PDF reward around desired spacing), speed adherence (quadratic penalty for overspeed), and fuel-efficiency (log-transformed distance-per-fuel metric).
Vehicle dynamics and fuel
- Kinematic updates for speed, gap, and relative velocity with simulation time step ∆T.
- Fuel consumption model: VT-CPFM-1 using driveline power P(t) = max(0, (R(t) + 1.04 m a(t)) / (3600 ηd) × v(t)), where R(t) includes aerodynamic drag and rolling resistance; instantaneous fuel rate rFuel = 0.000341 + 0.0000583 P + 0.000001 P^2.
Aggregate measures
- Fundamental Diagrams constructed from simulated trajectories across scenarios to estimate flow, density, and speed regimes.
- Macroscopic fuel-efficiency computed as distance per unit fuel aggregated over traffic streams and compared between RL and IDM scenarios.

Implications for AI Economics

Productivity and congestion externalities
- A ~7.5% capacity increase implies reduced congestion and travel time externalities per unit of roadway, raising network throughput and potentially lowering average delay costs for users.
- Capacity gains are penetration-dependent; partial adoption yields smaller, non-linear benefits, so diffusion dynamics matter for realized economic gains.
Energy and environmental impacts
- Significant fuel savings at higher speeds (~29%) translate into measurable reductions in fuel expenditures and CO2 emissions for highway travel. Even modest savings at lower speeds reduce operating costs at scale.
- Aggregate environmental benefits depend on mileage, penetration, and rebound effects (induced demand): improved travel conditions can increase VMT and partially offset per-vehicle fuel gains.
Distributional and market effects
- Benefits accrue unevenly: early adopters, routes with higher highway speeds, or regions with higher RL penetration capture more gains. Planners and policymakers should anticipate heterogeneity in who benefits.
- Value propositions for vehicle manufacturers and fleet operators: fuel savings + capacity improvements strengthen the business case for deploying DRL-based longitudinal control, particularly for long-haul and fleet contexts.
Regulation, standards, and induced investment
- Safety-aware reward design demonstrates that RL controllers can optimize multiple objectives, but regulators will need performance standards (e.g., minimum TTC behavior, explainability, verifiable tests) before large-scale deployment.
- Infrastructure and policy levers (e.g., incentives for AV adoption, managed lanes, standards for time-gap settings) could accelerate realisation of system-level gains and mitigate negative externalities.
Modelling and policy evaluation
- Macroscopic transport and urban economic models should incorporate RL-driven behavioral primitives (heterogeneous time-gap policies, fuel-response to acceleration profiles) rather than relying solely on classical car-following models (IDM/ACC).
- Cost–benefit analyses of AV deployment must include realistic scaling of effects by penetration, heterogeneity, and potential rebound, and should use empirically-trained RL behaviours (as here) rather than optimistic idealized controllers.
Research and data needs
- Field validation and multi-lane/network experiments are necessary to quantify real-world economic impacts and to evaluate induced demand, safety outcomes, and distributional effects.
- Standardised benchmarks and public datasets (beyond NGSIM) for macroscopic evaluation of learning-based controllers would improve comparability and policy relevance.

Suggested next steps for an economist interested in follow-up work - Quantify welfare gains from the reported capacity and fuel improvements under realistic penetration trajectories and VMT responses. - Model induced demand/rebound to estimate net energy and emissions impacts. - Perform distributional analysis across user groups and geographies to inform targeted deployment or subsidy strategies. - Incorporate RL-controlled AV behavior into city-/network-level transport-economic models to assess infrastructure investment priorities and regulatory design.

Assessment

Paper Typedescriptive Evidence Strengthmedium — The paper provides systematic simulation evidence comparing a TD3-based DRL controller to a canonical car-following model (IDM) using realistic trajectory data (NGSIM) and macroscopic metrics (Fundamental Diagram, fuel estimates). However, evidence is limited to simulated environments trained on one dataset, sensitive to reward design/hyperparameters, and lacks real-world deployment or causal identification from field interventions. Methods Rigormedium — The study uses a state-of-the-art DRL algorithm (TD3), leverages NGSIM empirical trajectories for training, and evaluates macroscopic traffic metrics and fuel consumption across parameter sweeps (driver heterogeneity, penetration rates). Rigor is reduced by likely omitted details (hyperparameter tuning, reward shaping, training stability), potential overfitting to NGSIM scenarios, limited reporting of statistical uncertainty, and simplified assumptions (e.g., vehicle dynamics, lane-change behavior, sensor noise). SampleSimulated mixed-traffic experiments where autonomous vehicles are controlled by a TD3 (Twin Delayed Deep Deterministic Policy Gradient) policy trained on the NGSIM highway trajectory dataset; experiments vary RL-vehicle penetration from 0% to 100% and heterogeneous driver safe time-gap distributions, with traffic performance assessed via the Fundamental Diagram and macroscopic fuel-efficiency estimates; comparison baseline is the Intelligent Driver Model (IDM). Details on sample size, training iterations, and exact fuel-consumption model specification are not provided in the summary. Themesproductivity adoption GeneralizabilityTrained and evaluated on NGSIM (limited geographic locations, specific traffic conditions and times), which may not reflect broader road networks or driving cultures, Results come from simulation not field deployment — real-world sensing, actuation delays, and unpredictable behaviors could reduce gains, Potentially single-lane car-following setup (no or limited lane-change modeling) limits applicability to multi-lane highway and urban settings, Fuel-efficiency estimates depend on the chosen consumption model and may not capture engine/transmission heterogeneity or real-world eco-driving tradeoffs, Performance sensitive to reward specification, hyperparameters, and training regime — transferability to other traffic mixes or vehicle types is uncertain, Baseline (IDM) calibration details matter; if IDM is not well-calibrated to the same data, relative gains may be overstated

Claims (9)

Claim	Direction	Confidence	Outcome	Details
Traditional car-following models, such as the Intelligent Driver Model (IDM), often struggle to generalize across diverse traffic scenarios and typically do not account for fuel efficiency. Other	negative	high	model generalizability and accounting for fuel efficiency	0.18
Deep Reinforcement Learning (DRL) has shown strong microscopic performance in car-following conditions, but its macroscopic traffic flow characteristics remain underexplored. Other	null_result	high	extent of prior research on macroscopic traffic flow characteristics for DRL models	0.18
This study implements a Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm to control AVs and trains it using the NGSIM highway dataset to enable realistic interaction with human-driven vehicles. Other	positive	high	method used for AV control (TD3 trained on NGSIM)	0.18
Traffic performance is evaluated using the Fundamental Diagram (FD) under varying driver heterogeneity, heterogeneous time-gap penetration levels, and different shares of RL-controlled vehicles. Organizational Efficiency	neutral	high	traffic performance (via Fundamental Diagram) under varied heterogeneity and RL penetration	0.18
Traffic performance is sensitive to the distribution of safe time gaps and the proportion of RL vehicles. Organizational Efficiency	mixed	high	traffic performance (e.g., flow, capacity) sensitivity to time-gap distribution and RL vehicle proportion	0.18
Transitioning from fully human-driven to fully RL-controlled traffic can increase road capacity by approximately 7.52%. Organizational Efficiency	positive	high	road capacity	7.52% increase 0.18
RL-based AVs improve average fuel efficiency by about 28.98% at higher speeds (above 50 km/h) compared to the IDM. Organizational Efficiency	positive	high	average fuel efficiency at speeds > 50 km/h	28.98% increase 0.18
RL-based AVs improve average fuel efficiency by about 1.86% at lower speeds (below 50 km/h) compared to the IDM. Organizational Efficiency	positive	high	average fuel efficiency at speeds < 50 km/h	1.86% increase 0.18
Overall, the DRL framework enhances traffic capacity and fuel efficiency without compromising safety. Organizational Efficiency	positive	medium	traffic capacity, fuel efficiency, and safety	0.11