AI-generated reward functions for building controllers improve modeled comfort equity and cut energy use by 3.2%, with the largest comfort gains for elderly females after iterative refinement; however, results are simulation-based and demographic gaps remain.
Large language models (LLMs) have demonstrated promising capability in generating reward functions for deep reinforcement learning (DRL)-based building energy management. However, their potential to exhibit or exacerbate disparities in occupant comfort across heterogeneous demographic populations remains unexplored. We present OccuReward, a framework investigating how LLM-mediated reward design affects demographic equity. Our contribution is three-fold: the introduction of the Comfort Equity Index (CEI) as a novel feedback signal; a methodology for iterative, equity-aware LLM reward shaping; and a performance analysis of DRL agents under these refined objectives. Utilizing four empirically grounded occupant profiles from the ASHRAE Global Thermal Comfort Database II (13,440 votes), we deploy a Soft Actor-Critic agent in CityLearn v2. Our approach employs the Gemini API to generate reward function logic and weights--rather than performing per-step inference--across three refinement rounds. Results across 15 experimental runs reveal that elderly female occupants consistently experience the lowest satisfaction in initial rounds. By Round 3, equity-aware LLM refinement activates specific reward components that improve satisfaction for Young Males (+17.6%), Mid-aged Females (+28.2%), Health Sensitive (+53.8%), and Elderly Females (+567%), while simultaneously reducing energy costs by 3.2%. Our findings highlight that while reward-level intervention significantly improves equity, demographic disparities in AI-driven controllers persist, necessitating further research into algorithmic fairness in building systems.
Summary
Main Finding
LLM-guided, iterative reward shaping (OccuReward) can materially improve demographic equity in DRL-based building energy control while also modestly reducing energy cost. By introducing a Comfort Equity Index (CEI) and feeding per-profile satisfaction back to an LLM (Gemini 1.5), the authors re-weighted reward components (notably Solar and SoC utilization) and raised all four empirically grounded occupant profiles above a 0.5 satisfaction threshold. The largest gain was for Elderly Females (satisfaction 0.12 → 0.80, +567%), and overall energy cost fell by 3.2% in the equity-aware round. However, residual disparities remain due to structural environmental limits (e.g., HVAC/setpoint constraints), indicating reward-level intervention alone is insufficient for full parity.
Key Points
- Contributions
- Introduces the Comfort Equity Index (CEI), an inequity metric derived as 1 − Jain’s fairness index applied to per-profile comfort satisfaction.
- Proposes an iterative LLM-in-the-loop reward-shaping method that uses CEI and per-profile feedback to refine DRL reward functions.
- Empirically analyzes distributional outcomes of LLM-generated rewards in CityLearn v2 and documents both successes and limits.
- Quantitative outcomes (Round 1 → Round 3)
- Elderly Female satisfaction: 0.12 → 0.80 (+567%)
- Young Male: 0.85 → 1.00 (+17.6%)
- Mid-aged Female: 0.78 → 1.00 (+28.2%)
- Health Sensitive: 0.65 → 1.00 (+53.8%)
- Energy cost: normalized metric improved by 3.2% after equity-aware refinement.
- CEI: 0.19 (Rounds 1–2) → 0.0082 (Round 3), i.e., much lower inequity.
- Mechanism of improvement
- The LLM rebalanced reward weights (increasing Solar/SoC incentives), enabling the agent to leverage local generation/storage to meet tighter thermal demands without a pure cost trade-off.
- Persistent limits
- Structural constraints of the simulation (ambient temperature ranges, HVAC capacity, lack of setpoint control) prevent full parity—reward shaping hit a “comfort ceiling.”
- Robustness & scope
- Experiments used five random seeds per round (five seeds × three rounds = 15 runs). Results are specific to CityLearn Phase 1 schema and one LLM (Gemini 1.5); small sample size for the Elderly Female profile (n=298) is noted.
Data & Methods
- Occupant data
- Source: ASHRAE Global Thermal Comfort Database II.
- Filtered records: 13,440 valid votes (after filtering age, sex, comfort).
- Four empirical profiles constructed with preferred temperature ranges from IQR of comfortable records:
- Young Male (18–35, n=4,503): 23.6–27.7°C, flexibility ±2.0°C
- Elderly Female (65–95, n=298): 21.3–23.8°C, flexibility ±1.0°C
- Mid-aged Female (40–55, n=1,504): 21.9–26.2°C, flexibility ±1.5°C
- Health Sensitive (45–60, n=3,798): 22.1–27.9°C, flexibility ±0.5°C
- Comfort satisfaction function
- Profile satisfaction S_i(T) = 1.0 if T ∈ preferred range; otherwise decreases with distance from range boundary scaled by a profile-specific flexibility parameter.
- Comfort Equity Index (CEI)
- CEI = 1 − J, where J is Jain’s fairness index applied to the vector of per-profile satisfaction scores. CEI ∈ [0,1], with 0 = perfect equity.
- Simulation & agent
- Environment: CityLearn v2.1.2, 5-building residential district (2022 Phase 1).
- Agent: Soft Actor-Critic (Stable-Baselines3), MLP 2×256, lr = 3e-4, batch = 256, trained 50,000 timesteps.
- Baseline: Rule-Based Controller (RBC) with fixed seasonal setpoints; energy KPIs normalized to RBC.
- LLM-in-the-loop reward shaping
- LLM: Gemini 1.5 Flash API. LLM generates Python reward logic and absolute coefficients (reward evaluated offline during training; no per-step calls).
- Reward form: R = −Σ_j w_j · d_KPI_j (weighted sum of normalized KPIs).
- Three rounds:
- Round 1: LLM given energy objectives only (equity_weight = 0).
- Round 2: LLM refines for energy KPIs only (naive refinement).
- Round 3: Equity-aware LLM prompt includes energy KPIs + CEI + per-profile satisfaction; LLM permitted to include equity weight (equity_weight ≈ 0.15).
- Evaluation
- Five random seeds per round; reported means ± std. Metrics: per-profile satisfaction, CEI, normalized energy cost.
Implications for AI Economics
- Distributional externalities matter in building AI
- Aggregate energy/cost optimization can generate uneven welfare outcomes across demographic groups. Economic evaluations of AI controllers should include distributional metrics (like CEI) in cost–benefit analyses.
- LLMs can discover Pareto-improving trade-offs
- The study shows an instance where integrating equity feedback led to both better fairness and modest cost savings (3.2%). This suggests market value for equity-aware controllers: improved social welfare with limited or even negative marginal cost.
- Policy and procurement
- Regulators and building owners should require algorithmic audits for distributional impacts and include occupant-equity KPIs (CEI or similar) in procurement/specification for intelligent BMS systems.
- Product and market opportunities
- There is a potential commercial niche for “equity-aware” control products and services (LLM-assisted reward engineering + setpoint-control integration) that can be marketed on welfare and compliance grounds.
- Limits and further investment needs
- Reward-level interventions can’t fully substitute for physical or architectural control capabilities (e.g., zone-specific setpoint authority). Economists estimating benefits should account for required capital/retrofit investment to unlock full equity gains.
- Research and evaluation recommendations
- Broader validation needed: multi-LLM comparisons, diverse environments, longer training and deployment horizons, and richer occupant sampling (address small-n subgroups).
- Incorporate CEI (or comparable distributional metrics) into welfare-weighted social welfare functions when valuing energy-efficiency interventions.
- Cautions: Goodhart’s law and robustness
- Reward engineering is susceptible to reward hacking; continuous monitoring and multi-metric governance are necessary to avoid gaming and unintended regressions in other KPIs.
Summary recommendation for AI economists: incorporate distributional measures like CEI into the economic appraisal of control AI, evaluate both reward-level and system-level interventions (including capital upgrades for setpoint control), and account for the non-negligible social welfare gains that equity-aware optimization can unlock alongside modest cost improvements.
Assessment
Claims (12)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We introduce the Comfort Equity Index (CEI) as a novel feedback signal. Inequality | positive | high | Comfort Equity Index (CEI) |
0.48
|
| We use four empirically grounded occupant profiles from the ASHRAE Global Thermal Comfort Database II (13,440 votes). Other | null_result | high | occupant profile representation (number of votes in dataset) |
n=13440
0.8
|
| We deploy a Soft Actor-Critic (SAC) agent in CityLearn v2 for experiments. Other | null_result | high | DRL agent deployment (method) |
0.8
|
| We employ the Gemini API to generate reward function logic and weights across three refinement rounds rather than performing per-step inference. Other | null_result | high | LLM-mediated reward generation method |
n=3
0.8
|
| Results across 15 experimental runs reveal that elderly female occupants consistently experience the lowest satisfaction in initial rounds. Consumer Welfare | negative | high | occupant satisfaction (per demographic group) |
n=15
0.48
|
| By Round 3, equity-aware LLM refinement improves satisfaction for Young Males (+17.6%). Consumer Welfare | positive | high | satisfaction for Young Males |
n=15
+17.6%
0.48
|
| By Round 3, equity-aware LLM refinement improves satisfaction for Mid-aged Females (+28.2%). Consumer Welfare | positive | high | satisfaction for Mid-aged Females |
n=15
+28.2%
0.48
|
| By Round 3, equity-aware LLM refinement improves satisfaction for Health Sensitive (+53.8%). Consumer Welfare | positive | high | satisfaction for Health Sensitive occupants |
n=15
+53.8%
0.48
|
| By Round 3, equity-aware LLM refinement improves satisfaction for Elderly Females (+567%). Consumer Welfare | positive | high | satisfaction for Elderly Females |
n=15
+567%
0.48
|
| By Round 3, equity-aware LLM refinement reduces energy costs by 3.2%. Organizational Efficiency | positive | high | energy costs |
n=15
3.2%
0.48
|
| Reward-level intervention (via equity-aware LLM refinement) significantly improves equity, but demographic disparities in AI-driven controllers persist. Inequality | mixed | high | equity in occupant comfort across demographic groups |
n=15
0.48
|
| LLM-mediated reward design can affect demographic equity in occupant comfort (i.e., LLM reward shaping has the potential to exhibit or exacerbate disparities). Inequality | mixed | medium | demographic equity in occupant comfort |
n=15
0.29
|