Telling AI exactly how to act — not vaguely appealing to 'be green' — cuts the energy footprint of a GenAI research workflow substantially without changing results. The study shows operational constraints and decision-rule prompts are a practical human-in-the-loop lever to align GenAI productivity with environmental efficiency.

On the Carbon Footprint of Economic Research in the Age of Generative AI

Andrés Alonso-Robisco, Carlos Esparcia, Francisco Jareño · Fetched April 05, 2026

semantic_scholar descriptive medium evidence 7/10 relevance Source

Prompting GenAI with operational constraints or decision rules substantially reduces runtime and estimated CO2e in a GenAI-assisted literature-mapping workflow while preserving decision-equivalent topic outputs, whereas generic 'green' language has no reliable effect.

Generative artificial intelligence (AI) is increasingly used to write and refactor research code, expanding computational workflows. At the same time, Green AI research has largely measured the footprint of models rather than the downstream workflows in which GenAI is a tool. We shift the unit of analysis from models to workflows and treat prompts as decision policies that allocate discretion between researcher and system, governing what is executed and when iteration stops. We contribute in two ways. First, we map the recent Green AI literature into seven themes: training footprint is the largest cluster, while inference efficiency and system level optimisation are growing rapidly, alongside measurement protocols, green algorithms, governance, and security and efficiency trade-offs. Second, we benchmark a modern economic survey workflow, an LDA-based literature mapping implemented with GenAI assisted coding and executed in a fixed cloud notebook, measuring runtime and estimated CO2e with CodeCarbon. Injecting generic green language into prompts has no reliable effect, whereas operational constraints and decision rule prompts deliver large and stable footprint reductions while preserving decision equivalent topic outputs. The results identify human in the loop governance as a practical lever to align GenAI productivity with environmental efficiency.

Summary

Main Finding

Treating generative-AI (GenAI) prompts as decision policies and shifting the unit of analysis from models to researcher workflows reveals that prompt-level governance (operational constraints and decision rules) can substantially reduce runtime and CO2e of GenAI-assisted computational research while preserving decision-equivalent outputs. Generic “green” wording in prompts does not reliably reduce footprint; explicit operational constraints and stopping/decision rules do.

Key Points

Unit of analysis shift: move from measuring model footprints to measuring downstream workflows in which models are tools; prompts are conceptualized as decision policies that allocate discretion between researcher and system and determine execution and stopping.
Literature mapping: Green AI literature clusters into seven themes:
Training footprint (largest cluster)
Inference efficiency (rapidly growing)
System-level optimisation (rapidly growing)
Measurement protocols
Green algorithms
Governance
Security and efficiency trade-offs
Empirical benchmark: a modern economic-survey workflow (LDA-based literature mapping) implemented with GenAI-assisted coding in a fixed cloud notebook.
Measurement: runtime and estimated CO2e were measured with CodeCarbon.
Prompt interventions tested:
- Adding generic green language to prompts (no reliable effect).
- Adding operational constraints (e.g., resource/time limits) and explicit decision/stopping rules (large, stable reductions).
Output quality: constrained/decision-rule prompts maintained decision-equivalent topic outputs (i.e., substantive outputs useful for the same research decisions).
Practical leverage: human-in-the-loop governance—designing prompts as part of workflow governance—emerges as an effective, low-friction intervention to align productivity and environmental efficiency.

Data & Methods

Use case: an LDA-based literature mapping workflow typical for an economic survey, executed in a single, fixed cloud notebook to control environment.
GenAI assistance: used for coding and workflow automation (prompt-driven interactions).
Interventions: compared baseline prompts, prompts with added generic “green” wording, and prompts that embed explicit operational constraints and decision/stopping rules.
Metrics:
- Runtime (execution time in the cloud notebook).
- Estimated CO2e via CodeCarbon (emissions estimator tied to cloud compute usage).
- Topic-output equivalence assessed to determine whether reduced-run workflows produced substantively equivalent LDA topic maps.
Analysis: benchmarked footprints across prompt conditions to quantify effect sizes and stability of reductions.

Implications for AI Economics

Measurement focus: economic analysis and regulation of AI should expand beyond model-level footprints to include downstream workflows, researcher behavior, and prompt design; externality accounting must incorporate workflow-level emissions.
Policy and incentive design: low-cost emissions reductions are available through governance of researcher–system interactions (prompts). Funders, institutions, and journals can mandate or incentivize operational constraints and stopping rules as part of reproducibility/efficiency standards.
Cost-benefit and principal–agent considerations: prompts as decision policies create a mechanism design problem—researchers’ incentives, tool providers’ defaults, and cloud pricing interact. Policies (subsidies, standards, billing transparency) can reshape these incentives toward efficiency.
Measurement standards: adopt workflow-level carbon accounting tools (e.g., CodeCarbon or similar) and standardize reporting for GenAI-assisted research workflows.
Adoption and productivity: because operational constraints preserved decision-equivalent outputs, efficiency interventions need not trade off research quality—this lowers the political/economic cost of adopting green governance.
Research agenda: study heterogeneous effects across tasks, model sizes, and cloud environments; quantify long-run behavioral responses (e.g., more runs due to lower per-run cost); integrate prompt-policy interventions into formal models of research production and externalities.

Assessment

Paper Typedescriptive Evidence Strengthmedium — The study implements controlled prompt interventions and direct measurements of runtime and CO2e estimates, providing clear within-workflow causal evidence that prompt design affects footprint; however evidence is limited to a single LDA-based survey workflow, relies on CodeCarbon estimates rather than direct energy/utility-meter measurements, and may not generalize across models, cloud providers, or production pipelines. Methods Rigormedium — The paper systematically maps prior literature and runs a repeatable benchmarking protocol with explicit prompt conditions and quantitative emission estimates, but it appears confined to one workflow and environment, lacks broader cross-platform or large-sample replication, and depends on proxy emission measures and unspecified details about randomization/number of runs. SampleA modern economic survey workflow: an LDA-based literature-mapping pipeline implemented in a single fixed cloud notebook with GenAI-assisted coding; experiments compare multiple prompt variants (generic green language, operational constraints, decision-rule prompts) across runs and measure runtime and estimated CO2e via CodeCarbon; also includes a mapped corpus of recent Green AI literature organized into seven themes. Themeshuman_ai_collab productivity governance IdentificationControlled within-notebook interventions on prompt text (generic 'green' language vs operational constraints vs decision-rule prompts) with repeated runs measuring runtime and estimated CO2e using CodeCarbon; output equivalence assessed via decision-equivalent topic comparisons (LDA topic outputs). GeneralizabilitySingle workflow: results derived from one LDA-based literature-mapping pipeline and may not transfer to other analysis types (e.g., large-scale training, inference-heavy services)., Single execution environment: findings depend on the specific cloud provider, VM/GPU hardware, and notebook configuration used., Model/API specificity: effects may vary with different GenAI models, model sizes, or API implementations., Emission measurement limits: CodeCarbon provides estimates rather than direct metered energy/CO2 readings, introducing measurement uncertainty., Scale and production: experiments on research notebooks may not reflect behavior in productionized, parallelized, or long-running pipelines.

Claims (9)

Claim	Direction	Confidence	Outcome	Details
Green AI research has largely measured the footprint of models rather than the downstream workflows in which GenAI is a tool. Research Productivity	negative	high	scope/emphasis of Green AI research (model-level vs. workflow-level measurement)	0.18
We map the recent Green AI literature into seven themes: training footprint is the largest cluster, while inference efficiency and system level optimisation are growing rapidly, alongside measurement protocols, green algorithms, governance, and security and efficiency trade-offs. Research Productivity	positive	high	distribution of themes within Green AI literature (theme prevalence and growth)	0.18
Training footprint is the largest cluster in the mapped Green AI literature. Research Productivity	positive	high	relative prevalence (cluster size) of 'training footprint' theme	0.18
Inference efficiency and system level optimisation are growing rapidly in the Green AI literature. Research Productivity	positive	medium	growth of specific research themes (inference efficiency, system-level optimisation)	0.11
We benchmark a modern economic survey workflow, an LDA-based literature mapping implemented with GenAI assisted coding and executed in a fixed cloud notebook, measuring runtime and estimated CO2e with CodeCarbon. Task Completion Time	positive	high	runtime and estimated CO2e (carbon footprint) of the benchmarked workflow	n=1 0.18
Injecting generic green language into prompts has no reliable effect. Task Completion Time	null_result	high	carbon footprint / runtime of the workflow under 'green language' prompts	no reliable effect 0.18
Operational constraints and decision rule prompts deliver large and stable footprint reductions while preserving decision equivalent topic outputs. Organizational Efficiency	positive	high	carbon footprint / runtime reductions and preservation of topic output equivalence	large and stable footprint reductions (no numeric value reported) 0.18
Human-in-the-loop governance is a practical lever to align GenAI productivity with environmental efficiency. Governance And Regulation	positive	medium	alignment between GenAI-assisted productivity and environmental efficiency via governance interventions	0.02
Prompts can be treated as decision policies that allocate discretion between researcher and system, governing what is executed and when iteration stops. Other	positive	high	conceptualization of prompts' role in workflow control and decision allocation	0.03