Differentiable AI weather models can produce gradient-based payment signals that recover near-optimal sensor value and support monotonic rewards for participatory IoT networks; however, these signals are computationally vulnerable to adversarial input inflation and need external baselines for reliable detection.

Calibrating Attribution Proxies for Reward Allocation in Participatory Weather Sensing

Mark C. Ballandies, Michael T. C. Chiu, Claudio J. Tessone · April 30, 2026

arxiv descriptive medium evidence 7/10 relevance Source PDF

Gradient-based attribution from differentiable weather models provides a computationally validated value signal that closely matches near-optimal sensor placement and yields monotonic payment incentives, but can be inflated by adversarial inputs and requires external baselines for robust detection.

Large-scale IoT weather sensing networks require incentive mechanisms to sustain participation, yet determining how much value individual data contributions bring to the network remains an open problem. Existing approaches address data quality but not data valuation; in operational meteorology, adjoint-based methods derive value from the forecast model itself but require full data assimilation infrastructure. We propose to utilise differentiable AI weather models to fill this gap and characterise gradient-based attribution on gridded GFS analysis inputs as a candidate value signal, evaluating fidelity, calibration, cost, and gaming vulnerability across more than 400 configurations. Attribution captures near-optimal sensor placement utility with monotonically faithful payments, but can be inflated by adversarial inputs, with detection requiring external baseline data. These findings establish gradient attribution as a computationally validated signal for model-informed reward allocation in participatory weather sensing.

Summary

Main Finding

Gradient-based attribution from differentiable AI weather models is a viable, computationally practical proxy for valuing individual data contributions in participatory weather sensing. Attribution (especially input-scaled variants) yields near-optimal sensor placement and monotonic, budget-balanced payments, but is model- and variable-dependent and vulnerable to certain adversarial inputs unless complemented by baseline monitoring and economic integrity mechanisms.

Key Points

Proposed signal: use gradient-based attribution from differentiable AI weather models (FourCastNet, SFNO) computed at forecast time as a value/informativeness proxy for station-level reward allocation.
Attribution proxies compared:
- Integrated Gradients (IG): path-integrated, baseline = climatology.
- Gradient × Input (GTI): single backward pass, (xi − x̄i)·∂F/∂xi.
- Vanilla Gradient (VG): ∂F/∂xi (no input scaling).
Fidelity and method comparison:
- IG and GTI correlate positively with ablation-based utility; VG performs poorly (often anti-correlated).
- Global variable ranking: SFNO mean Spearman ρ ≈ 0.655 vs FCN ≈ 0.385.
- Spatial attribution: IG Spearman ρ (scale/patch 5) ≈ 0.362 (FCN) and 0.252 (SFNO); top-5 spatial overlap often 72–77%.
- GTI retains ~83% of IG’s fidelity at ~1/50th the computational cost; IG path integration adds modest but consistent gains.
Sensor placement and payments:
- Attribution-based selection achieves near-oracle utility (≥92% of oracle in placement) and high overlap with the oracle top stations.
- Payments proportional to |attribution| are budget-balanced and exhibit monotonic calibration (stations binned by proxy show monotonically increasing mean ablation utility).
- Overpayment (total misallocation) reduced relative to simple baselines: attribution overpayment 33–36% vs distance 47–55% vs uniform 61–72%.
- Payment shares are relatively stable across forecast cycles (per-station bootstrap CI shrinkage ~15–23%).
Gaming and robustness:
- Systematic anomaly-inflation attacks (varying magnitude, proximity, and variable scope) were simulated (≈3,700 scenarios).
- Baseline-dependent detectors (comparing current scores to station baselines) detect inflation reliably (e.g., SFNO: 100% top-5 hit rate in experiments), but baseline-free detectors largely fail.
- Data-fabrication (replacing inputs with climatological mean) can evade attribution-based detectors; economic defenses such as staking/identity are required to resist such spoofing.
Important caveats:
- Attribution quality is model- and variable-dependent: temperature (t2m) is harder to attribute reliably than pressure/wind; SFNO and FCN have complementary strengths (SFNO better at global variable ranking; FCN sometimes better spatially).
- Single-cycle (per-forecast) attribution is noisier than time-aggregated signals; however, a single cycle still captures most of the aggregate spatial pattern in strong configurations.
- Attribution computes sensitivity of the model prediction (no verification data), not direct loss reduction; ablation audits (offline) are needed for calibration/validation.

Data & Methods

Data and models:
- Inputs: gridded GFS analysis fields at 0.25° resolution.
- Models: FourCastNet (FCN, vision-transformer) and Spherical FNO (SFNO, spectral).
- Common input set restricted to 24 variables for direct model comparisons.
Attribution proxies:
- IG computed with climatological baseline, K = 50 steps (K = 8 shown sufficient in practice).
- GTI (single backward pass) and VG (single backward pass, no scaling).
Reference ground truth:
- Ablation-based utility: measure change in absolute forecast error at target when removing or perturbing inputs (global variable ablation and spatial local patch perturbations P ∈ {1,3,5} pixels).
- Utility used in two ways: global variable importance and spatial station utility (|U_g|).
Experimental design:
- Evaluations across 2 models × 5 European target cities (Zurich, London, Berlin, Madrid, Oslo) × 3 forecast variables (t2m, u10m, msl) × 60 timestamps (Mar 2021–Dec 2022).
- Spatial grid: 468 European points at 2° spacing.
- Metrics: Spearman ρ for rank fidelity, top-k overlap (k = small sets), oracle and uniform baselines, efficiency/optimality ratios, bootstrap 95% CIs.
Gaming experiments:
- Systematic anomaly-inflation scenarios: attacker counts n ∈ {1,3,5}, magnitudes from 10% to 200% in extended runs, variable scopes (t2m, u10m, surface-all), random and stratified placement (close/mid).
- 3,700 total attack scenarios evaluated across models/configurations.
- Detectors evaluated: baseline-dependent (station score history), supervised classifiers, baseline-free single-snapshot detectors.
Payment rule:
- Proportional payments p(g) = |A(g)| / Σ |A(g')| · B, where A(g) is attribution magnitude (or baseline/distance/uniform alternative).
- Ablation audits used offline to compute true utility p_true for calibration and overpayment measurement.

Implications for AI Economics

Operationalizing model-informed rewards:
- Gradient attribution provides a practical, model-derived value signal that can replace geometry/uptime-only reward heuristics in participatory sensing markets (e.g., DePIN/tokenized networks).
- Using GTI yields a favorable compute–accuracy tradeoff enabling real-time per-cycle allocations at large scale.
Efficiency and investment allocation:
- Attribution-based payments direct budget toward sensors that increase predictive utility, improving marginal return on incentive spending and enabling targeted network growth (higher social value per payment vs distance/uniform rules).
- Near-oracle placement suggests lower budgets can achieve similar utility when payments are informed by attribution.
Market design and mechanism complements:
- Attribution alone is insufficient as a full integrity solution: it must be combined with periodic ablation audits, baseline monitoring, identity/Sybil resistance, and staking/slashing mechanisms to deter data-fabrication and coordinated spoofing.
- Baseline-dependent monitoring (history-based anomalies) is effective but requires sufficient per-device history and secure provenance.
- Economic design should account for model- and variable-dependent signal strength (e.g., different treatment or priors for t2m vs pressure/wind).
Policy and governance:
- Transparency: explainable attribution-based payouts (e.g., publishing attribution shares) supports fairness claims but also surfaces attack surfaces; careful disclosure and anti-spoofing incentives are needed.
- Auditability: periodic offline ablation audits should be institutionalized for calibration and dispute resolution; budget for audits must be included in the economic model.
Practical recommendations for implementers:
- Use input-scaled attribution (GTI or IG) rather than raw gradients; prefer GTI for low-latency systems and IG when extra compute/accuracy is affordable.
- Blend attribution with a distance prior (shrinkage) to improve stability where justified.
- Maintain per-station baselines and deploy baseline-dependent detectors; require staking/identity verification to handle fabrication attacks.
- Monitor variable-specific performance and consider variable-weighted payouts or differential quality filters for variables with weak attribution fidelity (e.g., surface temperature).
Research agenda for AI economics:
- Multi-agent game-theoretic analysis of collusion and strategic reporting when attribution-based payouts are introduced.
- Cost–benefit analysis of computation vs improved allocation efficiency (how much budget saved vs compute/audit costs).
- Design of composable cryptoeconomic primitives (staking, slashing, identity) aligned with attribution-informed reward rules.

Summary: Gradient attribution from differentiable AI weather models is a validated, operationally useful proxy for reward allocation in participatory sensing. It improves allocation efficiency and enables model-informed incentives, but must be deployed with complementary monitoring, economic security (staking/identity), and periodic ablation audits to mitigate adversarial and model-dependent weaknesses.

Assessment

Paper Typedescriptive Evidence Strengthmedium — The paper provides extensive computational validation (over 400 model/configuration experiments) showing that gradient-based attribution tracks sensor utility and yields monotonic payment signals, but it stops short of real-world field experiments or causal tests of participant behavior and payment outcomes, and it reveals vulnerability to adversarial manipulation that is only detectable with external baselines. Methods Rigormedium — Evaluation appears systematic and thorough within a simulation/ML-experiment setting (many configurations, fidelity/calibration/cost/gaming analyses), including adversarial scenarios, but the methods rely on a specific differentiable model and GFS inputs, lack field validation, and may be sensitive to model specification and operational constraints. SampleComputational experiments using differentiable AI weather models applied to gridded GFS analysis inputs; simulated participatory sensor contributions and placements; more than 400 configurations varying model settings, grid/resolution, attribution setups, cost constraints, and adversarial input scenarios; evaluation metrics include fidelity to sensor-placement utility, calibration of payments, computational cost, and vulnerability to gaming. Themesadoption governance GeneralizabilityValidated only on specific differentiable model architecture(s) and GFS analysis inputs — results may not hold for other forecast models or training regimes, Experiments are simulation-based (no deployed IoT network or real participant behavior), so incentive effects and strategic responses in the field are untested, Adversarial scenarios explored may not cover all realistic attack vectors; detection requires external baseline data which may not be available, Computational cost and integration with operational assimilation systems may limit applicability to low-resource or real-time deployments, Performance may vary with spatial/temporal scales, sensor modalities, and local meteorological regimes not exhaustively explored

Claims (10)

Claim	Direction	Confidence	Outcome	Details
Large-scale IoT weather sensing networks require incentive mechanisms to sustain participation. Adoption Rate	positive	high	need for incentive mechanisms to sustain participation	0.03
Determining how much value individual data contributions bring to the network remains an open problem. Other	null_result	high	existence of methods to value individual data contributions	0.03
Existing approaches address data quality but not data valuation. Other	negative	high	coverage of data valuation in existing approaches	0.09
In operational meteorology, adjoint-based methods derive value from the forecast model itself but require full data assimilation infrastructure. Other	mixed	high	suitability and infrastructure requirements of adjoint-based value methods	0.18
Differentiable AI weather models can be utilised to fill the gap between data-quality methods and adjoint-based data valuation, providing a practical value signal. Other	positive	high	feasibility of using differentiable AI models as a data valuation mechanism	0.18
Gradient-based attribution on gridded GFS analysis inputs is a viable candidate value signal for individual sensor contributions. Other	positive	high	suitability of gradient-based attribution as a value signal	0.18
We evaluated fidelity, calibration, cost, and gaming vulnerability of the proposed attribution approach across more than 400 configurations. Other	positive	high	fidelity, calibration, computational cost, and vulnerability to gaming of attribution	n=400 0.18
Attribution captures near-optimal sensor placement utility with monotonically faithful payments. Task Allocation	positive	medium	sensor placement utility captured by attribution; monotonicity/faithfulness of payments	0.11
Gradient-based attribution can be inflated by adversarial inputs, and detecting such inflation requires external baseline data. Ai Safety And Ethics	negative	medium	vulnerability to adversarial manipulation and detectability of such manipulation	0.11
Gradient attribution is established as a computationally validated signal for model-informed reward allocation in participatory weather sensing. Adoption Rate	positive	medium	validity of gradient attribution as a reward allocation signal	n=400 0.11