Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis

Predictive policing systems that direct patrol resources based on algorithmically generated crime forecasts have been widely deployed across US cities, yet their tendency to encode and amplify racial disparities remains poorly understood in quantitative terms. We present a reproducible simulation framework that couples a Generative Adversarial Network GAN with a Noisy OR patrol detection model to measure how racial bias propagates through the full enforcement pipeline from crime occurrence to police contact. Using 145000 plus Part 1 crime records from Baltimore 2017 to 2019 and 233000 plus records from Chicago 2022, augmented with US Census ACS demographic data, we compute four monthly bias metrics across 264 city year mode observations: the Disparate Impact Ratio DIR, Demographic Parity Gap, Gini Coefficient, and a composite Bias Amplification Score. Our experiments reveal extreme and year variant bias in Baltimores detected mode, with mean annual DIR up to 15714 in 2019, moderate under detection of Black residents in Chicago DIR equals 0.22, and persistent Gini coefficients of 0.43 to 0.62 across all conditions. We further demonstrate that a Conditional Tabular GAN CTGAN debiasing approach partially redistributes detection rates but cannot eliminate structural disparity without accompanying policy intervention. Socioeconomic regression analysis confirms strong correlations between neighborhood racial composition and detection likelihood Pearson r equals 0.83 for percent White and r equals negative 0.81 for percent Black. A sensitivity analysis over patrol radius, officer count, and citizen reporting probability reveals that outcomes are most sensitive to officer deployment levels. The code and data are publicly available at this repository.

Summary

Main Finding

A reproducible simulation framework coupling a spatial GAN with a Noisy-OR patrol-detection model shows that predictive-policing-style deployments can (1) amplify large, data-dependent racial disparities in who is detected by police and (2) that purely algorithmic “debiasing” of training data (CTGAN rebalancing) can change the direction of disparity but cannot remove structural inequality without concurrent policy/resource changes. Results are multi-city and temporal: Baltimore (2017–2019) exhibits extreme, year-variant bias (mean annual DIR up to ~15,714 in 2019), while Chicago (2022) shows systematic under-detection of Black neighbourhoods (mean DIR ≈ 0.22). Inequality (Gini) remains persistently high (≈0.43–0.62) across conditions.

Key Points

Framework: Monthly retrained conditional GAN generates patrol deployment locations; a Noisy-OR model computes detection probability for each crime given officers within a radius.
Primary fairness metrics: Disparate Impact Ratio (DIR = P(detected|Black)/P(detected|White)), Demographic Parity Gap, Gini coefficient over group detection rates, and a composite Bias Amplification Score (parity gap × Gini).
Main quantitative highlights:
- Baltimore (detected mode): 2017 mean DIR ≈ 0.95, 2018 ≈ 0.079, 2019 ≈ 15,714 (very large instability and extreme amplification in 2019).
- Chicago (2022, detected mode): mean DIR ≈ 0.22 (under-detection of Black areas). Reported mode in Chicago had mean DIR ≈ 1.22.
- Gini across detected-mode experiments: 0.43–0.62 (indicating persistent inequality across groups).
- CTGAN rebalancing (Baltimore 2019, replace 30% of training incidents with race-balanced synthetic data): Black detection rate ↑ from 3.44% → 4.93% (+1.49 pp); White detection rate ↓ from 6.70% → 1.59% (−5.11 pp); DIR flips from 0.513 → 3.106 (direction of disparity changes rather than vanishes).
- Strong correlations between neighbourhood racial composition and detection likelihood: Pearson r ≈ +0.83 with %White, r ≈ −0.81 with %Black (pooled n = 279 neighbourhood observations).
Sensitivity analysis:
- Officer count is the most sensitive parameter. Example: reducing officers from 60 → 30 can drive DIR from ~0.08 to ~7.71 (winner-take-all amplification).
- Increasing patrol radius increases DIR (wider zones amplify spatial concentration).
- Citizen reporting probability has a non‑monotonic influence on DIR (stochastic sampling effects).
Citizen-reported mode (calls-for-service sampling at baseline p=0.521) tends to dampen extreme feedback effects compared with GAN-driven detected mode.
Code and data are publicly available (authors provide repository).

Data & Methods

Data:
- Baltimore Part 1 crime incidents (2017–2019): 145,823 incidents after filtering; 11 months used per year (Feb–Dec).
- Chicago crime incidents (2022): 233,456 incidents after filtering; 11 months used (Feb–Dec).
- Neighborhood covariates from ACS 5-year estimates (2019 ACS for Baltimore years, 2022 ACS for Chicago). Point-in-polygon assignment maps incidents to census tracts/community areas.
- Baseline citizen reporting probability: 52.1% (Pew).
GAN for patrol locations:
- Generator: 100-d noise → FC layers (256→512→256→2) with batch-norm and LeakyReLU; output tanh → lat/lon.
- Discriminator: FC layers (512→256→128→1) with dropout/LeakyReLU → sigmoid.
- Trained per month for 200 epochs (Adam lr=2e-4, β1=0.5), generates N_officers (default 60) patrol points per simulation step.
Noisy-OR detection model:
- For each crime event, P(detected) = 1 − ∏_{j∈N(ci,r)}(1 − p_j), with default r=700 ft, per-officer p_j=0.85.
- Each crime is assigned to racial groups by sampling from neighborhood racial proportions.
CTGAN debiasing:
- Conditional Tabular GAN (SDV library) trained on Baltimore 2019; 30% of training incidents replaced by CTGAN-sampled incidents drawn equally by racial group; retrain patrol GAN on augmented set.
Statistical analysis:
- Monthly and annual aggregation of fairness metrics across 264 city-year-mode-month observations.
- OLS regression of neighbourhood detection rate on %Black, median income, poverty rate; Pearson and Spearman correlations.
Sensitivity grid: patrol radius ∈ {400,700,1000,1500} ft; N_officers ∈ {30,60,90,120}; reporting p ∈ {0.30,0.40,0.521,0.60,0.70,0.80}.

Implications for AI Economics

Distributional externalities and feedback loops matter economically:
- Predictive-allocation algorithms can create self-reinforcing surveillance externalities that alter the distribution of detected events in ways that depend on historical data concentration. These are distributional harms with social costs (reputational, legal, social capital) that are not captured by standard accuracy metrics.
Market and policy design considerations:
- Data-level debiasing is not a panacea: CTGAN rebalancing shifted disparities but, under fixed resource constraints, simply redistributed detection rather than equalized it. Economically, fairness interventions that ignore resource supply and institutional incentives can produce perverse reallocations.
- Resource allocation (number and placement of officers) is a high‑leverage policy lever. Changing deployment budgets or rules can have larger fairness effects than data reweighting alone. Cost‑benefit analysis of fairness should therefore include operational resource trade-offs.
Auditing and regulation:
- Reproducible simulation frameworks (GAN + detection model + socio-demographic mapping) provide a scalable way to audit likely distributional impacts ex ante and to evaluate interventions. Regulators and procurers should require such audits and public reporting of fairness metrics across time and locales.
Contracting and procurement:
- When agencies procure predictive-allocation tools, contracts should specify fairness-sensitive performance criteria (e.g., limits on disparity amplification, required sensitivity analyses) and recognize that model outputs interact with constrained operational resources.
Incentives and governance:
- Algorithmic mitigation must be paired with governance changes (deployment rules, transparency, community oversight, complaint remedies) to avoid zero-sum reallocations that simply shift harms.
Measurement & empirical research:
- Economists studying AI impact should account for endogenous data-generation processes (the policing → data feedback loop). Standard identification strategies that treat data as exogenous may understate the social multiplier effects of algorithmic deployment.
Policy recommendations (economic framing):
- Prioritize combined interventions: (i) transparency and mandatory auditing of allocation algorithms; (ii) assessments that jointly vary data, algorithm, and resource parameters (officer counts, patrol radii); (iii) consider resource augmentation or redistribution as part of fairness programs; (iv) use citizen-reporting and independent data sources as corrective inputs where feasible.
Cost of fairness interventions:
- The paper illustrates a trade-off: equalizing detected outcomes by rebalancing data can reduce detection in previously over-monitored areas (which may be politically and operationally costly). Economics of fairness must quantify these trade-offs (crime deterrence, community trust, enforcement costs).

Caveats / limitations (relevant to economic interpretation) - Race is inferred from neighbourhood composition rather than individual-level attributes—this aggregates and may obscure within-tract heterogeneity. - The GAN models patrol-location distributions learned from historical incident coordinates; real-world deployment decisions also depend on non-spatial factors (policy, political constraints, officer discretion). - Fixed per-officer detection probability and Noisy-OR assumptions simplify complex stop/arrest dynamics. - Chicago analysis is limited to a single year; temporal generalisability beyond the studied years is uncertain.

Overall, the paper provides a transparent, reproducible simulation toolkit and multi-city empirical evidence showing that predictive allocation can produce large, data-dependent distributional effects. For AI economists, the key takeaway is that fairness interventions must treat algorithmic outputs and operational resources as jointly determined variables; modeling and policy must internalize the economic feedback loops between deployment, observed data, and future algorithmic decisions.

Assessment

Paper Typedescriptive Evidence Strengthmedium — Findings are grounded in large administrative datasets and a transparent simulation framework with multiple bias metrics and sensitivity checks, which gives credible internal consistency; however, results depend heavily on modeling assumptions (Noisy-OR detection process, GAN fidelity, reporting probability), limited external validation against unobserved ground truth, and are therefore not definitive evidence of real-world causal effects. Methods Rigormedium — The study applies modern generative modeling (CTGAN), multiple quantitative bias metrics (DIR, demographic parity gap, Gini, composite score), regression analyses and sensitivity checks — showing methodological care — but it lacks empirical validation of key model components (e.g., true detection processes, reporting behavior), and inference is sensitive to untested assumptions about patrol behavior and input data quality. SampleAdministrative Part I crime incident records from Baltimore (145,000+ incidents, 2017–2019) and Chicago (233,000+ incidents, 2022), fused with US Census ACS neighborhood demographic data; synthetic datasets generated by CTGAN trained on those records; analyses aggregated monthly yielding 264 city×month observations for bias metric computation. Themesinequality governance IdentificationNo causal identification; the paper constructs a reproducible simulation pipeline that (1) trains a Conditional Tabular GAN (CTGAN) on observed crime records and ACS demographics to generate synthetic crime events, (2) passes events through a parametric Noisy-OR patrol detection model to simulate police contacts, and (3) conducts sensitivity analyses (varying patrol radius, officer count, reporting probability) and cross-sectional regressions relating neighborhood racial composition to simulated detection rates. GeneralizabilityResults based on two US cities (Baltimore and Chicago) and may not generalize to other jurisdictions with different crime patterns or policing practices, Uses Part I crime incident records only; excludes other enforcement data and unreported events, Relies on modeled detection/reporting assumptions (Noisy-OR, patrol radius, reporting probability) that may not reflect real officer behavior, Synthetic data quality depends on CTGAN fidelity; generative model may not capture complex socio-spatial dynamics, Temporal validity limited to the years/datasets studied and may not hold under operational changes or different predictive policing algorithms

Claims (10)

Claim	Direction	Confidence	Outcome	Details
We present a reproducible simulation framework that couples a Generative Adversarial Network (GAN) with a Noisy OR patrol detection model to measure how racial bias propagates through the full enforcement pipeline from crime occurrence to police contact. Other	positive	high	bias propagation through enforcement pipeline (simulation framework)	0.3
The study uses 145,000+ Part 1 crime records from Baltimore (2017–2019) and 233,000+ records from Chicago (2022), augmented with US Census ACS demographic data. Other	positive	high	data sample size / dataset composition	n=145000 145000+ (Baltimore); 233000+ (Chicago) 0.3
We compute four monthly bias metrics across 264 city-year-mode observations: the Disparate Impact Ratio (DIR), Demographic Parity Gap, Gini Coefficient, and a composite Bias Amplification Score. Inequality	positive	high	monthly bias metrics (DIR, Demographic Parity Gap, Gini, Bias Amplification Score)	n=264 four metrics across 264 city-year-mode observations 0.3
Experiments reveal extreme and year-variant bias in Baltimore's detected mode, with mean annual DIR up to 15,714 in 2019. Inequality	positive	high	Disparate Impact Ratio (DIR)	n=145000 mean annual DIR up to 15714 in 2019 0.18
In Chicago, the model shows moderate under-detection of Black residents with DIR equal to 0.22. Inequality	negative	high	Disparate Impact Ratio (DIR) indicating under-detection of Black residents	n=233000 DIR equals 0.22 0.18
Persistent Gini coefficients of 0.43 to 0.62 across all conditions indicate concentrated detection inequality. Inequality	positive	high	Gini Coefficient (detection distribution inequality)	n=264 Gini coefficients of 0.43 to 0.62 0.18
A Conditional Tabular GAN (CTGAN) debiasing approach partially redistributes detection rates but cannot eliminate structural disparity without accompanying policy intervention. Inequality	mixed	high	effect of CTGAN debiasing on detection rate distribution / structural disparity	n=264 CTGAN partially redistributes detection rates but cannot eliminate structural disparity 0.03
Socioeconomic regression analysis confirms strong correlations between neighborhood racial composition and detection likelihood: Pearson r = 0.83 for percent White and r = -0.81 for percent Black. Inequality	mixed	high	correlation between neighborhood racial composition and detection likelihood	n=264 Pearson r equals 0.83 (percent White) and r equals -0.81 (percent Black) 0.18
A sensitivity analysis over patrol radius, officer count, and citizen reporting probability reveals outcomes are most sensitive to officer deployment levels. Task Allocation	positive	high	sensitivity of bias/detection outcomes to simulation parameters (patrol radius, officer count, reporting probability)	n=264 most sensitive to officer deployment levels (officer count) 0.18
The code and data used in the study are publicly available at the referenced repository. Other	positive	high	availability of replication materials (code and data)	code and data publicly available at this repository 0.3