Simulated deployments of predictive-policing systems reveal extreme, city-specific racial disparities in police detection—Baltimore shows runaway amplification while Chicago exhibits systematic under‑detection of Black residents; algorithmic debiasing via CTGAN shifts but does not erase these structural imbalances.
Predictive policing systems that direct patrol resources based on algorithmically generated crime forecasts have been widely deployed across US cities, yet their tendency to encode and amplify racial disparities remains poorly understood in quantitative terms. We present a reproducible simulation framework that couples a Generative Adversarial Network GAN with a Noisy OR patrol detection model to measure how racial bias propagates through the full enforcement pipeline from crime occurrence to police contact. Using 145000 plus Part 1 crime records from Baltimore 2017 to 2019 and 233000 plus records from Chicago 2022, augmented with US Census ACS demographic data, we compute four monthly bias metrics across 264 city year mode observations: the Disparate Impact Ratio DIR, Demographic Parity Gap, Gini Coefficient, and a composite Bias Amplification Score. Our experiments reveal extreme and year variant bias in Baltimores detected mode, with mean annual DIR up to 15714 in 2019, moderate under detection of Black residents in Chicago DIR equals 0.22, and persistent Gini coefficients of 0.43 to 0.62 across all conditions. We further demonstrate that a Conditional Tabular GAN CTGAN debiasing approach partially redistributes detection rates but cannot eliminate structural disparity without accompanying policy intervention. Socioeconomic regression analysis confirms strong correlations between neighborhood racial composition and detection likelihood Pearson r equals 0.83 for percent White and r equals negative 0.81 for percent Black. A sensitivity analysis over patrol radius, officer count, and citizen reporting probability reveals that outcomes are most sensitive to officer deployment levels. The code and data are publicly available at this repository.
Summary
Main Finding
A reproducible simulation framework coupling a spatial GAN with a Noisy-OR patrol-detection model shows that predictive-policing-style deployments can (1) amplify large, data-dependent racial disparities in who is detected by police and (2) that purely algorithmic “debiasing” of training data (CTGAN rebalancing) can change the direction of disparity but cannot remove structural inequality without concurrent policy/resource changes. Results are multi-city and temporal: Baltimore (2017–2019) exhibits extreme, year-variant bias (mean annual DIR up to ~15,714 in 2019), while Chicago (2022) shows systematic under-detection of Black neighbourhoods (mean DIR ≈ 0.22). Inequality (Gini) remains persistently high (≈0.43–0.62) across conditions.
Key Points
- Framework: Monthly retrained conditional GAN generates patrol deployment locations; a Noisy-OR model computes detection probability for each crime given officers within a radius.
- Primary fairness metrics: Disparate Impact Ratio (DIR = P(detected|Black)/P(detected|White)), Demographic Parity Gap, Gini coefficient over group detection rates, and a composite Bias Amplification Score (parity gap × Gini).
- Main quantitative highlights:
- Baltimore (detected mode): 2017 mean DIR ≈ 0.95, 2018 ≈ 0.079, 2019 ≈ 15,714 (very large instability and extreme amplification in 2019).
- Chicago (2022, detected mode): mean DIR ≈ 0.22 (under-detection of Black areas). Reported mode in Chicago had mean DIR ≈ 1.22.
- Gini across detected-mode experiments: 0.43–0.62 (indicating persistent inequality across groups).
- CTGAN rebalancing (Baltimore 2019, replace 30% of training incidents with race-balanced synthetic data): Black detection rate ↑ from 3.44% → 4.93% (+1.49 pp); White detection rate ↓ from 6.70% → 1.59% (−5.11 pp); DIR flips from 0.513 → 3.106 (direction of disparity changes rather than vanishes).
- Strong correlations between neighbourhood racial composition and detection likelihood: Pearson r ≈ +0.83 with %White, r ≈ −0.81 with %Black (pooled n = 279 neighbourhood observations).
- Sensitivity analysis:
- Officer count is the most sensitive parameter. Example: reducing officers from 60 → 30 can drive DIR from ~0.08 to ~7.71 (winner-take-all amplification).
- Increasing patrol radius increases DIR (wider zones amplify spatial concentration).
- Citizen reporting probability has a non‑monotonic influence on DIR (stochastic sampling effects).
- Citizen-reported mode (calls-for-service sampling at baseline p=0.521) tends to dampen extreme feedback effects compared with GAN-driven detected mode.
- Code and data are publicly available (authors provide repository).
Data & Methods
- Data:
- Baltimore Part 1 crime incidents (2017–2019): 145,823 incidents after filtering; 11 months used per year (Feb–Dec).
- Chicago crime incidents (2022): 233,456 incidents after filtering; 11 months used (Feb–Dec).
- Neighborhood covariates from ACS 5-year estimates (2019 ACS for Baltimore years, 2022 ACS for Chicago). Point-in-polygon assignment maps incidents to census tracts/community areas.
- Baseline citizen reporting probability: 52.1% (Pew).
- GAN for patrol locations:
- Generator: 100-d noise → FC layers (256→512→256→2) with batch-norm and LeakyReLU; output tanh → lat/lon.
- Discriminator: FC layers (512→256→128→1) with dropout/LeakyReLU → sigmoid.
- Trained per month for 200 epochs (Adam lr=2e-4, β1=0.5), generates N_officers (default 60) patrol points per simulation step.
- Noisy-OR detection model:
- For each crime event, P(detected) = 1 − ∏_{j∈N(ci,r)}(1 − p_j), with default r=700 ft, per-officer p_j=0.85.
- Each crime is assigned to racial groups by sampling from neighborhood racial proportions.
- CTGAN debiasing:
- Conditional Tabular GAN (SDV library) trained on Baltimore 2019; 30% of training incidents replaced by CTGAN-sampled incidents drawn equally by racial group; retrain patrol GAN on augmented set.
- Statistical analysis:
- Monthly and annual aggregation of fairness metrics across 264 city-year-mode-month observations.
- OLS regression of neighbourhood detection rate on %Black, median income, poverty rate; Pearson and Spearman correlations.
- Sensitivity grid: patrol radius ∈ {400,700,1000,1500} ft; N_officers ∈ {30,60,90,120}; reporting p ∈ {0.30,0.40,0.521,0.60,0.70,0.80}.
Implications for AI Economics
- Distributional externalities and feedback loops matter economically:
- Predictive-allocation algorithms can create self-reinforcing surveillance externalities that alter the distribution of detected events in ways that depend on historical data concentration. These are distributional harms with social costs (reputational, legal, social capital) that are not captured by standard accuracy metrics.
- Market and policy design considerations:
- Data-level debiasing is not a panacea: CTGAN rebalancing shifted disparities but, under fixed resource constraints, simply redistributed detection rather than equalized it. Economically, fairness interventions that ignore resource supply and institutional incentives can produce perverse reallocations.
- Resource allocation (number and placement of officers) is a high‑leverage policy lever. Changing deployment budgets or rules can have larger fairness effects than data reweighting alone. Cost‑benefit analysis of fairness should therefore include operational resource trade-offs.
- Auditing and regulation:
- Reproducible simulation frameworks (GAN + detection model + socio-demographic mapping) provide a scalable way to audit likely distributional impacts ex ante and to evaluate interventions. Regulators and procurers should require such audits and public reporting of fairness metrics across time and locales.
- Contracting and procurement:
- When agencies procure predictive-allocation tools, contracts should specify fairness-sensitive performance criteria (e.g., limits on disparity amplification, required sensitivity analyses) and recognize that model outputs interact with constrained operational resources.
- Incentives and governance:
- Algorithmic mitigation must be paired with governance changes (deployment rules, transparency, community oversight, complaint remedies) to avoid zero-sum reallocations that simply shift harms.
- Measurement & empirical research:
- Economists studying AI impact should account for endogenous data-generation processes (the policing → data feedback loop). Standard identification strategies that treat data as exogenous may understate the social multiplier effects of algorithmic deployment.
- Policy recommendations (economic framing):
- Prioritize combined interventions: (i) transparency and mandatory auditing of allocation algorithms; (ii) assessments that jointly vary data, algorithm, and resource parameters (officer counts, patrol radii); (iii) consider resource augmentation or redistribution as part of fairness programs; (iv) use citizen-reporting and independent data sources as corrective inputs where feasible.
- Cost of fairness interventions:
- The paper illustrates a trade-off: equalizing detected outcomes by rebalancing data can reduce detection in previously over-monitored areas (which may be politically and operationally costly). Economics of fairness must quantify these trade-offs (crime deterrence, community trust, enforcement costs).
Caveats / limitations (relevant to economic interpretation) - Race is inferred from neighbourhood composition rather than individual-level attributes—this aggregates and may obscure within-tract heterogeneity. - The GAN models patrol-location distributions learned from historical incident coordinates; real-world deployment decisions also depend on non-spatial factors (policy, political constraints, officer discretion). - Fixed per-officer detection probability and Noisy-OR assumptions simplify complex stop/arrest dynamics. - Chicago analysis is limited to a single year; temporal generalisability beyond the studied years is uncertain.
Overall, the paper provides a transparent, reproducible simulation toolkit and multi-city empirical evidence showing that predictive allocation can produce large, data-dependent distributional effects. For AI economists, the key takeaway is that fairness interventions must treat algorithmic outputs and operational resources as jointly determined variables; modeling and policy must internalize the economic feedback loops between deployment, observed data, and future algorithmic decisions.
Assessment
Claims (10)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We present a reproducible simulation framework that couples a Generative Adversarial Network (GAN) with a Noisy OR patrol detection model to measure how racial bias propagates through the full enforcement pipeline from crime occurrence to police contact. Other | positive | high | bias propagation through enforcement pipeline (simulation framework) |
0.3
|
| The study uses 145,000+ Part 1 crime records from Baltimore (2017–2019) and 233,000+ records from Chicago (2022), augmented with US Census ACS demographic data. Other | positive | high | data sample size / dataset composition |
n=145000
145000+ (Baltimore); 233000+ (Chicago)
0.3
|
| We compute four monthly bias metrics across 264 city-year-mode observations: the Disparate Impact Ratio (DIR), Demographic Parity Gap, Gini Coefficient, and a composite Bias Amplification Score. Inequality | positive | high | monthly bias metrics (DIR, Demographic Parity Gap, Gini, Bias Amplification Score) |
n=264
four metrics across 264 city-year-mode observations
0.3
|
| Experiments reveal extreme and year-variant bias in Baltimore's detected mode, with mean annual DIR up to 15,714 in 2019. Inequality | positive | high | Disparate Impact Ratio (DIR) |
n=145000
mean annual DIR up to 15714 in 2019
0.18
|
| In Chicago, the model shows moderate under-detection of Black residents with DIR equal to 0.22. Inequality | negative | high | Disparate Impact Ratio (DIR) indicating under-detection of Black residents |
n=233000
DIR equals 0.22
0.18
|
| Persistent Gini coefficients of 0.43 to 0.62 across all conditions indicate concentrated detection inequality. Inequality | positive | high | Gini Coefficient (detection distribution inequality) |
n=264
Gini coefficients of 0.43 to 0.62
0.18
|
| A Conditional Tabular GAN (CTGAN) debiasing approach partially redistributes detection rates but cannot eliminate structural disparity without accompanying policy intervention. Inequality | mixed | high | effect of CTGAN debiasing on detection rate distribution / structural disparity |
n=264
CTGAN partially redistributes detection rates but cannot eliminate structural disparity
0.03
|
| Socioeconomic regression analysis confirms strong correlations between neighborhood racial composition and detection likelihood: Pearson r = 0.83 for percent White and r = -0.81 for percent Black. Inequality | mixed | high | correlation between neighborhood racial composition and detection likelihood |
n=264
Pearson r equals 0.83 (percent White) and r equals -0.81 (percent Black)
0.18
|
| A sensitivity analysis over patrol radius, officer count, and citizen reporting probability reveals outcomes are most sensitive to officer deployment levels. Task Allocation | positive | high | sensitivity of bias/detection outcomes to simulation parameters (patrol radius, officer count, reporting probability) |
n=264
most sensitive to officer deployment levels (officer count)
0.18
|
| The code and data used in the study are publicly available at the referenced repository. Other | positive | high | availability of replication materials (code and data) |
code and data publicly available at this repository
0.3
|