← Back to papers

Paper deep dive

Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis

Pronob Kumar Barman, Pronoy Kumar Barman

Year: 2026Venue: arXiv preprintArea: cs.AIType: PreprintEmbeddings: 40

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 97%

Last extracted: 3/22/2026, 6:11:25 AM

Summary

This paper presents a reproducible simulation framework using Generative Adversarial Networks (GANs) and a Noisy-OR patrol model to quantify racial bias in predictive policing. By analyzing crime data from Baltimore (2017-2019) and Chicago (2022), the authors demonstrate that algorithmic patrol deployment often encodes and amplifies historical racial disparities. The study evaluates fairness through metrics like the Disparate Impact Ratio (DIR) and Gini Coefficient, finding that while CTGAN-based debiasing can redistribute detection rates, it fails to eliminate structural disparity without policy intervention.

Entities (5)

Baltimore · location · 100%Chicago · location · 100%Conditional Tabular GAN · algorithm · 100%Disparate Impact Ratio · metric · 100%Generative Adversarial Network · algorithm · 100%

Relation Signals (3)

Baltimore analyzedin Predictive Policing

confidence 95% · Using 145000 plus Part 1 crime records from Baltimore 2017 to 2019

Generative Adversarial Network usedtomodel Predictive Policing

confidence 95% · We present a reproducible simulation framework that couples a Generative Adversarial Network GAN with a Noisy OR patrol detection model

Conditional Tabular GAN mitigates Algorithmic Bias

confidence 90% · We further demonstrate that a Conditional Tabular GAN CTGAN debiasing approach partially redistributes detection rates

Cypher Suggestions (2)

Identify algorithms used for bias mitigation · confidence 95% · unvalidated

MATCH (a:Algorithm)-[:MITIGATES]->(b:Phenomenon {name: 'Algorithmic Bias'}) RETURN a.name

Find all cities analyzed in the study · confidence 90% · unvalidated

MATCH (e:Entity {entity_type: 'Location'})-[:ANALYZED_IN]->(s:System {name: 'Predictive Policing'}) RETURN e.name

Abstract

Abstract:Predictive policing systems that direct patrol resources based on algorithmically generated crime forecasts have been widely deployed across US cities, yet their tendency to encode and amplify racial disparities remains poorly understood in quantitative terms. We present a reproducible simulation framework that couples a Generative Adversarial Network GAN with a Noisy OR patrol detection model to measure how racial bias propagates through the full enforcement pipeline from crime occurrence to police contact. Using 145000 plus Part 1 crime records from Baltimore 2017 to 2019 and 233000 plus records from Chicago 2022, augmented with US Census ACS demographic data, we compute four monthly bias metrics across 264 city year mode observations: the Disparate Impact Ratio DIR, Demographic Parity Gap, Gini Coefficient, and a composite Bias Amplification Score. Our experiments reveal extreme and year variant bias in Baltimores detected mode, with mean annual DIR up to 15714 in 2019, moderate under detection of Black residents in Chicago DIR equals 0.22, and persistent Gini coefficients of 0.43 to 0.62 across all conditions. We further demonstrate that a Conditional Tabular GAN CTGAN debiasing approach partially redistributes detection rates but cannot eliminate structural disparity without accompanying policy intervention. Socioeconomic regression analysis confirms strong correlations between neighborhood racial composition and detection likelihood Pearson r equals 0.83 for percent White and r equals negative 0.81 for percent Black. A sensitivity analysis over patrol radius, officer count, and citizen reporting probability reveals that outcomes are most sensitive to officer deployment levels. The code and data are publicly available at this repository.

Tags

ai-safety (imported, 100%)csai (suggested, 92%)preprint (suggested, 88%)

Links

Your browser cannot display the PDF inline. Open PDF directly →

Full Text

39,873 characters extracted from source content.

Expand or collapse full text

Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis Pronob Kumar Barman University of Maryland, Baltimore County Department of Information Systems Baltimore, MD, USA Pronoy Kumar Barman Jagannath University Department of Statistics Dhaka, Bangladesh Abstract Predictive policing systems that direct patrol resources based on algorithmically generated crime forecasts have been widely de- ployed across U.S. cities, yet their tendency to encode and amplify racial disparities remains poorly understood in quantitative terms. We present a reproducible simulation framework that couples a Generative Adversarial Network (GAN) with a Noisy-OR patrol- detection model to measure how racial bias propagates through the full enforcement pipeline—from crime occurrence to police contact. Using 145,000+ Part 1 crime records from Baltimore (2017–2019) and 233,000+ records from Chicago (2022), augmented with U.S. Cen- sus ACS demographic data, we compute four monthly bias metrics across 264 city-year-mode observations: the Disparate Impact Ratio (DIR), Demographic Parity Gap, Gini Coefficient, and a composite Bias Amplification Score. Our experiments reveal extreme and year- variant bias in Baltimore’s detected mode (mean annual DIR up to 15,714 in 2019), moderate under-detection of Black residents in Chicago (DIR = 0.22), and persistent Gini coefficients of 0.43–0.62 across all conditions. We further demonstrate that a Conditional Tabular GAN (CTGAN) debiasing approach partially redistributes detection rates but cannot eliminate structural disparity without accompanying policy intervention. Socioeconomic regression anal- ysis confirms strong correlations between neighbourhood racial composition and detection likelihood (Pearson푟=0.83 for %White, 푟=−0.81for %Black). A sensitivity analysis over patrol radius, of- ficer count, and citizen reporting probability reveals that outcomes are most sensitive to officer deployment levels. The code and data are publicly available at this repository. Keywords predictive policing, algorithmic bias, generative adversarial net- works, fairness, disparate impact, simulation, crime data ACM Reference Format: Pronob Kumar Barman and Pronoy Kumar Barman. 2026. Unmasking Algo- rithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis. In Proceedings of . ACM, New York, NY, USA, 8 pages. 1 Introduction The deployment of algorithmic decision-support systems in law enforcement has accelerated dramatically over the past decade. Predictive policing platforms—which use historical crime data to forecast where future crimes will occur and direct patrol resources , 2025 2026. accordingly—are now in use in dozens of major U.S. cities [8,14]. Proponents argue that data-driven patrol allocation reduces re- sponse times and deters crime; critics counter that such systems entrench historical patterns of over-policing in minority neigh- bourhoods, creating self-reinforcing feedback loops that generate more data on surveilled communities regardless of actual crime rates [7, 11]. Despite growing academic and public attention, the field still lacks a rigorous, reproducible, multi-city simulation framework that can quantify how racial bias enters the policing pipeline and how it compounds over time. Most existing studies rely on a single city, a single year, or aggregate arrest statistics rather than mod- elling the full path from crime occurrence to police detection. The scarcity of such quantitative evidence hinders both policymakers and algorithm designers who wish to audit or mitigate bias. This paper addresses that gap with three principal contributions: (1)A GAN-based spatial patrol model. We train a Generative Adversarial Network on real crime-incident coordinates to generate synthetic patrol deployment locations that mirror the distributional biases embedded in historical data. Our architecture couples a five-layer generator with a four-layer discriminator and a Noisy-OR contact model calibrated to realistic detection parameters. (2)A longitudinal, multi-city bias audit. We apply the frame- work to 264 simulation runs spanning Baltimore (2017–2019) and Chicago (2022), computing four interpretable fairness metrics—DIR, Demographic Parity Gap, Gini Coefficient, and Bias Amplification Score—on a monthly basis and aggregat- ing annually to capture temporal trends. (3) CTGAN debiasing and socioeconomic analysis. We eval- uate a Conditional Tabular GAN rebalancing strategy and quantify the relationship between neighbourhood socioeco- nomic characteristics and detection disparities using OLS regression and Pearson/Spearman correlations across 279 neighbourhood observations. The remainder of the paper is organised as follows. Section 2 reviews related work. Section 3 describes our datasets. Section 4 details our methodology. Section 5 outlines experimental setup. Section 6 presents results. Section 7 discusses implications and limitations. Section 8 concludes. arXiv:2603.18987v1 [cs.AI] 19 Mar 2026 , 2025Pronob Kumar Barman and Pronoy Kumar Barman 2 Related Work 2.1 Bias in Predictive Policing Foundational critiques of predictive policing established that using historical arrest data to forecast crime risk systematically over- predicts risk in communities that have historically been over-policed [11]. Ensign et al. [7] formalised this as a runaway feedback loop: in- creased patrol in a neighbourhood produces more detected inci- dents, which re-enters the training data and intensifies future pa- trol, irrespective of underlying crime rates. Richardson et al. [16] extended the analysis to show that the training data itself is contam- inated by decades of racially biased enforcement practices, labelling this the “dirty data” problem. More recent empirical work has reinforced and quantified these concerns. Almasoud and Idowu [1] conducted a systematic review of AI-driven policing tools and found consistent evidence of racial and socioeconomic disparities across deployment contexts. Hung and Yen [10] examined the ethical foundations of risk-score sys- tems, arguing that statistical discrimination embedded in these tools violates principles of algorithmic justice even when mathe- matical fairness criteria are nominally satisfied. Ziosi and Pruss [24] proposed participatory governance frameworks as a mechanism to surface community-level impacts that aggregate metrics obscure. 2.2 Fairness in Spatial and Temporal Crime Prediction Wu and Frias-Martinez [21] investigated fairness in short-term crime prediction at the census-tract level, demonstrating that accuracy- oriented models reliably produced higher false-negative rates in majority-Black tracts. Wang et al. [20] studied spatial bias in pa- trol allocation and found that officer deployment patterns diverge significantly from crime event distributions when the allocation model is trained on prior arrests rather than reported incidents. Semsar et al. [18] recently conducted a comparative simulation study specifically in the Baltimore metropolitan area, confirming temporal instability in bias metrics—a finding that our multi-year analysis corroborates and extends. 2.3 Generative Models for Fairness The use of generative adversarial networks to audit or mitigate bias has grown substantially since the original GAN formulation [9]. Ma et al. [12] proposed a counterfactual fairness approach using a disentangled causal-effect VAE, demonstrating that generative rebalancing can reduce disparate outcomes without dramatically degrading predictive accuracy. Xu et al.’s CTGAN [22] introduced a conditional tabular GAN specifically designed for structured, mixed-type datasets; we adopt this architecture for our debiasing experiments. 2.4 Measuring Algorithmic Fairness Mehrabi et al. [13] provide a comprehensive taxonomy of fairness definitions, noting that no single metric satisfies all fairness cri- teria simultaneously—a mathematical impossibility formalised by Chouldechova [4]. Berk et al. [3] survey fairness criteria in crim- inal justice risk assessment, concluding that the choice of metric encodes normative assumptions about equality of treatment versus equality of outcome. Selbst et al. [17] argue that abstract fairness metrics must be grounded in sociotechnical context to have practi- cal meaning, a principle that motivates our integration of Census demographic data. Zhang and Bareinboim [23] provide a causal formulation of fairness violations that informs our interpretation of the DIR. Dressel and Farid [6] demonstrate that even human decision-makers match or exceed recidivism prediction algorithms on fairness metrics, underlining that data-driven approaches do not automatically improve equity. 3 Datasets 3.1 Baltimore Part 1 Crime Data (2017–2019) We obtained Baltimore City’s Part 1 Crime incident reports from the Baltimore City Open Data portal [2]. The combined dataset covers 145,823 incident records across three years (2017: 49,682; 2018: 48,319; 2019: 47,822). Each record contains an incident type, GPS coordinates (latitude/longitude), date and time, and a district identifier. We retain only incidents with valid coordinates within the city bounding box (39.197 ◦ –39.372 ◦ N, 76.529 ◦ –76.712 ◦ W) and exclude the January holdout month to allow GAN burn-in, yielding 11 months of simulation data per year (February–December). 3.2 Chicago Crime Data (2022) We obtained Chicago Police Department crime incident records from the City of Chicago Data Portal [5]. The 2022 dataset contains 233,456 incidents with IUCR crime type, GPS coordinates, date, and community area identifiers. After applying the same validity filters, 11 months of data are retained for simulation. 3.3 Demographic Data Neighbourhood-level racial composition, median household in- come, and poverty rates are drawn from the U.S. Census Bureau’s American Community Survey (ACS) 5-Year Estimates [19]: the 2019 ACS release for Baltimore years and the 2022 ACS release for Chicago. We join crime incidents to their containing census tract using a spatial point-in-polygon assignment, yielding demo- graphic covariates for each neighbourhood unit. Baltimore contains 55 recognised neighbourhoods; Chicago contains 77 community areas. 3.4 Citizen Reporting Rate Following Pew Research Center [15], we set the baseline citizen crime-reporting probability at 52.1%. This parameter governs the “reported” simulation mode, which models the subset of crimes that generate a citizen call-for-service as distinct from GAN-directed patrols. 4 Methodology 4.1 GAN Architecture for Patrol Location Generation Our GAN learns the spatial distribution of historical crime inci- dents and generates synthetic patrol deployment locations. Let x=(lat, lon) ∈ R 2 denote a two-dimensional crime location. The generator퐺:R 100 → R 2 maps a latent noise vector z∼ N(0,I) Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis, 2025 to a synthetic patrol location. The discriminator퐷:R 2 → [0,1] distinguishes real from synthetic locations. Generator architecture: z(100) → FC(256) → BN→ LeakyReLU→ FC(512) → BN→ LeakyReLU→ FC(256) → BN→ LeakyReLU→ FC(2) → tanh Discriminator architecture: x(2) → FC(512) → LeakyReLU→ Dropout( 0.3) → FC(256) → LeakyReLU → Dropout(0.3) → FC(128) → LeakyReLU→ FC(1) → 휎 Training minimises the standard minimax objective [9]: min 퐺 max 퐷 E x∼푝 data [log퐷(x)]+ E z∼푝 푧 [log(1− 퐷(퐺(z)))](1) The generator is trained for 200 epochs with Adam optimiser (lr=0.0002,훽 1 =0.5,훽 2 =0.999), batch size 64. At inference,퐺 produces 푁 officers = 60 patrol locations per simulation step. 4.2 Noisy-OR Detection Model For each simulated crime event푐 푖 , we compute the detection prob- ability using a Noisy-OR formulation over patrol officers within a detection radius 푟 : 푃(detected | 푐 푖 )= 1− Ö 푗∈N(푐 푖 ,푟) (1− 푝 푗 )(2) whereN(푐 푖 ,푟)is the set of officers within radius푟=700 ft of푐 푖 and푝 푗 =0.85 is the per-officer detection probability. This parame- terisation yields realistic sparsity: most crimes are detected only when one or more patrol units are in close proximity. Crime events are assigned to a racial group based on the racial composition of their containing neighbourhood, sampled from the Census-derived proportions (%Black, %White, %Neither). 4.3 Bias Metrics We compute four fairness metrics monthly for each city-year-mode combination. Disparate Impact Ratio (DIR): DIR= 푃(detected | Black) 푃(detected | White) (3) Values below 0.8 (the “four-fifths rule” threshold [4]) indicate under- detection of Black residents; values above 1 indicate over-detection relative to White residents. Demographic Parity Gap: Δ parity = 푃(detected | Black)− 푃(detected | White)(4) Gini Coefficient: Computed over the vector of per-group de- tection rates as a measure of overall inequality: 퐺= Í 푖 Í 푗 |푟 푖 −푟 푗 | 2푛 Í 푖 푟 푖 (5) where 푟 푖 is the detection rate for group 푖. Bias Amplification Score: BAS=Δ parity ×퐺(6) This composite metric penalises configurations that combine both directional disparity and high overall inequality. 4.4 CTGAN Debiasing To evaluate an algorithmic mitigation approach, we train a Con- ditional Tabular GAN [22] on the Baltimore 2019 training data. CTGAN conditions the generation process on a discrete label—here, the racial group—which enables race-balanced synthetic augmenta- tion of the training set. We replace 30% of the real training incidents with CTGAN-generated incidents drawn in equal proportions from each racial group and retrain the patrol GAN on the augmented dataset. 4.5 Socioeconomic Regression Analysis To quantify the relationship between neighbourhood demographics and detection rates, we fit an OLS regression model at the neigh- bourhood level: ˆ 푟 det = 훽 0 +훽 1 ·%Black+훽 2 · MedianIncome+훽 3 · PovertyRate+휀 (7) We also compute Pearson and Spearman correlations between detec- tion rates and five neighbourhood-level predictors: %Black, %White, Median Income, and Poverty Rate. All correlations are computed on the pooled neighbourhood dataset (푛=279 observations across all city-year combinations). 5 Experimental Setup Simulations are executed on a per-month basis for February through December of each city-year. Each month, the GAN is retrained on that month’s crime incidents and generates 60 patrol deployment locations. The Noisy-OR model evaluates each crime event against the deployed patrol locations. Two simulation modes are evaluated: •Detected mode: Patrol locations are drawn entirely from GAN-generated points, representing algorithmically directed deployment. •Reported mode: Patrol locations reflect citizen call-for- service, with each crime independently reported with proba- bility 푝= 0.521 [15]. The full experimental grid comprises: 3 Baltimore years×2 modes×11 months + 1 Chicago year×2 modes×11 months = 264 simulation observations. Sensitivity analyses vary patrol radius (푟 ∈ 400,700,1000,1500ft), officer count (푁 ∈ 30,60,90,120), and citizen reporting probability (푝 ∈ 0.30,0.40,0.521,0.60,0.70,0.80). All experiments use a fixed random seed for reproducibility. GAN training uses PyTorch 2.1; CTGAN uses the sdv library 1.9.0. 6 Results 6.1 Temporal Bias Trends in Baltimore (2017–2019) Figure 1 shows monthly per-group detection rates across all three Baltimore years in detected mode. The most striking feature is the extreme upward spike in Black detection rates during 2019: the GAN, trained on 2019 crime data, generates patrol points that concentrate heavily in majority-Black neighbourhoods, producing detection rates for Black residents that vastly exceed those for White residents in most months. Figure 2 plots the monthly DIR across years and modes, revealing dramatic year-to-year instability in detected mode. In 2017, the mean annual DIR is 0.95 (6 of 11 months above 1.0, indicating slight , 2025Pronob Kumar Barman and Pronoy Kumar Barman Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month (FebDec) 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 Detection Rate (%) Monthly Crime Detection Rate by Neighbourhood Racial Composition Baltimore 2019 GAN Detected Mode Black-majority White-majority Neither-majority Figure 1: Monthly detection rates by racial group for Balti- more (2017–2019) in detected mode. The extreme spike in 2019 Black detection rates reflects the GAN learning patrol patterns concentrated in majority-Black neighbourhoods. 201720182019 Year 0 5000 10000 15000 20000 25000 Disparate Impact Ratio (Black / White) Annual Disparate Impact Ratio Baltimore Shaded Bands: ±1 SD Across Months Detected Mode Reported Mode Parity (DIR = 1) Figure 2: Monthly Disparate Impact Ratio (DIR) for Balti- more 2017–2019 across both simulation modes. The detected mode exhibits extreme year-to-year variance (DIR range: 0.04–35,582), while the reported mode remains more stable (DIR range: 0.24–1.66). over-detection of Black residents in some months). In 2018, the mean DIR collapses to 0.079 (0 months above 1.0, severe under- detection of Black residents). In 2019, the mean DIR explodes to 15,714 (10 of 11 months above 1.0), driven by near-zero White detection rates as the GAN concentrates patrols away from White- majority areas. The reported mode, by contrast, exhibits substantially lower and more stable DIR values across all years (mean annual DIR range: 0.61–1.22). This confirms that citizen reporting introduces a form of “ground truth” correction that dampens the feedback loop effect. The mean Gini coefficient under detected mode ranges from 0.43 (2017) to 0.62 (2018) to 0.55 (2019), indicating persistent inequality of detection across racial groups regardless of the directional bias. Figure 3 presents the monthly Demographic Parity Gap, which shows that 2019 detected mode is the only configuration where the gap consistently favours Black residents (positive gap), driven by the GAN’s patrol concentration in Black neighbourhoods. All other configurations show negative parity gaps, meaning White residents are more likely to have their crimes detected per capita. Table 1: Annual Bias Metrics Summary by City, Year, and Simulation Mode. DIR = Disparate Impact Ratio; PG = De- mographic Parity Gap; Gini = Gini Coefficient; BAS = Bias Amplification Score; M>1 = Months with DIR above 1.0. CityYearModeAvg. DIRMax. DIRAvg. PGAvg. GiniM>1 Baltimore2017det.0.9522.013 −0.0310.4256/11 Baltimore2017rep.0.6131.058 −0.0250.2831/11 Baltimore2018det.0.0790.522 −0.1420.6180/11 Baltimore2018rep.0.7211.155 −0.0200.3113/11 Baltimore2019det.15,71435,582 +0.0160.55310/11 Baltimore2019rep.0.6531.655 −0.0290.3611/11 Chicago2022det.0.2201.201 −0.0730.5671/11 Chicago2022rep.1.2182.694 −0.0000.2136/11 Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month (FebDec) 25 20 15 10 5 0 5 10 15 Parity Gap (Black% White%, p) Monthly Demographic Parity Gap Baltimore GAN Detected Mode; Positive = Black Areas Detected More Year 2017 2018 2019 Figure 3: Monthly Demographic Parity Gap (Black detec- tion rate minus White detection rate) for Baltimore across all years and modes. Values above zero indicate higher per- capita detection of Black residents. The 2019 detected mode is the only configuration where the gap is consistently posi- tive. 6.2 Cross-City Comparison: Baltimore vs. Chicago Figure 4 compares the annual DIR distributions for Baltimore (2017– 2019) and Chicago (2022). Chicago’s detected mode exhibits a mean DIR of 0.22, indicating systematic under-detection of crimes in Black neighbourhoods. This contrasts markedly with Baltimore 2019, where the GAN produces the opposite pattern. The divergence illustrates that the direction of bias is data-dependent: it is not a fixed property of the GAN architecture but reflects the specific spatial concentration of historical crime data in each city. Chicago’s reported mode produces a mean DIR of 1.22, with 6 of 11 months above 1.0, meaning that citizen reports slightly over-represent crime events in Black neighbourhoods relative to White neighbourhoods. This finding is consistent with prior work showing that Black communities in Chicago report crimes at higher rates in absolute terms due to higher overall crime prevalence [20]. Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis, 2025 Baltimore 2019 0 5000 10000 15000 20000 25000 Mean Disparate Impact Ratio (Black / White) 0.65 Baltimore 2019 Detected Mode Reported Mode Chicago 2022 0.0 0.5 1.0 1.5 2.0 0.22 1.22 Chicago 2022 Cross-City Comparison of Disparate Impact Ratio Baltimore 2019 vs. Chicago 2022 (Error bars: ±1 SD) Figure 4: Cross-city comparison of monthly DIR values across all city-year configurations. Baltimore 2019 detected mode (mean DIR = 15,714) is truncated for display; Chicago 2022 shows systematic under-detection of Black residents (mean DIR = 0.22). 400600800100012001400 Detection Radius (ft) 0.0 0.2 0.4 0.6 0.8 1.0 Disparate Impact Ratio (Black / White) Detection Radius DIR > 1 (over-detection, Black) DIR < 1 (under-detection, Black) 406080100120 Number of Officers 0 1 2 3 4 5 6 7 8 Officers Deployed DIR > 1 (over-detection, Black) DIR < 1 (under-detection, Black) 0.30.40.50.60.70.8 Reporting Probability 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Reporting Probability DIR > 1 (over-detection, Black) DIR < 1 (under-detection, Black) Sensitivity Analysis Disparate Impact Ratio Under Varying Simulation Parameters Baltimore 2019, GAN Detected Mode Figure 5: Sensitivity analysis of DIR across patrol radius (400– 1500 ft), officer count (30–120), and citizen reporting proba- bility (0.30–0.80). Officer count has the largest effect on DIR magnitude. Table 2: Sensitivity Analysis Results. DIR values for Balti- more 2019 detected mode under varied parameters. ParameterValueDIR Patrol Radius (ft) 4000.045 7000.087 10000.142 15000.227 Officer Count 307.713 600.084 900.121 1200.209 Reporting Prob. 0.300.612 0.400.457 0.5210.659 0.600.402 0.700.734 0.800.699 6.3 Sensitivity Analysis Figure 5 and Table 2 present the results of varying three key simu- lation parameters. All sensitivity experiments use Baltimore 2019 data in detected mode. Table 3: CTGAN Debiasing Results for Baltimore 2019. De- tection rates by racial group under biased (raw training) and debiased (CTGAN-balanced) conditions. ConditionDIRDet. RateDet. RateParity (Black)(White)Gap Biased (raw training)0.5133.44%6.70% −0.033 Debiased (CTGAN balanced)3.1064.93%1.59% +0.033 Biased (raw training) Debiased (CTGAN balanced) 0 1 2 3 4 5 6 7 Detection Rate (%) 3.44% 4.93% 6.70% 1.59% Detection Rate by Race Black-majority White-majority Biased (raw training) Debiased (CTGAN balanced) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Disparate Impact Ratio 0.51 3.11 Disparate Impact Ratio Parity (DIR = 1) CTGAN Debiasing Biased vs. Race-Balanced GAN Training Baltimore 2019, February Deployment Figure 6: Detection rates by racial group under biased and CTGAN-debiased training conditions for Baltimore 2019. CT- GAN rebalancing increases Black detection rates (+1.49 per- centage points) but reduces White detection rates (−5.11 per- centage points), reversing the direction of disparity rather than equalising it. The most influential parameter is officer count. Reducing from 60 to 30 officers more than doubles the DIR (from 0.084 to 7.71), because with only 30 patrol units, the GAN’s concentrated patrol pattern creates a winner-take-all detection environment where whichever neighbourhood receives patrol officers dominates the detection count. Increasing patrol radius monotonically increases DIR, as wider detection zones amplify the spatial concentration al- ready embedded in GAN-generated patrol points. Citizen reporting probability has a non-monotonic relationship with DIR, reflecting the stochastic nature of the sampling process. 6.4 CTGAN Debiasing Table 3 and Figure 6 present the results of applying CTGAN rebal- ancing to the Baltimore 2019 training data. CTGAN rebalancing raises the Black detection rate from 3.44% to 4.93% (+1.49 percentage points) but simultaneously reduces the White detection rate from 6.70% to 1.59% (−5.11 percentage points). The resulting DIR swings from 0.513 (under-detection of Black) to 3.106 (over-detection of Black), with the parity gap inverting sign from−0.033 to+0.033. This result demonstrates that algorith- mic debiasing at the data level can change the direction of dispar- ity without eliminating it, because the underlying patrol resource constraint—60 officers for a city-wide area—creates a zero-sum allocation environment. , 2025Pronob Kumar Barman and Pronoy Kumar Barman 0.20.40.60.81.0 % Black Residents 0 1 2 3 4 5 6 7 Detection Rate (%) % Black Residents Black-majority White-majority Neither-majority r = -0.81, p<0.001 0.00.20.40.60.8 % White Residents 1 2 3 4 5 6 7 Detection Rate (%) % White Residents Black-majority White-majority Neither-majority r = 0.83, p<0.001 20000400006000080000100000 Median Household Income ($) 1 2 3 4 5 6 7 8 Detection Rate (%) Median Household Income ($) Black-majority White-majority Neither-majority r = 0.65, p<0.001 0.050.100.150.200.250.300.350.40 Poverty Rate 0 1 2 3 4 5 6 7 Detection Rate (%) Poverty Rate Black-majority White-majority Neither-majority r = -0.64, p<0.001 010002000300040005000 Total Crime Count 2 3 4 5 6 7 Detection Rate (%) Total Crime Count Black-majority White-majority Neither-majority r = 0.20, p<0.001 Socioeconomic Correlates of Crime Detection Rate Baltimore Neighbourhoods (20172019 Pooled, n = 279) Figure 7: Scatter plots of neighbourhood-level detection rates versus %Black residents (left) and %White residents (right) across all푛=279 neighbourhood observations. Pearson푟= −0.81 (%Black) and 푟=+0.83 (%White). 201720182019 Year Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month 0.40.50.4 0.20.22.8 0.10.12.8 0.50.1>999 0.80.0>999 1.00.0>999 1.10.0>999 2.00.0>999 1.80.0>999 1.20.0>999 1.30.0>999 Detected Mode 201720182019 Year Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 0.60.40.7 1.10.40.4 0.50.60.4 0.50.50.6 0.30.90.8 0.61.20.3 0.61.10.7 0.61.11.7 0.70.40.7 0.70.80.3 0.50.60.6 Reported Mode 0 2 4 6 8 10 DIR (Black / White) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 DIR (Black / White) Disparate Impact Ratio Heatmap Baltimore Month × Year by Simulation Mode (Red = High DIR, Blue = Low DIR) Figure 8: Heatmap of mean monthly DIR across city, year, and simulation mode. The 2019 Baltimore detected mode stands out as an extreme outlier, while reported modes maintain near-unity DIR. 051015202530 Simulation Step (month) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Gini Coefficient of Detection Inequality 201720182019 Baltimore 20172019 Detected Mode Reported Mode 0246810 Simulation Step (month) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Chicago 2022 Detected Mode Reported Mode Gini Coefficient of Detection Inequality Over Time Higher Values Indicate Greater Racial Inequality in Crime Detection Figure 9: Gini coefficient trends across months for all city- year-mode configurations. Detected mode consistently pro- duces higher Gini values (0.43–0.62) than reported mode (0.12–0.36), indicating greater inequality in algorithmically directed patrols. 6.5 Socioeconomic Correlates of Detection Disparity Figures 7–9 and Tables 4–5 present the socioeconomic analysis. Table 4: OLS Regression Results: Neighbourhood Detection Rate as a function of demographic predictors (푛= 279). VariableCoefficientSignificance Intercept+0.0676*** %Black−0.0966*** Median Income −3× 10 −8 * Poverty Rate +0.0886*** *** 푝< 0.001; * 푝< 0.05 Table 5: Pearson and Spearman Correlation Coefficients be- tween neighbourhood demographic predictors and detection rate (푛= 279). All 푝-values are< 0.001. PredictorPearson 푟Spearman 휌 %Black−0.814 −0.428 %White+0.830 +0.447 Median Income +0.647 +0.520 Poverty Rate −0.644 −0.456 The strongest predictors of detection rate are %White (푟= +0.830) and %Black (푟= −0.814), both statistically significant at 푝<0.001. Median income (푟= +0.647) and poverty rate (푟= −0.644) are moderately correlated, consistent with the structural co-linearity between race and economic status in both cities [13]. The OLS model estimates a−0.0966 coefficient on %Black: a one- percentage-point increase in a neighbourhood’s Black population share is associated with a 0.097-percentage-point decrease in per- crime detection rate, holding other variables constant. The positive coefficient on poverty rate (+0.089) appears counterintuitive but reflects the collinearity structure: in high-poverty, majority-Black neighbourhoods, the GAN directs more patrols (increasing detec- tion events per patrol unit) while simultaneously detecting a lower fraction of total crimes. 7 Discussion 7.1 Bias Amplification Mechanism Our results expose a three-stage amplification mechanism in GAN- directed predictive policing. First, historical crime data encodes the spatial footprint of past enforcement, which reflects decades of racially targeted policing rather than the true spatial distribution of crime [16]. Second, the GAN learns and replicates this footprint, generating patrol locations that inherit the same concentration patterns. Third, the Noisy-OR detection model—operating on a fixed patrol budget—converts spatial concentration into detection rate disparity: neighbourhoods that happen to receive GAN-generated patrol points accumulate detected events, feeding back into the next training cycle. The extreme DIR values in Baltimore 2019 (mean 15,714) are not an artifact of model instability; they reflect near-complete patrol withdrawal from White-majority areas in that year’s training data. The reported mode’s comparative stability (mean DIR 0.61–1.22 across all conditions) confirms that citizen-initiated contacts pro- vide a partial corrective, since reporting is less susceptible to the Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis, 2025 spatial feedback loop. This finding is consistent with Ensign et al.’s theoretical analysis [7] and provides empirical evidence of the runaway dynamic in real city data. 7.2 Implications of the CTGAN Result The CTGAN debiasing result carries an important policy warning. Algorithmic rebalancing at the data level can substantially change the direction and magnitude of disparity without reducing total disparity. In our case, CTGAN rebalancing flips the DIR from 0.51 to 3.11—a sixfold change in magnitude and a full reversal of direction. This occurs because CTGAN synthetic augmentation increases the representation of crime events in Black neighbourhoods in the training set, causing the GAN to generate more patrol points there, which increases Black detection rates but diverts patrols from White neighbourhoods. Under a fixed patrol resource constraint, fairness gains for one group necessarily reduce detection rates for another. This zero-sum structure implies that purely data-centric debiasing cannot substitute for increases in patrol resources or for patrol allocation policies that explicitly account for neighbourhood equity [20]. 7.3 Socioeconomic Confounding The strong Pearson correlations between racial composition and detection rate (up to|푟|=0.83) confirm that the bias we measure is not incidental but is structurally embedded in the socioeconomic ge- ography of both cities. The OLS regression cannot disentangle cau- sation from correlation—racial composition, income, and poverty are mutually confounding in U.S. urban geographies [17]. Never- theless, the consistency of the sign and magnitude of these associ- ations across two cities and four years strengthens the inference that GAN-directed patrol allocation systematically disadvantages economically marginalised, racially segregated neighbourhoods. 7.4 Cross-City Variation The divergence between Baltimore and Chicago is instructive. Bal- timore 2018 and 2019 detected modes represent opposite extremes (DIR = 0.079 and 15,714 respectively), while Chicago 2022 detected mode sits at DIR = 0.22. This variation is not primarily an artefact of city size or total crime volume; it reflects the specific spatial concentration of each year’s crime data and the GAN’s tendency to amplify whichever spatial pattern dominates the training set. The implication for auditing is that bias metrics must be computed annu- ally and cannot be assumed to be stable from one deployment cycle to the next—a point reinforced by Semsar et al.’s recent comparative work [18]. 7.5 Limitations Several limitations bound the generalisability of our findings. (1) Race assignment is probabilistic, derived from neighbourhood cen- sus proportions rather than individual-level data. (2) The GAN is retrained monthly, which is computationally convenient but does not precisely model the longer retraining cycles used in operational systems. (3) The Noisy-OR detection model assumes independence between officers, which may over- or under-estimate detection probability in coordinated patrol formations. (4) We do not model crime displacement—the possibility that patrol concentration in one area drives criminal activity to adjacent areas—which could affect the long-run feedback dynamics. (5) CTGAN debiasing was evaluated only on Baltimore 2019; its effects may differ in other city-years. 8 Conclusion We have presented a reproducible, multi-city GAN simulation frame- work for auditing racial bias in predictive policing. Our analy- sis of 264 simulation observations across Baltimore (2017–2019) and Chicago (2022) demonstrates that GAN-directed patrol allo- cation produces large, year-variant, and structurally embedded racial disparities—captured by DIR values ranging from near-zero to above 15,000 in detected mode. The reported mode, driven by citizen calls, consistently produces lower and more stable dispar- ities, suggesting that citizen-initiated contact provides a partial corrective to the feedback loop that is absent in purely algorithmic allocation. CTGAN debiasing can alter the direction and magnitude of dis- parity but cannot eliminate it under a fixed patrol resource budget. Socioeconomic regression confirms that neighbourhood racial com- position is the strongest predictor of detection rate (푟=0.83), underscoring that the bias is structural rather than incidental. Our findings carry direct policy implications. First, predictive policing systems should be audited annually using city-specific bias metrics rather than relying on static model validation at deployment time. Second, data-level debiasing must be accompanied by resource and policy changes to avoid simply redirecting disparity. Third, community-driven reporting channels should be strengthened as a counterweight to algorithmically directed patrol. Future work will investigate causal intervention models [12,23] that go beyond distributional rebalancing, and will extend the simulation to model multi-round feedback across deployment cycles. Acknowledgments The authors thank the Baltimore City and Chicago open data por- tals for making crime incident data publicly available, and the U.S. Census Bureau for the ACS demographic data used in this study. References [1] Abdulmajeed S Almasoud and Samuel Idowu. 2024. Algorithmic bias in predictive policing: a systematic review of fairness in AI-driven law enforcement tools. AI and Ethics (2024). doi:10.1007/s43681-024-00541-3 [2]Baltimore City Open Data. 2019. Baltimore Police Department Part 1 Crime Data, 2017–2019. https://data.baltimorecity.gov. Accessed 2024. [3]Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2021. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research 50, 1 (2021), 3–44. doi:10.1177/0049124118782533 [4]Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 5, 2 (2017), 153–163. doi:10.1089/big.2016.0047 [5]City of Chicago Data Portal. 2022. Chicago Police Department Crimes Data, 2022. https://data.cityofchicago.org. Accessed 2024. [6]Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Science Advances 4, 1 (2018), eaao5580. doi:10.1126/sciadv.aao5580 [7] Danielle Ensign, Sorelle A Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2018. Runaway feedback loops in predictive policing. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency. PMLR, 160–171. [8]Andrew Guthrie Ferguson. 2017. Policing predictive policing. Washington University Law Review 94, 5 (2017), 1109–1189. [9]Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems, Vol. 27. , 2025Pronob Kumar Barman and Pronoy Kumar Barman [10]Kuo-Ting Hung and Yi-Ren Yen. 2023. The ethics of predictive policing: statistical discrimination, algorithmic injustice, and the construction of risk. Synthese 201 (2023), 1–23. doi:10.1007/s11229-023-04189-0 [11] Kristian Lum and William Isaac. 2016. To predict and serve? Significance 13, 5 (2016), 14–19. doi:10.1111/j.1740-9713.2016.00960.x [12]Ziqi Ma, Ding Guo, and Jie Jiang. 2024. Counterfactual fairness with disentangled causal effect variational autoencoder. In Proceedings of the International Conference on Learning Representations. arXiv:2310.17687. [13] Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. Comput. Surveys 54, 6 (2021), 1–35. doi:10.1145/3457607 [14]George O Mohler, Martin B Short, Sean Malinowski, Mark Johnson, George E Tita, Andrea L Bertozzi, and P Jeffrey Brantingham. 2015. Randomized controlled field trial of predictive policing. J. Amer. Statist. Assoc. 110, 512 (2015), 1399–1411. doi:10.1080/01621459.2015.1077710 [15]Pew Research Center. 2019. What the public knows about the political parties. https://w.pewresearch.org. Accessed 2024; citizen reporting rate cited as 52.1%. [16]Rashida Richardson, Jason M Schultz, and Kate Crawford. 2019. Dirty data, bad predictions: How civil rights violations impact police data, predictive policing systems, and justice. New York University Law Review Online 94 (2019), 15–55. [17] Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 59–68. doi:10.1145/3287560.3287598 [18]Arash Semsar et al.2026. A comparative simulation study of predictive policing bias in urban environments. arXiv preprint arXiv:2602.02566 (2026). [19] U.S. Census Bureau. 2022. American Community Survey 5-Year Estimates, 2019 and 2022. https://w.census.gov/programs-surveys/acs. Accessed 2024. [20] Hanlin Wang, Cynthia Rudin, et al.2023. Spatial bias in predictive policing and resource allocation. Journal of Quantitative Criminology (2023). doi:10.1007/ s10940-022-09545-w [21]Yiqun Wu and Vanessa Frias-Martinez. 2024. Fairness-aware spatio-temporal crime prediction. arXiv preprint arXiv:2406.04382 (2024). [22]Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling tabular data using conditional GAN. In Advances in Neural Information Processing Systems, Vol. 32. [23]Junzhe Zhang and Elias Bareinboim. 2018. Fairness in decision-making — the causal explanation formula. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. [24]Marta Ziosi and David Pruss. 2024. Participatory approaches to artificial intel- ligence governance in policing. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1–12. doi:10.1145/3630106.3658991