Better models can make property taxes both fairer and more accurate: across nearly all U.S. counties, adding property attributes and public Census data to assessment models reduces valuation errors and the regressivity of tax burdens, undermining the assumed inevitability of a fairness–accuracy trade-off.

Tradeoffs are Domain Dependent: Improving Accuracy and Fairness in Property Tax Assessments

Evelyn Smith, Emma Harvey, Christopher Berry, Jacob Goldin, Daniel E. Ho · May 14, 2026

arxiv correlational medium evidence 7/10 relevance Source PDF

Using 26 million U.S. property sales, the paper shows that better predictive assessment models—especially those adding property features and Census data—tend to improve both valuation accuracy and fairness, contradicting a universal fairness–accuracy tradeoff in property-tax assessments.

Algorithmic fairness research often assumes a tradeoff between fairness and accuracy. Yet this tradeoff may not be universal. We test this assumption in the context of U.S. property tax assessment - a setting in which the output of predictive algorithms directly determines the distribution of tax obligations among homeowners. Currently, systematic assessment errors cause owners of lower-valued properties to face disproportionately high tax burdens, creating regressivity in the property tax system. Using data on 26 million property sales spanning 95% of U.S. counties, we conduct three complementary analyses. First, we find that assessment accuracy and fairness - measured using domain-relevant metrics - are strongly correlated across counties under status quo practices. Second, in simulated assessment models, we show that adding property features improves accuracy in most cases, and that when accuracy improves, fairness almost always improves as well. Third, we show that incorporating publicly available Census data into assessment models - a feasible reform in most counties - would significantly improve both accuracy and fairness relative to status quo assessments. Together, these results challenge the presumed universality of the fairness-accuracy tradeoff and demonstrate that well-designed modeling improvements can advance both fairness and accuracy in large-scale public sector systems.

Summary

Main Finding

Across U.S. property tax assessments, improving predictive accuracy generally improves vertical equity (reduces regressivity). Using 26 million single‑family home sales across ~95% of U.S. counties, the authors show (1) status‑quo assessment accuracy and fairness are positively correlated across counties; (2) in county‑level simulated assessment models, adding features almost always increases accuracy and, when accuracy improves, fairness improves >99% of the time; and (3) adding readily available Census (ACS) neighborhood characteristics to models would meaningfully improve both accuracy and fairness in hundreds of counties. This challenges the idea that an accuracy–fairness tradeoff is universal: in property tax assessment, better models can be Pareto improvements.

Key Points

Domain and metric matter: the canonical fairness–accuracy tradeoff from ML literature is not inevitable; in property tax assessment, accuracy gains typically align with fairness gains.
Three complementary analyses:
Cross‑county empirical: counties with lower mean absolute percentage error (MAPE) of assessments tend to have less regressive tax outcomes (measured by multiple regressivity metrics).
Simulation of counterfactual AVMs: county‑level LASSO and random forest models trained on richer property/neighborhood features show that adding features usually reduces error, and accuracy improvements coincide with fairness improvements in >99% of cases.
Feasible reform test: including publicly available Census block‑group variables in assessment models substantially improves both accuracy and regressivity (including reductions in regressivity correlated with neighborhood race and income).
Regressivity metrics used (triangulation): Log Coefficient (regression‑based), Suits Index (distributional), and Price‑Related Differential (PRD, ratio‑based).
Conceptual insight: the relationship between accuracy and regressivity depends on variance vs. bias tradeoffs in predictions; under realistic data/modeling improvements, variance/bias changes tend to reduce regressivity.
Policy/practice context: most assessor offices do not use Automated Valuation Models (AVMs)—IAAO survey found ~16% use AVMs—and many do not include fine‑grained neighborhood data, despite evidence these features help.

Data & Methods

Data: 26 million arms‑length single‑family home sales (2018–2023), merged with property characteristics, tax assessment values, location (county, block group), and ACS 5‑year block group socioeconomic variables. Coverage: ~2,844 of 3,007 U.S. counties (~95%).
Unit of analysis: county‑year; counterfactual models are built and evaluated within counties.
Counterfactual models: LASSO and random forest AVMs predicting log sale price. Models simulate realistic assessor practices and are intended to be technically and administratively feasible.
Preprocessing:
- One‑hot encode categorical features (keep categories present ≥5%).
- Drop features missing >50%.
- Impute remaining missing values with MICE.
- Winsorize numeric features at 1st and 99th percentiles; normalize.
Training/evaluation:
- Chronological train/test split — train on historical sales, test on most recent year (aligns with real assessment cycles).
- Hyperparameter tuning via Bayesian optimization with 5‑fold CV using mean absolute error on log price.
Primary accuracy metric: Mean Absolute Percentage Error (MAPE) between predicted assessed value and actual sale price (sale price used as ground truth market value). Robustness checks use RMSE and MAE.
Fairness/regressivity metrics: Log Coefficient (slope of log assessed-to-sale ratio on log sale price), Suits Index (area‑based distributional measure), PRD (ratio‑based average assessed/sale ratio).
Evaluation framework: Pareto frontier analysis — check whether counterfactual models produce Pareto improvements (simultaneous accuracy and fairness gains) over status quo assessments and other models.

Implications for AI Economics

Tradeoffs are domain dependent: economists and policymakers should not assume an inherent accuracy–fairness tradeoff. Data richness and model specification can unlock joint gains.
Value of additional data: collecting or integrating inexpensive, public, neighborhood‑level features (e.g., ACS block‑group variables) can produce measurable welfare gains by reducing regressivity in tax burdens without sacrificing accuracy.
Policy recommendations:
- Encourage wider, transparent adoption of AVMs that incorporate location/neighborhood features, subject to local legal constraints and auditability.
- Where AVMs are adopted, require evaluation on both accuracy and distributional (regressivity) metrics; use multiple fairness metrics because no single metric is definitive.
- Low‑cost reforms (use of public Census data, neighborhood features) can be prioritized before costlier interventions (inspections, appraisals).
Cautions and open issues for AI economists:
- Ground truth and measurement: sale prices are an imperfect proxy for market value (selection into sale, excluded distressed sales). Results depend on that operational choice.
- Legal and institutional constraints: state caps on reassessments, inspection mandates, or political economy factors (corruption, lobbying) can limit implementable improvements and produce heterogeneous effects across counties.
- Feature ethics and race: improvements leveraging neighborhood sociodemographics can reduce regressivity on average, but using race or income proxies raises normative and legal questions. The paper uses block‑group public data and examines neighborhood‑level regressivity, not individual protected attributes; implementers must consider anti‑discrimination law and public acceptability.
- Incentives and dynamics: assessments affect behavior, market prices, and local fiscal policy. Longitudinal and general equilibrium effects (e.g., strategic reporting, appeals, effects on mobility and investment) remain to be studied.
- Heterogeneity: some counties may experience tradeoffs depending on specific variance/bias patterns; thus local diagnostics are necessary before deployment.
Research directions: quantify political economy barriers to AVM adoption, study long‑run distributional and market feedbacks from improved assessments, formalize cost‑benefit analyses for feature collection versus inspection/appraisal investments, and test whether similar domain dependence of the fairness–accuracy relationship holds in other public‑sector settings (e.g., benefit determination, licensing).

Summary takeaway: in property tax assessment—a high‑impact, public‑sector prediction task—richer data and straightforward modeling improvements can simultaneously increase predictive accuracy and vertical equity. For AI economics, this underscores the importance of granular, domain‑specific evaluation (multiple fairness metrics, careful data choices) and suggests that data investment often yields both efficiency and equity gains.

Assessment

Paper Typecorrelational Evidence Strengthmedium — The paper uses a very large, near-national dataset and triangulates findings with cross-county correlations, simulations, and counterfactual model comparisons, which together provide strong descriptive and predictive evidence that improving models can reduce both error and regressivity; however, it lacks experimental or quasi-experimental validation of implemented reforms, and results depend on modeling choices and simulation assumptions. Methods Rigorhigh — Authors employ multiple complementary analyses (empirical correlation across counties, simulated assessment models, and counterfactuals using publicly available Census data), use domain-relevant fairness and accuracy metrics, and analyze a large, comprehensive dataset covering most U.S. counties, but the work does not include a field implementation or randomized test of proposed model changes. SampleAdministrative sales and assessment data covering 26 million property sales spanning roughly 95% of U.S. counties (county-level status-quo assessments and property-level features), supplemented by simulated predictive models and publicly available Census (neighborhood) variables; exact years/time window not specified in the summary. Themesinequality governance adoption GeneralizabilityFocuses on U.S. property-tax assessment; results may not generalize to other public-sector prediction tasks or private-sector applications., Relies on properties that sold (sales-based data) which may be non-random and not fully representative of all taxed properties., Local legal, institutional, and administrative constraints vary across counties and may limit feasibility of adopting the recommended data or models., Improvements shown are from simulations and counterfactuals; real-world implementation may face data quality, political, or operational obstacles., Using Census or neighborhood variables could raise legal or ethical issues (e.g., proxying protected characteristics) that affect adoption and external validity., Time period/sample composition (e.g., housing market cycles) could affect the magnitude of estimated improvements.

Claims (6)

Claim	Direction	Confidence	Outcome	Details
Assessment accuracy and fairness - measured using domain-relevant metrics - are strongly correlated across counties under status quo practices. Output Quality	positive	high	assessment accuracy and assessment fairness (domain-relevant metrics)	n=26000000 0.3
Currently, systematic assessment errors cause owners of lower-valued properties to face disproportionately high tax burdens, creating regressivity in the property tax system. Inequality	negative	high	distributional tax burden (regressivity across property value quintiles)	n=26000000 0.3
In simulated assessment models, adding property features improves accuracy in most cases. Output Quality	positive	high	assessment accuracy (model predictive performance)	0.3
When accuracy improves in the simulated assessment models, fairness almost always improves as well. Output Quality	positive	high	assessment fairness (distributional error/fairness metrics) conditional on changes in assessment accuracy	0.3
Incorporating publicly available Census data into assessment models - a feasible reform in most counties - would significantly improve both accuracy and fairness relative to status quo assessments. Output Quality	positive	high	assessment accuracy and fairness after inclusion of Census data	n=26000000 0.3
These results challenge the presumed universality of the fairness-accuracy tradeoff and demonstrate that well-designed modeling improvements can advance both fairness and accuracy in large-scale public sector systems. Output Quality	positive	high	co-movement of fairness and accuracy under improved modeling practices	n=26000000 0.3