An Index Autopsy

Every composite index is a theory wearing a number's clothing. The ADI's theory, inherited from Gopal Singh's 1990s formulation, includes housing values and rents among its seventeen inputs: expensive homes signal advantage. In most of America that is true. In New York, San Francisco, and Los Angeles it breaks catastrophically — a poor family paying $2,400 for a crowded apartment scores as "advantaged."

Principal component analysis lets us rebuild the index with a different theory and almost no hand-tuning: throw nine ACS measures of income, education, employment, and family structure into the machine — no housing costs — and let the data find the single axis that explains the most variation. We call the result the ZCD index. The autopsy compares the two.

What the machine chose

The first component absorbs 51 percent of the variance across nine measures, and its loadings need no massaging to interpret: poverty, public assistance, unemployment, low education, and single-parent households pull one way; income, college, homeownership, and labor-force participation pull the other. Disadvantage, the data agree, is one axis.

First-component loadings. PCA of nine standardized ACS tract measures (2019–2023), oriented so positive = more deprived. scikit-learn PCA; 84,000 tracts.

Two indices, one disagreement

Plot every tract's ADI percentile against its ZCD percentile. The diagonal mass is the 0.68 correlation — for most of the country the two theories of deprivation agree. The rebellious cloud above the diagonal is dominated by expensive-housing-market tracts (red: ZIP median home value over $450k): places the ZCD calls deprived and the ADI, looking at their rents, calls comfortable.

ADI vs. ZCD percentile, 4,500-tract sample. Red: tracts whose ZIP-area median home value exceeds $450k (expensive housing markets). Sources: Neighborhood Atlas; ACS PCA; ACS home values.

The map of the disagreement

Average the divergence by county and the quarrel acquires a geography. The ADI under-calls deprivation (red) in exactly the places this series kept tripping over: the Bronx (+54 percentiles), Brooklyn (+52), urban California. It over-calls deprivation (blue) across cheap-housing metros — upstate New York, Pittsburgh, suburban St. Louis — where low rents read as poverty that the labor market data don't corroborate.

ADI overstatesADI understates

ZCD minus ADI percentile, county average (population- weighted; counties with ≥8 tracts). Red = our housing-free index says the county is more deprived than ADI does. Sources: as above.

The horse race — and the honest verdict

Which index better predicts life expectancy? Nationally it is essentially a tie (R² = 0.36 vs 0.37). Within counties, ADI's slope per 10 percentiles is steeper — partly because nationally-ranked percentiles compress differently inside metros. The fair conclusion is not that one index wins; it is that the gradient is robust to the theory of deprivation, while the ranking of specific places is not. Use either index for national gradients; use neither uncritically in expensive cities — and never let a single composite carry a city-level claim alone.

Predicting tract life expectancy. National R² and the within-county slope per 10 percentiles (95% CI), each index in turn. pyfixest, population-weighted, county-clustered SEs.

Notes & data

Inputs. Nine ACS 2019–2023 tract measures: poverty rate, public assistance, unemployment, HS-or-less share, single-parent share, log median income, labor-force participation, homeownership, college share. Standardized; PCA via scikit-learn; PC1 percentile-ranked nationally and oriented so higher = more deprived. Loadings published above and in the variable registry.
Why no housing costs. The exclusion is the experiment: ADI's known weakness (documented in the health-services literature and visible in our story № 1, where East Harlem scored ADI 1–7) comes from rents and home values entering as advantage. Homeownership stays in (tenure ≠ price).
Divergence ≠ error. ADI is a fixed, validated, citable national standard; ZCD is one PCA on fewer inputs. Where they disagree, the truth likely sits between: expensive-city tracts are less materially deprived than ZCD implies (housing quality, services) and more than ADI implies (income left after rent). The book will report both where it matters.
Horse race details. `le ~ index` (national, weighted) and `le ~ index | county_fips` (within), per index. In-sample R² only — no claim of out-of-sample superiority either way. Within-county slope differences partly reflect differing within-county variance of nationally ranked percentiles.
Reproducibility. `articles/_build/prep_data_c.py`; seeds fixed; loadings deterministic up to sign (orientation fixed on poverty).