Do not cluster on dollars.
Raw lottery sales mostly follow population and retailer availability. The case starts by separating scale from play style.
Part 4 case study
A NYC ZIP-level segmentation built from NY Lottery baseline behavior. The model ignores demographics while it learns the behavior space, then uses income, education, race, ethnicity, population, and retailer availability to interpret the segments afterward.
PCA scores from 35 lottery behavior features
Each dot is an active NYC ZIP. Color is the final k-means segment.
Raw lottery sales mostly follow population and retailer availability. The case starts by separating scale from play style.
PCA turns correlated product, channel, add-on, timing, entropy, and habit measures into readable coordinates.
Factor analysis names latent play modes: daily habit, rapid convenience, jackpot add-ons, and incidental variation.
K-means on behavior scores gives three segments. Demographics explain them only after the clusters are formed.
The axes and clusters use product mix, channel context, add-on rates, timing, concentration, entropy, volatility, and habit signals. No demographics enter the fit.
A ZIP placed in the daily-routine segment is behaviorally similar to other daily-routine ZIPs. That does not say why residents play that way.
PCA gives the compact map. Factor analysis gives a language for the latent play styles behind the observed variables.