D3M · Unstructured Patterns — Unsupervised Learning

The Shape of Development

Twelve numbers describe a country. It turns out you need barely one. A working note on principal components, factors, and clusters.

Method PCA · Factor Analysis · k-means Data 178 nations · 12 indicators Course Data-Driven Decision Making

The thesis, in one axis

Feed twelve development indicators into a machine that asks only “where does the variation live?” and it hands back a single line. Each tick is a nation; colour marks the four worlds the data falls into.

Pick a country and write down twelve facts about it: how long its people live, how many years they spend in school, what they earn, whether they have internet, how many infants die, how old the median citizen is. You now hold a point in twelve-dimensional space. With 178 countries you have a cloud of 178 points in that space — impossible to picture, impossible to put on a slide.

The instinct of a manager, an economist, or a machine-learning model is the same: compress it. Find the few directions that carry most of what is going on and throw away the rest. That instinct has three classic incarnations, and this note walks through all three on one dataset so the differences become concrete rather than abstract.

Principal Component Analysis rotates the cloud to find the directions of greatest spread. Factor Analysis asks instead what unobserved forces could have generated the correlations we see. Clustering ignores directions entirely and asks which countries simply belong together. They answer different questions — and on real data, the way they disagree is where the teaching is.

A note on honesty before we start. The headline result below is almost embarrassingly clean. That cleanliness is itself a lesson: when every variable measures a facet of the same underlying thing, dimension reduction looks like magic. The harder, more interesting questions live in the small print — the 23% of variance the first component throws away, and the cluster boundaries the data does not actually want.

01 The wall of correlations

Everything is tangled with everything

Before reducing anything, look at why reduction is even possible. The matrix below is every indicator correlated with every other. If the variables were independent, it would be mostly pale. It is not pale.

Fig. 1 — Correlation matrix

Two blocks, one diagonal, almost no neutral ground

Indicators are ordered by how they load on the first principal component, which sorts the “more-is-better” outcomes away from the “more-is-worse” ones. Blue cells are positive correlation, red negative. Hover any cell.

−1 → +1 correlation

The redundancy is the point. Life expectancy, schooling, income and connectivity move together so tightly that knowing one lets you guess the others. Symmetrically, infant mortality, maternal mortality and adolescent births form their own mutually-reinforcing block. When a table looks like this, you are not measuring twelve things — you are measuring one or two things twelve times.

02 Principal Component Analysis

Rotate the cloud until the variance lines up

PCA performs one geometric trick. It re-aims the coordinate axes so the first new axis points along the direction the data spreads out most, the second points along the most spread that is left over and perpendicular to the first, and so on. No information is lost — twelve indicators still give twelve components — but the variance is now front-loaded. The question is how front-loaded.

Fig. 2 — Scree plot

One component does the work of nine

Bars show the share of total variance each component captures; the line is the running total. The dashed mark is the Kaiser threshold — components above it explain more than a single original variable would.

76.8% in a single direction. The first component alone reconstructs more than three-quarters of the spread across all twelve indicators. By the second component the curve has already flattened. This is the quantitative version of the heatmap: the data is, to first approximation, one-dimensional.

What is the axis actually made of?

A component is a weighted recipe of the original variables. The weights — called loadings — tell us what the new axis means. For the first component the recipe is unambiguous.

Fig. 3 — Component loadings

PC1 is “human development”; PC2 is something else entirely

Each bar is one indicator’s correlation with the component. Long bars matter; sign tells direction. The first component pulls every wealth-and-health variable to one side. The second is dominated by a single, surprising indicator.

The second axis is violence. PC1 cleanly separates living standards from mortality — call it a development score, and indeed it correlates 0.97 with the published Human Development Index. PC2 captures almost nothing except the homicide rate: a thin slice of variation, orthogonal to wealth, that turns out to carry real meaning.

Every country on two axes

Project all 178 nations onto the first two components and the abstract becomes a map. Left-to-right is development; bottom-to-top is insecurity. Recolour the same points three ways to see what each lens reveals.

Fig. 4 — The principal-component plane · interactive

A development gradient, with a vertical surprise

Horizontal position is the development score (PC1); vertical is the insecurity score (PC2). Switch the colouring to read the same geometry as a continuous gradient, as world regions, or as the discovered clusters. Hover any point.

Colour

The Latin-American anomaly. Colour by development and the cloud is a smooth ramp. Colour by region and a band of Latin-American and Caribbean states floats high on the violence axis despite middling development — Venezuela, El Salvador, Jamaica, the Bahamas. PC2 is small in variance but it isolates exactly the countries a one-number index would mislabel.

03 Factor Analysis

From “directions of variance” to “hidden causes”

PCA is a description of the data’s shape. Factor analysis makes a stronger, more scientific claim: that a handful of latent variables we cannot observe — call them development, demographic maturity, insecurity — are the real drivers, and that each measured indicator is just a noisy reflection of them. Mathematically the two start in the same place. The difference is what we do next.

Factor analysis rotates the extracted axes to make them interpretable, deliberately trading the “first axis hogs all the variance” property for axes that each map cleanly onto a few indicators. The giant first component splits apart.

PCA asks where the data spreads. Factor analysis asks what is doing the spreading.

Fig. 5 — Unrotated vs. rotated structure

Rotation turns one fat factor into three legible ones

Left column: the raw first principal component, where everything loads heavily. Right three columns: a varimax-rotated three-factor solution. Cell colour is the loading; darker means the indicator speaks more strongly to that factor.

Three forces, not one. Rotation redistributes the variance into Living standards (income, health, schooling, connectivity), Age structure (median age and old-age dependency — why ageing Japan and youthful Niger sit far apart even at similar safety), and Insecurity (homicide, standing almost alone). The same arithmetic that gave PCA one dominant axis gives factor analysis a story with named parts. This is also why the oil-rich Gulf states look “anomalous” to a single index: high living standards, but a demographically young age structure.

04 Clustering

Stop drawing axes. Start drawing borders.

The first two methods give every country a position on continuous scales. Clustering does something categorical: it partitions the 178 nations into groups that are internally similar and externally distinct. The manager’s version of the question is “how many meaningfully different kinds of country are there, and which is which?”

k-means needs to be told how many groups to find, so the first task is choosing k honestly — and the honest answer is uncomfortable.

Fig. 6 — Choosing the number of clusters

The data prefers two groups; we will overrule it

The line is the silhouette score (higher = cleaner separation) at each candidate k. The bars show within-cluster spread, whose “elbow” is the other classic cue.

A continuum resists being cut. Silhouette peaks at k=2 — rich versus poor — because development is genuinely a smooth gradient, not a set of islands. There is no clean number of clusters because there are no clean gaps. We choose k=4 not because the geometry demands it but because a four-tier reading is more useful than a two-tier one. That trade — fidelity for usefulness — is a decision, not an output.

The four worlds, placed on the same map

Here are the k-means clusters drawn back onto the principal-component plane from Figure 4. Because development is the dominant axis, the clusters line up along it like a thermometer — but watch the vertical scatter.

Fig. 7 — Clusters on the PC plane

Four bands along development — and one that floats upward

Same coordinates as Figure 4; points coloured by cluster, with shaded hulls. Hover any point.

Clusters echo the components. The partition is mostly a slicing of the development axis into four tiers. But the lower-middle tier rides noticeably higher on the insecurity axis than its neighbours — the homicide signal from PC2, resurfacing as a property of a group rather than a direction. Two methods, asked different questions, point at the same fact.

What defines each world

A cluster is only useful if you can say what it is. The profile below shows how each group sits, indicator by indicator, against the global average.

Fig. 8 — Cluster profiles

Each tier’s signature, in standard deviations from average

Rows are clusters, columns indicators. Blue means above the global mean, red below; intensity is the size of the gap (in z-scores). Read across a row to read a way of life.

Legible at a glance. The advanced tier is uniformly blue on the good indicators and deep red on mortality and youth-dependency; the least-developed tier is the photographic negative. The interesting cells are the off-pattern ones — the homicide column refusing to follow the development gradient — which is exactly the variance PCA parked on its second axis.

The same structure, built from the bottom up

k-means imposes a flat partition. Hierarchical clustering instead merges nations pair by pair, nearest first, recording the whole genealogy as a tree. Cutting the tree at the right height recovers the four tiers — and the branch heights show how reluctantly some groups join.

Fig. 9 — Dendrogram (Ward linkage)

A family tree of 178 nations

Each leaf on the right is a country; branches merge leftward as groups combine, the horizontal distance marking how dissimilar the merged groups were. Leaf colour is the four-cluster cut.

The first split is the widest. The earliest, left-most merge — the developed world joining the rest — spans the largest distance, the tree’s visual confirmation of PCA’s 77% first axis. Lower down, the colour bands stay almost perfectly contiguous, meaning the flat k-means tiers and the nested hierarchy broadly agree on who belongs with whom.

05 Choosing your lens

Three questions, not three answers

It is tempting to treat these as competing algorithms and ask which “wins.” They do not compete; they answer different questions, and a good analyst reaches for the one whose question matches the decision at hand.

PCA

How few numbers can I get away with?

Best when you want a compact, faithful summary — a development index, a risk score, inputs to another model. It is descriptive and assumption-light, but its axes chase variance, not meaning, and can blur distinct ideas together.

Use when the goal is compression or visualisation.

Factor Analysis

What unseen forces produced this?

Best when you believe in latent constructs and want interpretable, named dimensions. Rotation buys legibility at the cost of the “one dominant axis” simplicity, and it leans on stronger modelling assumptions.

Use when the goal is explanation or theory-building.

Clustering

Which things belong together?

Best when a decision needs discrete segments — tiers, personas, policy groups. Powerful for action, but it will happily invent borders inside a continuum, so the number of groups is your responsibility, not the algorithm’s.

Use when the goal is segmentation or assignment.

On this dataset the three converge on one story told three ways. PCA found that development is overwhelmingly one-dimensional and quietly flagged violence as a second, smaller theme. Factor analysis unpacked that first dimension into living standards and age structure, and confirmed insecurity as its own latent force. Clustering carved the development continuum into four working tiers and, without being asked, reproduced the violence signal as a property of the lower-middle group.

The convergence is reassuring; the disagreements are the education. PCA’s discarded 23%, factor analysis’s rotated axes, and clustering’s arbitrary fourth boundary are not flaws to be hidden — they are the places where a thoughtful analyst earns their keep, deciding how much structure the data truly supports and how much is being imposed for the sake of a cleaner slide.

The one-slide takeaway. Dimension reduction is a negotiation between fidelity and usefulness. When indicators are highly correlated — as nearly all development data is — the math will gladly hand you a single clean number. Whether that number is wisdom or oversimplification depends entirely on the 23% you agreed to throw away.

Data & Method

Source. Country-level indicators compiled from World Bank, UNDP Human Development Reports, and related cross-sectional collections (a single recent snapshot, not a time series). Forty-one World Bank regional and income aggregates (“Sub-Saharan Africa”, “High income”, etc.) were removed so that only sovereign units enter the analysis.

Variables. Twelve indicators spanning health, education, income, demography, connectivity and safety. Three skewed monetary or rate variables — GNI per capita, the maternal-mortality ratio and the homicide rate — were log-transformed before analysis. The Human Development Index itself was deliberately excluded from the inputs (it is built from several of them) and held back only to validate the first component.

Complete cases. Of 217 sovereign units, 178 had non-missing values on all twelve indicators and form the analysis sample. Countries dropped for missingness are disproportionately very small states and a few conflict-affected nations; conclusions about those groups should be read with that caveat.

Estimation. All indicators were standardised to zero mean and unit variance, so no variable dominates by virtue of its units. PCA on the correlation matrix; factor analysis via a three-factor varimax-rotated solution (sampling adequacy KMO = 0.89, Bartlett’s test of sphericity p < 0.001, both indicating the data is well-suited to factoring). Clusters from k-means on the standardised indicators (k chosen at 4 for interpretability over the silhouette-optimal k = 2); the dendrogram uses Ward linkage on Euclidean distance.

Honest limits. This is one cross-section: it shows how countries differ today, not how any country changes over time. The strong one-dimensionality is partly real and partly an artefact of indicators that are definitionally entangled. And cluster boundaries, as Figure 6 stresses, are imposed on a continuum — convenient fictions, not natural kinds.