Topic 2 · Quantifying Metrics · Interactive Exercise
Southwest enters a route and fares fall. But by how much — and is it really Southwest, or just that Southwest flies the cheap routes? Use regression to hold everything else constant and isolate one airline's footprint on price.
01 — The tempting answer
Before any modeling, do what any manager would do first: split the routes into those Southwest flies and those it doesn't, and compare the average fare.
So we are done — Southwest knocks roughly $142 off the fare? Not so fast. Southwest doesn't fly routes at random. It historically favored shorter, denser, more competitive markets. Short routes are cheaper anyway. So part of that gap isn't the "Southwest effect" at all — it's the kind of route Southwest chooses to enter.
This contamination has a name: confounding. Distance and competition both affect fare and correlate with where Southwest flies. To recover the real Southwest effect, we have to compare like routes with like routes.
02 — Build the model yourself
A regression doesn't do magic. It picks the four numbers — one intercept and three slopes — that make the model's predicted fares come closest to the real fares across all 598 routes. "Closest" means the smallest sum of squared errors (SSE).
Drag the four sliders to hand-tune your own model. Each route's error is (actual − predicted); we square it (so over- and under-shooting both count, and big misses hurt more) and add them all up. Try to drive the SSE as low as you can — then hit Solve to see the mathematically optimal answer.
The "Solve" button runs the exact same normal-equations math that lm() in R or the LINEST function in Excel uses — there is nothing you can type that beats it on SSE. That single best-fit line is the regression.
03 — Read the estimate
Here is the fitted model. Toggle between the dollar model and the log-log (elasticity) model and watch how the interpretation changes.
| Term | Estimate | What it says |
|---|
The naive gap was about $142. After holding distance and competition constant, the Southwest effect is about $49. The missing ~$93 was confounding: Southwest's short, competitive routes were already cheaper.
This is observational data — Southwest chose its routes. Regression controls for what we measured (distance, competition), but unmeasured factors could remain. It's a strong argument, not a clean experiment.
04 — See it on one slice
Longer flights cost more — that's the upward trend. Southwest routes (amber) cluster among shorter, cheaper flights. The dollar model and the log–log model fit that pattern differently.
05 — Check your understanding
06 — The decision memo
Holding distance and competition constant, a Southwest presence is associated with roughly $49 lower fares — about a 24% reduction in the log model. Adding one more competing airline is worth about $41 off on its own.
For an incumbent on a Southwest-threatened route, the pricing pressure is real and large. For a regulator, it's evidence that low-cost entry — not just "more airlines" — drives consumer prices down.
Route entry isn't random. The estimate is an association after measured controls; a cleaner answer would track fares on the same routes before and after Southwest entered (a difference-in-differences design — Topic 2, later).
Collect a panel of routes over time with entry dates, add route and year fixed effects, and check whether fares fell after entry relative to comparable routes Southwest never touched.