↖︎ Vishal Singh

Topic 2 · Quantifying Metrics · Interactive Exercise

The Southwest
Effect

Southwest enters a route and fares fall. But by how much — and is it really Southwest, or just that Southwest flies the cheap routes? Use regression to hold everything else constant and isolate one airline's footprint on price.

01 — The tempting answer

Start with the simple difference

Before any modeling, do what any manager would do first: split the routes into those Southwest flies and those it doesn't, and compare the average fare.

Average fare, by Southwest presence

Diagnostic BI
No Southwest
$0
Southwest present
$0
Raw gap: –$0 lower when Southwest flies the route

So we are done — Southwest knocks roughly $142 off the fare? Not so fast. Southwest doesn't fly routes at random. It historically favored shorter, denser, more competitive markets. Short routes are cheaper anyway. So part of that gap isn't the "Southwest effect" at all — it's the kind of route Southwest chooses to enter.

"Sales fell after the campaign" is not "the campaign caused sales to fall." A difference in averages mixes the effect we want with the differences we forgot to account for.

This contamination has a name: confounding. Distance and competition both affect fare and correlate with where Southwest flies. To recover the real Southwest effect, we have to compare like routes with like routes.

02 — Build the model yourself

What "best fit" actually means

A regression doesn't do magic. It picks the four numbers — one intercept and three slopes — that make the model's predicted fares come closest to the real fares across all 598 routes. "Closest" means the smallest sum of squared errors (SSE).

predicted_fare = b₀ + b₁·SouthWest + b₂·Distance + b₃·OtherAirlines

Drag the four sliders to hand-tune your own model. Each route's error is (actual − predicted); we square it (so over- and under-shooting both count, and big misses hurt more) and add them all up. Try to drive the SSE as low as you can — then hit Solve to see the mathematically optimal answer.

Interactive · least squares R² = 0.00
120
0
0.100
0
Sum of squared errors (lower is better)
vs. best: —

The "Solve" button runs the exact same normal-equations math that lm() in R or the LINEST function in Excel uses — there is nothing you can type that beats it on SSE. That single best-fit line is the regression.

03 — Read the estimate

From four numbers to a business answer

Here is the fitted model. Toggle between the dollar model and the log-log (elasticity) model and watch how the interpretation changes.

Fare = b₀ + b₁·SW + b₂·Distance + b₃·OtherAirlines

R² = —
TermEstimateWhat it says

Why the number shrank

The naive gap was about $142. After holding distance and competition constant, the Southwest effect is about $49. The missing ~$93 was confounding: Southwest's short, competitive routes were already cheaper.

Association, not yet proof

This is observational data — Southwest chose its routes. Regression controls for what we measured (distance, competition), but unmeasured factors could remain. It's a strong argument, not a clean experiment.

04 — See it on one slice

Distance is the biggest confounder

Longer flights cost more — that's the upward trend. Southwest routes (amber) cluster among shorter, cheaper flights. The dollar model and the log–log model fit that pattern differently.

Fare vs. distance · 598 routes drag the toggle in §03 — this chart follows it
No Southwest Southwest present fitted fare (avg. 1 competitor)

05 — Check your understanding

Make the manager's call

06 — The decision memo

What a manager does with this

Finding

Holding distance and competition constant, a Southwest presence is associated with roughly $49 lower fares — about a 24% reduction in the log model. Adding one more competing airline is worth about $41 off on its own.

Implication

For an incumbent on a Southwest-threatened route, the pricing pressure is real and large. For a regulator, it's evidence that low-cost entry — not just "more airlines" — drives consumer prices down.

Limitation

Route entry isn't random. The estimate is an association after measured controls; a cleaner answer would track fares on the same routes before and after Southwest entered (a difference-in-differences design — Topic 2, later).

Better next test

Collect a panel of routes over time with entry dates, add route and year fixed effects, and check whether fares fell after entry relative to comparable routes Southwest never touched.

A dashboard tells you where to look. Regression tells you how much one lever is worth once you stop comparing apples to airports.