ImageI had noticed the ‘collider’ sampling bias before but never thought about how common it must be:

Sampling error? Omitted variable bias? Bah, that’s for first-year grad students. What I find really interesting is there are some fairly basic principles for how analysis can get really screwy but which can’t be fixed by adding more control variables, increasing your sample size, or fiddling with assumptions about the distribution of the dependent variable. I’m thinking about really scary sources of model specification problems. Or actually, not model specification in of itself, but data collection. Your typical social science graduate curriculum talks a lot about getting standard error right but on a day to day basis most of our work goes into getting the data into the proper form and this is also where most problems come from.

But before talking math, let’s contemplate a recent overheard confession that, “Turns out those funny looking toe shoes are pretty comfortable.” As someone who feels naked without footwear that involves both socks and laces I had never given much thought to this and to the extent that I had, I assumed wearing these things was a costly signal of geekiness. But on reflection it makes perfect sense. After all if something as ridiculous looking as toe shoes were not comfortable then nobody would wear them. Conversely, four inch heels are very uncomfortable (or so I am given to understand) but many women wear them because they’re attractive. So we can imagine a negative association between how attractive shoes are and how good they feel. Indeed, this describes my own collection of incredibly comfortable but informal Chucks, fairly comfortable and decent-looking dress shoes, and a second pair of dress shoes that are uncomfortable but fancy. One interpretation of this (and bear with me as I briefly sound like a critical studies type person) would be something along the lines of a sadistic gaze wherein the perceived attractiveness of a shoe is directly derived from the discomfort we imagine it imposing on its wearer. I don’t doubt that people have made this argument but I don’t buy it as a general argument because I can imagine shoes that are both hideous and uncomfortable — say Crocs made of gravel and epoxy. There is no ontological reason why we can’t have shoes that are both hideous and uncomfortable but rather there is a practical reason in that nobody wears shoes that are terrible in every way and so such shoes don’t make it unto the market. That is, there is a big difference between the covariance of traits for all conceivable shoes versus covariance of traits among those shoes that actually get bought and worn.


I took the 2010 wave of the General Social Survey  and pulled all 395 Republicans and GOP-leaning independents (PARTYID==4/6). For these people I compared their attitudes on marijuana (GRASS) and government redistribution of wealth (EQWLTH, which I cut to a binary with responses 1/4). Among Republicans who oppose wealth distribution, 37% favor legalizing marijuana, as opposed to 38% among those who favor wealth redistribution. This difference of one percentage point is not even remotely statistically significant (chi2 0.08, 1 df).

OK, now wait a minute you may be saying, he promised us negative relationships but this is no trend at all. True, but let’s contrast it with the same analysis for the whole sample, regardless of party. In general, 42% of those who oppose redistribution favor legalized marijuana against 53% of those who favor redistribution. This relationship is strongly statistically significant (chi2 14.50, 1 df). So among the general population there is a positive association between marijuana legalization and wealth redistribution. Among Republicans this effect is perfectly counterbalanced by conditioning on a collider. People presumably join the GOP because they agree with it on at least some issues. Republicans who oppose both weed and redistribution we can call movement conservatives, those who oppose weed but favor redistribution we can call social conservative populists, those who favor weed but oppose redistribution we can call libertarians, and those who favor both we can call people who should probably change their party registration. This case illustrates how conditioning on a collider doesn’t necessarily result in a net negative relationship but rather can partially or complete suppress an underlying general trend.