Monday, October 31, 2011

More Propaganda by Statistics

Given:
A youth basketball league, ages 11 – 15. Every player in the league has been a participant as long as they are eligible. The kids play basketball five days per week, for several hours per day.

Annual skills assessments are taken of the players at the beginning and end of each year. Not surprisingly, the mean skills score for each age group is higher than all the younger groups.

As the league matures and becomes more competitive, the mean skills score for the entire league increases. The 11-year-olds enter the league with approximately the same mean skill level every year (league participation isn't magically conferred to kids who haven't yet participated).

Without putting any numbers to our assessments, we can make several observations about our league.

First, year-over-year, the 11-year-olds as a group enter the league at a greater skills deficit compared to the rest of the league than previous groups of 11-year-olds. That is, although the league mean increases year-over-year, the 11-year-old mean score remains static.

Second, therefore, year-over-year increases in mean skills scores are skewed towards the older kids. As the league mean increases over time, and each age group's mean is higher than all younger age groups, the 15-year-olds have to increase as much or more (on average) every year than any other age group. The 14-year-olds have to increase as much or more (on average) than the 13, 12, and 11-year-olds, and so forth.

Third, although (and because) each group of 11-year-olds enters at a greater skills deficit and the league mean is increasing, each group's mean skills at a given age are increasing at a greater rate than any older group. In other words, this year's 12-year-olds will gain more skills this year than the 13, 14, and 15-year-olds did when they were 12.

Fourth, if the entire league was divided into five groups, not by age but by skills assessment, we would expect a strong correlation between the two methods of division. Furthermore, that correlation would be strongest for the top and bottom quintiles.

Fifth, as the bottom quintile is strongly correlated with 11-year-olds, and the 11-year-olds enter the league with approximately the same skills every year, the mean skills of the bottom quintile will also be static year-over-year.

Sixth, some 11-year-olds may not start out in the bottom quintile, nor may they reach the top quintile of skills by the time they turn 16. However, for the givens to hold true, only strong outliers from the mean could enter the league in the bottom quintile and remain there for the entire five years.

Leaving other considerations aside, it would be nonsense to conclude from the scenario given that any of our observations are a priori bad (or a priori good). Rather, they are what they are.

That's not to say that the other considerations (whatever they are) are negligible. Rather, it's to say that the givens and observations do not support on their own the idea that the other considerations skew the skills distribution and skills increases. The observations are wholly expected and explained by normal league play.

Put another way, it is not valid from the information given to make the statement (or inference) that, because the skills of the bottom quintile are static and the skills of the top quintile are increasing at a higher rate than any other quintile:

a) the noted increases are due to some consideration other than normal league play; and
b) that is a priori bad.

Better, more granular data would be required to validly draw those conclusions. This is obvious, I believe, because no matter how large we envision our league to be, we can conceptualize the league and the individual players entering as Gumby-esque 11-year-olds and progressing to 15-year-old league veterans.

We would (rightly) recognize and denounce unsupported statements (or inferences) a and b above as feeble attempts at propaganda by statistics.

However, that hasn't stopped the propagandists from presenting the front cover of this CBO report with inferences a and b, while conveniently ignoring the weak caveats in paragraph 7 of the summary:
The growth in average income for different groups over the 1979–2007 period reflects a comparison of average income for those groups at different points in time; it does not reflect the experience of particular households. Individual households may have moved up or down the income scale if their income rose or fell more than the average for their initial group. Thus, the population with income in the lowest 20 percent in 2007 was not necessarily the same as the population in that category in 1979.
As it turns out, not only is the population not the same, but what constitutes a household has changed. The aggregation of aggregates into household income hides other important data as well:
American households in the top income quintile have almost five times more family members working on average than the lowest quintile, and individuals in higher-income households are far more likely than lower-income households to be well-educated, married, and working full-time in their prime earning years. In contrast, individuals in low-income households are far more likely to be less-educated, working part-time, either very young or very old, and living in single-parent households.
Please note that by making these observations I am not saying the other considerations are trivial and that there aren't real problems concerning private profits and social losses. Neither does my linking to the analyses of others constitute approbation for their possible biases (nor, certainly, any ad hominem attacks). Rather, the time I would spend recreating the work of others and presenting it here wouldn't add to the point of this exercise.

The point is, real problems aren't discovered and cannot be solved by witchcraft, propaganda, or any other mysticism. I'm not sure there is any desire to have a meaningful conversation free of preconceptions and propaganda, I just know such a conversation cannot start with unsupported conclusions drawn from meaningless aggregates.

Labels: , ,