Blue book values, as mentioned in the last post, are derived from regression statistics. And regressions assume a normal distribution around the mean. Single values that are too far off the mean cause the “line of best fit” to be skewed–the angle of the line is thrown off. These “outliers” have to be removed from the data.
In academic research, this can be a problem. When we are studying human beings, we must assume that, for a given trait, people vary uniformly about the mean. In other words, we assume that we can give a number that portrays a group of people. And this group clusters around the mean closely enough that we can use this number to talk about them. So a researcher that notes that the average IQ of his study sample is 110 is telling us that the sample is above average, and giving us an overall view of the participants in that group. And we must be careful in reading it not to assume that every member of that group is above average, or that each member has an IQ of 110.
But when we use regression or other sophisticated techniques that compute variance, a single outlier can give us “spurious” results. I reviewed an article for publication once that had a statistically significant finding. The author reported only averages for each group, with standard deviations (the “spread-out-ness” of the scores within a group). I was able to work backwards and determine that one person had scores that were so far from the mean that the results and conclusions were entirely due to this one person. This is particularly problematic when a researcher uses a small sample in the study; in this case only 8 people were studied. So by not throwing out the outlier, and not mentioning the fact that the statistical test was significant because of a single participant, the researcher might have otherwise published the article without noting the cause of the results.
But in most studies of this type, the researcher may “throw out” an outlier. Sometimes this happens for legitimate reasons, as for example when a participant is found to have a much different profile than the others in the study. For example, when well-conditioned athletes are recruited and one of the participants turns out not to be in good shape, the results could be flawed. Other times an outlier interferes with the finding of significant differences between two groups, and the researcher must weigh the issues involved in removing an outlier.
But there is a more subtle, more troubling assumption with the idea of outliers. The fact that outliers must be thrown out implies that, for every trait studied, humans are really all alike. If we study joy or love or depression, we must assume that the trait can be quantified identically. A “4” on a happiness scale would have to mean the same thing for you and for me and for John Doe. In fact, it is common for happiness researchers begin with the idea that happiness can be quantified on (for example) a five-point scale, and that each person anchors it identically.
Is it any wonder social scientists shy away from studying topics like spirituality, in which each person has a unique experience? It is in areas in which we are unique that numbers become less useful. Researchers who use quantitative measures rarely discuss the assumption behind their use of statistics.
That’s one of the key underlying themes of this blog: each of us is a unique person. If this is true, then it leads to a second theme of this blog: it will be illuminating to examine the assumptions made by statistical researchers in the study of the human condition.