It means nothing. Really.

[Note: Perhaps this is part two of “zero, naught, and nothing” because it is related to the concepts introduced in that section.]

In research, participants are divided up into two (or more) groups, and theoretically, at least, they are randomly assigned to those groups. One is a control group and the other is an *intervention* or *experimental* group. Some type of intervention is applied (administration of a type of therapy or drug under investigation, for example) to the experimental group, and the outcome measures of the two groups are compared to see if the *difference* is *significant*. Those two words are critical, because they are the key to understanding this type of research.

How do we know if there is a significant difference?

Let’s start with an actual example from the academic literature. Some well-known, researchers for whom I have the highest respect (e.g., Mitchell Krucoff, Frank Oz, Harold Koenig, and others) wanted to study prayer to see if they could scientifically demonstrate an effect of prayer on heart patients who underwent surgery (i.e., cardiac catheterization). In all fairness, these researchers were pioneers in their efforts to scientifically validate the spiritual dimension (which I will be blogging about soon). Their design was fairly simple—divide cardiac patients in two groups, and enroll prayer groups to pray for the experimental group. However, maintaining anonymity of the patients was a factor, so the researchers negotiated that the prayed-for patients would be identified only by their first name and last initial; “John Smith” would be “John S.” No picture or other identifying information would be transmitted to the three groups around the country who were praying daily for the experimental-group cardiac patients.

No significant differences were found between the patients who were prayed for and the control group. The researchers concluded that “prayer was not effective” in cardiac patients. This research was published in one of the most prestigious medical journals, *The Lancet*. I read the article the day after it was published, and immediately called one of my colleagues. We decided to write a letter to the editors to point out that the conclusion was not possible; the letter was published.

In other words, what was done here was the equivalent of testing batteries by putting them in a flashlight and turning on the switch. If the flashlight lights up, then we can conclude that the batteries are good. This is the equivalent of finding a statistically significant difference (the bulb was off, then it turned on).

But what if the flashlight did not light up? In that case, we could not conclude that the batteries are not good. Why not? We do not know if the flashlight works; it could be that there is not a good contact with the batteries, or the bulb is burned out, or the switch does not work. It is the equivalent of failing to find a significant difference in research; just as we cannot conclude that the batteries are not good, we cannot conclude that there is no difference between the two groups. Our only conclusion is: all possibilities remain.

[This part gets academic-boring-technical, so if you want to cut to the chase scroll on down to the brackets below.]

In research, how do we know if there’s a statistically significant difference? That’s where the use of statistical methodology comes in to play. Scientists use statistical tests to examine the differences between the two groups to see how likely it is that those results could have been derived by chance. For example, if we flip a coin 20 times and get 12 heads, how likely is that? If two people each flip a coin 20 times and one person (let’s call her “control”) records 13 heads and another person (let’s call him “experimental”) gets 10, are those results what we could normally expect, given the odds of getting these results by chance?

It’s not as simple as it looks. We can’t just say the two groups need to have a difference of, say, 10% in order to have a significant difference. This is because the results could be grouped together. For example, if we are comparing a control group that receives a placebo, and the experimental group receives a new high-tech drug to increase the rate of hair growth, then we can measure the hair growth in each group and compare them. So if the average growth of hair in the control group is 10 cm and the average growth in the intervention group is 12 cm we don’t know if the difference is meaningful or due to factors of chance. One factor that comes into play is how “spread out” the rates of growth are between members of the two groups. We might have a significant overlap between actual growth rates of members in the two groups.

An average of 10cm vs. an average of 12cm might look almost the same, however, if the two groups are spread out widely. And having 4 people in each group is different from having 1000 people in each one.

The “spread-outness” of the data has a name: we call it *variance* or *standard deviation* (standard deviation is the square root of the variance, so these are related closely enough statistically that we can use them interchangeably). Statistical formulae that are used to determine whether the difference between two groups is statistically different must account for the number of people in each group, the variance, and the actual difference between the averages for the groups.

But we have to back up just a second and talk about “hypothesis testing.” The scientist has a null hypothesis and an alternate hypothesis. The scientist begins with the null, which in this case is “There is no difference between the control group and the experimental group.” The alternate hypothesis, then, would be “there is a significant difference between the two groups” (and therefore the intervention has an effect). If the researcher finds a statistically significant difference between the two groups, he or she rejects the null and accepts the alternate hypothesis. But if the researcher fails to find a difference between the two groups, then the only conclusion that can be made is that “all possibilities remain.”

[OK. End of academic speak. What it really means when “No significant differences were found.”]

The trick is what happens when the differences between the groups aren’t large enough to find a statistically significant difference. Perhaps the two groups overlap too much, or there just aren’t enough people in the two groups to establish a significant difference; when the researcher fails to find a significant difference, only one conclusion is possible: “all possibilities remain.” In other words, failure to find a significant difference means that nothing was found. So it means nothing. Really.