xkcd.com // CC 2.5

I certainly believe that scientific research is important.  Research uncovers new knowledge and prunes away facts that are not accurate.  However, in our society, research is also a coinage to justify views of reality. A Biblical scholar might invoke a sentence from the Bible before holding forth on his own interpretation or opinions. In a similar manner, a scientific study might be cited or a scientist quoted to justify that something is real before jumping off into one’s own thoughts, opinions, theories, or justifications.  If a scientific result can be invoked, we can believe that something is true. Is there an unconscious?  Freud said so, but he’s out of date.  Are we intrinsically social beings?  Evolutionary theorists argue. Does meditation really result in an altered state of consciousness?  If I present results from research, preferably using a high tech measurement like a brain scan, or if I can come up with a theory that uses words like “neural nets” or “neurotransmitters,” then I can believe all of these things.

What’s wrong with this? Isn’t this science doing its job of uncovering truth?  There are two things wrong with this. One is that not all knowledge is scientific knowledge.  The second is that scientific results are often portrayed inaccurately in our society.

With regard to the first point, I’ll just give a few examples.  von Bertalanffy, a systems theory scientist, wrote that even a physicist will chase his (sic) hat when the wind blows it without knowing the mathematics determining which way the hat will blow.   Einstein famously said that not everything that was important could be measured, and not everything that could be measured was important.

But what I really want to talk about here is the second point.  We are inundated with scientific results in newspapers, websites, and other places. Most often, a brief summary of research is followed by broad generalizations about what the research means.   However, the outcome of research is not simple facts. Experiments are complicated things that must be evaluated by readers and understood in context.  When I was a graduate student in psychology, every class included practice in critiquing research.

To understand research, certain mathematical ideas are important.  “Statistical significance” is important to both accurate interpretation of research and to inaccurate or misleading reports. If you’ll bear with me, I’ll run through what I mean. Suppose you have a coin. If you toss the coin 100 times, it will come up heads about 50 times, not exactly 50 but close. Why?  That’s just the way the world we live in works, there are laws of probability. Since there are two possible outcomes—heads or tails—each will come up about half the time. If I toss my coin 100 times and it always comes up heads, I’ll probably conclude the coin is biased.  Why?  Because it just doesn’t happen; it’s extremely improbable, in the world we live in, that an honest coin would do this.

What if the coin came up heads 60 times? Is the coin honest or not?   The question is this: When is an outcome still “what you would expect by chance even though the numbers are not exactly alike (since we expect approximately 50 heads, not exactly 50)”?  On the other hand, when is the difference big enough that you would conclude that the coin is probably biased?   Sometimes it’s hard to tell.  In research, very often results are in the “hard to tell” category.   For example, if 55 percent of the women in my research prefer chocolate ice cream, while 65 percent of the men prefer chocolate, is there a real sex difference (it’s so improbable there’s a real difference) or is there not (the numbers seem different, but I’m not sure whether this is just because there is a range due to chance and not a real difference). Sometimes numbers that seem very different are actually what you could commonly get by chance, and sometimes numbers that don’t seem very different are very improbable.  In addition, what I’m studying may produce a weak rather than a larger, obvious effect because among us humans, for all kinds of psychological, social, and biological research, what is being studied is only one factor contributing to a situation and not the only thing going on.  In the example, even if men and women do have different likelihoods of preferring chocolate, there are many possible reasons for a person’s choices—diabetes, city you grew up in, getting rejected by a date while you were eating chocolate ice cream, etc.

Enter tests of statistical significance. These are mathematical procedures which assess how likely an outcome is to have occurred by chance if there was no real underlying difference. If my statistical test revealed that the difference in the percentages of men and women who prefer chocolate ice cream could have occurred purely by chance only one time out of a thousand, I would conclude that my results were in the “there probably is a sex difference” category. Researchers have an arbitrary  convention:  If results could have happened by chance 5% of the time or less, then the results are considered evidence of a real difference and are said to be “statistically significant.”

When media reports state that results are “significant,” very often they mean “statistically significant.”  However, statistically significant only means “unlikely to have occurred by chance.” How important a result is is a completely different question.  For example, suppose that I was studying a medication that worked about 70% of the time; a sugar pill didn’t work.  Suppose this was extremely unlikely to have occurred by chance; that is, the results were statistically significant, and I, therefore, had evidence that the pill was having an effect.  If we’re talking about a cancer medication given to people who would otherwise die, but now 70% didn’t, this would be a powerful effect.  I would be happy and excited.  Suppose instead that the medication was a weight loss pill, and 70% of the people using it lost 5 pounds after 18 months while people given a placebo sugar pill (or perhaps a sugar substitute) didn’t.  Even though this result was also statistically significant—I have evidence that those folks wouldn’t have lost the 5 pounds without the pill—the amount of weight lost was so small that I wouldn’t be happy and excited that I had found a new, important weight loss pill.

The research that Chris Hitchcock discusses in her March 13 re:Cycling post is a good example.  Women were best at picking out a picture with a snake during the days immediately before their menstrual period.  The results were extremely statistically significant—many of the results would have occurred by chance less than one time in ten thousand—and were reported in the media. However, what these results mean and how important they are in affecting behavior are separate questions.  Chris discusses the results—the response was faster by 1/5 of a second.  She also discusses the theoretical implications the authors choose to draw—that women are responding to anxiety and fear, and that this has something to do with human evolution and PMS.  However, does a tiny change in reaction time indicate a meaningful change in anxiety level or the ability to detect danger?  Are the changes in reaction time necessarily due to anxiety? For example, the subjects were assigned to experimental groups based on phase of their cycle.  Does this mean that they knew the research was about menstruation?  If so, this could have influenced their behavior.

There are other important points to being a canny consumer of research reports.  As when buying a used car, or even a new car, you can get a really good vehicle, but it’s a good idea to be knowledgeable before making a purchase.  Let the buyer beware.

Simple Follow Buttons