Field of Science

Naomi Oreskes and false positives in climate change: Do we know enough?

Historian of science Naomi Oreskes who has long since been a campaigner against climate change and science denialists (her book "Merchants of Doubt" is a touchstone in this regard) has a New York Times Op-Ed titled "Playing Dumb on Climate Change" in which she essentially - to my eyes at least - asks that scientists slightly lower the standards of statistical significance when assessing the evidence for climate change.

It's not that Oreskes is saying that we should actually use sloppier standards for climate change analysis. Her argument is much more reasonable and specific but ultimately she still seems to embrace a kind of certainty that I do not think exists in the field. She starts by reminding us of the two common errors in any kind of statistical analysis: Type 1 and Type II. In my field I tend to use the more informative names 'false positives' and 'false negatives' so that's what I will stick with here. False positives arise when you are too generous and allow spurious data along with legitimate facts. False negatives arise when you are too curmudgeonly and disallow legitimate facts. In some sense false positives are better than false negatives since you would rather run the risk of getting some noise along with all the signal rather than missing some of the signal.

The risk of getting false negatives vs false positives is governed in part by the confidence limits you are using. A confidence limit of 95% means that your standards for accepting something as a signal are very high - it means that you are bound to reject some true signal, resulting in a false negative. When you reduce the confidence limits, say to 90%, then you are lowering the bar for marking something as a signal, leading potentially to some false positives creeping in along with the true signal.

The crux of Oreskes's argument is that until now climate scientists have largely used a very high confidence limit for accepting and rejecting evidence which in her opinion has led to some false negatives. She is right that the magic number of 95% is arbitrary and has no objective basis in natural law. But, and this is her crucial point, we now know enough about the climate to warranty using slightly lower confidence limits since any false positives emerging from this lowered limit can be identified as such and therefore rejected. Perhaps, she suggests, we could consider using a lower confidence limit like 90% for climate change studies.

I see where Oreskes is coming from and her general argument about being able to use lower confidence limits in light of increased knowledge about a field is valid, but I am simply not as convinced as she is that we will be able to identify false positives in climate data and reject them. First of all, 'climate data' is a humungous and staggeringly heterogenous mass of facts from fields ranging from oceanography to soil chemistry. While we may be more confident about some of these facts, we don't know enough about others (especially concerning the biosphere) to be as confident about them. That means that we won't even know which facts are more likely to be false positives than others. There are certainly some false positives that are in the 'known unknown' category but there are also many others which are in the 'unknown unknown' category, so how will we identify these if they show up?

The second problem is that, even though I mentioned before that sometimes accepting false positives is better than rejecting true positives, in case of climate change one must weigh this tradeoff against the enormous cost and resources needed to pursue false positives. So for instance, if a false positive data point concerns melting glaciers in some part of the world, actually taking action to address that data point would mean the expenditure of tens of millions or even billions of dollars. Sadly in case of climate change, pursuing Type I errors is not as simple as spending a few man hours and a few thousand dollars investigating blind alleys.

Oreskes like many other thinkers in the field is well-meaning and her suggestions apply to fields where enough is known to confidently identify and weed out false positives, but in case of climate change we are straitjacketed by a problem that has tragically been the bane of the field since its inception: our lack of knowledge and general ignorance about a very complex, unpredictable system. Unless we bridge this gulf of ignorance significantly, a 90% confidence limit would be as arbitrary and dangerous as a 95% confidence limit.


  1. Oreskes' proposal makes a great deal of sense from a Bayesian point of view. For claims where the prior evidence is already quite high, the evidence for that claim does not need as strong as in the case where the prior evidence is lacking. This is just the flip side of the old addage, "extraordinary claims require extraordinary evidence."

    1. Yes, I am not disputing the use of Bayesian statistics in such cases. What I find less convincing is that we can put hard numbers on priors and adjust the confidence limits accordingly from 95 to 90. I find the dataset too complex and heterogenous and the error bars too diverse for doing that. It may however be possible in certain well-defined, limited domains of the data.

  2. Entirely agree with your last sentence - to somehow think that lowering p-value cutoffs to 0.1 rather than 0.05 is at best immensely naive and at worst deliberately silly. Ronald Fisher did not hand down a cutoff of 0.05 as a meaningful or required threshold, rather as a way of judging, informally, whether the evidence was worhy of a second look. A really nice description of some of the pitfalls with the whole approach is given in Regina Nuzzo's highly readable column on p-values "Statistical Errors" (Nature Feb 2014). Scientists are best advised to spend their energy understanding the issues/liminations of the approach, and combine it with other measures (magnitude of effect sizes? A looser exploratory study combined with a stricter valiation study? A genuine effort to generate some predictions etc etc), rather than muck around with lowering an already arbitrary threshold that is already being misused.



Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS