Lots of smoke, hardly any gun. Do climatologists falsify data?
One of climate change denialists’ favorite arguments concerns the fact that not always can weather station temperature data be used as raw. Sometimes they need to be adjusted. Adjustments are necessary in order to compensate with changes the happened over time either to the station itself or to the way data were collected: if the weather station gets a new shelter or gets relocated, for instance, we have to account for that and adjust the new values; if the time of the day at which we read a certain temperature has changed from morning to afternoon, we would have to adjust for that too. Adjustments and homogenisation are necessary in order to be able to compare or pull together data coming from different stations or different times.
Some denialists have problems understanding the very need for adjustments – and they seem rather scared by the word itself. Others, like Willis Eschenbach at What’s up with that, fully understand the concept but still look at it as a somehow fishy procedure. Denialists’ bottom line is that adjustments do interfere with readings and if they are biased toward one direction they may actually create a warming that doesn’t actually exist: either by accident or as a result of fraud.
To prove this argument they recurrently show this or that probe to have weird adjustment values and if they find a warming adjustment they often conclude that data are bad – and possibly people too. Now, let’s forget for a moment that warming measurements go way beyond meteorological surface temperatures. Let’s forget satellite measurements and let’s forget that data are collected by dozens of meteorological organizations and processed in several datasets. Let’s pretend, for the sake of argument, that scientists are really trying to “heat up” measurements in order to make the planet appear warmer than it really is.
How do you prove that? Not by looking at the single probes of course but at the big picture, trying to figure out whether adjustments are used as a way to correct errors or whether they are actually a way to introduce a bias. In science, error is good, bias is bad. If we think that a bias is introduced, we should expect the majority of probes to have a warming adjustment. If the error correction is genuine, on the other hand, you’d expect a normal distribution.
So, let’s have look. I took the GHCN dataset available here and compared all the adjusted data (v2.mean_adj) to their raw counterpart (v2.mean). The GHCN raw dataset consists of more than 13000 station data, but of these only about half (6737) pass the initial quality control and end up in the final (adjusted) dataset. I calculated the difference for each pair of raw vs adj data and quantified the adjustment as the trend of warming or cooling in degC per decade. I got in this way a set of 6533 adjustments (that is, 97% of the total – a couple of hundreds were lost in the way due to the quality of the readings). Did I find the smoking gun? Nope.
Not surprisingly, the distribution of adjustment trends2 is a quasi-normal3 distribution with a peak pretty much around 0 (0 is the median adjustment and 0.017 C/decade is the average adjustment – the planet-warming trend in the last century has been about 0.2 C/decade). In other words, most adjustments hardly modify the reading, and the warming and cooling adjustments end up compensating each other1,5. I am sure this is no big surprise. The point of this analysis is not to check the good faith of people handling the data: that is not under scrutiny (and not because I trust the scientists but because I trust the scientific method).
The point is actually to show the denialists that going probe after probe cherry-picking those with a “weird” adjustment is a waste of time. Please stop the nonsense.
Edit December 13.
Following the interesting input in the comments, I added a few notes to clarify what I did. I also feel like I should explain better what we learn from all this, so I add a new paragraph here (in fact, it’s just a comment promoted to paragraph).
How do you evaluate whether adjustments are a good thing?
To start, you have to think about why you want to adjust data in the first place. The goal of the adjustments is to modify your reading so that they could be easily compared (a) inter-probes and (b) intra-probes. In other words: you do it because you want to (a) be able to compare the measures you take today with the ones you took 10 years ago at the same spot and (b) be able to compare the measures you take with the ones your next-door neighbor is taking.
So, in short, you do want your adjustment to siginificatively modify your data – this is the whole point of it! Now, how do you make sure you do it properly? If I were to be in charge of the adjustment I would do two things. 1) Find another dataset – one that possibly doesn’t need adjustments at all – to compare my stuff with: it doesn’t have to cover the entire period, it just has to overlap enough to be used as a test for my system. The satellite measurements are good for this. If we see that our adjusted data go along well with the satellite measurements from 1980 to 2000, then we can be pretty confident that our way of adjusting data is going to be good also before 1980. There are limits, but it’s pretty damn good. Alternatively, you can use a dataset from a completely different source. If the two datasets arise from different stations, go through different processings and yet yield the same results, you can go home happy.
Another way of doing it is to remember that a mathematical adjustment is just a trick to overcome a lack of information on our side. We can take a random sample of probes and do a statistical adjustment. Then go back and look at the history of the station. For instance: our statistical adjustment is telling us that a certain probe needs to be shifted +1 in 1941 but of course it will not tell us why. So we go back to the metadata and we find that in 1941 there was a major change in the history of our weather station, for instance, war and subsequent move of the probe. Bingo! It means our statistical tools were very good in reconstructing the actual events of history. Another strong argument that our adjustments are doing a good job.
Did we do any of those things here? Nope. Neither I, nor you, nor Willis Eschenbach nor anyone else on this page actually tested whether adjustments were good! Not even remotely so.
What did we do? We tried to answer a different question, that is: are these adjustments “suspicious”? Do we have enough information to think that scientists are cooking the data? How did we test so?
Willis picked a random probe and decided that the adjustment he saw were suspicious. End of the story. If you think about it, all his post is entirely concentrated around figure 8, which simply is a plot of the difference between adjusted data and raw data. So, there is no value whatsoever in doing that. I am sorry to go blunt on Willis like this – but that is what he did and I cannot hide it. No information at all.
What did I do? I just went a step back and asked myself: is there actually a reason in the first place to think that scientists are cooking data? I did what is called a unilaterally informative experiment. Experiments can be bilaterally informative when you learn something no matter what the outcome of the experiment is (these are the best); unilaterally informative when you learn something only if you get a specific outcome and in the other case you cannot draw conclusions; not informative experiments.
My test was to look for a bias in the dataset. If I were to find that the adjustments are introducing a strong bias then I would know that maybe scientists were cooking the data. I cannot be sure about it, though, because (remember!) the whole point of doing adjustments is to change data in the first place!. It is possible that most stations suffer of the same flaws and therefore need adjustments going in the same direction. That is why if my experiment were to lead to a biased outcome, it would not have been informative.
On the other hand, I found instead that the adjustments themselves hardly change the value of readings at all and that means I can be pretty positive that scientists are not cooking data. This is why my experiment was unilaterally informative. I was lucky.
This is not a perfect experiment though because, as someone pointed out, there could be a caveat. One caveat is that in former times the distributions of probes was not as dense as it is today and since the global temperature is calculated doing spatial averages, you may overrepresent warming or cooling adjustments in a few areas while still maintaining a pretty symmetrical distribution. So, to test this you would have to check the distribution not for the entire sample as I did but grid by grid. (I am not going to do this because I believe is a waste of time but if someone wants to, be my guest).
Finding the right relationship between the experiment you are doing and the claim you make is crucial in science.
2) What I consider here is the trend of the adjustment not the average of the adjustment. Considering the average would be methodologically wrong. This graph and this graph have both averages of adjustment 0, yet the first one has trend 0 (and does not produce warming) while the second one has trend 0.4C/decade and produces 0.4C decade warming. If we were to consider average we would erroneously place the latter graph in the wrong category.
3) Not mathematically normal as pointed out by dt in the comments – don’t do parametric statistics on it.
5) RealClimate.org found something very similar but with a more elegant approach and on a different dataset. Again, their goal (like mine) is not to add pieces of scientific evidence to the discussion, because these tests are actually simple and nice but, let’s face it, quite trivial. The goal is really to show to the blogosphere what kind of analysis should be done in order to properly address this kind of issue, if one really wants to.