Paul Krugman has a brilliant and sobering column on what he calls the Excel Depression.
Core idea. The austerity policies adopted worldwide were driven by academic work about the benefits of such policies.
But:
Finally, Ms. Reinhart and Mr. Rogoff allowed researchers at the University of Massachusetts to look at their original spreadsheet — and the mystery of the irreproducible results was solved. First, they omitted some data; second, they used unusual and highly questionable statistical procedures; and finally, yes, they made an Excel coding error. Correct these oddities and errors, and you get what other researchers have found: some correlation between high debt and slow growth, with no indication of which is causing which, but no sign at all of that 90 percent “threshold [claimed in the earlier work].” [Bold added.]
In response, Ms. Reinhart and Mr. Rogoff have acknowledged the coding error, defended their other decisions and claimed that they never asserted that debt necessarily causes slow growth. That’s a bit disingenuous because they repeatedly insinuated that proposition even if they avoided saying it outright. But, in any case, what really matters isn’t what they meant to say, it’s how their work was read: Austerity enthusiasts trumpeted that supposed 90 percent tipping point as a proven fact and a reason to slash government spending even in the face of mass unemployment.
Its scarey — and should lead all of us to check and check — particularly when our data is used in policy debates. It should also lead us to be careful to rebut it when some make excessive claims from ambiguous or uncertain data.
Moreover, as we move to dynamic triage and other more complex systems, the risks become greater and harder to monitor. We need to build in a wide variety of checking mechanisms. At the same time, we must remember that current systems are making errors all the time, and not become paralyzed.
This reminded me of Reaganomics and the period of 1980-1982 when Volker at the Fed and Stockman at OMB–made assumptions on inflation and set monetary and public policy that sent the country into a rabid recession that hurt many low income communities.
On the triage point–the best way to ensure the data collected and shared is relevan is to allow for peer review–and also to allow for all of the parties involved in the project to share information about “pre-triage data” and then “post triage data”–and to make sure the meassuring tools are fine tuned for a period of 2 to 3 years on a quarterly basis–to learn what the new “average” and new normal might be after a triage system rolls out. It is not something that can be done in 6 months–it will take some time for all involved to figure out if they are meassuring the right data points, and using them the right way–to fine tune the operations of the new system–and so projects engaging in the projects should be requesting 2 to 2 1/2 years to fine tune the project and its tool so that the empirical data can be factored in once the project rolls out and its effects can be verified and modified as needed.
We have a responsibility for our data. We have to be responsible for our use, but can we be responsible for how others use them? I think not.
Changes in the past ten years have made scientists responsible for making their data public.
That means anyone can use the data. It does not mean that they must use them responsibly; probably more important it does not mean that they must use them inteligently.
I remember many years ago when someone using our public-use data from the NHANES testified before the Congress about all the people in the United States going to bed hungry. We checked; his thousands were based on one sample person.