A book excerpt – Retraction Watch


We are pleased to present an excerpt from Distrust: Big Data, Data-Torturing, and the Assault on Science, a new book by Pomona College economics professor Gary Smith. The Washington Post said the book’s lessons “are very much needed.”

The fact that changes in bitcoin prices are driven by fear, greed, and manipulation has not stopped people from trying to crack their secret. Empirical models of bitcoin prices are a wonderful example of data torturing because bitcoins have no intrinsic value and, so, cannot be explained credibly by economic data. 

Undaunted by this reality, a National Bureau of Economic Research (NBER) paper reported the mind-boggling efforts made by Yale University economics professor Aleh Tsyvinski and a graduate student, Yukun Liu, to find empirical patterns in bitcoin prices. 

Tsyvinski currently holds an endowed chair named after Arthur M. Okun, who had been a professor at Yale from 1961 to 1969, though he spent six of those eight years on leave so that he could work in Washington on the Council of Economic Advisors as a staff economist, council member, and then chair, advising presidents John F. Kennedy and Lyndon Johnson on their economic policies. He is most well known for Okun’s law, which states that a 1 percentage-point reduction in unemployment will increase U.S. output by roughly 2 percent, an argument that helped persuade President Kennedy that using tax cuts to reduce unemployment from 7 to 4 percent would have an enormous economic payoff. 

After Okun’s death, an anonymous donor endowed a lecture series at Yale named after Okun, explaining that 

Arthur Okun combined his special gifts as an analytical and theoretical economist with his great concern for the well-being of his fellow citizens into a thoughtful, pragmatic, and sustaining contribution to his nation’s public policy. 

The contrast between Okun’s focus on meaningful economic policies and Tsyvinski’s far-fetched bitcoin calculations is striking. 

Liu and Tsyvinski report correlations between the number of weekly Google searches for the word bitcoin (compared to the average over the past four weeks) and the percentage changes in bitcoin prices one to seven weeks later. They also looked at the correlation between the weekly ratio of bitcoin hack searches to bitcoin searches and the percentage changes in bitcoin prices one to seven weeks later. The fact that they reported bitcoin search results looking back four weeks and forward seven weeks should alert us to the possibility that they tried other backward-and-forward combinations that did not work as well. Ditto with the fact that they did not look back four weeks with bitcoin hack searches. They evidently tortured the data in their quest for correlations. 

Even so, only seven of their fourteen correlations seemed promising for predicting bitcoin prices. Owen Rosebeck and I looked at the predictions made by these correlations during the year following their study and found that they were useless. They might as well have flipped coins to predict bitcoin prices. 

Liu and Tsyvinski also calculated the correlations between the number of weekly Twitter bitcoin posts and bitcoin returns one to seven weeks later. Unlike the Google trends data, they did not report results for bitcoin hack posts. Three of the seven correlations seemed useful, though two were positive and one was negative. With fresh data, none were useful. 

The only thing that their data abuse yielded was coincidental statistical correlations. Even though the research was done by an eminent Yale professor and published by the prestigious NBER, the idea that bitcoin prices can be predicted reliably from Google searches and Twitter posts was a fantasy fueled by data torturing. 

The irony here is that scientists created statistical tools that were intended to ensure the credibility of scientific research but have had the perverse effect of encouraging researchers to torture data—which makes their research untrustworthy and undermines the credibility of all scientific research. 

Gary Smith

Traditionally, empirical research begins by specifying a theory and then collecting appropriate data for testing the theory. Many now take the shortcut of looking for patterns in data unencumbered by theory. This is called data mining in that researchers rummage through data, not knowing what they will find. 

Way back in 2009, Marc Prensky, a writer and speaker with degrees from Yale and Harvard Business School, claimed that 

In many cases, scientists no longer have to make educated guesses, construct hypotheses and models, and test them with data-based experiments and examples. Instead, they can mine the complete set of data for patterns that reveal effects, producing scientific conclusions without further experimentation. 

We are hard-wired to seek patterns but the data deluge makes the vast majority of patterns waiting to be discovered illusory and useless. Bitcoin is again a good example. Since there is no logical theory (other than greed and market manipulation) that explains fluctuations in bitcoin prices, it is tempting to look for correlations between bitcoin prices and other variables without thinking too hard about whether the correlations make sense. In addition to torturing data, Liu and Tsyvinski mined their data. 

They calculated correlations between bitcoin prices and 810 other variables, including such whimsical items as the Canadian dollar–U.S. dollar exchange rate, the price of crude oil, and stock returns in the automobile, book, and beer industries. You might think I am making this up. Sadly, I am not. 

They reported finding that bitcoin returns were positively correlated with stock returns in the consumer goods and health care industries and negatively correlated with stock returns in the fabricated products and metal mining industries. These correlations don’t make any sense and Liu and Tsyvinski admitted that they had no idea why these data were correlated: “We don’t give explanations . . . . We just document this behavior.” A skeptic might ask: What is the point of documenting coincidental correlations? 

And that is all they found. The Achilles heel of data mining is that large data sets inevitably contain an enormous number of coincidental correlations that are just fool’s gold in that they are no more useful than correlations among random numbers. Most fortuitous correlations do not hold up with fresh data, though some, coincidentally, will for a while. One statistical relationship that continued to hold during the period they studied and the year afterward was a negative correlation between bitcoin returns and stock returns in the paperboard-containers-and-boxes industry. This is surely serendipitous—and pointless. 

Scientists have assembled enormous databases and created powerful computers and algorithms for analyzing data. The irony is that these resources make it very easy to use data mining to discover chance patterns that are fleeting. Results are reported and then discredited, and we become increasingly skeptical of scientists.

Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on Twitter, like us on Facebook, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at team@retractionwatch.com.





Source link

Previous articleAnother Epic L from Apple leaves Tim Sweeney angry at everyone
Next articleApple Store crash driver pleads not guilty to second-degree murder