When the numbers game is played for fun the results can be misleading
More words of wisdom from the Soccer Boffin
In a 1910 edition of The American Magazine a sports reporter called Hugh Fullerton wrote an article on baseball statistics. It gave lots of colourful facts and figures, accompanied by a warning.
“Given the speed and direction of the ball and the speed of the player it is possible to figure out where his hands will meet the ball,” Fullerton said. “But just as you start to write Q.E.D. the ball will take a bad bound.”
Fullerton was one of the first sportswriters to mix words with numbers. He acknowledged what many who came after him have not. Statistics often entertain us more than they enlighten us.
Here are some of the reasons why.
Three out of three means three out of four
Tobias Moskowitz and Jon Wertheim put this well in a 2011 book called Scorecasting. They used other examples from baseball.
“When we’re told that a player has reached base in ‘four of his last five at-bats’, we should assume right away that it’s four of his last six. Otherwise, rest assured, we’d have been told that the streak was five out of six. Clearly, a team that has ‘lost three in a row’ has dropped only three of its last four – and possibly three of five or three of six or… otherwise it would have been reported as a four-game losing streak.”
“Those of us in the sports media have an interest in selling the most extreme scenario. Collectively, we pick and choose data accordingly.”
Many good schools are small. So are many bad schools
Bill Gates, founder of Microsoft, is clever. Even he can be fooled by numbers, however. One of his mistakes cost nearly $2 billion.
Daniel Kahneman told the story in a book called Thinking, Fast and Slow.
“In a survey of 1,662 schools in Pennsylvania, six of the top 50 were small, which is an overrepresentation by a factor of four. These data encouraged the Gates Foundation to make a substantial investment in the creation of small schools, sometimes by splitting large schools into smaller units.
“If the statisticians who reported to the Gates Foundation had asked about the characteristics of the worst schools, they would have found that bad schools also tend to be smaller than average.
“The truth is that small schools are simply more variable.”
Small schools constitute small samples. Small samples are more likely than large samples to give extreme readings – in any direction.
It is not unusual to be unusual
As Tom Jones did not quite sing.
Small samples can deceive you but so can large samples.
You might be surprised that the person you met at the bus stop was in Bali when you were. You should not be surprised that you had something in common. There are so many potential connections between you.
The more data there is the greater the chance of finding in it links that will sound astonishing but should not.
Nassim Nicholas Taleb, who I often quote, wrote: “More data means more information, perhaps, but it also means more false information.
“The fooled-by-data effect is accelerating. There is a nasty phenomenon called ‘Big Data’. Modernity provides too many variables and the spurious relationships grow much, much faster than real information.”
Records are there to be broken
Something that never happened in the past might happen in the future.
Pierre-Simon Laplace was a scientist in the late 18th and early 19th century. He was aware that on every morning of his life, to use a common expression, the sun had risen. He was prepared to accept it had risen every morning before he was born since the formation of the earth.
“What is the chance it will rise again tomorrow?” he wondered.
Laplace came up with a formula for estimating the chance of something happening now if you knew only how often it had happened before. The formula was:
S was how many times the thing had happened before and T was how many times the thing could have happened before. In scientific language T was the number of trials and S was the number of successes. For sunrises T and S are the same. We do not know on any given day what their number is but we do know it is big.
Laplace calculated the chance of the sun rising tomorrow as greater than 99.99 per cent. The point is, it was not 100 per cent.
Only in narrow circumstances does Laplace’s formula give the best answer. Never, however, does it give an answer that must be completely wrong. It never says the chance of an event occurring is 100 per cent or 0 per cent. There can be a first time for anything.
Does that mean they are due a win or a loss?
This is the sort of question that should always be asked but hardly ever is. What does a statistic tell us about what might happen next?
In a 2012 book called The Success Equation, Michael Mauboussin wrote: “Statistics are widely used in a range of fields. But rarely do the people who use statistics stop and ask how useful they really are.”
To get an idea of the chance of different outcomes in a football team’s next game I look at what happened to other teams who went into a game with the same stats. In such circumstances how often has this happened, how often has that happened?
Nate Silver used a similar method to predict possible development paths for baseball players. He is famous now as a predictor of US elections. He started out as a baseball analyst then became a poker player before getting mixed up in politics.
We want to be enlightened by statistics – not just entertained, and never bewildered. Reginald Perrin was a character in a 1970s sitcom. He gave a computer data on the tastes of Sunshine Desserts customers. He wanted to know which three flavours to choose for a new range of ice creams. The computer said “bookends, pumice stone and West Germany”.
More by Kevin Pullein
Follow us on Twitter @racingpostsport
Like us on Facebook RacingPostSport