Sunday, 20 May 2012

Statistics

I'm planning another post on a recent research paper, but in preparation I want to talk about statistics.  Specifically, I want to talk about Frequentist versus Bayesian perspectives for the interpretation of experiments.  I won't get technical, or go into the actual details about how calculations are done, but rather talk about philosophy.

My casual impression, based on reading papers to stay abreast of the field, is that most experiments use Frequentist methods in analysing their data.  In this approach, discovery of new phenomena is based on disproving the null hypothesis, the assumption that there is nothing to discover.  In this sense, Frequentist methods are very Popperian.  Frequentists will argue that this ensures their methods are objective, which is more or less true.1

The problem with Frequentism is that it has a tendency to be misinterpreted.  For example, let's say in a particular experiment we can exclude the null hypothesis at 95% confidence level.  What does that mean?  It is tempting to interpret it as saying that there is a 95% probability that the null hypothesis is false.  However, this is wrong.  The strictly correct statement is: if the null hypothesis is true, the probability of getting this experimental result is 5% or less.

To understand the difference, there is a common analogy that I will steal.  Consider the following two different probabilities:
1. The probability that it is raining, given that it is cloudy outside.
2. The probability that it is cloudy outside, given that it is raining.
The latter probability is close to one, while the former is much lower.  This is the problem of Frequentist results.  The question most people ask is
• What is the probability that a particular model is true, given the data?
Whereas the result of a Frequentist measurement is
• What is the probability of the data, given that a particular model is true?
Hopefully we see the problem here.

This is where Bayesian analyses come in.  Bayes' Theorem gives the technique to answer the question we want.  The result of applying this rule can always be interpreted as a probability about what is true. The ultimate reason is philosophical; to a Bayesian, probabilities are statements of belief about reality, whereas to a Frequentist probabilities are outcomes of experiments.  In particular, the question "which theory is more probable" is meaningless from a strict Frequentist perspective.

As can probably be guessed, I prefer the Bayesian approach.  As such, it would be remiss of me not to mention its flaws.  The one which is most discussed is the explicit subjectivity that Bayes' Theorem introduces, in the form of the Prior statement of belief.  Two different experimenters can come to different conclusions about the same experimental behaviour, if they have different prior beliefs.  I content that this is not as severe a problem as some critics make out.  For one thing, all experiments are driven on prior beliefs, in their design for example.  Making this explicit can not be a bad thing.  Additionally, by varying our choice of prior we can see how it affects our results.  If our conclusions depend heavily on our priors, it simply tells us that we do not have enough data to draw solid conclusions, which is again useful knowledge.  Finally, Harold Jeffreys identified the Jeffreys priors, which maximise the effect of our results on our conclusions.

A more serious criticism of Bayesian approaches is that it is computationally intensive.  The reason is found in the need to integrate over probability distributions, specifically the probability distribution of getting the observed data as a function of the model parameters.  This is in contrast to Frequentist methods that only care about the largest value of the same distribution.  It is for this reason that, in my opinion, Frequentist methods will always remain more popular, especially in the early analysis of data.  And despite my biases, there's nothing wrong with that provided we interpret things correctly!

1. There is arguably some subjectivity in the choice of test statistic, but this is pretty minor.