## Tuesday, May 3, 2016

### Numerical pitfalls in computing variance

One of the most common tasks in statistical computing is computation of sample variance. This would seem to be straightforward; there are a number of algebraically equivalent ways of representing the sum of squares $S$, such as $S = \sum_{k=1}^n ( x_k - \bar{x})^2$ or $S = \sum_{k=1}^n x_k^2 + \frac{1}{n}\bar{x}^2$ and the sample variance is simply $S/(n-1)$.

What is straightforward algebraically, however, is sometimes not so straightforward in the floating-point arithmetic used by computers. Computers cannot represent numbers to infinite precision, and arithmetic operations can affect the precision of floating-point numbers in unexpected ways.