Errors and residuals
The expected value, being the mean of the entire population, is typically unobservable, and hence the statistical error cannot be observed either.A residual (or fitting deviation), on the other hand, is an observable estimate of the unobservable statistical error.If we assume a normally distributed population with mean μ and standard deviation σ, and choose individuals independently, then we have and the sample mean is a random variable distributed such that: The statistical errors are then with expected values of zero,[4] whereas the residuals are The sum of squares of the statistical errors, divided by σ2, has a chi-squared distribution with n degrees of freedom: However, this quantity is not observable as the population mean is unknown.It is remarkable that the sum of squares of the residuals and the sample mean can be shown to be independent of each other, using, e.g. Basu's theorem.That fact, and the normal and chi-squared distributions given above form the basis of calculations involving the t-statistic: whereThat is fortunate because it means that even though we do not know σ, we know the probability distribution of this quotient: it has a Student's t-distribution with n − 1 degrees of freedom.If one runs a regression on some data, then the deviations of the dependent variable observations from the fitted function are the residuals.[5] If the data exhibit a trend, the regression model is likely incorrect; for example, the true function may be a quadratic or higher order polynomial.Since this is a biased estimate of the variance of the unobserved errors, the bias is removed by dividing the sum of the squared residuals by df = n − p − 1, instead of n, where df is the number of degrees of freedom (n minus the number of parameters (excluding the intercept) p being estimated - 1).[7] Another method to calculate the mean square of error when analyzing the variance of linear regression using a technique like that used in ANOVA (they are the same because ANOVA is a type of regression), the sum of squares of the residuals (aka sum of squares of the error) is divided by the degrees of freedom (where the degrees of freedom equal n − p − 1, where p is the number of parameters estimated in the model (one for each variable in the regression equation, not including the intercept)).Concretely, in a linear regression where the errors are identically distributed, the variability of residuals of inputs in the middle of the domain will be higher than the variability of residuals at the ends of the domain:[9] linear regressions fit endpoints better than the middle.This is the basis for the least squares estimate, where the regression coefficients are chosen such that the SSR is minimal (i.e. its derivative is zero).