Bias, Variance and Model Complexity
Test error (generalization error): the prediction error over an independent test sample πΈπππ=πΈ[πΏ(π,Λf(π))|π] Here the training set Ο is fixed, and test error refers to the error for this specific training set.
Expected test error: Err=E[L(Y,Λf(X)]=E[ErrΟ]
This expectation averages over everything that is random, including the randomness in the training set that produced Λf
Training error: the average loss over the training sample Β―err=1NNβi=1L(yi,Λf(xi))
Model selection: estimating the performance of different models in order to choose the best one.
Model assessment: having chosen a final model, estimating its prediction error (generalization error) on new data.
Randomly divide the dataset into three parts: - a training set: fit the models - a validation set: estimate prediction error for model selection - a test set: assessment of the generalization error of the nal chosen model
A typical split might be 50% for training, and 25% each for validation and testing:
The Bias Variance Decomposition
General Model
If we assume that Y=f(X)+Ο΅ where E(Ο΅)=0, and Var(Ο΅)=Ο2Ο΅, we can derive an expression for the expected prediction error of a regression fit Λf(X) at an input point X = x0, using squared-error loss:
Err(x0)=E[(YβΛf(x0))2|X=x0]=E[(f(x0)+Ο΅βΛf(x0))2]=E[Ο΅2+(f(x0)βΛf(x0))2+2Ο΅(f(x0)βΛf(x0))]=Ο2Ο΅+E[f(x0)2+Λf(x0)2β2f(x0)Λf(x0)]=Ο2Ο΅+E[Λf(x0)2]+f(x0)2β2f(x0)E[Λf(x0)]=Ο2Ο΅+(E[Λf(x0)])2+f(x0)2β2f(x0)E[Λf(x0)]+E[Λf(x0)2]β(E[Λf(x0))2=Ο2Ο΅+(EΛf(x0)βf(x0))2+Var(Λf(x0))=Ο2Ο΅+Bias2(Λf(x0))+Var(Λf(x0))=IrreducibleError+Bias2+Variance
- The first term is the variance of the target around its true mean f(x0), and cannot be avoided no matter how well we estimate f(x0), unless Ο2Ο΅=0
- The second term is the squared bias, the amount by which the average of our estimate differs from the true mean
- The last term is the variance; the expected squared deviation of Λf(x0) around its mean.
Typically the more complex we make the model Λf, the lower the (squared) bias but the higher the variance.
KNN regression
For the k-nearest-neighbor regression t, these expressions have the sim- ple form Err(x0)=E[(YβΛfk(x0))2|X=x0]
Ref:
James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013.
Hastie, Trevor, et al. "The elements of statistical learning: data mining, inference and prediction." The Mathematical Intelligencer 27.2 (2005): 83-85