Use a bagged tree classifier, in matlab 'fitensemble' with options 'Bag', 'type','classification'. First use cross validation on the training data to select good values for the tree size, and the number of trees. Cross validation can be envoked with option 'kfold' in 'fitensemble'. I suggest using 5-fold cross validation. Trees of different sizes can be built with option 'MaxNumSplits' in 'templateTree' function. Use values 1,5,10, and 20 for 'MaxNumSplits'. For the number of trees, try values from 1 to 50. If you run cross-validation separately for numTrees = 1,2,...,50, it will take too long. Run 'fitensemble' with 'maxNumTrees' set to 50. You will get a cross-validated classifier 'ens'. Then use loss = kfoldLoss(ens,'mode','cumulative'). The 'cumulative' mode will tell you the loss (loss is just another name for error) for numTrees = 1,2,....,50. This saves a lot of time. On the same graph, plot number of trees vs. loss for each difreent tree type (i.e for 'MaxNumSplits' = 1,5,10,20) in different color. Discuss the plot.
Now retrain bagged classifier on all training data with maxNumTrees and MaxNumSplits giving the smallest error. Report and discuss the cross validation, training and the test errors.
Repeat problem 1 now with adaboost, option 'AdaBoostM2' in 'fitensemble'. You should report all the same errors/plots as for Problem 1. It is interesting to plot cross validation errors on the same plot as that in problem 1, to compare boosting and bagging.Discus difference in performance from Problem 1.
This problem explores the relationship beteen cross validation, training, and test errors. Choose 'MaxNumSplits' > 5, 'MaxNumTrees' > 100, and find cumulative cross-validation error (with folds > 4) with an ensemble classifier (either 'bag' or 'AdaBoost'). Then re-train classifier on all training data, and find cumulative training and test errors. Plot cross-validation, training, and test errors vs. number of trees on the same plot with different colros. Discuss your plot.
Try to develop a bagged or boosted classifier that peforms better than those you developedin the previous problems. Report cross validation, training, and test errors. Explain what you did. Things you can try: add more features, use larger trees, use more trees, etc.