Assignment 3, due November 28, 5% of class score
Instructions
- Submit through OWL by the midnight of the due date
- You may discuss the assignment with other students, but all code/report must be your own work
- Assignment is to be done in Matlab
- Deliverables: matlab code that you write yourself, and assignment write-up
- You are allowed to use any matlab functions and
VLFeat library. Make sure you run vl_setup before using vlfeat library.
- Indoor scene classification data
for the assignment. There are ten different scene classes. Each scene class is in its own subdirectory. Each subdirectory contains 100 scenes (examples).
- Useful matlab commands: predict, resubLoss, templateTree,
fitensemble, kfoldLoss
- Useful VLFeat commands: vl_ikmeans, vl_ikmeanspush, vl_hog, vl_covdet
- Problem 1 (50%): Use a bagged tree classifier, in matlab 'fitensemble' with options 'Bag', 'type','classification'. First use cross validation on the training data to select good values for
the tree size, and the number of trees. Cross validation can be envoked
with option 'kfold' in 'fitensemble'. I suggest using 5-fold cross validation. Trees of different sizes can
be built with option 'MaxNumSplits' in 'templateTree' function.
Use values 1,5,10, and 20 for 'MaxNumSplits'. For the number of trees,
use values from 1 to 50. You do not want to run cross-validation for
numTrees = 1,2,...,50, it will take too long. Run 'fitensemble' with
'maxNumTrees' set to 50. You will get a cross-validate classifier 'ens'. Then use loss = kfoldLoss(ens,'mode','cumulative'). The 'cumulative' mode will tell you the loss (loss is just another name for error) for numTrees = 1,2,....,50. This saves a lot of time. On the same graph, plot number of trees vs. loss for each difreent tree type (i.e for 'MaxNumSplits' = 1,5,10,20) in different color. Discuss the plot.
Now retrain bagged classifier on all training data with maxNumTrees and MaxNumSplits giving the smallest error. Report the training error.
Test on the 20 test samples, and report test error.
- Do the above using local color histogram features. You should use several different values for the number of bins and region size.
- Do the above using HOG features. Use several values for cell size and number of orientaions.
- Problem 2 (15%): Repeat problem 1 now with adaboost, option 'AdaBoostM2' in 'fitensemble'. You should report all the same values/plots as for Problem 1. Discus difference in performance from Problem 1.
- Problem 3 (15%): Repeat problem 1 now
combining histogram and HOG features together, both with boosting and adaboost. You should report all the same values/plots as for Problem 1. Discus difference in performance from Problem 1,2
- Problem 4 (20%): Develop a bagged or boosted classifier that peforms better than those you developedin the previous problems. Report cross validation, training, and test errors. Explain what you did. Things you can try: add more features, use larger trees, use more trees, etc.