Assignment 2, due November 15, 5% of class score
Instructions
- Submit through OWL by the midnight of the due date
- You may discuss the assignment with other students, but all code/report must be your own work
- Assignment is to be done in Matlab
- Deliverables: matlab code that you write yourself, and assignment write-up
- You are allowed to use any matlab functions and
VLFeat library. Make sure you run vl_setup before using vlfeat library.
- Indoor scene classification data
for the assignment. There are ten different scene classes. Each scene class is in its own subdirectory. Each subdirectory contains 100 scenes (examples).
- Useful matlab commands: pca, fitcdiscr, predict, ldaResubErr, resubLoss, crossval, kfoldLoss, hist
- Useful VLFeat commands: vl_ikmeans, vl_ikmeanspush, vl_hog, vl_covdet
- Download
SVM library and guide for using it.
- Tips for LIBSVN library
- Run 'make' file in the 'matlab' subdirectory before using
- Add the 'matlab' subdirectory to matlab path
- For cross-validation or validation, you need to run
'svmtrain' with different parameter options specified as a string, such as '-t 2 -c 1 -g 0.1', for example, for gaussian kernal with beta =c = 1 and gaussian width 0.1. If you use strcat to concatinate strings, the output will be of type 'cell'. This is not the format 'svmtrain' expecting. It will not crash but give wrong results. Therefore convert to 'char' if you use strcat function.
- Problem 1 (20%): Use 100 samples per class and linear classifier (matlab "fitcdiscr"). Report 10-fold cross validation error using HOG features and your choice of 2 other feature types from Assignment 1.
- Problem 2 (20%): Use 100 samples per class and
a linear classifier (matlab "fitcdiscr"). Reduce dimensionality
using PCA. Use linear classifier. Report training error and 10-fold cross validation
error for serveral principal component numbers. That is reduce
dimentionality to, say 10,50, 100 dimension and report reults in each case.
Use the same features as in Problem 1.
- Problem 3 (20%): Reduce dimensionality
using PCA. Use quadratic classifier (matlab "fitcdiscr"). Report training error ("resubLoss" function in matlab) and 10-fold cross validation
error for serveral choices of principle components (i.e. reduce
dimentionality to serveral different choicess for dimension).
Use the same features as in Problem 1. Notice that if you use too high dimension, quadratic fit may not work and matlab will complain. So dimensionality should be small enough for quadratic fit to be reasonable. You should see training error decreases with increase in dimensionality, but cross-validation error first decreases, then starts to increase. This is because at some point, quadratic classifier starts to overfit.
- Problem 4 (20%): Use linear SVM and HOG features for this problem. Use 80 samples for training, 20 for testing, per class. For linear SVM, we need to determine
the best parameter 'beta' with validation or cross validation.
Both should be performed on the training data. You have
80 samples per class for training. If using cross-validation, break each class in folds of "k". If using vaidation, take 50 samples for training, 30 for validation in each class.
After you determine the best 'beta', retrain on training data (all 80 samples per class) and report performance on the test data. Compare with your results on HOG features in Assignment . Note that in SVM library, 'beta' is parameter 'c', and linear
SVM is '-t 0'.
- Problem 5 (20%): Repeat problem 2 but now with Gaussian kernel (option '-t 2'). In addtion to beta ('c' in the library), you have to find a good kernel parameter ('-g ' option in the library). Use validation or cross validation exactly as you did it in Problem 4. After you find optimal beta and kernal parameter, retrain on all training data. Test on test data and report your performance. Compare your results to Problem 4. For SVM, normalizing features is supposed to help. Normalize features to be in some range (say from 0 to 1) and retrain classifier again. Note that you have to do cross validation again. Report your results on the test data with normalized features.