Assignment 1, due October 19, 5% of class score
Instructions
- Submit through OWL by the midnight of the due date
- You may discuss the assignment with other students, but all code/report must be your own work
- Assignment is to be done in Matlab
- Deliverables: matlab code that you write yourself, and assignment write-up
Submit 2 files for your assignment: a pdf file with the report, and the code you wrote yourself (and only the code you wrote yourself) in one zipped file.
- You are allowed to use any matlab functions and
VLFeat library. Make sure you run vl_setup before using vlfeat library.
- Indoor scene classification data
for the assignment. There are ten different scene classes. Each scene class is in its own subdirectory. Each subdirectory contains 100 scenes (examples).
- Useful matlab commands: fitcknn, predict, crossval, kfoldLoss, hist
- Useful VLFeat commands: vl_ikmeans, vl_ikmeanspush, vl_hog, vl_covdet
- For all problems, use the first 80 samples per class for
training, and the last 20 samples for testing
- Problem 1 (20%):
(a) Use pixelwise grayscale representation and kNN classifier. You need to rescale all images to be of the same size, otherwise the extracted feature vectors have different size. I suggest rescaling images to size 50 by 50. Report training and test error for several values of k of your choice. Discuss training and test errors.
(b) repeat part (a) but now with pixelwise color representation.
- Problem 2 (20%): Use global color histogram and kNN classifier. Bin each color channel into 50 bins ('hist' in matlab). Report training and test error for several values of k of your choice. Discuss training and test errors.
Compare to peformance in Problem 1.
- Problem 3 (20%): Use local color histogram with kNN classifier (i.e. divide image into several local regions, compute color histogram in each region, and then pile all local histograms into one feature vector). Use 10-fold cross-validation on training data to choose the number of bins per channel, the number of local regions to use for the local histogram and the value of k for the kNN classifier. I suggest trying from 10 to 40 bins per color channel in steps of 10, and splitting into local regions using from 1 by 1 to 5 by 5 grids in step of 1. Report the best choice of number of bins, grid size, and k according to the cross-validation error, as well as the cross-validation error itself. Then retrain classifier with the chosen valuess and report error on test data. Compare/discuss test error vs. cross validation error. Discuss difference in performance from previous problems.
- Problem 4 (20%): Use HOG features with kNN classifier.
You need to rescale all images to be of the same size.
Use 10-fold cross validation on training data to find a good value for number of orientations (try from 4 to 14 in steps of 2), cell size (try from 10 to 40 in steps of 20), image rescaling (try from 50 to 50 to 250 by 250 in steps of 100) and k for kNN classifier. Report cross validation error for each case you try. Report the chosen values for k, number of orientations, cell size. Then retrain classifier with the chosen valuess and report error on test data. Compare/discuss test error vs. cross validation error. Discuss difference in performance from previous problems.
- Problem 5 (20%): Use patch-based features and kNN classifier.
Use cross-validation on training data to find the patch size
and vocabulary size. To build a dictionary,
I reccommend using at most 10% of all possible training data patches, and resize all image to be of size, say 250 by 250 (or even smaller,but performance may worsen), otherwise patch extraction can take too long. During
cross-validation, I recommended trying patch size from 10 by 10 to 40 by 40 (in steps of 10), and vocabulary size from 50 to 200 (in steps of 50). It works better if patches are normalized, for example to have mean 0 and variance 1. After you build dictionary, use global histogram over vocabulary words (i.e. codewords). Report cross validation error for each case you try. Report the chosen values for k, patch size, vocabulary size. Then retrain classifier with the chosen valuess and report error on the training and test data. Display the dictionary you "learn" (like slide 44, lecture 4) for the chosen patch size and vocabulary size. Compare/discuss test error vs. cross validation error.
Discuss difference in performance from previous problems.
Side Note: Ideally, during cross validation, one should be building vocabulary only from the "training" folds, i.e. not using the "test" fold, since the test data should never be used for training. However, for this assignment, it is ok to build dictionary from training data ignoring the folds, otherwise performance is too slow.