HTML Tables

Assignment 2, due November 9, 5% of class score

Instructions

Submit through OWL by the midnight of the due date
You may discuss the assignment with other students, but all code/report must be your own work
Assignment is to be done in Matlab
Deliverables: matlab code that you write yourself, and assignment write-up
Submit 2 files for your assignment: a pdf file with the report, and the code you wrote yourself (and only the code you wrote yourself) in one zipped file.
You are allowed to use any matlab functions and VLFeat library. Make sure you run vl_setup before using vlfeat library.
Indoor scene classification data for the assignment. There are ten different scene classes. Each scene class is in its own subdirectory. Each subdirectory contains 100 scenes (examples).
Useful matlab commands: pca, fitcdiscr, predict, ldaResubErr, resubLoss, crossval, kfoldLoss, hist
Useful VLFeat commands: vl_ikmeans, vl_ikmeanspush, vl_hog, vl_covdet
Download SVM library and guide for using it.
Tips for LIBSVN library
- Run 'make' file in the 'matlab' subdirectory before using
- Add the 'matlab' subdirectory to matlab path
- For cross-validation or validation, you need to run 'svmtrain' with different parameter options specified as a string, such as '-t 2 -c 1 -g 0.1', for example, for gaussian kernal with beta =c = 1 and gaussian width 0.1. If you use strcat to concatinate strings, the output will be of type 'cell'. This is not the format 'svmtrain' expecting. It will not crash but give wrong results. Therefore convert to 'char' if you use strcat function.
For all problems, use 80 samples per class for training, 20 for testing

Problem 1 (20%):
Fit linear classifier (matlab "fitcdiscr"). Report test error for all feature types (except pixelwise color) from Assignment 1. If you did not do Assignment 1, you can get my features from
Indoor scene classification A1_data . Now combine the global color histogram, patch, and HOG features into the "combined" feature and report the new test error. From now on, this new feature vector will be referred to as "combined" features. Summarize your errors in a table format below. Also include number of dimensions for each feature type. Discuss training error vs. number of features. Discuss your errors and compare with kNN errors from Assignmment 1, if you did it.
HTML Tables

Feature Type Training Error Test Error Num dims

Hist 0.6 0.8 75

Problem 2 (20%):
For this problem, we will reduce number of features with PCA. Use linear classifier and cross-validation on training data to determine the best dimension (i.e. number of principle components) d to reduce to. I suggest trying 10,20,...,130 components. Do this separtely for each feature type, as well as the feature vector which is a combination of all features. For combined features, on the same figure, plot training error and cross validation error vs. number of dimensions. Discuss the plot.

After you determine the best d, retrain linear classifier on all training data and report the training and test error. Summarize your cross-validation, training, and test errors in a table. Compare your results to those in Problem 1, i.e. to linear classifier without dimensionality reduction.

Optional: Also test how kNN works on the reduced dimensionality data and include it in the table summarizing errors. Note that we searched for the best dimension d using linear classifier, not kNN. Still, the results for kNN should also mostly improve.

Implementaiton Note: During cross validation, you should reduce dimensionality of the training fold, and project the test fold to the same space as the training fold. Thus you should implement cross-validation yourself, rather than using built-in one for the linear classifier. Matlab function crossvalind is useful to implement cross validation.

Problem 3 (20%):
Repeat Problem 2 but now with quadratic classifier (matlab "fitcdiscr" with option "discrimType"). Notice that if you use too high dimension, quadratic fit may not work and matlab will complain. So dimensionality should be small enough for quadratic fit not to crash.
Problem 4 (20%):
Use linear SVM for this problem, and the same feature types as above. Use cross validation on training data to determine the best parameter 'beta'. After determining 'beta', retrain on all training data. Report cross-validation and test errors for all feature types, summarized in a table. Compare performance to that in previous problems. Note that in SVM library, 'beta' is parameter 'c', and linear SVM is '-t 0'.

Problem 5 (20%):
Repeat problem 4 but now with Gaussian kernel (option '-t 2'). In addtion to beta ('c' in the library), you have to find a good kernel parameter ('-g ' option in the library). Report cross-validation and test errors for all feature types, summarized in a table. Compare performance to that in previous problems.

For SVM, normalizing features usually helps. Normalize features to be in some range (say from 0 to 1) and retrain classifier again. Note that you have to do cross validation again. Report cross-validation and test errors for all feature types (normalized), summarized in a table.

Feature Type	Training Error	Test Error	Num dims
Hist	0.6	0.8	75