Fit linear classifier (matlab "fitcdiscr"). Report test error for all feature types (except pixelwise color) from Assignment 1. If you did not do Assignment 1, you can get my features from
Feature Type | Training Error | Test Error | Num dims |
Hist | 0.6 | 0.8 | 75 |
For this problem, we will reduce number of features with PCA. Use linear classifier and cross-validation on training data to determine the best dimension (i.e. number of principle components) d to reduce to. I suggest trying 10,20,...,130 components. Do this separtely for each feature type, as well as the feature vector which is a combination of all features. For combined features, on the same figure, plot training error and cross validation error vs. number of dimensions. Discuss the plot.
After you determine the best d, retrain linear classifier on all training data and report the training and test error. Summarize your cross-validation, training, and test errors in a table. Compare your results to those in Problem 1, i.e. to linear classifier without dimensionality reduction.
Optional: Also test how kNN works on the reduced dimensionality data and include it in the table summarizing errors. Note that we searched for the best dimension d using linear classifier, not kNN. Still, the results for kNN should also mostly improve.
Implementaiton Note: During cross validation, you should reduce dimensionality of the training fold, and project the test fold to the same space as the training fold. Thus you should implement cross-validation yourself, rather than using built-in one for the linear classifier. Matlab function crossvalind is useful to implement cross validation.
Repeat Problem 2 but now with quadratic classifier (matlab "fitcdiscr" with option "discrimType"). Notice that if you use too high dimension, quadratic fit may not work and matlab will complain. So dimensionality should be small enough for quadratic fit not to crash.
Use linear SVM for this problem, and the same feature types as above. Use cross validation on training data to determine the best parameter 'beta'. After determining 'beta', retrain on all training data. Report cross-validation and test errors for all feature types, summarized in a table. Compare performance to that in previous problems. Note that in SVM library, 'beta' is parameter 'c', and linear SVM is '-t 0'.
Repeat problem 4 but now with Gaussian kernel (option '-t 2'). In addtion to beta ('c' in the library), you have to find a good kernel parameter ('-g ' option in the library). Report cross-validation and test errors for all feature types, summarized in a table. Compare performance to that in previous problems.
For SVM, normalizing features usually helps. Normalize features to be in some range (say from 0 to 1) and retrain classifier again. Note that you have to do cross validation again. Report cross-validation and test errors for all feature types (normalized), summarized in a table.