Assignment 4, due December 15, 5% of class score

Instructions

Submit through OWL by the midnight of the due date
You may discuss the assignment with other students, but all code/report must be your own work
Assignment is to be done in Matlab
Deliverables: matlab code that you write yourself, and assignment write-up
Download MatConvNet library 16 Follow the setup instructions. Make sure you download the older version "16" by cliqing "older versions".
Indoor scene classification data for the assignment. There are ten different scene classes. Each scene class is in its own subdirectory. Each subdirectory contains 100 scenes (examples).
My code for the assignment
An excellent tutorial on how to use the convNet library. You can do the assignment without reading it, but I highly recomend it.

Important parameters that control ConvNet Training, located in file "p1.m":

trainOpts.batchSize = 50: number of examples to use in a batch for mini-batch training protocol
trainOpts.learningRate = 0.001: learning rate for gradient descent
trainOpts.numEpochs = 10: number of iterations to run gradient descent for
trainOpts.expDir = fullfile(outsubDir): which directory store results to
trainOpts.continue = false: if set to false, starts computations from scratch. If you set it to true, continues training from previous results. This is useful in the following scenario. Suppose you trained with gradient descent for 50 epochs (iterations). Looking at the energy plots, you have decided that you should have trained for 100 epochs. If you set trainOpts.numEpochs = 100, computation will start from scratch. If you set trainOpts.continue = true, the computation will reuse the previous epochs (1-50) and will continue with epoch 51.
trainOpts.useGpu = false: Use GPU or not. You want it set to false, unless your computer has an allowed GPU and you want to use it for training.

Problem 1 (25%):
I give you complete code for setting up the MatConvNet library, reading data, normalizing data, defining neural network structure, training neural network, and, finally, applying it to the test data. The main file is "p1.m". Before running "p1.m", in file "setup.m" change to the correct location the directories for the MatConvNet library and the data.

When you run "p1.m", it outputs:
- Training/validation error for each mini-batch and each epoch
- The plot of training/validation energies vs. number of training epoch
- The plot of test/validation errors vs. the number of epochs.
- The test error as the last sentence
The plots are useful to understand whether you underfit/overfit to the training data.

What to do for this problem:
1. Run the code with parameters unchanged. Observe the plots, and the test error.
2. Change the number of epochs to 50. Observe the plots, the test error, and compare with (a). If you set trainOpts.continue = true; the result of the first 50 iterations is reused.
3. Come back to the parameters in (a) and now change the learning rate to 0.1. Observe the plots, the test error, and compare with (a).
4. Now change the learning rate to 0.00001. Observe the plots, the test error, the validation error, and and compare with (a).
5. Come back to the parameters in (a) and change the batch size to 1. Observe the plots, the test error, the validation error, and discuss the difference from (a).
6. Change the batch size to 200. Observe the plots, the test error, the validation error, and discuss the difference from (a).

Problem 2 (25%):
Open file "initializeCNN.m". This file initializes the structure of convNet. There are two stages, each stage consists of a convolutional layer, ReLu layer (linear rectification), spatial normalization layer, and a pooling layer. After two stages, there is a fully connected layer and finally the softmax layer.

Try to get a better performance by experimenting with the network structure. You can change the number of layers (add/remove), change filter size, etc. In order to get a consistent network, notice the following:
- For the convolutional layer, the filters are of size w x h x d, where d = 3 for the first layer, or d=number of channels in the previous layer.
- For any layer, the number of biases must be equal to the number of filters.
- The last layer before the softmax layer must have the number of units equal to the number of classes (10 in our case).
- More comments on building a consistent DNN are in file "initializeCNN.m".
- If you are not sure how a convolutional layer changes the size of current layer x, use matlab command y = vl_nnconv(x, w, [], 'stride',s ), where w is initialized with the convolution filter size you are using. For example, if using 5 x 5 filters, and if current layer depth (the third dimension of x) has size 10, and if you want to create 20 new filters, and use stride 2, first set w = randn(5,5,10,20,'single') and then use command y = vl_nnconv(x, w, [], 'stride',2 ). Now size(y) will tell you the size of the new layer.
- Similarly, if you want to see what is the size of the new layer after the pooling layer, use command y = vl_nnpool(x, [p p]), 'stride', s). For example, if pooling in 3 x 3 region with a stride of 3, use y = vl_nnpool(x, [3 3]), 'stride', 3). The command size(y) now will tell you the new layer size.
- After you run "initializeCNN.m", you can use command vl_simplenn_display(net) to display network information
Problem 3 (25%):
Ideally, DNN should be trained on a large amount of data. Our dataset is quite small. In this problem, you will experiment with increasing the training data by a factor of 10. For each training image of size R x C, take 4 different random subcrops of size, say, (R-M) x (C-M) where M is a fraction of image width (for example, M = (1/8)*C). This gives you five images in total. Also flip them left to right (fliplr in matlab). You get 10 images. They should be resized to the size expected by our network, namely 200 by 200. This increases the amount of training data tenfold. Choose the best performing network from Problem 1 and 2, and retrain it on this larger dataset. Report the energy/training error/validation error plots as well as the test error. Compare with training on a smaller dataset.

In order to do this problem, you will need to add code to matlab script readDataForCNN.m. This script creates a dataset of images to be used for training and validation, namely, imdb.images.data (size 200 x 200 x 3 x 800). There are 800 images, and each of size 200 x 200 x 3. In addition, it creates imdb.images.labels (size 1 x 800) that stores the labels (correct classes, their range is 1, 2,...,10) for imdb.images.data. It also creates imdb.images.set (size 1 x 800) that stores "1" for a sample to be used for training, and "2" for a sample to be used for validation.

The new versions of imdb.images.data, imdb.images.label, imdb.images.set should be ten times larger in the forth dimention, which corresponds to the number of training samples. That is the new imdb.images.data should be of size 200 x 200 x 3 x 8000, the new imdb.images.labels of size 1 x 8000 and the new imdb.images.set of size 1 x 8000.

When you create a new image, its "label" and set "values" should remain the same as that for the original image. To avoid confusion, let me call the old dataset imdb_Old, and the new one imdb. Suppose you are creating a new images from sampe number 20 in the old dataset. Suppose the index of this sample in the new dataset is 200. Then you should set: imdb.data.set(200) = imdb_Old.data.set(20) and imdb.data.labels(200) = imdb_Old.data.labels(20).

The requirement that the label set set is obvious, the new image should be of the same class as the image it was derived from. The requirement on the validation set ensures that if the original image was validation, all the resulting images are also validation images. This ensures that the validation images are not that too similar to training images, otherwise, our validation error would be too optimistic.
Problem 4 (25%):
Deep neural networks have to be trained on large amounts of data. We do not have nearly enough data. However, we can use a network trained on a massive amount of data, but for a different task. Recall that a neural network can be viewed as learning features. Visual features learned for one recognition task are likely to be useful for another task. In this problem, we will take a network trained for the imageNet competition (1000 classes), extract features, and use them for our task.
- Extract the "imagenet-vgg" DNN from provided file "imagenet-vgg-f.mat". You can load it with command
  net_trained = load('imagenet-vgg-f.mat').
- This network has many layers. We will experement with features learned at layers 18, 19, 20. These are the last layers (fully connected) before the final 'softmax' classification layer.
- In order to convert to the new features, you need to run each training and test sample through the network. This network expects a certain image size and performs a certain normalization. Therefore, for each image "im"
  - First resize it using: im = imresize(im, net_trained.normalization.imageSize(1:2)) ;
  - Then normalize it : im = im - net_trained.normalization.averageImage ;
  - Now run it through the network: res = vl_simplenn(net_trained, im) ;
- Now we are ready to convert to the new features. We will be building a new array "data" storing the new samples. The new features are the responses the pre-trained network gives at layers 18, 19, and 20. These responses have 4096 dimensions. We will reshape them to be of size 64 x 64. To get features of these layers, use res(18).x, res(19).x, res(20).x. You can put features of layer 18 into the first channel of the new array "data", layer 19 into the second, and layer 20 into the third channel.
What to hand in for this problem:
1. Use all three layers (18, 19, 20) and build the best network you can with these features. One convolutional layer and one softmaxloss layer may be enough, but you are welcome to experiment with more layers. Report your valiation/training energy plots, training/validation error plots, as well as the test error.
2. Repeat (a) but use only one of the layers. You just buld data as 64 x 64 x 1 x N array, where N is the number of samples, and there is only one channel (corresponding to features from teh chosen layer). Experiment with using each of the layers 18, 19, 20 for the "chosen" layer. Report the same results as in part (a) for each chosen layer.
3. Optional: use these new features with other classifiers covered in the course. Compare to the results in (a,b).