I give you complete code for setting up the MatConvNet library, reading data, normalizing data, defining neural network structure, training neural network, and, finally, applying it to the test data. The main file is "p1.m". Before running "p1.m", in file "setup.m" change to the correct location the directories for the MatConvNet library and the data.
When you run "p1.m", it outputs:
The plots are useful to understand whether you underfit/overfit to the training data.
What to do for this problem:
Open file "initializeCNN.m". This file initializes the structure of convNet. There are two stages, each stage consists of a convolutional layer, ReLu layer (linear rectification), spatial normalization layer, and a pooling layer. After two stages, there is a fully connected layer and finally the softmax layer.
Try to get a better performance by experimenting with the network structure. You can change the number of layers (add/remove), change filter size, etc. In order to get a consistent network, notice the following:
Ideally, DNN should be trained on a large amount of data. Our dataset is quite small. In this problem, you will experiment with increasing the training data by a factor of 10. For each training image of size R x C, take 4 different random subcrops of size, say, (R-M) x (C-M) where M is a fraction of image width (for example, M = (1/8)*C). This gives you five images in total. Also flip them left to right (fliplr in matlab). You get 10 images. They should be resized to the size expected by our network, namely 200 by 200. This increases the amount of training data tenfold. Choose the best performing network from Problem 1 and 2, and retrain it on this larger dataset. Report the energy/training error/validation error plots as well as the test error. Compare with training on a smaller dataset.
In order to do this problem, you will need to add code to matlab script readDataForCNN.m. This script creates a dataset of images to be used for training and validation, namely, imdb.images.data (size 200 x 200 x 3 x 800). There are 800 images, and each of size 200 x 200 x 3. In addition, it creates imdb.images.labels (size 1 x 800) that stores the labels (correct classes, their range is 1, 2,...,10) for imdb.images.data. It also creates imdb.images.set (size 1 x 800) that stores "1" for a sample to be used for training, and "2" for a sample to be used for validation.
The new versions of imdb.images.data, imdb.images.label, imdb.images.set should be ten times larger in the forth dimention, which corresponds to the number of training samples. That is the new imdb.images.data should be of size 200 x 200 x 3 x 8000, the new imdb.images.labels of size 1 x 8000 and the new imdb.images.set of size 1 x 8000.
When you create a new image, its "label" and set "values" should remain the same as that for the original image. To avoid confusion, let me call the old dataset imdb_Old, and the new one imdb. Suppose you are creating a new images from sampe number 20 in the old dataset. Suppose the index of this sample in the new dataset is 200. Then you should set: imdb.data.set(200) = imdb_Old.data.set(20) and imdb.data.labels(200) = imdb_Old.data.labels(20).
The requirement that the label set set is obvious, the new image should be of the same class as the image it was derived from. The requirement on the validation set ensures that if the original image was validation, all the resulting images are also validation images. This ensures that the validation images are not that too similar to training images, otherwise, our validation error would be too optimistic.
Deep neural networks have to be trained on large amounts of data. We do not have nearly enough data. However, we can use a network trained on a massive amount of data, but for a different task. Recall that a neural network can be viewed as learning features. Visual features learned for one recognition task are likely to be useful for another task. In this problem, we will take a network trained for the imageNet competition (1000 classes), extract features, and use them for our task.