Abdulwahab Kabani, Winter 2017,
"Improving Deep Learning Image Recognition Performance
Using Region of Interest Localization Networks",
Computer Science Department, University of
Western Ontario, Canada.
PhD. Thesis Abstract
Deep Learning has been gaining momentum and achieving the state-of-the-art
results on many visual recognition problems. The roots of this field can be
traced back to the 1940s of the 20th century. The field has recently started
delivering some interesting results on many image understanding problems. This
is mainly due to the availability of powerful hardware that can accelerate the
training process. In addition, the growth of the Internet and imaging devices
such as mobile phones and cameras has contributed to the increase in the amount
of data that can be used to train neural networks. All of these factors have
contributed to the success of deep learning on large scale image understanding
tasks.
Many image understanding problems do not have large training data. This is
especially true in many special purpose datasets such as medical images,
astronomical images, and environmental images. These application do not have
large training datasets because unlike natural images, users do not typically
take these images and upload them to the web. In addition, some of these
applications, such as medical imaging, have many restrictions on sharing the
data in order to protect the privacy of the patients. Finally, the labeling
process needed for training natural images can be done by any person, unlike
special purpose datasets. For example, in medical imaging, the images must be
labeled by medical or clinical experts in the field. This results in datasets
that are normally much smaller than natural images datasets as these experts
have limited time to invest in the creation of the training sets. Luckily, in
many of these applications, the most discriminative features may be present in
a small region of interest.
In this work, we present a method of training deep learning models on problems
with low number of training images. We will do that by localizing a region of
interest in these images, which will help reduce the problem of overfitting. In
this thesis, two localization architectures are introduced, namely: the naive
localization network and the wide localization network (wide net). The latter
has several advantages which we explain thoroughly. The first problem we will
introduce is the Right whale recognition problem. The problem involves
recognizing whales from aerial images by analyzing the callosities pattern on
their heads. We will study how localizing the region of interest can be used to
make deep learning work on such a small dataset. The second problem we will
study is the estimation of the ejection fraction and left ventricle volume by
analyzing cardiac MRI images. Automatically estimating the ejection fraction
and volume of the heart can help in identifying and diagnosing several cardiac
health issues. Similarly, this dataset contains only a small number of training
subjects.