Applying deep learning methods to classify brain images with different tumour types

Abstract
Recent research has shown that deep learning methods have performed well on supervised machine learning, image classification tasks. The purpose of this study is to apply deep learning methods to classify brain images with different tumor types: meningioma, glioma, and pituitary. A dataset was publicly released containing 3,064 T1-weighted contrast enhanced MRI (CE-MRI) brain images from 233 patients with either meningioma, glioma, or pituitary tumors split across axial, coronal, or sagittal planes. This research focuses on the 989 axial images from 191 patients in order to avoid confusing the neural networks with three different planes containing the same diagnosis. Two types of neural networks were used in classification: fully connected and convolutional neural networks. Within these two categories, further tests were computed via the augmentation of the original 512×512 axial images. Through rotating, shifting, and mirroring, data sizes could be increased with the sacrifice of image clarity. Training neural networks over the axial data has proven to be accurate in its classifications with an average five-fold cross validation of 91.43\\% on the best trained neural network. This result demonstrates that a more general method (i.e. deep learning) can outperform specialized methods that require image dilation and ring-forming subregions on tumors.
INTRODUCTION
Challenging and severe patient diagnoses are often delayed treatment due to a lack of automated diagnostic tools and limited number of people on the medical team. Medical staff rely on manual evaluation of each patient and his or her test results, and, without automated tools, not only is there a higher risk of misdiagnosis but also a difficulty in quickly completing simple cases and focusing more time on challenging cases. Doctors and radiologists must take the time to manually review all test results and images rather than review and treat complex diagnoses. In order to improve patient care, enhanced medical technology in the form of automated tools is necessary to increase efficiency. The purpose of this research is to develop automated methods to aid doctors in diagnosis in order to prevent misdiagnosis and prioritize difficult patient diagnoses. In particular, this research achieves this automation through the classification of brain tumor types from patient brain images. Brain images require a radiologist to examine multiple image slices to determine health issues which takes time. Our goal is to confidently identify brain cancer types to reduce this burden, leaving the most complex diagnoses to medical specialists.
Previous research has developed specialized methods for automated brain tumor classification. Cheng et. al.\\cite{Cheng15} has created a public brain tumor dataset containing images from 233 patients with one of three brain tumor types: meningioma, glioma, and pituitary. Additionally, the dataset has images categorized into three sets: axial, coronal, and sagittal images. These sets represent the various planes images of the brain are scanned; they correlate with the transverse plane, frontal plane, or lateral plane planes respectively. The images originated from 233 patients, so many of the images are from the same patient. Examples of these images can be seen in Figure 1. Cheng et. al. used image dilation and ring-forming subregions on tumor regions to increase accuracies of classifying brain tumors to up to 91.28\\% using a Bag of Words (BoW) model.
Our research improves on previously presented results using a general method of neural networks without extensive specialized processing methods. Neural network’s generalizability has been demonstrated in a variety of fields, outperforming other specialized methods\\cite{Krizhevsky12, LeCun98, Dieleman15}. Furthermore, the combination of neural network and medical research has shown promising results\\cite{Sklan15, Lasko13}, even in its infancy stages, as larger medical data sets are released. The main contributions of this paper are as follows: (1) developing a generalized method for brain tumor classification using deep learning and (2) empirically evaluating convolutional neural networks on axial images and reporting with per image accuracy and per patient accuracy.
Three main types of NNs have been researched: fully connected
NNs (FCNNs), convolutional NNs (CNNs), and recurrent
NNs (RNNs). For this study, CNNs are primarily
used given that the inputs are images, though FCNNs are
also examined. Though there have been prior attempts to
apply machine learning to medical data, there are a lack of tools utilizing modern advances
in neural networks (NNs). While extensive research has
successfully applied these techniques to recognizing patterns
in non-medical images\\cite{Krizhevsky12}, the proposed research applies
them to medical images. Furthermore, applying neural networks to
medical images has implications of faster and more precise
diagnoses.
RELATED WORK
A public brain tumor dataset was created from Nanfang
Hospital, Guangzhou, China, and General Hospital, Tianjing
Medical University, China from 2005 to 2012 and was
used in Cheng et. al.\\cite{Cheng15} to classify brain tumors in these
images. Three approaches were used to analyze
this dataset: intensity histogram, gray level co-occurrence
matrix (GLCM), and bag-of-words (BoW). In these approaches, Cheng et. al. augmented the tumor
region through image dilation in order to enhance the surrounding
tissue and provide insights into the tumor type.
Furthermore, Cheng et. al. created increasing ring formations around the tumor through common
normalized Euclidean distances in order to use spatial
pyramid matching (SPM) to discover local features. In BoW, the local features are then
extracted through dictionary construction and histogram
representation, which are then fed into a feature vector to
be trained on a classifier. Out of all three methods, BoW
gave the highest classification accuracy with 91.28\\%. Yet,
this classification method is highly specialized, requiring
zooming into the tumor or region of interest and knowledge
of tumor existence. On the contrary, neural networks
are generalizable and can discover local features from image
input alone.
Neural networks and its generalizability has only appeared
in recent years. After falling out of favor in the late 1990’s, deep learning
resurfaced when Hinton et. al.\\cite{Hinton06} in 2006 introduced
the method of pre-training hidden layers one at
a time through unsupervised learning of restricted Boltzmann
machines (RBMs). Hinton demonstrated an effective
method of training neural networks through greedily stacking
RBMs. Since then the field of deep learning has expanded
and produced more efficient methods of training
neural networks and quickly became the state of the art.
Examples of modern neural networks are shown in Figure
2.
While originally introduced into the public in 1998 by LeCun et. al.\\cite{LeCun98}, convolutional neural networks gained popularity
when in 2012 Krizhevsky et. al.\\cite{Krizhevsky12} designed a winning
convolutional neural network for the ImageNet competition
and performed considerably better than the previous
state of the art model. The computer vision community
adopted neural networks as the state of the art after
the competition, realizing the potential convolutional neural
networks have on classification of images. Since 2012,
convolutional neural networks have dominated other classification
competitions, including the Galaxy Zoo Challenge
that occurred from 2013 to 2014. Dieleman et. al.\\cite{Dieleman15} introduced
how data augmentation can greatly increase dataset
size through transformations, rotations, and translations of
images. Adding image transformations prevented overfitting and more generalized
learning.
Preventing overfitting in neural networks has been a main focus for much research, and in 2014 Srivastava et. al.\\cite{Srivastava14} introduced
dropout as a simple way to prevent co-adaptation
of neurons. Dropout randomly drops neuron connections
with a given probability, causing neuron units to become
more independent rather than relying on other neurons to
detect features. Similar in nature, Goodfellow et. al.\\cite{Goodfellow13} created a neural network layer called a maxout layer, which were designed
to work in conjunction with dropout. Maxout layers are equivalent to the fully connected layers found in standard feed-forward multilayer perceptron neural network, yet they use a new activation function named the maxout unit. This unit takes the maximum linear activation for that unit.
In addition to preventing overfitting, deep learning research has created
faster ways to train neural networks. Glorot et. al.\\cite{Glorot11}
revealed rectified linear units (ReLUs) performed much faster
in supervised training of deep neural networks as compared
to logistic sigmoid neurons and performed equal if not better
than the hyperbolic tangent. This is due to ReLUs nonlinear
nature where it creates sparse representations which
work well for naturally sparse data. To further improve training time, a form of momentum update called Nesterov\’s momentum\\cite{Nesterov83} was adapted to act as a neural network\’s update formula. Nesterov\’s
momentum takes the gradient at a future location following
the momentum from previous updates that has directed
updates in a particular direction. This differs from standard
momentum where the gradient is taken at the current
location.
Convolutional Neural Network
Convolutional neural networks is a type of neural network
designed specifically for images and have been shown to
perform well in classifying various supervised learning tasks\\cite{LeCun98}.
There have been several variations of convolutional neural
networks created with commonalities in structure, including
the use and ordering of convolutional, pooling, and
dense layers.
Convolutional neural networks were created with the assumption
that nearby inputs are highly related to one another.
In the case with images, the inputs are pixels, and
pixels next to each other in images have a strong correlation
with each rather than pixels further away in distance. With
this assumption in mind, convolutional neural networks focus
on local regions of the images in order to extrapolate
local features. The extrapolation of
local features is performed in the convolutional layers.
As there is an increase in convolutional layers, these local
features build upon one another to form higher-order features,
combining to understand what the image is in its entirety.
Extrapolating local features to higher order features starts by selecting a filter size, or local receptive field, where a neuron in the
convolutional layer takes in a particular k x j subregion of
the image. In order for each neuron in the convolutional
layer to take in various blocks of k x j pixels, convolutional
layers can add in stride, which will shift the k x j pixels
over by some given stride. This implies that
k x j pixel subregions can overlap with each other, which
depending on the size of the stride, can typically help extrapolate local features in the convolutional neural network since overlapped subregions contain related pixels.
Pooling layers are often paired with convolutional layers
in order to reduce dimensionality in the neural network
and augment pixel values slightly so as to make the layer
insensitive to small changes in pixel values. Pooling
is a type of subsampling or pooling layer which produces
smaller outputs from its input by applying a function such as max or average
over a k x j subregion to represent the entire subregion in
the output. Common values of k and j are 2
and 3. Convolutional and pooling layers in neural networks
are often fed into fully connected or dense layers (i.e. all
neurons in a layer are connected to each neuron in the following
layer). Since fully connected layers have full connections
to all activations in the previous layer, fully connected
layers perform high-level reasoning in the neural
network. For each neuron in every layer besides pooling
layers, a nonlinearity function is applied in the convolutional
neural network; otherwise layers could be collapsed
into one layer since applying linear functions can be substituted
with applying just one linear function. The last layer in a convolutional neural network is often a softmax layer which is used to determine classification
probabilities. The softmax layer\’s neurons represent the probabilities
of an image belonging to a particular category.
Neural network research has recently started to combine
with medical research in conjunction to the recent rise in large quantities of accessible medical data. Many of the concepts and ideas from past research
in neural networks are directly applied to this study.
MODEL
Convolutional neural networks are the focus of the presented research and are used in conjunction with the brain images to produce tumor class probabilities. For the best performing convolutional neural networks, convolutional layers have a filter size of 5 x 5 and a stride of 1, which creates overlapping receptive fields. The best convolutional neural networks use max-pooling layers, extrapolating local features from 2 x 2 subregions to in order to cut the dimensionality to one-fourth of the size at each use. Each layer in the convolutional neural networks, excluding max-pooling and softmax layers, apply ReLUs for nonlinearity. The last layer for the convolutional neural networks is the softmax layer containing 3 neurons representing the probabilities of the brain image containing the three types of brain tumors.
ALGORITHMS AND IMPLEMENTATION
Several algorithms are utilized in the construction of a neural
network ranging from updating weights to calculating
loss or error. This section will review the various algorithms
incorporated into convolutional neural network and
the specifics on implementing the layers mentioned in the
previous section.
Forward Pass
Convolutional neural network is a type of feedforward neural
networks containing convolutional layers. Feedforward neural networks are neural networks in which a forward pass in training is computed
with no loops in neuron connections; the next layers
must only be connected to previous layers. When moving
to a convolutional or fully connected layer, a set of weights
and bias is applied to all of the connected neurons from
the previous layer in order to sum them together. This can
be seen as applying a certain weight to a certain pixel and
adding a bias. This formula can be seen below for a certain
neuron i for a certain convolutional or fully connected layer
l receiving input.
\\[a_i^l = \\sum_{j=1}^{n} W_{ij}^{l} x_j + b_i \\]
In this formula, j represents the certain input into neuron i. The nonlinearity ReLU is then applied to layer $l$ neuron i\’s sum (or activation) $a_i^l$ to produce a new value $z_i^l$.
\\[z_i^l = max(0, a_i^l) \\]
These two formulas are applied to every neuron in a convolutional or fully connected layer in order to obtain each neuron\’s nonlinear activation respectively. For max-pooling layers, the max function is applied over certain k x j subregions in order to get the max value as an output, and this is applied over the entire input keeping note of the given stride. The last layer contains the softmax function instead of the ReLU function in order to assign probabilities of the image being a certain type of tumor.
\\[ z_i^l = \\frac{e^{a_i^l}}{\\sum_k e^{a_k^l}} \\]
The denominator represents the sum of all output neurons. This will give the predictions for an image by choosing the highest probability. In order to learn from these probabilities, first the loss or error of the predictions is calculated. To calculate loss, these convolutional neural networks use the categorical cross-entropy loss function.
\\[ L = \\sum_j t_j \\log(p_j) \\]
In the above formula, t represents the target label, and p represents the prediction probability for the target label from our calculated predictions from the neural network\’s softmax layer. Given this summed error, an average categorical cross-entropy loss is calculated by dividing by the total number of examples in training m.
\\[ \\frac{1}{m} L \\]
In addition to categorical cross-entropy, it is common to add regularization to the loss in order to prevent weights from increasing too far in magnitude which is prone to overfitting. In this neural network, weight decay uses L1 normalization.
\\[R = \\frac{\\lambda}{m} \\sum_w |w| \\]
In the above formula, w represents the weights in the neural network, m is the number of training examples, and lambda is the regularization constant / strength. The regularization constant is a hyperparameter which can vary based on the design of the convolutional neural network. In this convolutional neural network, a regularization strength of $10^{-4}$ is currently used. This regularization is combined with the categorical cross-entropy to give the overall cost function.
\\[ C = \\frac{1}{m} L + R = \\frac{1}{m} \\sum_j t_j \\log(p_j) + \\frac{\\lambda}{m} \\sum_w |w| \\]
Backwards pass
Neural networks now can use backpropagation to update weights and biases by propagating this error backwards through the neural network. This is propagated back until the inputs are reached, and the backpropagation has reached all parameter weights W and biases b, during which they are updated in order to minimize the overall cost. In order to change the parameters in a direction that minimizes the cost function, partial derivatives are used with respect to each parameter, starting with the partial derivative of the cost function with respect to weights and bias.
\\[\\frac{\\partial C}{\\partial W^l}, \\frac{\\partial C}{\\partial b^l} \\]
In the above formula, $l$ is the current layer, starting with the last layer. The partial derivatives are used to update the weights connected to the last layer containing the softmax function. In order to continue to update previous layers\’ weights and biases, the chain rule is applied from the current layer to the previous layer. This is done by finding the partial derivative with respect to the current layer z$^l$. Multiplying this by the partial derivative of the previous layer with respect to its weights, biases, and its previous layer is shown below.
\\begin{gather*}
\\frac{\\partial C}{\\partial W^{l – 1}} = \\frac{\\partial C}{\\partial z^l}\\frac{\\partial z^l}{\\partial W^{l – 1}} \\\\
\\frac{\\partial C}{\\partial b^{l – 1}} = \\frac{\\partial C}{\\partial z^l}\\frac{\\partial z^l}{\\partial b^{l – 1}} \\\\
\\frac{\\partial C}{\\partial z^{l – 1}} = \\frac{\\partial C}{\\partial z^l}\\frac{\\partial z^l}{\\partial z^{l – 1}}
\\end{gather*}
This can be computed for any layer $l$ by continuation of backpropagation. Now the gradient for each parameter can be used to update the parameters by using Nesterov\’s momentum$^6$.
\\begin{gather*}
\\hat{v}^l = v^l \\\\
v^l = \\mu v^l – \\lambda \\frac{\\partial C}{\\partial p^l} \\\\
p^l = p^l – \\mu\\hat{v}^l + (1 + \\mu)v^l
\\end{gather*}
In the above equations, p represents a parameter, l is the layer, v̂ is the current velocity, and v is the velocity lookahead, and $\\mu$ is a hyperparameter momentum constant in Nesterov\’s momentum whose common values include [0.5, 0.9, 0.95, 0.99]. In this research $\\mu$ is 0.9. In the second equation above, the partial derivative is the gradient for p. With this new set of weights and biases that were just updated, the neural network has just completed one epoch, which consists of one forward and one backward iteration. Neural networks train through multiple epochs, and, for this research, 100 and 500 epochs are used to train the neural networks.
Approach
In this section, we describe the brain image dataset and our approach to practically training our convolutional neural networks. We first describe the images in the brain tumor dataset. We then describe processing and augmentation of images in order to gain more training data. Lastly, we discuss the various models that were trained from the transformed images.
Data
In order to focus on the axial images alone to avoid confusing the neural networks with three different planes containing the same diagnosis, 989 axial images from 191 patients were separated from the original dataset created by Cheng et. al. Out of these T1-weighted contrast-enhanced images, we have 208 meningioma, 492 glioma, and 289 pituitary tumor images. These images originated from 191 patients, so several of the images are from the same patient.
Preprocessing
Image data were based on 2D slices and were acquired originally at a size of 512×512. In order to train our neural networks, downsizing the images were required due to memory constraints. The slices were then preprocessed into three categories: vanilla data, tumor location, and tumor zoom. Each form of preprocessing is described below.
Vanilla Data Preprocessing
Vanilla data used images as the only input to the neural networks.
Tumor Location
Tumor location used both images and the provided tumor locations. The brain tumor dataset provided the tumor location for each image as a set of points that described the tumor boundaries. In order to provide this to a neural network, the maximum and minimum boundary point in the width direction x and height direction y were determined.
Tumor Zoom
Rather than provide the neural network the maximum and minimum boundary points in the width direction and height direction, these values were used in order to zoom into the tumor region of each brain scan. In order for each image to have a consistent size, the minimum box needed to contain every tumor was determined. To find this box, we found the minimum width and height needed to contain each tumor. The width was determined via the difference between the minimum x and maximum x, and the height was determined via the difference between the minimum y and maximum y. This preprocessing was based on the note from Cheng et. al.\\cite{Cheng15}, stating how the tissue surrounding the tumor can give insight into tumor type.
Image Augmentation
Within each preprocessing category, various image transformations were applied. Large image sizes has implications towards not only neural network training time but also memory issues. Original brain tumor images were scaled down through bilinear interpolation to several small image sizes due to memory constraints. While several image sizes were tested, we mainly discuss the polar ends of the downsizing since they performed the best in areas regarding accuracy and training time performance.
Large Image Size
Large image sizes consisted of images scaled to 256 x 256 pixels, which required brain tumor images to downsize from 512 x 512.
Small Image Size
Similarly to Dieleman et. al.\\cite{Dieleman15}, brain tumor images were given as a large size of 512 x 512. While scaling images to 256 x 256 images solved the memory issues, the time to train our models was very high. In order to speed up training, images were cropped and shrunk. Images were subsequently cropped to 412 x 412 by removing 50 pixels from each side. These pixels were majority unimportant, not withholding any information regarding the brain itself and holding constant pixel values of 0. Since all brain images were nearly centered in their images, only minor portions of the edges of a few brain images were affected. Images were then reduced to 69 x 69 in size by downscaling. This increased training speed by a factor of 10. A small image size of 64 x 64 was created as well by downsizing from the original size of 512 x 512 with similar increases in training speed.
Counteracting Overfitting
Convolutional neural networks have a high number of learnable parameters; the cutting edge neural networks have millions of learnable parameters relying on a large number of images in order to train. With the limited dataset from the brain tumor images, our neural networks were at a high risk of overfitting. Overfitting can occur when neural networks\’ weights memorize training data rather than generalize the input to learn patterns in the data. This can often happen due to small datasets. We applied several methods that prevent overfitting including data augmentation, regularization through dropout, and parameter sharing implied through rotations and transformations of images mentioned below.
Like many images, brain tumor image classifications are invariant under translations, transformations, and rotations. This allows for several forms of data augmentation to be exploited. Data augmentation has proven useful in expanding small datasets\\cite{Dieleman15} to prevent overfitting. In a set of the tests run on the images, several forms of data augmentation were applied.
\\begin{enumerate}
\\item \\textbf{Rotation}: Images were rotated with an angle between 0$\\degree$ and 360$\\degree$ that was randomly taken from a normal distribution.
\\item \\textbf{Shift}: Images were randomly shifted -4 to 4 pixels left or right and up or down. These minor shifts were taken from a normal distribute and kept brains in the center of the image but changed the location of the brains enough to avoid memorization of location in an image rather than relative to the brain itself.
\\item \\textbf{Mirror}: Each image was mirrored across its y-axis (horizontally) with a probability of 0.5.
\\end{enumerate}
After these initial transformations, further augmentation was performed in order to increase the size of the training set each round. Each image was rotated 0$\\degree$ and 45$\\degree$. These images were then cropped to a size of 45 x 45 taking the four corners of the images as edges to produce 16 possible images. The above data augmentation was run on the training data every epoch of training in order to constantly introduce new images to the neural network every iteration. This augmentation affected training time very little. We will call this processing step CO for counteracting overfitting.
Crop Averaging
Following Krizhevsky et. al.\\cite{Krizhevsky12}, another form of data augmentation was implemented during training for 256 x 256 images, in which images were downscaled to 224 x 224 and five random patches of 196 x 196 were extracted from each training of these images in order to increase training data. When testing occurred, five 196 x 196 patches of each test image downscaled to 224 x 224 were extracted, one for each corner of the image and one for the center. The softmax probabilities for each of these images were then averaged together to give averaged softmax probabilities.
Network construction
A variety of neural networks were constructed using the neural network library Lasagne\\cite{Lasagne} which is backed by Theano\\cite{Theano}. The networks were based on the preprocessing of image data, and each is described in detail in this section. For each neural network, each convolutional and fully connected layer applied the nonlinearity ReLU and dropout to help in regularization and overfitting.
Convolutional Neural Network
This neural network represents taking only images as input. While many combinations of layers were tested, the best combination for this neural network was the following.
\\begin{itemize}
\\item Convolutional Layer with 64 filters of size 5 x 5 and stride of 1
\\item Max-pooling Layer with pool and stride size 2 x 2
\\item Convolutional Layer with 64 filters of size 5 x 5 and stride of 1
\\item Max-pooling Layer with pool and stride size 2 x 2
\\item Fully Connected Layer with 800 neurons
\\item Fully Connected Layer with 800 neurons
\\item Softmax Layer with 3 neurons
\\end{itemize}
We will refer to this neural network as CNN from now on in this paper.
Fully Connected Neural Network
This neural network represents taking only images as input as well, but it does not utilize any convolutional or max-pooling layers. This network consisted of the following layers.
\\begin{itemize}
\\item Fully Connected Layer with 800 neurons
\\item Fully Connected Layer with 800 neurons
\\item Softmax Layer with 3 neurons
\\end{itemize}
We will refer to this neural network as FCNN from now on in this paper.
Concatenation of Convolutional and Fully Connected Input Layers
This neural network represents providing more information than one image input. There are two version of the neural network. Each version has a neural network synonymous to CNN from above. However, a second input layer exists representing (1) the same image input or (2) the maximum and minimum x and y to represent the location of the tumor. These have their own neural network path that eventually concatenates with the CNN from before. This second neural network path consists of the following layers:
\\begin{itemize}
\\item Fully Connected Layer with (1) 800 neurons or (2) 4 neurons
\\item Fully Connected Layer with (1) 800 neurons or (2) 4 neurons
\\end{itemize}
The last layer of this path and the last fully connected layer from CNN were then concatenated together and connected to two final fully connected layer with 800 neurons before reaching the softmax layer from CNN. We will refer to this neural network as ConcatNN from now on in this paper.
Random Forests
Random Forests were created in 2001 by Breiman$^{11}$, and they are a combination of tree predictors where trees are dependent on randomly sampled independent vectors. Each tree is given features with minor amounts of perturbation in order inject noise in the data, and noise is further injected in the model level through randomization of attributes to split decisions on. While random forests are not neural networks, they have become a common technique in machine learning research in the medical field. To compare against the neural network models, a test was conducted using random forests with the best performing neural network\’s image dataset. The random forests used 10 trees, a max depth of 10, and 65,536 features from the 256 x 256 images.
Training
For each of the preprocessed datasets, patients were randomly placed into three sets for training, validation, and test with 149, 21, and 21 patients respectively. A patient represents all of a patients images; this avoids mixing patient data in both training and test which allows for easier predictions since patient images are similar in structure. The mean picture from training was subtracted from train, validation, and test in order to centralize the data. An example mean picture can be seen in Figure 3. This was found to produce higher accuracies than cases without subtraction of the mean picture. During the training of the neural networks, training data was used to update weights while validation data gave a glimpse into how the neural network was improving over time. After the training phase was completed, the test data was then used to see how well the neural networks predicted types of tumors from new images.
A variety of hyperparameters are available to alter. We list the hyperparameters that produced the highest accuracies.
\\begin{itemize}
\\item \\textbf{Regularization constant: } 0.014
\\item \\textbf{Learning rate: } 0.0001
\\item \\textbf{Momentum constant: } 0.9
\\item \\textbf{Batch size: } 4 for non-augmented datasets, 128 for augmented datasets
\\item \\textbf{Epochs: } 100 (and one 500) which was compensation between accuracy and training time
\\end{itemize}
Decaying Learning Rate
Rather than maintain a constant learning rate, a decaying learning rate was attempted in order to increase accuracies by decreasing the learning rate over time. However, each case of the decaying learning rate had significantly worse accuracies than without them.
Accuracy Metrics
Three different models were computed during validation in order to evaluate model performance on test data.
\\begin{itemize}
\\item
Last: The model after the last training epoch.
\\item
Best: The model at the epoch of the best validation accuracy for per image accuracies.
\\item
Best-PD: The model at the epoch of the best patient diagnosis validation accuracy for per patient accuracies.
\\end{itemize}
For each of the above models, two metrics were used to measure the prediction quality: per image and per patient accuracy. Per image accuracies were calculated by dividing correctly predicted images by the total number of images tested. Per patient accuracies were calculated by averaging softmax probabilities from all images of the same patient before classification. Since doctors take several brain slices to evaluate a patient, the motivation to include the per patient metric came from evaluating patients by utilizing all information about a patient. For each of these models, per image and per patient accuracies were applied to evaluate test performance.
Results
The accuracies for the conducted tests can be seen in Table 1. From these accuracies, we can see per patient accuracies were consistent with per image accuracies. This implied brain tumor accuracies across patient images varied little and were classified nearly the same. Training on larger images predicted tumors more accurately, producing 10\\% higher results than neural networks that corrected overfitting, even with the increase in epochs from 100 to 500. In order to further compare these models, the loss and accuracy history for training and validation sets were plotted for each model with average five-fold cross validation, and representative examples are given in Figure 4. While the loss histories show the 256 x 256 images had overfitting over time due to lack of examples, their accuracies showed to be consistently higher than smaller images. When looking at the precision at k (Figure 5), nearly all models remained above 90\\%. This metric uses the top k predictions with the highest probabilities over all images. This top-k result demonstrates that the CNN accurately classifies images the classifier has high confidence about. Having 90\\% accuracy consistently implies the predictions with the highest probabilities were often correct. A particular note is that any model using 256 x 256 images had a precision of 1.0 from k = 1 to 20. Larger images produced better classification for a model’s top predictions. From Table 1, we can see the best performing neural network was the convolutional neural network with image size 256 x 256 using images only (Vanilla CNN 256 x 256) and has the highest accuracy at 91.43\\%. The weights from the best neural network’s first convolutional layer can be seen in Figure 6. Minor structures representing low level features can be seen from each of these 5 x 5 weights.
We analyzed the best performing neural network further in Tables 2-5. As seen from Figure 4, the best performing neural network, Vanilla CNN 256 x 256, earned a perfect score for average precision at k equals 1 to 20. In Table 2-3, we increase k until there is an incorrect prediction for Vanilla CNN 256 x 256. For per image and patient accuracy, the neural network averages reaching well over half of the images and patients before predicting an incorrect tumor type, with the best cross validation reaching 90\\% and 100\\% accuracy respectively. To see how the best neural network performs on each tumor type, we evaluate the precision and recall for the Best model for per image accuracy and the Last model for per patient accuracy (Table 4-5). These two models performed the best in their respective accuracy measure. In Table 4, we can see meningioma tumors were the most difficult to predict with an average of 0.84 precision and 0.74 recall, while glioma and pituitary had precision and recall in the mid-90\\%s. In Table 5, tumor type precision and recall is approximately equal with averages of 93\\%, 93\\%, and 91\\% for meningioma, glioma, and pituitary tumors respectively.
Lastly, random forest was run on the vanilla data in order to compare its results to the best performing neural network. The trained model gained averages close to 90\\% consistently on test data for both per image and per patient accuracies with considerable speed up as compared to training neural networks.
Conclusion and Future Work
Convolutional neural networks demonstrate that a general method can outperform specialized methods using image dilation and ring-forming subregions when classifying brain tumors. Training convolutional neural networks to detect tumor types in brain images improves classification accuracy, requires only images to understand brain tumor types, and provides initial steps into introducing deep learning into medicine. Furthermore, the per patient accuracy metric consistently remained at the levels of per image accuracy results, implying the neural network is providing both consistent brain tumor predictions and similar accuracies across images of the same patient. Future work can add upon this research by exploring neural networks that train on coronal and sagittal images. Combining patient images across planes can increase dataset size and provide insights into tumor type that is difficult to view from only one plane. Furthermore, adding brain images without tumors may help distinguish tumors further in classifications. Lastly, decreasing image size improved efficiency of training neural networks greatly. Improving performance on smaller images can have great benefits in training and assisting doctors in patient treatment. Dealing with noisy, smaller images can help neural networks understand more complex brain images.

Essay: Applying deep learning methods to classify brain images with different tumour types

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: