Crop disease detection through Deep Convolutional Neural Network

INTRODUCTION
1.1 OVERVIEW
The agricultural land mass is more than just being a feeding sourcing in today’s world. Indian economy is highly dependent of agricultural productivity. Therefore in field of agriculture, detection of disease in plants plays an important role. To detect a plant disease in very initial stage, use of automatic disease detection technique is beneficial. For instance a disease named little leaf disease is a hazardous disease found in pine trees in United States. The affected tree has a stunted growth and dies within 6 years. Its impact is found in Alabama, Georgia parts of Southern US. In such scenarios early detection could have been fruitful.
The existing method for plant disease detection is simply naked eye observation by experts through which identification and detection of plant diseases is done. For doing so, a large team of experts as well as continuous monitoring of plant is required, which costs very high when we do with large farms. At the same time, in some countries, farmers do not have proper facilities or even idea that they can contact to experts. Due to which consulting experts even cost high as well as time consuming too. In such conditions, the suggested technique proves to be beneficial in monitoring large fields of crops. Automatic detection of the diseases by just seeing the symptoms on the plant leaves makes it easier as well as cheaper.
Plant disease identification by visual way is more laborious task and at the same time, less accurate and can be done only in limited areas. Whereas if automatic detection technique is used it will take less efforts, less time and become more accurate. In plants, some general diseases seen are brown and yellow spots, early and late scorch, and others are fungal, viral and bacterial diseases.A model for classifying the crop disease is proposed. .The existing Deep Convolutional Neural Network, MobileNet Architecture is adapted for crop disease detection application. This model recognizes 16 classes of leaf images that contains both diseased and healthy leaf. The Deep Convolutional Neural Network takes much time for layer computation and training the entire image dataset. In order to achieve fast performance, the system should be parallelized using Graphics Processing Unit by Compute Unified Device Architecture (CUDA) programming.When this model is applied in real time, it takes 22 hours for training.
Many attempts have been made to understand why and how deep learning obtains such impressive performances. Full understanding of how to choose structural features as well as how to efficiently tune hyper-parameters of models is still far from being a reality. Currently, deep learning models need a significant amount of computation burden to reach state-of-the-art performances on large sized data sets in offline environment. Adapting the deep learning concept for precision agriculture has the challenge of computing the large amount of available images in high speed. In agricultural field, the robots can be employed to scan the leaves all over the field which are the inputs for training. The real time images of leaves will contain noises and distortions. Thus, there is a need for developing a deep learning model for diagnosing the crop diseases which is capable of performing efficient training and parallel computing in short time with high accuracy.
1.2 OBJECTIVE
• To diagnose the diseases in crops like corn, banana, apple and grape with the help of image classification using deep learning technique
• To enhance the performance of the system using parallel computing in Graphics Processing Unit(GPU) by using various classifiers.
• To compare the efficiency of plant disease detection using classifiers such as SVM, Decision tree, Extreme Learning Machine classifier.
1.3 DEEP LEARNING
Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence.A Convolutional Neural Network (CNN), a popular architecture of Deep learning has been used in wide variety of computer vision applications. CNN is comprised of one or more convolutional layers and then followed by one or more fully connected layers as standard multi-layered neural network.
Figure 1.1 Convolutional Neural Network
The architecture of a CNN is designed to take advantage of the 2D structure of an input image. This is achieved with local connections and tied weights followed by some form of pooling which results in translation invariant features. Another benefit of CNN is that they are easier to train and learn fewer significant parameters than fully connected networks with the same number of hidden units. Conversion of image into 3 Dimensional volume and formations of filters at every layer forms the deep architecture. The use of filters reduces the number of parameters to be learnt. The concept of retraining the existing network of MobileNet is utilized inthis project.
1.4 GRAPHICS PROCESSING UNIT
A general-purpose graphics processing unit (GPGPU) is a graphics processing unit (GPU) that performs ordinary mathematical and logical calculations that is usually performed by the CPU (Central Processing Unit). Usually, GPU’s are used to render graphics to application layer. GPGPU are used to do tasks that were once performed using high performance CPU’s such as encryption/decryption, bit coin mining and other scientific computation. Graphics cards are constructed for massive parallel applications which eventually bring down the calculation rate of the more powerful CPU’s for many parallel tasks. The GPU consist of shader cores that are responsible for rendering multiple pixels simultaneously. This can be turn to process multiple image or data simultaneously.
A highly configured GPU may contain many shader cores while a multi-core CPU might just have 8-10 cores. GPGPU have faced an increased focus since the advent for Direct X 10 with unified shaders in its shader core specification. GPU companies like AMD/ATI and Nvidia have approaches to GPGPU’s with their own API’s. AMD uses OpenCL while Nvidia uses Compute Unified Device Architecture (CUDA) for programming GPUs.
The modules that can be parallelized are to be identified. With CUDA programming, model should be implemented which has same accuracy as CPU but with high speed of execution. GPU suits well for Deep learning operations like processing thousand of images, computing many intermediate layers and extracting large set of features as mentioned in NVIDIA site.
Figure 1.2 CPU and GPU
1.5 NVIDIA GEFORCE GPU
GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate deep learning, analytics, and engineering applications. Pioneered in 2007 by NVIDIA, GPU accelerators now power energy-efficient data centers in government labs, universities, enterprises, and small-and-medium businesses around the world. They play a huge role in accelerating applications in platforms ranging from artificial intelligence to cars, drones, and robots.
GPU-accelerated computing offloads compute-intensive portions of the application to the GPU, while the remainder of the code still runs on the CPU. From a user’s perspective, applications simply run much faster.
A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles.
Modern GPUs are very efficient at manipulating computer graphics and image processing, and their highly parallel structure makes them more efficient than general-purpose CPUs for algorithms where the processing of large blocks of data is done in parallel. In a personal computer, a GPU can be present on a video card, or it can be embedded on the motherboard or – in certain CPUs – on the CPU die.
A simple way to understand the difference between a GPU and a CPU is to compare how they process tasks. A CPU consists of a few cores optimized for sequential serial processing while a GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously. NVIDIA GeForce 940M and GT 610 are the two
1.5.1 NVIDIA GeForce 940M
GPU’s used in this work. The NVIDIA GeForce 940M is a mid-range DirectX 11-compatible graphics card for laptops unveiled in March 2015. It is based on Nvidia’s Maxwell architecture (GM108 chip) and manufactured in 28 nm. The 940M offers 384 shader units as well as 2 GB of DDR3 memory (64 bit, 2000 MHz effective). GM108 integrates the sixth generation of the PureVideo HD video engine (VP6), offering a better decoding performance for H.264 and MPEG-2 videos. Of course, VP6 supports all features of previous generations (4K support, PIP, video encoding via NVENC API). Unfortunately, HDMI 2.0 is still not supported.
1.5.2 NVIDIA GeForce GT 610
Every PC deserves dedicated graphics. Bring your multimedia performance to life with an NVIDIA® GeForce® GT 610 graphics card. Step up to NVIDIA dedicated graphics for a faster, more immersive experience in your favorite applications—every time. It has the following features NVIDIA PureVideo HD Technology, Blu-ray 3D Support, TrueHD and DTS-HD Audio Bitstreaming, Microsoft DirectX 11 Support, NVIDIA CUDA Technology, NVIDIA PhysX Technology, NVIDIA FXAA Technology, NVIDIA Adaptive Vertical Sync, HDMI, Dual-link DVI and PCI Express 2.0 Support.
1.6 TENSOR FLOW
Crop disease detection model is coded using TensorFlow framework and Python. TensorFlow is an open source library designed for numerical computation using data flow graphs. Nodes represent mathematical operations and the graph edges represent the tensors communicated between them. The flexible architecture of TensorFlow helps to deploy computation to one or more CPUs or GPUs in a desktop with a single API. Deep flexibility, true portability, auto differentiation and language option are the some of the features of TensorFlow.
A study is made on TensorFlow to extract features from leaves using deep learning. TensorFlow provides basic framework for CNN like convolutional layers, pooling layers, optimization algorithms and so on. The pre-trained CNN is fine tuned for crop-disease detection
1.7 MATLAB
MATLAB (matrix laboratory) is a multi-paradigm numerical computing environment. A proprietary programming language developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages, including C, C++, C#, Java, Fortran and Python. Although MATLAB is intended primarily for numerical computing, an optional toolbox uses the MuPAD symbolic engine, allowing access to symbolic computing abilities. An additional package, Simulink, adds graphical multi-domain simulation and model-based design for dynamic and embedded systems.
CHAPTER 2
LITERATURE REVIEW
There are many works related to crop disease diagnosis where features of diseased leaves are extracted using image processing techniques and classified using machine learning algorithms. The works focusing on deep learning architecture for computer vision applications are also reviewed.
An algorithm is proposed by [1] for plant disease detection based on genetic algorithm. The disease in the plant is identified based on color difference and the images in the datasets are segmented using Genetic Algorithm. They used four disease samples like banana, beans, lemon and rose. They proposed an algorithm in such a way that it is fully automatic so there is no need of user input at the time of segmentation. The results of the proposed algorithm is compared with different classifiers and SVM produces the highest accuracy.
The fully-connected and convolutional neural networks have been trained[2] to achieve state-of-the-art performance on a wide variety of tasks such as speechrecognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these “deep learning” models employ the softmax activation function for prediction and minimize cross-entropy loss. Support vector machine is an widely used alternative to softmax for classification.The cassification to train the deep neural networks has been done using L2-SVM.Lower weights are updated by back propagating the top layer of the linear SVM.The overfitting has been avoided by adding Gaussian noise.The accuracy is checked between the softmax and the linear svm in the last layer.The switch over from softmax to linear svm is simple and produced a results of softtmax-0.99 and linear svm 0.87
Various models has been discussed in [3] and designed for precision agriculture using various techniques ranging from rule-based approach, deductive logic, inductive logic, fuzzy logic, decision tree to neural networks. It is concluded that developing an expert system with the ability to tackle many disease including the rarest one with high speed and accuracy is need of the hour.
A new approach is introduced in [7] for disease recognition using CNN training and fine tuned to fit accurately for plant disease dataset. To avoid over fitting, model is trained with high resolution images which are focused on regions of interest(diseased spots) and well augmented with affine transformation, perspective transformation, rotation and scaling. CNN is implemented in Caffe using CaffeNet architecture which has multiple layers that progressively compute features through series of layers namely 5 Convolutional layers, 3 pooling layers, 2 normalization layers and 1 fully connected layer. CNN training performed in GPU mode with overall accuracy 96.3 %.
A deep-learning-based classification method is proposed in [8] which combines convolutional neural networks (CNNs) and extreme learning machine (ELM) to improve classification performance. A pretrained CNN is initially used to learn deep and robust features. However, the generalization ability is finite and suboptimal, because the traditional CNN adopts fully connected layers as classifier. It uses an ELM classifier with the CNN-learned features instead of the fully connected layers of CNN to obtain excellent results. Experimental results show that the proposed CNN-ELM classification method achieves satisfactory results.
A deep learning based classification based on vein morphology.Vein features was extracted from PLANTGUI is discussed in [5].Three different machine learning algorithms Support VectorMachine,Penalised Discriminant Analysis and Random Forests were tested..Three transforms are done in each layer of convolutional neural networks.Convolutional. Neural Network is trained from two Layers to six layers.The accuracy for setup 1 is 92.6 and for 2 is 96.9.The main result we obtained high accuracy using standard deep learning model.
This [6] has identified different types of plant disease using deep convolutional neural network with the ability to distinguish the plant leaves from the surrounding.They have identified 13 different types of plant diseases from healthy leaves.The CNN training has been performed with Caffe , a deep learning framework.This model produces a precision of 96.3.
[4] has described the large scale image collection process of ILSVRC provide information about most successful algorithms.The goal of this paper is to take a closer look at the current state of the field of categorical object recognition.Image Level annotation and object level annotations are the two categories of ILSVC.ImageClassification, Single Object Localization and Object Detection are the tasks of ILSVRC.The paper explored improved convolutional neural network combining the multiscale idea gained from the hebbian principle.
The deep convolutional neural network has been trained in [9] to classify 1.2 million images into 1000 classes.The neural network consists of five convolutional layes then some max poolin layers and three fully connected layers.The GPU has been used to make the training faster.The dropout method has been used to avoid overfitting in fully connected layer.
CHAPTER 3
DEEP LEARNING APPLICATION FOR PLANT DISEASE DETECTION USING GPU
Images of crop leaves like apple, banana, corn and grape are given as inputs. Images are augmented to produce large dataset for training. The number of layers and characteristics of each layer are defined. Layers of CNN include like convolutional layer, pooling layer and fully connected layer. Images are converted to numpy array of pixels for further processing as mentioned in TensorFlow tutorial. CNN is fine tuned to adapt the crop disease dataset. Retraining of MobileNet Architecture is defined and classification of the image is done after retraining and testing using softmax and fully connected layer. Loss, precision and accuracy of linear model and proposed convolutional layer are calculated.
3.1 ARCHITECTURE
The architecture diagram as shown in Figure 3.1 depicts the modules present in this project.
Figure 3.1Architecture diagram
The crop disease dataset is acquired from various sources and augmented to increase the size of dataset for training. Images are fedfor feature extraction. Classification is done with the features generated from CNN which results in correct prediction of classes. Further the model should be parallelized using GPU. The architecture of the feature extraction using CNN architecture is shown in Figure 3.2.
Figure 3.2 Convolutional Neural Network Architecture for Classification
This project uses the idea of transfer learning. Transfer learning is retraining or fine-tuning the trained model on a distinct image classification task instead of training from scratch. Training image dataset with class labels are the inputs for retraining. Inputs are fed to the fully connected layer which generates features for every image which are stored in bottleneck files. Classification is done with bottleneck files after applying softmax regression. The model generates categorical probability distribution i.e. probability of test image to be in each class label.
3.2 IMAGE ACQUISITION
Datasets for plant disease that contains very common diseases is collected first.PlantVillage dataset is an open access database of 50000+ images. This database is maintained and grown in order to enable the development of open access machine learning algorithms that can accurately classify crop disease with smart phone application. It is provided with images of leaves of various crops such as apple, maize (corn), banana, blueberry etc. This model is experimented with images of maize leaves obtained from PlantVillagedataset .
3.3 IMAGE AUGMENTATION
Deep networks need large amount of training data to achieve good performance. To build a powerful image classifier using very little training data, image augmentation is usually required to boost the performance of deep networks. Image augmentation artificially creates training images through different ways of processing or combination of multiple processing, such as random rotation, shifts, shear and flips, etc.
The following augmentation process is followed
• Random scaling of the image- Having differently scaled object of interest in the images is the most important aspect of image diversity. When your network is in hands of real users, the object in the image can be tiny or large. Also, sometimes, object can cover the entire image and yet will not be present totally in image (i.e cropped at edges of object).
• Rotating the image- Depending upon the requirement, there maybe a necessity to orient the object at minute angles. However problem with this approach is, it will add background noise. If the background in image is of a fixed color (say white or black), the newly added background can blend with the image. However, if the newly added background color doesn’t blend, the network may consider it as to be a feature and learn unnecessary features.
• Flipping left and right- This scenario is more important for network to remove biasness of assuming certain features of the object is available in only a particular side. Consider the case shown in image example. You don’t want network to learn that tilt of banana happens only in right side as observed in the base image. Also notice that flipping produces different set of images from rotation at multiple of 90 degrees.My additional question is has anyone done some study on what is the maximum number of classes it gives good performance. Consider, data can be generated with good amount of diversity for each class and time of training is not a factor.
• Adjusting brightness- This is a very important type of diversity needed in the image dataset not only for the network to learn properly the object of interest but also to simulate the practical scenario of images being taken by the user. The lighting condition of the images are varied by adding Gaussian noise in the image.
3.4 CONVOLUTIONAL NEURAL NETWORK
A Convolutional Neural Network (CNN), a popular architecture of Deep learning has been used in wide variety of computer vision applications. Convolutional Neural Network (CNN) is constructed with 1 input layer, many hidden layers and 1 output layer. Layers in CNN are such as convolution layer, max pooling layer and dropout layer. CNN can also have one or more convolutional layers and then followed by one or more fully connected layers as standard multi-layered neural network.
TYPES OF LAYERS:
• Convolutional Layer
• Max Pooling Layer
• Fully Connected Layer
3.4.1 CONVOLUTIONAL LAYER
Convolutional layer is the core building block of CNN. This layer connects each neuron at hidden layer only to a local region of the input volume instead of connecting to all neurons of the previous layer. These local regions are called “receptive fields” generated by moving filters over the whole image. In Tensorflow, convolutions use a stride of ones and zeros padded so that the output is the same size as the input.
3.4.2 POOLING LAYER
Pooling layer is used to minimize the spatial size of pixel representation which in turn reduces the amount of parameters to be learned and reduce the computation in the network to control over fitting. Max pooling technique is used. 2*2 Filter slides over every pixel and fills the maximum neighbor pixel value as shown below.
3.4.3 FULLY CONNECTED LAYER
The resultant pixel values from series of conv_layer, pool_layer and drop_layer are the 1D input for the fully connected layer. Outputs are the class labels. This layer computes the probabilities for each class label.
• SOFTMAX
Softmax Function is used to classify multiple classes.Softmax function is general form of sigmoiod function.Eventhough softmax function function provides vector of probabilities for each class label to calculate cost function the vector has to be converted to the same format.The calculated probabilities from the softmax function and the created encoding matrix is used to calculate the distance encoding function which is the cross entropy.The distance value will be lesser for right target class and larger for wrong target class.
• SVM
SVM is a supervised machine learning algorithm that analyse data used for both classification and regression.SVM is the representation of points in space that are mapped and the classes are separated by classes using wide gap that is as far as possible.The hyperplane separates the classes and a good separation is achieved by largest distance to the nearest training data point of any class.In case of linearly separable parallel hyperplane is chosen and the region bounded by the hyperplane is called margin.In multiclass svm the labels are drawn from different elements and they are assigned to every instance.
• KNN
This is the most simplest and non parametric method used for both classification and regression is called KNN.The is the k closest training samples in feature space.In this type of classification the output is based on class membership.The object is assigned to the class that contains most common among the k nearest neightbour(ie majority vote of the neighbours).In case of regression the object is assigned based on the average values of the k nearest neighbour.Knn is an instance based learning where all the function is approximated locally and until the classification is done the computations are deffered.
• NAIVE BAYES
Naïve bayes classifier is a probabilistic classifier based on appling bayes theorem.This contains strong independent assumptions between the features.That is the value of one feature is independent of the other featuresIf the number of parameters is linear in the number of variables in a learning problem the the classifier is linear.Naive bayes classifier assign class labels to the instances.This is a conditional probability model that assign probability to the instances.
• LINEAR DISCRIMINANT CLASSIFIER
Linear Discriminant alalysis is machine learning model to find linaer combination of features that separate two or more classes.This contains continuous independent variables and categorical dependent variables.LDA is closely related to principle component analysis and linear analysis.The difference between classes of data is done using LDA.In LDA a distinct difference between dependent and independent variables has been made.
3.5 DEEP LEARNING USING SOFTMAX LAYER
3.5.1 GOOGLENET ARCHITECTURE
GoogleNet is a class of architecture designed by researchers at Google.This architecture contains 22 layers, the researchers also made a novel approach called the Inception module.
Figure 3.3 Pretrained Model
In a single layer, multiple types of “feature extractors” are present. This indirectly helps the network perform better, as the network at training itself has many options to choose from when solving the task. It can either choose to convolve the input, or to pool it directly.
The final architecture contains multiple of these inception modules stacked one over the other. Even the training is slightly different in GoogleNet, as most of the topmost layers have their own output layer. This helps the model converge faster, as there is a joint training as well as parallel training for the layers itself.
Figure 3.4 Layers in GoogleNet Architecture
Given 10 possible classes, the softmax layer has 10 nodes denoted by pi , where i = 1, . . . , 10.pi specifies a discrete probability distribution, therefore
Let h be the activation of the penultimate layer nodes, W is the weight connecting the penultimate layer to the softmax layer, the total input into a softmax layer, given by a, is
then we have
The predicted class ˆi would be
3.6 DEEP LEARNING USING SVM AND OTHER CLASSIFIERS:
The fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these “deep learning” models employ the softmax activation function for prediction and minimize cross-entropy loss. Support vector machine is an widely used alternative to softmax for classification.
3.6.1 FEATURE EXTRACTION USING PRE TRAINED MODEL
Traditional machine learning approach uses feature extraction for images using Global descriptors such as Local Binary Patterns, Histogram of Oriented Gradients, Color Histograms etc. or Local descriptors such as SIFT, SURF, ORB etc. These are hand-crafted features that requires domain level expertise.
But here comes Deep Neural Networks! Instead of using hand-crafted features, Deep Neural Nets automatically learns these features from images in a hierarchical fashion. Lower layers learn low-level features such as Corners, Edges whereas middle layers learn color, shape etc. and higher layers learn high-level features representing the object in the image.
Thus, we can use a Convolutional Neural Network as a Feature Extractor by taking the activations available before the last fully connected layer of the network (i.e. before the final softmax classifier). These activations will be acting as the feature vector for a machine learning classifier which further learns to classify it. This type of approach is well suited for Image Classification problems, where instead of training a CNN from scratch (which is time-consuming and tedious), a pre-trained CNN could be used as a Feature Extractor.
Below are the pre-trained models available and MobileNet model is used here for feature extraction
• Xception
• VGG16
• VGG19
• ResNet50
• InceptionV3
• InceptionResNetV2
• MobileNet
Linear support vector machines (SVM) is originally formulated for binary classification. Given training data and its corresponding labels, SVMs learning consists of the following constrained optimization
Since L1-SVM is not differentiable, a popular variation is known as the L2-SVM which minimizes the squared hinge loss:
L2-SVM is differentiable and imposes a bigger (quadratic vs. linear) loss for points which violate the margin. To predict the class label of a test data x:
3.7 ALGORITHM
3.7.1 SOFTMAX:
• Images of diseased and healthy leaf of crops like apple, banana, corn and grapes are taken as input.
• The learning rate, the number of training steps and the initial weights are given as input to the model.
• Image augmentation parameters like random_brightness, random_crop, random_scale, random_flip are given values to segment the image.
• Convolutional layer, Pooling layer, Fully connected layer are defined with tensor size and number of slices.
• Evaluating the training dataset.
• Computation is done to make the probability of each test image to fall in each class label.
• Testing the model
3.7.2 SVM AND OTHER CLASSIFIERS:
• The features are extracted using neural network in the form of text files
• The text files are added with labels and are converted to CSV files.
• The data are shuffled and 80% of the data are used for training and the remaining part is for testing.
• The classification using SVM, Naïve Bayes, KNN tree and Linear Discriminant classifiers are done in MATLAB.
• After training, the remaining 20% of the data are used for testing and the predicted labels are displayed.
• The predicted labels are then compared with the original labels.
• The accuracy, confusion matrix, precision and recall are calculated.
3.8 PROCEDURE
3.8.1 BUILDING CONVOLUTION LAYER IN TENSORFLOW:
tn.cn.conv2d function uses three inputs which make up the CNN.
3.8.1.1 READING INPUT:
Training data: 80% of the images are used for training.
Validation data: 20% images will be used for validation. These images are taken out of training data to calculate accuracy independently during the training process.
Test set: Sometimes due to something called Overfitting; after training, neural networks start working very well on the training data (and very similar images) i.e. the cost becomes very small, but they fail to work well for other images. It’s possible that your network works very well on this validation data-set, but if you try to run it on an image with a cluttered background rather than plain background it will most likely fail.
Convolution Neural Network (CNN) is constructed with 1 input layer, many hidden layers and 1 output layer . Layers in CNN are such as convolution layer, max pooling layer and dropout layer. This model is constructed similar to MobileNet architecture. Feature map is given as input to the last fully connected layer with class labels as outputs. Using gradient descent optimizer, cross entropy and loss rate is computed for every iterations.
3.8.1.2 Image Augmentation
Deep networks need large amount of training data to achieve good performance. To build a powerful image classifier using very little training data, image augmentation is usually required to boost the performance of deep networks. Image augmentation artificially creates training images through different ways of processing or combination of multiple processing, such as random rotation, shifts, shear and flips, etc.
The following augmentation process is followed
• Random scaling of the image
• Rotating the image
• Flipping left and right
• Adjusting brightness
3.8.1.3 Training the network
The following configuration options are set,
• Input image resolution: 224px. It can also be 128,160,192, or 224px. Unsurprisingly, feeding in a higher resolution image takes more processing time, but results in better classification accuracy.
• The relative size of the model as a fraction of the largest MobileNet:0.5 and can also be any of these values like 1.0, 0.75, 0.50, or 0.25. The smaller models run significantly faster, at a cost of accuracy.
Figure 3.5 Set the Architecture
3.8.1.4 Training the MobileNet model
As noted in the introduction, Imagenet models are networks with millions of parameters that can differentiate a large number of classes. This model only trains the final layer of that network, so training will end in 22 hours including augmentation.
The following command is used for training.
Figure 3.6 Training command
The process involves spitting of images randomly into a training set with 80% data and an evaluation set with 20% data. Training dataset are feed into the feature extraction part which converts the image data into feature vectors consisting of 2048 float values for each image. A feature vector represents the features of the image in an abstract manner
This ends with the datasets running using pretrained model and classification using softmax layer. The feature extracted files are then converted to csv files with labels included in it. The following functions are used to classify with different classifiers
3.8.2 ADDING CLASSIFIERS USING MATLAB
3.8.2.1 SVM
templateSVM() returns a support vector machine (SVM) learner template suitable for training error-correcting output code (ECOC) multiclass models.If you specify a default template, then the software uses default values for all input arguments during training.Specify t as a binary learner, or one in a set of binary learners, in fitcecoc to train an ECOC multiclass classifer.
3.8.2.2 Naïve bayes
ClassificationNaiveBayes is a naive Bayes classifier for multiclass learning. Use fitcnb and the training data to train a ClassificationNaiveBayes classifier.Trained ClassificationNaiveBayes classifiers store the training data, parameter values, data distribution, and prior probabilities. You can use these classifiers to:Estimate resubstitution predictions. For details, see resubPredict.Predict labels or posterior probabilities for new data. For details, see predict.
3.8.2.3 KNN
Knn- t = templateKNN() returns a k-nearest neighbor (KNN) learner template suitable for training ensembles or error-correcting output code (ECOC) multiclass models.If you specify a default template, then the software uses default values for all input arguments during training.Specify t as a learner in fitcensemble or fitcecoc.
3.8.2.4 LINEAR DISCRIMINANT CLASSIFIER
Linear discriminant- t = templateDiscriminant() returns a discriminant analysis learner template suitable for training ensembles or error-correcting output code (ECOC) multiclass models.If you specify a default template, then the software uses default values for all input arguments during training.Specify t as a learner in fitcensemble or fitcecoc.The confusion matrix precision and recall are then calculated
3.8.3 CALCULATING CONFUSION MATRIX ,PRECISION AND RECALL
3.8.3.1 CONFUSION MATRIX
In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix,[4] is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa).[2] The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabelling one as another).
A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing.
3.8.3.2 PRECISION AND RECALL
In pattern recognition, information retrieval and binary classification, precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while recall (also known as sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. Both precision and recall are therefore based on an understanding and measure of relevance.
3.9 EVALUATION
The image recognition model called MobileNet model is used for training consists of two parts:
• Feature extraction part with a convolutional neural network.
• Classification part with classifiers like softmax, SVM, Naïve Bayes, KNN, Linear Discriminant.
The feature extraction part is reused for crop disease dataset and the classification part with the dataset is retrained. Hence, the model is trained with less computational resources and training time. Bottleneck is the layer before the final output layer that actually does the classification. Bottleneck files contain the summary of the images.
Training of the model has 500 steps. Each step chooses ten images randomly from the training set, calculate their bottlenecks from the cache, and feeds them into the final layer to get predictions. Those predictions are then compared against the target labels to update the weights of final layer through the back-propagation process. The reported accuracy has improved, as the process continues. After every 10 steps, training accuracy is calculated for 80% dataset, validation accuracy is calculated over randomly selected 10% images in the training set. Test accuracy is calculated at the end of training with the 20% of images that are kept for testing. This part uses softmax layer for classification whereas classifiers like SVM, Naïve Bayes, KNN, Linear Discriminant classifiers are done using MATLAB and follows the same ratio for training and testing i.e., 80% for training and 20% for testing. The predicted labels are compared with the actual labels to find the accuracy, confusion matrix, precision and recall.
3.10 VALIDATING WITH TEST IMAGES
The new set of test images excluded from training set is given as input to the model. The graph containing the extracted features i.e. bottleneck files are used for classification. The final layer contains 16 neurons i.e., 16 class labels. Softmax regression is computed between the bottleneck layer and the last fully connected layer which calculates the score for each test image i.e., probability of an image to fall on each class label. Incase of the SVM and other classifiers the CSV files are shuffled and 20% of the data that are used for testing are predicted and compared with the actual labels.
CHAPTER 4
RESULTS AND DISCUSSION
The crop disease dataset contains images of leaf like apple, banana, corn, grape.The MobileNet architecture of CNN is implemented using TensorFlow framework in Python Language. Training mechanism used is transfer learning instead of training from scratch. This model creates training, testing and validation sets from the input dataset by hashing the name of the image and using probability to assign to one of the above sets. The training accuracy, cross entropy and validation accuracy are calculated for each set and displayed for every 10 steps. During training, the bottleneck files for all images are created which contains the feature set for classification. Features sets are the weights and bias for fully connected layer of model that is computed and saved for each image. Bottleneck files for each image is created. Retraining of the model is done with the bottleneck files as inputs.
Table 4.1 Crop Diseases Dataset Description
Crop Disease Number of original images
Corn Healthy 392
Corn Burn 373
Corn Leaf Blight 340
Corn Common Rust 368
Corn GLS 391
Banana Healthy 105
Banana Black Sigatoka 240
Banana Speckle 347
Apple Healthy 68
Apple Black Rot 254
Apple Cedar Rust 251
Apple Scab 264
Grape Healthy 305
Grape Black Rot 316
Grape Esca 104
Grape Leaf Blight 365
TOTAL 4483

Essay: Crop disease detection through Deep Convolutional Neural Network

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: