Software for classifying diseases in rubber leaf

Chapter 1
INTRODUCTION
India is an agricultural country; wherein about 70% of the population depends on agriculture. Farmers have wide range of diversity to select suitable fruit and vegetable crops. However, the cultivation of these crops for optimum yield and quality produce is highly technical. The management of perennial fruit crops requires close monitoring especially for the management of diseases that can affect production significantly the post-harvest life. In case of plant the disease is defined as any impairment of normal physiological function of plants, producing characteristic symptoms. A symptom is a phenomenon accompanying something and is regarded as evidence of its existence. Disease is caused by pathogen which is any agent causing disease [6]. Rubber trees are one of the earth’s greatest natural resource. Malaysia is the leading producer of the rubber in the world. The consumption of the rubber increased with the advent of the 20th century, when industrialization took place. The first rubber plantation in India was started in 1895 in Kerala. In 2008 the IRSG estimated the world consumed 22.18 million metric tons of rubber.
Natural rubber is also called as India rubber or caoutchouc, as initially produced, consists of polymers of the organic compound isoprene, with minor impurities of other organic compound plus water. Forms of polyisoprene used as natural rubbers are classified as elastomers. Currently rubber is harvested mainly in the form of latex from trees. The latex is sticky, milky colloid drawn off by making incisions into the bark and collecting the fluid in vessels in a process called ‘tapping’. The latex then is refined into rubber ready for commercial processing. Natural rubbers are extensively in many applications and products either alone or in combination with other materials. In most of its useful forms it has a large stretch ratio, high resilience and is extremely waterproof. Rubber is used in a variety of applications ranging from latex gloves, textile manufacturing, and valves in machinery and in automobiles to shoes. Rubber plantation provides jobs to local economy. Rubber tree is not just a great source of latex but it also used as bio-fuel, raw material for manufacturing and for furniture production.
In this market environment today, going ‘green’ is the thing. Rubber production industry is recognized as a green industry. The amount of rubber sequestered in one hectare of rubber plantation amounts to 680Mt. With these benefits of natural it is disheartening that the glove industry is moving towards synthetic rubbers made from carbon based raw materials. These synthetic rubbers are non bio-degradable compared to natural rubber. Also when synthetic rubbers are disposed off by the furnaces harmful chemicals are released into environment. After 30 years of rubber tapping, efficiency of a rubber tree to produce Latex will diminish. The tree is then chopped down and the wood from the tree is used as bio-fuel, furniture wood or as light manufacturing wood. Every time when tree is chopped down a new tree takes its place.
The greatest problem now faced by the growers is rubber tree diseases. It is very essential to identify the appropriate disease and take necessary control measures for the same before tree dies off. Hence this project aims to identify the diseases using the features of rubber leaves from its image.
1.1 Objective
‘ To develop an effective software solution to classify the diseases in rubber leaf.
‘ Goal is to achieve a perfect classification with minimal number of decisions.
‘ The aim of the project is to design, implement and evaluate an image processing based software solutions for automatic detection and classification of plant leaf disease.
‘ The objective of the project is to find various diseases of rubber leaf.
1.2 Literature survey
To implement the project work and to understand the problem faced in implementation a exhaustive literature survey is necessary.
Smita Naikwadi, Niket Amoda[4] takes into account the management of disease in a leaves or stems of the plant. Precise quantification of these visually observed diseases, pests, traits has not been studied yet because of the complexity of visual patterns. Hence there has been increasing demand for more specific and sophisticated image pattern understanding. In biological science, sometimes thousands of images are generated in a single experiment. These images are required for further studies like classifying lesion, scoring quantitative traits, calculating area eaten by insects, etc. Almost all of these tasks are processed manually or with distinct software packages. It is not only tremendous amount of work but also suffers from two major issues: excessive processing time and subjectiveness rising from different individuals .Hence to conduct high throughput experiments, plant biologist need efficient computer software to automatically extract and analyze significant content. Here image processing plays important role.
The basic procedures of the proposed image processing based disease detection are:
‘ Image acquisition
‘ Image preprocessing
‘ Image segmentation
‘ Feature extraction
‘ Statistical analysis
‘ Classification based on classifier
Prof. Sanjay B. Dhaygude, Mr. Nitin ,P. Kumbhar[5] considered a vision based detection algorithm .The basic procedure of the proposed vision-based detection algorithm is described as follows:
‘ RGB image acquisition
‘ Convert the input image from RGB to HSV formats
‘ Masking the green pixels
‘ Removal of masked green pixels
‘ Segment the components
‘ Obtain the useful segments
‘ Compute the features using colour-co-occurrence methodology
‘ Evaluation of texture statistics.
H. AI-Hiary, S. Bani-Ahmed, M. Reyalat, M. Braik and Z. AL Rahamneh[7] discussed method for fast and accurate detection and classification of plant diseases. They used Otsu segmentation, K-means clustering and back propagation feed forward neural network for clustering and classification of diseases that affect on plant leaves. The developed processing scheme consists of four main phases such as clustering technique, masking the green pixels and the pixels on the boundaries, features extraction, neura networks.
Hashim, H.; Haron.; M.A.; Osman, F.N; Al Junid, S.A.M[8] explains the classification of five types of rubber leaf disease by using the spectrometer and data mining software such as Software Package for Statistical Analysis(SPSS). Five leaf disease images are used as samples which are Oidium secondary leaf fall, Fusicoccum Leaf Blight, Bird eye’s spot and Anthracnose. Further analysis and justification are completed by using approximate statistical tools from SPSS. The results obtained show that there are strong evidences that these diseases can be discriminated from each other using a spectrometer. .
S.A Ali, Sulaiman, A Mustapha and N Mustapha[9,10] discussed a decision tree learning method for the accurate findings of diseases, in which the learned function is represented by a decision tree. Learned trees are represented as sets of if then rules to improve the human readability. In this study authors presented a response classification experiment based on user intentions using decision tree .
S.S. Abu-Naser, K.A. Kashkash and M. Fayyad[11] have developed an expert system for the analysis of plant disease diagnosis. Identifying plant diseases is usually difficult task; it needs an agriculture engineer to describe the case accurately. Moreover, few diseases have similar symptoms; consequently, it will be difficult for non expert to distinguish between the types of disease. Expert system can help enormously in overcoming those difficulties. The design and developments of an expert system with two different methods for diagnosing plant diseases:
‘ Step by step descriptive method
‘ Graphical representation method
The proposed system saved a lot of time and effort in identifying plant diseases.
S. Arivazhagan, R. Newlin Shebiah, S. Anathi, S. Vishnu Varthini[12] discussed software solution for automatic detection and classification of plant leaf diseases. The step-by-step procedure of the system is as follows:-
‘ RGB image acquisition
‘ Convert the input image from RGB to HSI format
‘ Masking the green-pixels
‘ Removal of masked green pixels
‘ Segment the components
‘ Computing the texture features using Color-Co-Occurrence methodology
‘ Configuring the Neural Networks for Recognition.
Sabah Bashir, Navdeep Sharma[14] explained the conventional method of disease detection in plants using naked eye observation, method is cumbersome and is non effective. Using computer vision toolbox the disease detection in plants is efficient and is not time consuming. Detection and recognition of diseases in plants using machine learning is very fruitful in providing symptoms of identifying diseases at its earliest. An expert system called CLASE (Central Lab. of Agricultural Expert System) developed by central lab identifies different disorders and diseases in plant leaves. This system consists of a vision system, classifier and image processing algorithm. A Back propagation neural network is used for recognition of leaves.
1.3 Problem Formulation
The rubber tree is an important source of natural rubber, there is a huge dependency by the industries on rubber plants. Rubber tree is the world’s most important economic crop of natural rubber and is constantly threat due to pathogens which cause various diseases like corynespore leaf fall, Powdery Mildew and Bird’s eye spot. Control of these diseases by applying chemical fungicides may have adverse effects on the environment and it also spoils soil fertility. Hence production of rubber also reduces hence efficient identification of disease is essential.
The different methodologies used for the detection of diseases in plant leaves are ANN (Artificial Neural Network), SVM (Support Vector Machine), DTC(Decision Tree Classification). ANN and SVM exhibit higher accuracy and performance, but they are relatively slow compared to DTC. As the size of the data increases the training period required also increases. Training a decision tree is much faster and also easier to analyze compared to other methods. In my project work decision tree approach is used to classify the disease in rubber leaves since Decision Tree’s are more interpretable than other classifiers such as ANN and SVM. DT has significant advantages for the classification of problems because of their flexibility and ability to handle non linear classes and features, hence improves the accuracy to the greater extent.
DT follows a hierarchical structure where at each level a test is applied to one or more values that may have one of the two outcomes. To classify the object we start at the root node of the tree then evaluate the test and take the branch appropriate to the outcome. The process continues until a leaf is encountered. Tree is expanded until every training set is correctly classified. Over fitting of the data is avoided by pruning the data set.
1.4 Scope and Relevance of the Project
Rubber tree is an earth’s greatest natural resource; management of these resources is one of the most studied field’s in present days. Earth’s resources are limited; hence proper utilization of these resources is a challenging task.
1.5 Limitations of the Project
‘ If the leaves are collected in the periphery of the plantation, and deep inside the plantation we have noticed that there is a variation present in the color of the leaves.
‘ Collection of data samples in different seasons
i)rainy-Abnormal Leaf fall
ii) winter-Colletotrichum
iii) Summer-Bird’s eye spot and Powdery mildew
‘ Presence of diseases is very less during rainy season.
‘ However during the onset of the winter and in the early summer the presence of the disease in plantation seems to be dominant
1.6 Organization of the Report
This thesis is structured into five chapters.
Chapter 1: Provides a brief introduction of the Rubber Tree and its importance. The Problem definition, Scope and Relevance of the project, Literature survey has been discussed briefly in this chapter.
Chapter 2: Presents an overview of Types of Diseases in Rubber Leaf, the causative agent, occurrence, symptoms and control measure of various diseases is discussed in the chapter.
Chapter 3 : Gives a detailed description of the Decision Tree (DT) classifier. The steps involved in this process and the advantage of DT have been discussed in this chapter.
Chapter 4: Provides a brief introduction of the image processing techniques. Histogram of the leaf images have been discussed in this chapter.
Chapter 5: Gives a brief introduction of the software and hardware requirements. WEKA classifiers, ERDAS IMAGINE image processing tool, Camera and MATLAB have been discussed in this chapter.
Chapter 6: Provides detailed information about the results and discussions. J 48 classifier tree, Decision tree rules and have been discussed in this chapter.
Chapter 2
DISEASES IN RUBBER LEAF
The rubber tree may live for a hundred years or even more. But its economic life period in plantations, on general considerations is, only around 7 years of immature phase and 25 years of productive phase. But if it is affected by diseases then production rate drastically reduces. Severe incidence of diseases may lead trees to die. Diseases in rubber plants are season bound. Few commonly found diseases in summer season are listed below:
2.1 Powdery Mildew (Oidiumheveae)
Figure 2.1 powdery mildew effected leaf
Powdery mildew is a fungal disease that affects a wide range of plants. Powdery mildew diseases are caused by many different species of fungi in the order Erysiphales.
Causative Agent: Oidiumheveae Steinm
Occurrence: Predominantly noticed on newly formed tender flush during the re-foliation period of January to March. The disease is severe in Kanyakumari, Idukki and Wynad district of South India and North Eastern States. Cloudy days with light rains and/or misty nightswithdewformationduringre-foliationfavourseriousdiseaseoutbreaks.
Symptoms: Tender leaves with ashy coating curl, crinkle, and edges roll inwards and fall, leaving the petioles attached to the twigs like a broom-stick. After a few days, the petioles also fall. Die-back of twigs follows. On older leaves white patches later causing necrotic spots reduce photosynthetic efficiency. Infected flowers and tender fruits are shed, affecting seed production[16].
2.2 Corynespora Disease
Figure 2.2 Corynespora affected leaf
Corynespora disease (caused by the fungus Corynespora cassiicola) is more severe during re-foliation in the months from December to April. Though it affects leaves of all stages, young leaves in the light green stage appear to be the most susceptible. Symptoms vary with the clones and locality. Circular lesions of varying sizes with papery centre, brown margin and a yellow halo is the common symptom. The central region of the lesions may disintegrate leaving holes. Usually several lesions join together and form a large blighted area. The disease spreads on the leaf veins, turning them dark brown. When veins and vein let-[ are affected, they appear like ‘brown railway track’ markings as seen on geographical maps.
The leaf tissues surrounding infected veins turn yellow and later to brown, and then the leaf falls off. Even a single lesion near the base of leaflet or on mid-rib causes the leaf to fall off. Trees on the border of the plantations and branches exposed to sunlight are more severely affected. Severe incidence leads to shoot dieback. The new flushes formed after one round of defoliation may be affected again. Such repeat infections and defoliation cause drying up of affected trees.
High temperature and humidity during refoliation period are found to favour the disease incidence. Unlike powdery mildew disease, in which the leaflets crinkle and fall off leaving petioles on the trees, in this case, the partially blighted leaflets are seen to remain on the tree for a while, giving the canopy a burnt appearance[16].
2.3Bird’s Eye Spot
Figure 2.3 bird’s eye spot affected leaf
Causative Agent: Drechslera heveae(Petch) M.B Ellis
Occurrence: A hot weather disease, serious and damaging in the nursery. Weaker plants and plants growing under exposed situations are more susceptible.
Symptoms: Symptoms appear as small necrotic spots with dark/brown margins and pale centre. Severe infection leads to premature defoliation and die back.
Clonal Susceptibility: Nursery seedlings are susceptible.
Control measures: Repeated spraying with Bordeaux mixture 1% or mancozeb 0.2 %( Mancozeb/ Indefil M 45 2.5 g/l or Carbendazim 0.02% (Bavistin 0.4 ml/ l). Shading the nursery plants reduces the disease incidence. Maintain seedlings in vigorous condition through adequate balanced nutrition [16].
2.4 Colletotrichum
Figure2.4 Colletotrichum affected leaf
Causitive agent: Colletotrichum acutatum C. gloeosporioides Sacc.
Occurrence: Obsereved during April to December, In North East India. The disease is prevalent throughout the year except during winter.
Symptoms: Infects tender leaves, mostly at the leaf tip region. spots are small, brown in colour and is sorrounded by an yellow hallo. Numerous spots coalesce and dry up leading to defoliation. The infected leaves often crinkle and become distorted before shedding..
Control Measures: Spraying with Bordeaux mixture 1%, copper oxychloride 0.125% (Fytdan 2.5 g/l) mancozeb 0.2% (Dithane/ Indofil M 45 2.66 g/l) or carbendazim 0.05% (Bavastin 1 g/l) at 10 – 15 days intervals is effective.
Chapter 3
DECISION TREE CLASSIFIER
3.1 Introduction
A decision tree is a classifier in the form of tree structure. It is a graphical model describing decisions and their possible outcomes. It is used as a model for sequential decision problems under uncertainty. A decision tree describes decision to be made, the events that may occur, and the outcomes associated with combinations of decisions and events. Decision tree classify instances by starting at the root node of the tree and moving through it until a leaf node. The major goal of the analysis is to find the best decision. Decision tree consists of three types of nodes:
‘ Decision node
‘ Leaf node
‘ Terminal node
Decision node: It is usually represented by squares showing decisions that can be made. Lines emanating from this node show all distinct options available at a node. It also specifies a test on single attribute.
Leaf node: It is usually represented by circles or squares showing chance outcomes. Chance outcomes are events that can occur but are outside the ability of the decision maker to control.
Terminal node: It is usually represented by triangles or by lines having no further decision nodes or chance nodes. Terminal nodes depict the final outcomes of the decision making process.
Decision tree classifier (DTC) is an inductive machine learning algorithm; it is based on divide and conquer strategy. This approach was proposed by R. Quinlan[17]. The main idea shared with this algorithm is to choose a variable that provides more information to realize the appropriate partition in each tree branch in order to classify the training set. In their simplest form, decision trees successively partition the input training data into more and more homogeneous subsets by producing optimal rules or decision, called nodes, which maximize the information gained and thus minimize the error rates in the branches of the tree. Depending on the number of variables used at each stage, they can be categorized as univariate and multivariate decision trees.
A sample decision tree model is shown in the Figure 5.1. At each stage the outcome may be a terminal node, which allocates a class, or a decision node, which specifies a further test on the attribute values and forms a branch or sub-tree of the tree. Classification is performed by moving down the tree until a leaf is reached. The method for constructing a decision tree as summarized is as follows:
‘ If there are k classes denoted {C1, C2,…, Ck}, and a training set, T, then
‘ If T contains one or more objects which all belong to a single class Cj, then the decision tree is a leaf identifying class Cj.
If T contains no objects, the decision tree is a leaf determined from information other than T. If T contains objects that belong to a mixture of classes, then a test is chosen, based on a single attribute, that has one or more mutually exclusive outcomes {O1, O2,…, On}. T is partitioned into subsets T1, T2 ,…, Tn, where Ti contains all the objects in T that have outcome Oi of the chosen test. The same method is applied recursively to each subset of training objects to build the decision tree.
Root node
Leaf node
A D E Terminal node
B C
Figure 5.1 A Decision tree model
Decision trees provide an effective method of Decision Making because they:
‘ Clearly lay out the problem so that all options can be challenged.
‘ Allow us to analyze fully the possible consequences of a decision.
‘ Provide a framework to quantify the values of outcomes and the probabilities of achieving them.
‘ Help us to make the best decisions on the basis of existing information and best guesses.
In order to classify an object, we start at the root of the tree, evaluate the test, and take the branch appropriate to the outcome. The process continues until a terminal node is encountered, at which time the object is asserted to belong to the class named by the terminal node. If the attributes are appropriate, it is always possible to construct a decision tree which could correctly classifies each object in the training data set. Typically, the decision tree is expanded until every training instance is correctly classified and then the tree is pruned to avoid over fitting to the training data.
3.3 Information Theory
Different decision tree algorithms have different criteria for splitting the training samples, which is based on the information theory. It defines a statistical property called information gain that measures how well a given attribute separates the training samples according to their target classification. Decision tree uses this information gain measure to select among the candidate attributes at each step while growing the tree.
For any subset S of X, where X is the population, let frequency (ji, S) be the number of objects in S, which belongs to class i. Then consider the ‘message’ that a randomly selected object belongs to class ji. The ‘message’ has probability freq (ji, S) / |S|, where |S| is the total number of objects in subset S. The information conveyed by the message (in bits) is given by -log2 freq (ji, S) / |S|). Summing over the classes gives the expected information (in bits) from such a message:
3.1
When applied to a set of training objects, Info (T) gives the average amount of information needed to identify the object of a class in T. This amount is also known as the entropy of the set T. Consider a similar measurement after T has been partitioned in accordance with the n outcomes of a test X. The expected information requirement can be found as a weighted sum over the subsets {Ti}:
3.2
3.3
The quantity gain(X) measures the information that is gained by partitioning T in accordance with the test X. The gain criterion selects a test to maximize this information gain. The gain criterion has one significant disadvantage in that it is biased towards tests with many outcomes. The gain ratio criterion was developed to avoid this bias. The information generated by dividing T into n subsets is given by
3.4
The proportion of information generated by the split that is useful for classification is
Gain ratio (X) = gain (X) / split info (X). 3.5
If the split is near trivial, split information will be small and this ratio will be unstable. Hence, the gain ratio criterion selects a test to maximize the gain ratio subject to the constraint that the information gain is large.
3.3 Characteristics of Decision Tree Induction
‘ Decision tree induction is a non parametric approach for building classification models. It does not require any prior assumptions regarding type of probability distribution satisfied by the class and other attributes.
‘ Techniques developed for constructing decision trees are computationally inexpensive, making it possible to construct models even with the training set is large
‘ Decision trees, especially smaller sized trees are relatively easy to interpret. The accuracies of the trees are also comparable to other classification techniques for many simple data sets.
‘ Decision trees provide an expressive representation for learning discrete valued functions.
‘ A sub tree can be replicated multiple times in a decision tree
3.5 Pruning
Decision tree classifiers aim to refine the training sample T into subsets, which have only a single class. However, training samples may not be representative of the population they are intended to represent. In most cases, fitting a decision tree until all leaves contain data for a single class causes over-fitting. Generally there are two types of pruning methods:
‘ Stopping or Pre-pruning.
‘ Post pruning.
Pre-pruning tries to look at the best way of splitting the subset and assess the split in terms of information gain, gain ratio criteria. If this assessment falls below some threshold, the division is rejected. In this way the tree building and pruning process works simultaneously at each node of the tree.
On other hand, post pruning first grow the full over fitted tree and then prune it. Though growing and then pruning is a time consuming process, it gives more reliable results than pre-pruning. Post pruning calculates error at each node and then discards sub tree, which gives maximum error.
3.5 Decision tree rules
To simplify a decision tree we can convert them into rules, which are easier to understand and to implement. Every path from the root to a leaf is converted to an initial rule by regarding all the test conditions appearing in the path as the conjunctive rule antecedents while regarding the class label held by the leaf as the rule consequence.
After that, each initial rule is generalized by removing antecedents that do not seem helpful for distinguishing a specific class from other classes, which is performed by a pessimistic estimate of the accuracy of the rule. In detail, the accuracy of the initial rule and that of its variant where an antecedent is removed are estimated. If the latter is not worse than the former then the initial rule is replaced by the variant of it.
It is worth noting that usually there are several rule antecedents that could be removed. In such cases, Rule carries out a greedy elimination, that is, the removal of the antecedent that produces the lowest pessimistic error rate of the generalized rule is kept, and such kind of removal is repeatedly performed until the rule could not be generalized further.
After all the initial rules are generalized, they are grouped into rule sets corresponding to the classes respectively. All rule sets are polished with the help of the Minimum Description Length (MDL)[18] Principle so that rules that do not contribute to the accuracy of a rule set are removed. Then, the rule sets are sorted according to the ascending order of their false positive error rates. Finally, a default rule is created for dealing with instances that are not covered by any of the generated rules. The default rule has no antecedent and its consequence is the class that contains the most training instances not covered by any rule.
3.4 Advantages of Decision Tree
‘ Simple to understand and interpret: People are able to understand decision tree models after a brief explanation.
‘ Have value even with little hard data: Important insights can be generated based on experts describing a situation (its alternatives, probabilities, and costs) and their preferences for outcomes.
‘ Possible scenarios can be added: Allow the addition of new possible scenarios.
‘ Worst, best and expected values can be determined for different scenarios
‘ Efficient learning and classification
‘ Decision trees are powerful and popular tools for classification and prediction.
‘ Performs well with large data sets: Large amounts of data can be analyzed using standard computing resources in reasonable time.
Chapter 4
IMAGE PROCESSING TECHNIQUES
4.1Introduction
Digital images play an important role, both in daily-life applications such as satellite television, magnetic resonance imaging, computer tomography as well as in areas of research and technology such as geographical information systems and astronomy. An image is a two dimensional representation of a three-dimensional scene. An image may be defined as two dimensional function, f(x, y), where x and y are spatial coordinates and amplitude of f are at any pair of coordinates (x, y) is called the intensity or the gray level of the image at that point. A digital image is composed of picture elements called as pixels. Pixels are the smallest sample of an image. A pixel represents the brightness at one point. Conversion of analog image into digital image involves two important operations, namely sampling and quantization. A digital image is basically a numerical representation of an object. The term digital image processing refers to the manipulation of an image by means of a processor. The different elements of an image processing system include image acquisition, image storage, image processing and display [2]. The processing of an image by means of a computer is generally termed digital image processing. The advantages of using computers for the processing of images are summarized below:
‘ Flexibility and Adaptability: The main advantage of digital computers when compared to analog electronic and optical information processing devices is that no hardware modifications are necessary in order to reprogram digital computers to solve different tasks. This feature makes digital computers an ideal device for processing image signals adaptively.
‘ Data Storage and Transmission: With the development of different image ‘compression algorithms, the digital data can be effectively stored. The digital data within the computer can be easily transmitted from one place to another.
The only limitation of digital imaging and digital image processing are memory and processing speed capabilities of computers. Different image ‘processing techniques include image enhancement, image restoration, image fusion and image watermarking [2].
4.2 Histogram
In an image processing context, the histogram of an image normally refers to a histogram of the pixel intensity values. This histogram is a graph showing the number of pixels in an image at each different intensity value found in that image. For an 8-bit gray scale image there are 256 different possible intensities, and so the histogram will graphically display 256 numbers showing the distribution of pixels amongst those grayscale values. Histograms can also be taken of colour images either individual histogram of red, green and blue channels can be taken, or a 3-D histogram can be produced, with the three axes representing the red, blue channels, and brightness at each point representing the pixel count. The exact output from the operation depends upon the implementation. It may simply be a picture of the required histogram in a suitable image format, or it may be a data file of some sort representing the histogram statistics.
The operation is very simple. The image is scanned in a single pass and a running count of the number of pixels found at each intensity value is kept. This is then used to construct a suitable histogram.
The horizontal axis of the graph represents the tonal variations, while the vertical axis represents the number of pixels in that particular tone. The left side of the horizontal axis represents the black and dark areas, the middle represents medium gray and right hand side represents light and pure white areas. The vertical axis represents the size of the area that is captured in each one of the zones. Thus the histogram for a very dark image will have the majority of its data points on the left side and centre of the graph.Conversly the histogram for a very bright image with few dark areas and shadows will have most of its data points on the left side and centre of the graph.
Histograms are the basis for numerous spatial domain processing techniques. Histogram manipulation can be used effectively for image enhancement. Histograms can be used to provide useful image statistics. Information derived from histograms is quite useful in other image processing applications, such as image compression and segmentation.
4.1 HISTOGRAM PROCESSING
‘ The histogram of a digital image with gray levels in the range [0, L-1] is a discrete function h(rk) = nk, where rk is the kth gray level and nk is the number of pixels in the image having gray level rk.
‘ It is common practice to normalize a histogram by dividing each of its values by the total number of pixels in the image, denoted by n. Thus, a normalized histogram is given by p(rk) = nk / n, for k = 0, 1, ‘, L -1.
‘ Thus, p(rk) gives an estimate of the probability of occurrence of gray level rk. Note that the sum of all components of a normalized histogram is equal to 1.
Figure 4.1 histogram analysis
Chapter 5
SOFTWARE/HARDWARE REQUIREMENTS
5.1 Software configuration
‘ Operating system: Windows 7/8/XP
‘ Tool: ERDAS IMAGINE 9.2
‘ Tool: WEKA 3.7 Classifier
‘ MATLAB
5.2 Hardware configuration
Digital camera captures image directly and stores in its memory device. Image acquisition in image processing can be defined as the action of retrieving an image from some source, usually a hardware-based source, so it can be passed through whatever processes need to occur afterward. Performing image acquisition in image processing is always the first step in the implementation, without an image, no processing is possible. Camera specifications are as follows:
Camera:
Camera maker: Sony
Camera model: DSC-H70
Exposure time: 1/30 sec
Focal length: 5mm
Contrast: Normal
Image:
Image size: 4608*2592
Width: 4608 Pixels
Height: 2592pixels
Horizontal resolution: 72dpi
Vertical resolution: 72dpi
Bit depth: 24
Color representation: RGB
Compressed bits/pixel: 4
5.3 ERDAS IMAGINE 9.2:
In image processing, feature extraction is a special form of reduction. The main goal is to emphasize certain features of interest in an image for further analysis or image display. When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant then the input data will be transformed into a reduced representation set of features. Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. It can be used in the area of image processing which involves using algorithms to detect and isolate various desired portions or features of a digitized image. This software tool is used to extract RGB (Red, Green, Blue) values of leaf images. The viewer window of the ERDAS IMAGINE software I is used for viewing the images. The image displayed is the multispectral image of the leaf disease. The CKB file is created by using the RGB values of the multispectral image. They represent the spectral reflectance value of the pixels. The window used for noting down the RGB values is shown in Figure 5.1
Figure 5.1: RGB values of a pixel of the multispectral image
5.4 WEKA 3.7 Classifier
WEKA stands for Waikato Environment for Knowledge Analysis it is an open source data mining tool in Java. It was developed at University of Waikato, New Zealand. It contains large number of Libraries and various machine learning algorithms. It is an excellent system for learning about machine learning techniques. The WEKA workbench contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to this functionality. WEKA supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection. All of WEKA’s techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes WEKA classification procedure [15].
5.5 MATLAB
MATLAB is a high-performance language for technical computing. It is a programming environment for algorithm development, data analysis, visualization and numerical computation, using MATLAB, one can solve technical computing problems faster than with traditional programming languages, such as C, C++and FORTRAN. The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects. Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the state of the art in software for matrix computation. MATLAB has evolved over a period of years with input from many users.
In university environments, it is the standard instructional tool for introductory and advanced courses in mathematics, engineering, and science. One can use MATLAB in a wide range of applications, including signal and image processing, communications, control design, test and measurement, financial modeling and analysis and computational biology. For a million engineers and scientists in industry and academia, MATLAB is the language of technical computing.
Chapter 6
RESULTS AND DISCUSSIONS
This chapter describes the details of the project work carried out till now. Study areas of the work and the software used have been described. The results obtained so far have been considered for the analysis. The classification of the diseased leaf images is carried out and the result obtained based on the Decision Tree classifier is considered in this chapter. To generate Decision Tree and to perform accuracy assessment WEKA Classifier software is used.
6.1 WEKA Classifier Procedure
A database containing the RGB band information of the leaf images is first created using the M S Excel software; this file is stored as .data format. This file is imported to the WEKA classifier interface for further classification procedure. The user interface of the WEKA classifier I shown in the figure 6.1
Figure6.1 WEKA user interface
Data file is imported using the open file tab of the preprocessor tab. The figure 6.2 shows the WEKA preprocessor window. File is subjected to the appropriate classification technique using classify tab option.
Figure6.2 WEKA file preprocessing window
At the top of classify window there is a classifier box. This box gives the name of the currently selected classifier. Here J 48 algorithm is selected. The result of the classifier is tested according to the test option selected.
6.1.1 Test options
There are four test options:
‘ Use training set: In this method the classifier is evaluated on the training data itself. it is based on how well it predicts the class of instances it was trained on.
‘ Supplied test set: The classifier is evaluated on how well it predicts the class of a set of instances loaded from a file. Clicking the Set button brings up a dialog allowing you to choose the file to test on.
‘ Cross-validation: The classifier is evaluated by cross-validation, using the number of folds that are entered in the Folds text field.
‘ Percentage split: The classifier is evaluated on how well it predicts a certain percentage of the data which is held out for testing. The amount of data held out depends on the value entered in the % field.
Once the classifier, test option and class all have been set, learning process starts by clicking on the start button. The classifier output box is filled with the text describing the result of testing and training.
6.1.2 Classifier output box
Classifier output box describes the result of training and testing. The output is split into several sections:
‘ Run information: A list of information giving the learning scheme options, relation name, instances, attributes and test mode that were involved in the process.
‘ Classifier model (full training set): A textual representation of the classification model that was produced on the full training data.
‘ Summary: A list of statistics summarizing how accurately the classifier was able to predict the true class of the instances under the chosen test mode.
‘ Detailed Accuracy By Class: A more detailed per-class breaks down of the classifier’s prediction accuracy.
‘ Confusion Matrix: Shows how many instances have been assigned to each class. Elements show the number of test examples whose actual class is the row and whose predicted class is the column.

Essay: Software for classifying diseases in rubber leaf

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: