Improving aspects of Naive Bayes Classifier

Abstract

Naive Bayes (also known as Simple Bayes and Independence Bayes) classifier is a probabilistic classifier based upon Bayes’ Theorem which makes assumptions of independence between the entities at play. It assigns a particular object to the most probabilistic class that it finds it to be a suitable match for. Its performance and relevance in various fields remains intact despite its strong independence and over simplified hypothesis. There is some loss of information due to the independence association that is innate to the Naive Bayes’ algorithm. In this research paper, we display that for a wide range of applications spanning over various areas of applications, Naive Bayes’ Classifier does exceptionally good in spite of the various drawbacks that are associated to it. Most importantly, it is also seen that many aspects of Naive Bayes’ Classifier can be improved upon and we shall be providing some of these improvements in this research paper.

INTRODUCTION

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set. For example, an animal may be considered to be an elephant if it has a long trunk, floppy and thick ears and weighs about 2 to 7 tons. A naive Bayes classifier assumes that each of these features contribute equally (and independently) to analyse the probability that the said animal is an elephant. Such ‘independent features’ and ‘maximum likelihood’ property is unique to Bayes’ classifier.

Naive Bayes model are so named for this very assumption that all variables Xi are mutually independent given a variable C. The success of naive Bayes in the presence of feature dependencies can be explained as follows: optimality in terms of zero-one loss (classification error) is not necessarily related to the quality of the fit to a probability distribution (i.e., the appropriateness of the independence assumption). Rather, an optimal classifier is obtained as long as both the actual and estimated distributions agree on the most-probable class. For example, naive Bayes optimality for some problems classes that have a high degree of feature dependencies, such as disjunctive and conjunctive concepts. Although this assumption is very naive, the naive Bayes classifier has good generalisation ability, and is one of top 10 algorithms in data mining voted by IEEE International Conference on Data Mining (ICDM) 2006 (Wu et al., 2008).

DEFINITION AND BACKGROUND

Let be a vector of observed random variables, called features, where each feature takes values from its domain . The set of all feature vectors (examples, or states), is denoted as . Let be an unobserved random variable denoting the class of an example, where can take one of values Capital letters, such as Xi , will denote variables, while lower-case letters, such as xi, will denote their values; boldface letters will denote vectors.

A function , where denotes a concept to be learned. Deterministic corresponds to a concept without noise, which always assigns the same class to a given example (e.g., disjunctive and conjunctive concepts are deterministic). In general, however, a concept can be noisy, yielding a random function .

A classifier is defined by a (deterministic) function (a hypothesis) that assigns a class to any given example. A common approach is to associate each class with a discriminant function , and let the classifier select the class with maximum discriminant function on a given example: . The naive Bayes is a conditional probability model: given a problem instance to be classified, represented by a vector representing some n features (independent variables), it assigns to this instance probabilities for each of k possible outcomes or classes .

Using Bayes’ theorem, the conditional probability can be decomposed as:

Conditional probability

In plain English, using Bayesian probability terminology, the above equation can be written as:

“The probability of the data belonging to a particular class (posterior) is equal to the likelihood of the data given that class, multiplied by the probability of the class (prior), and divided by the probability of the data across all classes (evidence).”

NAIVE BAYES CLASSIFIER APPLICATIONS

1) TEXT CLASSIFICATION

The data sets used here are two mail sets with 45396 junk mails and 18314 normal mails from CCERT[18]. We choose 30000 junk mails and 10000 normal mails as the data sets for an experiment conducted by Wei Zhang et al.

There are three cases here:

(a) choose 1000 features to represent the document, and it turns out that 96 features have auxiliary feature;

(b) choose 1500 features to represent the document, and it turns out that 152 features have auxiliary feature;

2) E-CATALOG CLASSIFICATION

E-catalog (Electronic catalog) hold information of products and services in an e- commerce system. E-catalog classification is the task of assigning an input catalog to one of the predefined categories (or classes).

The main problem of our approach used in text classification is about matching the attribute values. Since the values are texts composed of many words and are often noisy, accepting only the ex- act matches is misleading. It is clearly wrong to distinguish between ‘Desktop’ and ‘Desktop Computer’. The problem becomes worse when we try to use an attribute like ‘product description’ which is sometimes composed of full sentences. So, with a parser, we use partial matches between attribute values.

Given a set of classes C, the attributes <a1,a2,L,an > and the values <v1,v2,L,vn > that describe an input instance, the Naïve Bayes Classifier assigns the most probable category according to the following formula.

Naive bayes

A model described by this equation can be represented as Fig(1).

Fig. 1. (a) represents the structure of our classifier and (b) represents that of the Naïve Bayes Classifier for flat-text classification

3) FAKE NEWS CLASSIFICATION

Classification is one way of organising the twitter messages. SVM and Naive Bayes classifiers are the most popular classification methods which are often use for text classification. Theoretically, it proves that Naive Bayes performs more faster than any other classifiers with less error.

A comparison was done between Support Vector Machine (SVM) Classifier and Naive Bayes Classifier and the following results were obtained, when a single source of measurement (weighted harmonic mean F) was taken. It show the number of features that turned out to be relevant.[2]

4) MEDICAL SCIENCES

The transfusion of allogeneic blood products is an essential option in heart surgery. Although major improvements have been made in preoperative blood-conservation strategies, transfusion rates remain high after surgery Principal benefits of blood transfusion include enhanced oxygen-carrying capacity, improved haemostats and increased intravascular volume to support heart output. Despite these benefits, blood transfusion may cause various problems and large transfusions are increasingly recognised as a risk factor for adverse outcome after heart surgery. [3]

5) SENTIMENT ANALYSIS

A typical method to obtain valuable information is to extract the sentiment or opinion from a message. Machine learning technologies are widely used in sentiment classification because of their ability to “learn” from the training dataset to predict or support decision making with relatively high accuracy.

Fig (2) Program Pseudo code for Sentiment Analysis (a) Mapper Reducer Training Job (b) Classify Job

IMPROVEMENTS IN NAIVE BAYES’ CLASSIFIES ALGORITHM

Properly trained Naive Bayes classifiers are usually astonishingly accurate and very fast to train–noticeably faster than any classifier. Still, there is some scope of improvement in its implementation and we list out the specifics here.

Data Parsing (pre-processing)

Instead of using raw data and feeding it as input, without preprocessing it, to the NBC, would result in poor performance. To mitigate the effect of poor data sets, steps should be taken to ensure that data sets are not filled with redundancy. This can be achieved by stemming & synonym finding.

Feature Selection

Some features carry more weight than others, and hence play a more important role in defining the properties of an entity. Steps should be taken to ensure such features are seen as more prominent by the classifier in its implementation.

Specific Classifier Optimizations

Generally we begin with a two-class classifier (Class A and ‘all else’) then the results in the ‘all else’ class are returned to the algorithm for classification into Class B and ‘all else’, etc. A better way of doing this would be using Fisher method. Fisher method can be seen as a normalizing method to the input probabilities. An NBC uses the feature probabilities to construct a ‘whole-document’ probability. The Fisher Method calculates the probability of a category for each feature of the document then combines these feature probabilities and compares that combined probability with the probability of a random set of features.

CONCLUSION

Despite its unrealistic independence assumption, the naive Bayes classifier is surprisingly effective in practice since its classification decision may often be correct even if its probability estimates are inaccurate. Although some optimality conditions of naive Bayes have been already identified in the past, a deeper understanding of data characteristics that affect the performance of naive Bayes is still required.

Our broad goal is to understand the data characteristics which affect the performance of naive Bayes. We also discuss some major applications of the NB classifier algorithm. Given its wide range of use, it’s surely a candidate for the top slot of classifier algorithms. We also provide some suggestions to improve upon the existing working framework of the algorithm which may help in enhancing its processing and output capabilities.

Finally, a better understanding of the impact of independence assumption on classification can be used to devise better approximation techniques for learning efficient Bayesian net classifiers, and for probabilistic inference, e.g., for finding maximum-likelihood assignments.

REFERENCES:

2017-10-13-1507867001

Essay: Improving aspects of Naive Bayes Classifier

Essay details and download:

Text preview of this essay:

Abstract

INTRODUCTION

DEFINITION AND BACKGROUND

NAIVE BAYES CLASSIFIER APPLICATIONS

1) TEXT CLASSIFICATION

2) E-CATALOG CLASSIFICATION

3) FAKE NEWS CLASSIFICATION

4) MEDICAL SCIENCES

5) SENTIMENT ANALYSIS

IMPROVEMENTS IN NAIVE BAYES’ CLASSIFIES ALGORITHM

Data Parsing (pre-processing)

Feature Selection

Specific Classifier Optimizations

CONCLUSION

REFERENCES:

About this essay:

Essay details and download:

Text preview of this essay:

Abstract

INTRODUCTION

DEFINITION AND BACKGROUND

NAIVE BAYES CLASSIFIER APPLICATIONS

1) TEXT CLASSIFICATION

2) E-CATALOG CLASSIFICATION

3) FAKE NEWS CLASSIFICATION

4) MEDICAL SCIENCES

5) SENTIMENT ANALYSIS

IMPROVEMENTS IN NAIVE BAYES’ CLASSIFIES ALGORITHM

Data Parsing (pre-processing)

Feature Selection

Specific Classifier Optimizations

CONCLUSION

REFERENCES:

About this essay:

Essay Categories: