INTRODUCTION
Brief Introduction
Software quality is considered as great importance in the software engineering field. Thus, building software of high quality is very expensive task. Consequently in order to increase the efficiency and usefulness of quality assurance and testing, software defect prediction is used to discover defect-prone modules in a forthcoming version of a software system and help allocate the effort on those modules.
Chapter wise contents
Chapter 1 Gives a brief introduction of the overall project and also mentions about chapter wise contents in brief.
Chapter 2 Gives a brief description about literature survey and existing system.
Chapter 3 Reflects the analysis of the proposed system.
Chapter 4 Gives the details about the software and hardware requirements, system design and implementation.
Chapter 5 Gives the details about the testing, test case and test results.
Chapter 6 Explains about the results that obtained in the projects.
Chapter 7 Gives the conclusions and future work of the proposed system.
LITERATURE SURVEY
Related Work
Software Defect prediction based on association rule classification.
In software defect prediction, predictive models are estimated based on various Code attributes to assess the likelihood of software modules containing errors. Many classification methods have been suggested to accomplish this task. However, association based classification methods[2] have not been investigated so far in this context.Data experiments were conducted to compare the CBA2 classifier with two other rule/tree based classifiers (i.e. C4.5 and RIPPER), showing that the CBA2 method obtained satisfactory performance when compared to C4.5 and RIPPER, without losing comprehensibility. Discrete classifiers (i.e. classifiers with dichotomous outcomes) are routinely assessed using a confusion matrix. A confusion matrix summarizing the number of modules correctly or incorrectly classified as error prone (EP) or not error prone (NEP) by the classifier.TP, TN, FP, and FN represent respectively the number of true positives, true negatives, false positives, and false negatives, and then a number of metrics can be defined: accuracy, sensitivity, and specificity.
An Algorithm for the discovery of arbitrary length ordinal association rules.
Association rule mining techniques are used to search attribute-value pairs that occur frequently together in a data set. Ordinal association rules are a particular type of association rules that describe orderings between attributes that commonly occur over a data set.Although ordinal association rules are defined between any numbers of the attributes, only discovery algorithms of binary ordinal association rules (i.e., rules between two attributes) exist.To introduce the DOAR[6] algorithm that efficiently finds all ordinal association rules of interest to the user, of any length, which hold over a data set.Association rule mining aims to find interesting associations or correlations that exist between items in large data sets. Association rule discovery was first introduced in the context of market basket analysis, where customer buying habits or patterns to be uncovered .Since then, many research efforts in the area of association rule mining have been made mainly in two directions:
• To improve old algorithms or develop new ones in order to ensure scalability with respect to data size.
• To extend the Boolean association rules concept to adapt it to new applications. Han and Kamber present an extensive overview of the types of association rules that can be discovered in data (e.g., Boolean vs. quantitative, single vs. multidimensional, single vs. multi-level, constrained based rules, etc.) and of their utility and discovery methods.
To introduced a novel algorithm for thediscovery of interesting any length ordinal associationrules in data sets. We formally proved that the proposedalgorithm, named DOAR, is complete and we showedthrough a case study that it efficiently explores the searchspace of the possible rules.We are working on extending and improving theresearch results described in this paper towards:
•Validating the scalability of the DOAR algorithm byconducting experiments on large real data sets.
•Defining ordinal association rules that containrepeating attributes; adapting the proposed techniquein order to discover such interesting rules.
•Using the ordinal association rules detectiontogether with supervised learning for medicaldiagnosis prediction. Preliminary work in this direction is reported.
•Extending ordinal association rules towardsrelational association rules, i.e., rules betweenattributes with different data domains and relationsnot only ordinal between attributes.
•Using ordinal association rules of arbitrary lengthtogether with other data mining techniques such asclassification or regression to increase the accuracy of the predictive models. Binary association rulesare currently used in building predictive models in e-banking services.
Fast Algorithms for mining associational rules in large databases.
The problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving this problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm_ called AprioriHybrid Scale up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale up properties with respect to the transaction size and the number of items in the database.
Progress in barcode technology has made it possible for retail organizations to collect and store massive amounts of sales data referred to as the basket data. A record in such data typically consists of thetransaction date and the items bought in the transaction successful organizations view such databasesas important pieces of the marketing infrastructure. They are interested in instituting informationdriven marketing processes managed by database technology that enable marketers to develop and implementcustomized marketing programs and strategies. The problem of mining association rules over basket data was introduced. An example of such a rule might be that 98% of customers that purchasetires and auto accessories also get automotive services done. Finding all such rules is valuable for crossmarketing and attached mailing applications. Otherapplications include catalog design add-on sales store layout and customer segmentation based on buying patterns. The databases involved in theseapplications are very large. It is imperative therefore to have fast algorithms for this task.
A Systematic literature Review on Fault Prediction Performance in Software engineering.
Accurate prediction of where faults are likely to occur in code can help direct test effort, reduce costs, and improve the quality of software. We investigate how the context of models, the independent variables used, and the modelling techniques applied influence the performance of fault prediction models. A systematic literature review is used to identify 208 fault prediction studies published from January 2000 to December 2010. Synthesize the quantitative and qualitative results of 36 studies which report sufficient contextual and methodological information according to the criteria we develop and apply.
The models that perform well tend to be based on simple modelling techniques such as Naive Bayes or Logistic Regression. Combinations of independent variables have been used by models that perform well. Feature selection has been applied to these combinations when models are performing particularly well. The methodology used to build models seems to be influential to predictive performance. Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.
Detecting Software design defects using relational association rule mining.
A machine learning perspective, the problem of automatically detecting defective software entities (classes and methods) in existing software systems, a problem of major importance during software maintenance and evolution. In order to improve the internal quality of a software system, identifying faulty entities such as classes, modules, and methods is essential for software developers. As defective software entities are hard to identify, machine learning-based classification models are still developed to approach the problem of detecting software design defects. The new method based on relational association rule mining for detecting faulty entities in existingsoftware systems. Relational association rules are a particular type of association rules anddescribe numerical orderings between attributes that commonly occur over a dataset. The method is based on the discovery of relational association rules for identifying design defects in software. Experiments on open source software are conducted in order to detect defective classes in object-oriented software systems, and a comparison of our approach with similar existing approaches is provided. The obtained results show that our method is effective for software design defect detection and confirms the potential of our proposal. Nowadays, software systems have become increasingly complex and versatile, that is why in order to make them simple to maintain and evolve, it is very important to continuously identify and correct software design defects. Thus, accurately predicting whether a software entity contains design defects can help improve the quality of the software systems. There is a continuous interest in applying association rule mining in order to discover relevant patterns and rules in large volumes of data. Moreover, applying data mining methods in software engineering is becoming increasingly important as mining techniques can support several aspects of the software development life cycle, such as software quality. An example is a Subgroup Discovery (SD) algorithm named EDER-SD (Evolutionary Decision Rules for Subgroup Discovery) that is based on evolutionary computation and generates rules describing only fault-prone modules in order to detect defects in software systems. Association rule mining means searching attribute-value conditions that occur frequently together in a dataset. Ordinal association rules are a particular type of association rules. Given a set of records described by a set of attributes or features, the ordinal association rules specify ordinal relationships between record attributes that hold for a certain percentage of the records. However, in real-world datasets, attributes with different domains and relationships between them, other than ordinal, exist. In such situations, ordinal associationrules are not powerful enough to describe data regularities. Consequently, we have introduced in relational association rules in order to be able to capture various kinds ofrelationships between record attributes. Relational association rule mining is used in solvingproblems from a variety of domains, such as: Data Cleaning, Natural Language Processing,Databases, and Health Care. We are exploiting in this paper the effectiveness of using relational association rules in mining software engineering data and also investigating the benefit of data mining techniques to uncover hidden patterns in software systems architecture. Starting from the fact that software metrics are essential in measuring software quality, using a metric-based high dimensional representation of the entities from a software system and based on the idea of discovering relational association rules within a dataset, we introduce in this paper a novel method for detecting defective software entities. The results obtained by evaluating the technique proposed in this paper on open source software systems confirm that applying relational association rulemining for software design defect detection is promising and indicates the potential of our proposal.
Existing System
In the Existing system ordinal associational rules are used for predicting software defects. Ordinal association rules are a particular type of association rules. Given a set of records described by a set of attributes,the ordinal association rules specify ordinal relationship between record attributes but hold for a certain percentage of the records.
Ordinal Association Rules
The objective here is to find ordinal relationships between record attributes that tend to hold over a large percentage of records. If attribute A is less than B most of the time then a record that contains a B that is less than A may be in error. One flag on B may not mean much, but if a number of such rules that deal with B are broken, the likelihood of error goes up. These considerations lead to new extension of the association rules-ordinal association rules or simply ordinal rules. The following more formally defines this concept.
Let R={r1,r2…….,rn} a set of records, where each record is a set of k attributes(a1,….,ak).Each attributesai in a particular record rj has a valueØ (rj,ai) from a domain D.The value of the attribute may also be empty and is therefore included in D.The following relations(partial orderings) are defined over D, namely less or equal(<=),equal(=) and,greater or equal (>=) all having the standard meaning.Then(a1,a2,….,am)=>(a1µ1a2µ2a3………µm-1 am),where eachµi €{<=,=,>=},is a an ordinal association rule if:
a1………..amoccur together(are non-empty)in at least s% of the n records,where s is the support of the rule;
main steps:
Find ordinal rules with a minimum confidence c.
Identify data attributes that broke the rules and which can be considered potential errors.
Here, the manner in which support of a rule is important differs from the typical data mining problem. We assume all the discovered rules that hold for more than two records represent valid possible partial orderings. Future work will investigate user specified minimum support and rules involving multiple attributes. The method first normalizes the data if necessary and then computes comparisons between each pair of attributes for every record. Only one scan of the data set is required. An array with the results of the much comparisons is maintained in the memory. Figure 1 contains the algorithm for this step. The complexity of this step is only O (N*M2)where N is the number of records in the data set, and M is the number of fields/attributes. Usually M is much smaller than N, The results of this algorithm are written to a temporary file for use in the next step of processing. In the second step, the ordinal rules are identified based on the chosen minimum confidence. There are several researched method to determine the strength including interestingness and statistical significance of a rule(minimum support and minimum confidence, chi-square test, etc.).Using confidence intervals to determine the minimum confidence is currently under investigation. However, previous work on the data set used in our experiment showed that the distribution of the data was not normal. Therefore, the minimum confidence was chosen empirically, several values were considered and the algorithm was executed. The results indicated that a minimum confidence between 98.8 and 99.7 provide best results(less number of false negative and false positives).The second component extracts from the temporary file and stores in memory the data associated with the rules. This is done with a single scan of the comparisons file (complexity O(C(M,2))).Then for each record in the dataset, each pair of attributes that correspond to a pattern it is check to see if the values in those fields within the relationship indicated by the pattern. If they are not, each field is marked as possible error. Of course, in most cases only one of the two values will actually be an error. Once every pair of fields that correspond to a rule isanalyzed, the average number of possible error marks for each marked field is computed. Only those fields that are marked as possible errors more times than the average are finally marked as containing high probability errors. Again, the average value was empirically chosen as threshold to prune the possible errors set. Other methodsto find such a threshold,without using domain knowledge or multiple experiments, are under investigation.
Association rule mining proves to be useful in identifying not only interesting patterns for fields such as market basket analysis or census data, but also, by extension to ordinal association rules, patterns that uncover errors in other kind of data set. The classical notion of association rules has been extended to include ordinal relationships between pairs of numerical attributes, thus defining ordinal association rules. This extension allows the uncovering of stronger rules that yielded potential errors in the data set, while keeping the computation simple and efficient. Ordinal association rules bear some similarity with the above mentioned extensions of Boolean association rules. However, they are better suited to the problem of identifying possible errors in the type of data sets being analyzedfor the following reasons.
They are easier and faster to compute than quantitative association rules or radio-rules.
Although they are weaker than quantitative association rules or radio-rules, they give very good results in the case of finding(partial)ordering trends.
Distance-based association rules(over interval data)could also used in this for this problem, but it is inherently hard to find the right intervals in the absence of specific domain knowledge, and the methods tend to be rather expensive.
The results of the current experiments are promising and new ones are in progress to extend the use of the ordinal to cope with attributes of different types and o find relationships between rules that involve more than two attributes.
Limitations
In real world datasets, attributes with different domains and relationships between them, other than ordinal, do actually exist. In such situations, ordinal association rules are not strong enough to describe data regularities.
SYSTEM ANALYSIS
In Existing system Ordinal association rules are used for predicting software defects. However, in real world datasets, attributes with different domains and relationships between them, other than ordinal, do actually exist. In such situations, ordinal association rules are not strong to describe data regularities. Consequently,relational association rules were introduced in order to be able to capture various kinds of relationships between record attributes.In the proposed system we are using a novel classification model for the problem of defect prediction, based on the idea of discovering relational association rules within a dataset.
Relational Association Rules
These datasets are used in the training step of the DPRAR classifier and a classification model consisting of the discovered relational association rules is built. At the classification time, when a new instance (software entity) e has to be classified, the model learned during the training step will be used for computing the similarity degrees of the instance e to the positive and negative classes, i.e. to predict if the query instance is or not defective.
SYSTEM DESIGN AND IMPLEMENTATION
System design is the process or art of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements. One could see it as the application of system theory to product development. There is some overlap with the disciplines of systems analysis, systems architecture and systems engineering.
Requirement specification plays an important part in the analysis of a system. Only when the requirement specifications are properly given, it is possible to design a system, which will fit into required environment. This is because the requirements have to be known during the initial stages so that the system can be designed according to those requirements. It is very difficult to change the system once it has been designed and on the other hand designed a system, which does not cater to the requirements of the user, is of no use.
Requirements
It deals with both the hardware and software requirements for the project. The requirements are given below.
Hardware Requirements
These include the basic hardware specifications needed for the system to run the application.
RAM : 4GB
HARDDISK : 500 GB
Processor : Intel core-i3
Software Requirements
These include the software essential for running the project including the Operating System. Programming Language etc. for this project we require the following software.
O.S : windows 7
Language : Java(jdk 1.7.0)
SYSTEM DESIGN USING UML DIAGRAMS
The Unified Modeling Language (UML) is a standard language for specifying, visualizing, constructing and documenting the artifacts of software systems, as well as for business modeling and other non-software systems. The UML represents a collection of best engineering practices that have proven successful in the modeling of large and complex systems. The UML is a very important part of developing objects oriented software and the software development process. The UML uses mostly graphical notations to express the design of software projects .In software there are several ways to approach a model. The two most common ways are algorithmic perspective and an object oriented perspective. The contemporary view of software development takes an object –oriented perspective in this approach, the main building block of all software is object or class .Every objects has identify (we can name it or otherwise distinguish it from others), state (there’s generally some data associated with it), and behavior (we can do things to an object, and it can do things for other objects as well).
The object –oriented approach to software development is decidedly a part of the main stream simply because it has proven to be value in building systems in all sort of problem domain, encompassing all the degrees of size and complexity. Visualizing, specifying, constructing and documenting object –oriented systems is exactly the purpose of Unified Modeling Language.
The Unified Modeling Language (UML) is a standard language for specifying, visualizing, constructing and documenting the artifacts of software systems, as well as for business modeling and other non-software systems. The UML represents a collection of best engineering practices that have proven successful in the modeling of large and complex systems. The UML is a very important part of developing objects oriented software and the software development process. The UML uses mostly graphical notations to express the design of software projects .In software there are several ways to approach a model. The two most common ways are algorithmic perspective and an object oriented perspective. The contempary view of software development takes an object –oriented perspective in this approach, the main building block of all software is object or class .Every objects has identify (we can name it or otherwise distinguish it from others), state (there’s generally some data associated with it), and behavior (we can do things to an object, and it can do things for other objects as well).
The object –oriented approach to software development is decidedly a part of the main stream simply because it has proven to be value in building systems in all sort of problem domain, encompassing all the degrees of size and complexity. Visualizing, specifying, constructing and documenting object –oriented systems is exactly the purpose of Unified Modeling Language.
Class Diagram
Class diagram shows the number of classes with the attributes and operations and the relationship between them.
Fig.4.1: Class Diagram
Explanation
Class diagram of defect prediction system contains two classes one is user and the other one is evaluator. The user has attributes like name and he performs operations like loading dataset,selecting pre-processing technique ,applying DPRAR algorithm, selecting measures to calculate ,comparing results ,selecting chart to plot and view details of new entity .The other class evaluator has attributes like version and name. It performs operations like pre-processing the dataset,displaying pre-processing results, classifying data,displaying classification results, calculating measures,plotting chart and displaying new entity details.
Use-Case Diagram
A use-case describes a sequence of actions that provide something of measurable value to an actor and is drawn as a horizontal eclipse.
Fig.4.2: Use-case Diagram
Explanation
Use-case diagram of defect prediction system contains two actors one is user and the other one is evaluator. The defect selecting preprocessing technique, selecting DPRAR classifier, selecting measures to prediction system contains use-case like loading the dataset, calculate and plotting chart. These use-cases are associated with user. Remaining use-cases like perform pre-processing, classifying data, displaying the classification results, calculating measures and classifying the new entity. These are associated with the evaluator.
Activity Diagram
Activity diagram represent the business and operational workflows of a system. It is dynamic that diagram that shows the activity and the event that causes the object to be particular state. It describes the workflow behavior of a system.
Fig.4.3: Activity Diagram
Explanation
Activity diagram of defect prediction system contains two swim lanes one is user and other one is evaluator.Firstly the user loads the dataset and selects the pre-processing technique, then the evaluator pre-process the data,then the user selects the measures to calculate and the evaluator calculates the measures.Finally the user selects the chart button,then the evaluator plots the graph and displays whether entity is defective or not.
Sequence Diagram
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that shows how process operates with one another and in what order. Sequence diagrams are sometimes called event diagrams, event scenarios and timing diagrams.
Fig.4.4: Sequence Diagram
A sequence diagram shows, as parallel vertical lines (lifelines), different processes or objects that live simultaneously, and, as horizontal arrows, the messages specification of simple runtime scenarios in a graphical manner.
Explanation
Sequence diagram of defect prediction system contains one actor i.e. user and one object i.e. evaluator. Firstly the user loads the dataset and selects the pre-processing technique and the evaluator performs classification .Then the user selects the measures to calculate and the evaluator calculates the measures. Finally the user selects the chart button, the evaluator plots the graph and displays whether the entity is defective or not.
Project Implementation Details
For classifying a software entity as being or not defective, the following steps will be performed:
Data pre-processing
Training/building the DPRAR
Testing/classification
Data Pre-Processing
During this step, the training data are scaled to [0,1] and a statistical analysis is carried out on the training datasets DS+and DS-in order to find a subset of features that are correlated with the target output. The statistical analysis on the featuresis performed in order to reduce the dimensionality of the input data, by eliminating features which do not significantly influence the output value.To determine the dependencies between features and the target output, the Spearman’s rank correlation coefficient is used. A Spearman correlation of 0 between two variables X and Y indicates that there is no tendency for Y to either increase or decrease when X increases. A Spearman correlation of 1 or -1 results when the two variables are being compared monotonically related, even if their relationship is not linear. At the statistical analysis step we remove from the feature set those features that have no significant influence on the target output, i.e. are slightly correlated with it. In order to decide which feature(s) to remove, we reason as follows. For each feature (software metric) smi €SM we compute the Spearman correlation (cor (smi, target)) between the feature and the target output (defect or non-defect). Let us denote by m the average value and stdeV the standard deviation of the correlations between all features and the target output. We consider that a feature smiis slightly correlated with the target classification output and will be removed from the feature set if the absolute value of the correlation is less than m – stdeV, i.e. abs (cor (smi, target)) < m – stdeV. The dataset pre-processed this way can now be used for building the relational association rule based classification model.
Fig.4.5 Correlations for the CM1 dataset
Training
In training, we define a set of relations between the feature values that will be used in the relational association rule mining process. More exactly, we are focusing on identifying relations between two software metrics (features), relations that would be relevant for deciding if a software entity is or not defective, and consequently would be useful in the mining process. After the relations were defined, the interesting relational association rules are discovered in the training datasets. More exactly, the training consists of the following steps:
Determine from DS+, using the DRAR algorithm, the set RAR+of relational association rules having a minimum support and confidence.
Determine from DS-, using the DRAR algorithm, the set RAR-of relational association rules having a minimum support and confidence.
For each rule r from the sets RAR+and RAR- determined as indicated above, the support (denoted by supp(r)) and the confidence (denoted by conf (r)) of the rule are computed. We denote in the following by ratio(r) the value obtained by dividing the confidence of the rule to its support, i.e.
Ratio (r)= (conf(r))/(supp(r)) .
Classification
Fig. 4.6: Confusion matrix and performance metrics for discrete classifiers.
i .e., Acc = (TP+TN)/(TP+TN+FP+FN) .
The probability of detection (denoted by Pd), or the recall/sensitivity of the classifier computes the proportion of actual positives which are predicted positive,
i.e., Pd = TP/( TP+FN) .
The specificity of the classifier (denoted by Spec) computes the proportion o actual negatives which are predicted negative,
i.e., Spec =TN/(TN+FP) .
.
The classification precision (denoted by Prec) computes the proportion of predicted positives which are actual positive,
i.e., Prec =TP/(TP+FP) .
The Area under the ROC curve measure (AUC) is indicated as one of the best evaluation measure to compare different classifiers and it is recommended as the primary accuracy indicator for comparative studies in software defect prediction. The ROC (Receiver Operating Characteristics) curve is a two-dimensional plot of sensitivity vs. (1- specificity). ROC curves are usually constructed for classifiers which, instead of directly returning the class of an instance, return a score that is transformed into a label using a threshold. In such cases, different (sensitivity, 1-specificity) pairs are obtained for each threshold, which are represented on the ROC curve.
TESTING
Testing is a process of executing the program with the intension of finding the error. A good test case is one that has the high probability of finding undiscovered error. Overall objective is to design test that symmetrically uncovers different classes of errors with minimum amount of time and effort. Software testing is an investigation conducted to provide stakeholders with information about the quality of the product or service under test. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation.
Software Testing
Software testing is a critical element of software quality assurance and represents the ultimate reuse of specification. Design and code testing represents interesting anomaly for the software during earlier definition and in development phase it was attempted to build software from an abstract concept to tangible implementation.The aim of the testing process is to identify all defects in a software product. It is not possible to guarantee that the software is error free. This is because of the fact that the input data domain of most software projects is very large. We can safely conclude that testing provides way of reducing defects in a system and increasing the user’s confidence in a developed system.
Testing a program consists of subjecting the program to set of test inputs and observing if the program behaves as expected. If the program fails to behave as expected, then the conditions under which failure occurs are noted for later debugging and correction. The following are some commonly used terms associated with testing.
Testing Objective
Testing is a process of executing a program with the intent of finding an error.
A good test case is one that has a probability of finding as a yet undiscovered error.
A successful test is one that uncovers an undiscovered error.
Testing Principles
All tests should be traceable to end user requirements.
Tests should be planned long before testing begins.
Testing should begin on a small scale and progress towards testing in large.
Testing Strategies
A Strategy for software testing integrates software test cases into a series of well planned steps that result in the successful construction of software. Software testing is a broader topic for what is referred to as Verification and Validation. Verification refers to the set of activities that ensure that the software correctly implements a specific function Validation refers he set of activities that ensure that the software that has been built is traceable to customer’s requirements.
Levels of Testing
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures: interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key functions, or special test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields, predefined processes, and successive processes must be considered for testing. Before functional testing is complete, additional tests are identified and the effective value of current tests is determined.
Test Cases and Test Results
Table 5.1:Test Case –Uploading File.
S. No Test Case Expected output Actual output Result
1. When valid file is Uploaded. File Uploaded. File Uploaded. Pass
2. When invalid file is uploaded. File not Uploaded. File not Uploaded. Pass
Table 5.2 : Test Case- pre-processing data.
S. No Test Case Expected output Actual output Result
1. When pre-processing technique is performed correctly. Shows pre-processed data. Shows pre-processed data. Pass
2. When pre-processing technique is not performed correctly. Shows wrong data. Shows wrong data. Pass
Table 5.3 : Test Case – classifying data.
S. No Test Case Expected output Actual output Result
1. When valid dataset is uploaded for classification File uploaded File uploaded Pass
2. When invalid dataset is uploaded for classification Upload valid file. Upload valid file Pass
Table 5.4 : Test Case – Testing data.
S.No Test Case Expected output Actual output Result
1. When valid input data is given for testing Shows actual results Shows actual results Pass
2. When invalid is given for testing Shows wrong results Shows wrong results Pass
RESULTS
To upload the software dataset and using its metric and association rules a classifier will be building. By applying that classifier on new input we can predict give input software metrics has defected or not.
Positive or true: if software has defects
Negative or false: if software metrics has no defects
First screen
Fig: 6.1: Run the command in the command prompt.
After Run the command prompt the required project main screen will be displayed. That screen contains four modules are there, these are Upload dataset, preprocessing and Training, Classification (load test file) and testing. In that first we want to select upload dataset.
Fig: 6.2: Main screen for software defect prediction using relational associational rules.
First we have to click on ‘Upload Dataset’ button, after click on this button we have to load the data,now we have to load the dataset whatever the datasets are available in our project.
Fig. 6.3: Loading the required dataset
Through this page the user can upload files by clicking on upload dataset button so that the user can select any .csv file or .txt as an input to the system. After loading the dataset, file loaded message will be displayed.
Fig. 6.4: After loading the dataset
After file upload click on ‘Preprocess & training button’ to clean dataset using spearman correlation and to build training data using association rules. In that preprocessing module feature selection will be conducted (which feature we want to remove based on the spearman correlation we will find the results).In preprocessing it will calculate the dataset size and positive features removed before spearman correlation, negative features removed after spearman correlation it will shown in to the data pre-processing module.
In training module we want to calculate the positive feature after training and negative features after training will be calculated. After preprocessing we want to select classification module, in that to give the input file.
Fig.6.5: Selecting Preprocess and training.
After training it will show all information related to dataset.
This page displays the output after clicking pre-process button. It will ask you load input file and this input file contains software metrics without true and false. Application automatically detect after classification. If you want you can copy one record from dataset and paste it in input file and you can do classification to predict. Don’t forget to remove true or false from the end of input file. Now automatically it will compare the given input and present output, then these are the input and output will be same and the output will represents whether the software system is defect or not.
Fig. 6.6: Give the input value from the dataset.
Fig.6.7: Display the classification results.
Now click on testing button, after click on the testing button automatically to calculate accuracy, probability of detection, specificity, classification precision, based on whatever we select the dataset like CM1, KC1 and MC1.And we will compare the accuracy also. Based on these results our classifier performance is better or not it will be calculated.
Fig.6.8: Generate table based on given dataset.
Now click on chart button, after that it will generate the chart based on whatever we given the dataset values and properties. In that so many different values are there, based on these are the values to generate the chart. And also whatever load the dataset based on those dataset values automatically it will generate the chart.
Fig.6.9: Generate chart based on dataset values.
This is the required chart for our dataset. In that graph we have to compare two datasets based on the generated values. In that Accuracy, probability of detection, Specificity and Precision.
CONCLUSIONS AND FUTURE WORK
A Classification model based on relational association rules discovery for detecting in software systems software entities that are likely to be defective. Experiments were conducted in order to detect defective software modules, and the obtained results have shown that our classifier (DPRAR) is better than, or comparable to the classifiers already applied for software defect detection, indicating the potential of our proposal.
Further work in the relational association rules discovery will be made in order to identify and consider different types of relations between the software metrics, relations that may be relevant in the mining process. We will also investigate how the length of the rules and the confidence of the relational association rules discovered in the training data may influence the accuracy of the classification task. Directions to hybridize our classification model, by combining it with other machine learning based predictive models will be considered too. We also plan to extend our model considering fuzzy relational association rules and investigating their usefulness in software defect detection.
REFERENCES
Gabriela Czibula, Zsuzsanna Marian, “Software Defect Prediction Using Relational Associational Rule mining”, in Proceedings of the 20th International Conference on Very Large Data Bases, Morgan Kaufman Publishers Inc., San Francisco, CA, USA, pp. 487–499,1994.
M. Baojun, K. Dejaeger, J. Vanthienen, B. Baesens, “Software defect prediction based on association rule classification,” Open Access publications from Katholieke Universiteit Leuven urn:hdl:123456789/296322, Katholieke Universiteit Leuven February 2011.
E. Baralis, L. Cagliero, T. Cerquitelli, P. Garza, “Generalized association rule mining with constraints,” Inform. Sci. 194, 68–84, 2012.
G.D. Boetticher,“Advances in Machine Learning Applications in Software Engineering,” IGI Global, (Ch. Improving the Credibility of Machine Learner Models in Software Engineering) 2007.
L.C. Briand, W.L. Melo, J. Wust, “Assessing the applicability of fault-proneness models across object-oriented software projects, ” IEEE Trans. Software Engineering 28 (7) 706–720, 2002.
Campan, G. Serban, T.M. Truta, A. Marcus, An algorithm for the discovery of arbitrary length ordinal association rules, DMIN 107–113, 2006.
V.U.B. Challagulla, F.B. Bastani, I.-L. Yen, R.A. Paul, Empirical assessment of machine learning based software defect prediction techniques, in: Proceedings of the 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems, WORDS ’05, IEEE Computer Society, Washington, DC, USA, , pp. 263–270, 2005.
R. hua Chang, X. Mu, L. Zhang, Software defect prediction using non-negative matrix factorization, J. Softw. 6 (11) , 2114–2120, 2011.
S.R. Chidamber, C.F. Kemerer, Towards a metrics suite for object-oriented design, in: Conference of the Proceedings on Object Oriented Programming Systems, Languages, and Applications, pp. 197–211,1991.
M. D’Ambros, M. Lanza, R. Robbes, Evaluating defect prediction approaches: a benchmark and an extensive comparison, Int. J. Emp. Software. Engineering. 1–47, 2011.
T. Fawcett, An introduction to roc analysis, Pattern Recogn. Lett. 27 (8) 861–874, 2006.
Essay: Software quality
Essay details and download:
- Subject area(s): Information technology essays
- Reading time: 22 minutes
- Price: Free download
- Published: 21 April 2017*
- Last Modified: 29 September 2024
- File format: Text
- Words: 6,519 (approx)
- Number of pages: 27 (approx)
Text preview of this essay:
This page of the essay has 6,519 words.
About this essay:
If you use part of this page in your own work, you need to provide a citation, as follows:
Essay Sauce, Software quality. Available from:<https://www.essaysauce.com/information-technology-essays/software-quality/> [Accessed 26-11-24].
These Information technology essays have been submitted to us by students in order to help you with your studies.
* This essay may have been previously published on EssaySauce.com and/or Essay.uk.com at an earlier date than indicated.