Software Engineering Research Report
Introduction
“Software Engineering” is, namely, the application of engineering to all the aspects of software production. The term was first officially used in an official conference report in 1968 for the world’s first conference on Software Engineering in Garmisch, Germany. The Conference, sponsored by the NATO Science Committee, was meant to cover all aspects of software, from its conceiving all the way to distribution and service, with the final goal of devising a set of best practices for software production based on engineering. This new set of norms was meant to close the “Software Gap”, that is the void between what was hoped for from a complex software system, and what was typically achieved[1]. This gap, determined by the inferior quality of the software, caused not only technical problems, which resulted in system failures and performance issues, but prevented projects from being delivered on time and on budget to clients. Moreover the ever-increasing presence of software in embedded and (in more recent times) non-embedded systems made the problem in question of the top priority: how to build good software?
Ever since then the question has been the same, software has become ubiquitous -it’s in our hand, in our glasses, in a space station miles above our heads- and once it was known how to make good software we evolved to asking how to make it better: how to make it more robust, reliable, scalable, time and cost effective. The latest answers to this questions are collected and internationally published under the Software Engineering Book Of Knowledge (SWEBOK), the latest one published dating back to 2014.
Research in software engineering has been trying to give new, optimised answers to this questions, some of it even by applying modern techniques like Machine Learning or Artificial Intelligence for the task. Here are four current areas of interest in Software Engineering Research and a brief dissertation for each:
0. A few words on automation
Automation, in general, has been an ever present point in most of the research done in the last decade. In the process of finding a better, more efficient way of producing software, automating some of the sub-task has been a very popular answer and has unsurprisingly been applied to most Software Engineering fields: from requirements elicitation to the development process. One reason for this being the overall inefficiency of human work both time-wise and error-wise, i.e. not only manually carrying out a task takes generally more time than an automated process specifically designed for that purpose would, but it also implies either a greater workforce or the taking away of time that could have been used on the execution of some other task. This makes any kind of automation, as long as it’s valid, a very significant step forward for Software Engineering.
Automation in Automated Testing
1.1 Contextualisation and Importance
Automated testing has been in the last decade a central theme in Software Engineering. Software Testing, even if sometimes neglected among the most inexperienced, is crucial to the successful deployment of any piece of software: first, it is responsible for finding bugs in the code which could lead to system failures, failures that according to early studies by the NIST (Nation Institute of Standards and Technology) cost the U.S. Economy, an economy largely based on software, from $22 to $59 billion a year [2]; second, it is responsible to ensure that all the stakeholders requirements are met at the time of delivery, process which is specifically called “acceptance testing”. Some types of software testing can be very repetitive and tedious to do manually and are usually carried out by either the software developer themselves, or a dedicated team of testers. This makes testing a perfect fit for automation.
An application of automation in testing is GUI Testing, a process in which user interaction events are replicated and the system responses evaluated to make sure the visible behaviour of the software in question is correct. A very recent application of this kind of testing, which is more data and keyword driven, is on websites, through the use of headless browsers, such as Phantom.js, or tools like Selenium Web Driver. These both make use of the DOM tree structure of web pages to find precise UI elements and apply events on them. Headless browsers, such as Phantom.js, are not themselves testing framework, but most of the time they have these frameworks built on top of them. More specifically in the case of Phantom.js, there are related projects like Casper.js, Lotte or WebSpecter that serve the purpose of testing [3]. Writing testing automation code can be in turn very repetitive and tedious due to the numbers of conditions that need to be checked and the scarce maintainability of tests. Here is some interesting research about the attempt of automating test automation:
1.2 Current Research
Interesting research has been done on writing UAT (User Acceptance Tesing) test automation code applying natural language processing techniques by using the assumption of the imperative nature of the test cases [4]. Currently, test automation experts manually convert UAT test cases into functioning testing code using automation tools like the Robot Selenium Framework. The research goal is to automate this process by creating a mapping between test cases written in English language onto a specified spreadsheet format and the test automation code run by the Selenium framework. Which is a step forward from what frameworks like Cucumber [6], which requires some user stories to be written in a language called Gherkin and mapped to the code, are already doing. Of course test cases written in plain English must be tagged in order to be understood and potentially mapped to their corresponding action case, and this is done through a learning approach using machine learning techniques (statistical taggers) and a pre-tagged “test cases corpus” to be trained on. The second part consists on creating a dictionary of mappings from actions (i.e verbs) to keywords taken from the Selenium Framework being used with a connection to the generated code using the specific keyword. After this is done, the system should be ready for automatic mapping between natural language to testing code and able of automating the “time-consuming and cumbersome manual testing process”, significantly increasing efficiency and minimising error. Similar research has been done by IBM – Research India [5], in which the scope of automation of test automation has been extended to all types of testing. The method used for mapping in here does not only make use of machine learning but uses backtracking-based search in order to solve ambiguities derived by the tester’s instructions, and “a sequence of automatically interpretable action-target-data (ATD) tuples, where each tuple is a triple (a, t, d) consisting of an action, a target user-interface (UI) element, and a data value”. Ambiguities are due to the inescapable limitations of the tester’s natural language.
1.3 Contributions and Future Developments
Key researchers in this field are [4] Arvind Madhavan from Iowa State University, and Saurabh Sinha, Suresh Thummalapenta now at Microsoft Research, Nimit Singhania, and Satish Chandra of the IBM T. J. Research Centre, India. A key conference for Testing Automation is the GTAC (Google Testing Automation Conference), held annually.
There’s a growing trend in software engineering research that deals with automating the transition between natural language and its structured implementation. Some research is being done on the extension of automated program synthesis from testing to SE concepts like models from natural language descriptions. Closer to testing, there already are systems like CoTester that let you use a English-like test-scripting language, called ClearScript, to write tests. Nevertheless, in order to do that the language spectrum must be artificially limited, the test manually segmented and there is no guarantee about ambiguity. Although [5] solves the ambiguity issue through backtracking, there is still a long way to go to create a tool that can correctly convert any natural language test instruction to its equivalent test automation code.
2. Model Driven Engineering
2.1 Contextualisation and Importance
2.2 Current Research
2.3 Contribution and Future Developments
3. Higher Order Mutation Testing
3.1 Contextualisation and Importance
The idea, which dates back to 1970, is to mutate the code covered by tests in a confined way and check whether or not the existing test suite will cause the system to detect and reject the change. If this doesn’t happen, it means that either the piece of code changed by the mutation was never used, in which case we identified a piece of dead code to be removed, or that the current tests are not comprehensive enough and leave one or more aspects of the software untested [7]. Mutations are typically representative of usual programmers mistakes or edge cases. Every piece of code generated by a mutation is called a mutant, the detection of this mutant by a well formed test set is said killing the mutant. Was the test set in question perfect, no mutant would survive and the effectiveness of the test suite would be validated.
A metaphor to properly convey the importance of this type of testing can be found in Juvenal’s famous dilemma “Who will guard the guardians?”, which translated into software engineering jargon means that in order for software to be validated by testing, the testing itself should be valid first. To this we must add the increasing importance in testing in the development process both time-wise and cost-wise, where early studies suggest that testing can take up to 50% of the software development budget [8], and the billions of dollars that are wasted in the modern industry due to the inadequacy of testing [9].
3.2 Current Research
Current research being done on this particular theme is on Higher Order Mutation Testing. While First Order Mutants (FOMs) are generated by a single mutation to the original program, representing therefore only simple errors, Higher Order Mutants (HOMs) are generated by joining together multiple first order mutation in order to create more complex faults. Nevertheless, HOMs were neglected at first because of their generation function’s exponential complexity.
Recent empirical research [10] suggest that almost 90% [20] of faults at release could be represented as complex faults, that kind for which only HOMs could be useful to replicate and prevent. Moreover, Yue in his PhD dissertation [11] points out how applying HOMs could actually solve some of the current issues with mutation testing: first, being most HOMs equivalent to its FOM components (which means FOMs can be replaced without loss of test effectiveness by an HOM), the number of mutants to be tested could be considerably reduced, effectively lowering computational costs; second, trivial mutation are also significantly lessened by behaviour like “fault-shifting” for which compositions of FOMs result in HOMs with completely new, non-equivalent, faulty behaviours; third, the reduction of mutants to be tested also reduces in turn the time taken comparing the original program’s output with each test case to the expected result (human oracle problem). However, the HOMs used in this consideration represent only a subset of all possible HOMs which would actually be exponentially more than the number of FOMs from which they are generated. In order to identify this subset a meta-heuristic (greedy and hill-climbing are two proposed approaches) algorithm is proposed for which the fitness function is defined as “the ratio of the fragility of this higher order mutant to the fragility of its constituent first order mutants”, where fragility measures how easy is a mutant killed from 0 to 1, 0 being impossible to kill. Therefore as fitness decreases from 1 to 0 the higher order mutant becomes gradually stronger than its constituent first order mutants. This stronger HOMs are defined subsuming higher order mutants because they can absorb most of the test effectiveness of its components. Other subsuming HOMs identification algorithms are described in [18].
Other research [19] suggests the use of a genetic algorithm base technique to “aid the automatic generation of test inputs for killing higher-order mutants” more efficiently than random test generation techniques. This potentially allows co-evolving between stronger HOMs and testing data.
3.3 Contributions and Future Developments
Key researchers in this field are [11] Jia, Yue, Mark Harman, William B. Langdon of the University College of London, [19] Ahmed S. Ghiduk of Taif University, Saudi Arabia and [18] Quang Vu Nguyen and Lech Madeysk of the Wroclaw University of Technology, Poland. A key conference for Mutation Testing is the IEEE ICST (International Conference on Software Testing, Verification and Validation), held annually; a key journal that deal with mutation testing related themes is the International Journal of Computer Science Issues (IJCSI) and a key publication on the higher order mutation testing theme is A Manifesto for Higher Order Mutation Testing, by Jia, Yue, Mark Harman and William B. Langdon.
Further research is being done on how to co-evolve higher order mutants and testing data and also in the application of genetic programming for the generation of interesting HOMs. Another topic of interest in future research is the mitigation of the effects of equivalent mutants, which are mutants that behaves in exactly the same way as the original, through some classification strategy.
4. Modern Techniques in Requirements Engineering
4.1 Contextualisation and Importance
Requirements Engineering (RE) is generally considered the most important part of the Software Development Life Cycle. In this phase the needs of stakeholders or potential users are established and formalised. This is a crucial step because it represents the very foundation of the whole software is going to be built: correction of shortcomings during requirements has been reported to account for up to 75% of all error removal costs [12]. If the gathered requirements are judged complete and consistent after analysis then the next step is requirements specification, otherwise the process goes back to elicitation for further gathering of information. Analysis and Elicitation are essential to RE and strongly determine the quality of the finished requirements on which software is going to be built, therefore needs extreme attention and precision. Finding a way to reduce error, increase efficiency and consequently increase quality in these stage of development can most certainly result in a significant increase in time and cost efficiency: the first because of the long and tiring nature of requirements analysis, the second because of a potentially better error prevention.
4.2 Current research
Some of the latest research has been done over the application of modern intelligent techniques, like Machine Learning (ML) or Bayesian Networks (BN), for the quality assessment of requirements. Parra et al. [13] propose an automated assessment of requirements quality, with the aim of emulating a quality expert level assessment, using learning based on standard features taken into consideration by an expert when manually assessing the quality of requirements. The training set is a list of requirements previously classified (labelled) by a domain expert and three main clusters of metrics (features used in the learning process) are introduced: correctness, consistency and completeness. Correctness is the one taken under consideration in this research. A set of low-level quality metrics is taken from [14]: these metrics are then used together with the assessed corpus so it is possible to associate “the set of metric that represent each requirement and the quality value provided by the expert”. Two ways of creating training instances are proposed: in the first, each instance corresponds to the set of metric values of only a requirement and its assigned quality; in the second two requirement’s metrics are joined together with an indicator of the requirement with better quality into a single training test. The model is then trained over the corpus on the specified metrics (features), and is ready to generalise the classification to new test cases obtaining in the researchers’ experiments “an accuracy in a range of 83.27 as a minimum accuracy and 87.72 as the maximum percentage of accuracy” [13]. Another research [15] is on the use of Bayesian Networks, formalisation and active learning techniques to model and process the knowledge deriving from requirements engineering in order to improve the process productivity and deal properly with the potential partialness, inconsistency and ambiguity of the requirements. In much the same way del Sagrado et al. [16] have applied BN networks to requirements risk assessment. Another very interesting frontier in requirement engineering is [17], where a method for “automatically retrieving functional requirements from stakeholders using agile processes” is proposed. The method is based on a combination of machine learning, knowledge acquisition and belief revision with the aim to collect information from stakeholders and manipulate it so to have a list of essential requirements for the software system.
4.3 Contributions and Future Developments
Key researchers in this field are [13] Eugenio Parra, , Christos Dimou, Juan Llorens, Valentín Moreno, Anabel Fraga of the Knowledge Reuse Group of the University Carlos III de Madrid, [15] Isabela M. del Aguila and Josè del Salgado of the University of Almerìa, and [14] Ronit Ankori of Bar Ilan University, Israel. The key conference for Requirement Engineering is the IEEE International Requirements Engineering Conference (RE), held annually and a key journal is Requirements Engineering where the second reported research was published.
Parra et al. [13] suggest in the research further work to probe the insertion of the consistency and completeness cluster of metrics into the learning process so to “cover completely the assessment of the quality of requirement written in natural language”, and further focus on new ways of implementation for the learning instances in order to increase the accuracy of the classifiers.