«Software Fault Reporting Processes in Business-Critical Systems Jon Arvid Børretzen Doctoral Thesis Submitted for the partial fulfilment of the ...»
4.4 Study 4: Fault report analysis (Paper P4) P4. Jon Arvid Børretzen and Jostein Dyre-Hansen: “Investigating the Software Fault Profile of Industrial Projects to Determine Process Improvement Areas: An Empirical Study” Abstract to P4: Improving software processes relies on the ability to analyze previous projects and derive which parts of the process that should be focused on for improvement. All software projects encounter software faults during development and have to put much effort into locating and fixing these. A lot of information is produced when handling faults, through fault reports. This paper reports a study of fault reports from industrial projects, where we seek a better understanding of faults that have been reported during development and how this may affect the quality of the system. We investigated the fault profiles of five business-critical industrial projects by data mining to explore if there were significant trends in the way faults appear in these systems. We wanted to see if any types of faults dominate, and whether some types of faults were reported as being more severe than others. Our findings show that one specific fault type is generally dominant across reports from all projects, and that some fault types are rated as more severe than others. From this we could propose that the organization studied should increase effort in the design phase in order to improve software quality.
The results from P4 were the following:
• We have found that "function" faults, closely followed by "GUI" faults are the fault types that occur most frequently in the projects as shown in Table 4-3. To reduce the number of faults introduced in the systems, the organization should focus on improving the processes which are most likely to contribute to these types of faults, namely the specification and design phases of development.
• The most severe fault types were "relationship" and "timing/serialization" faults, while the fault types "GUI" and "documentation" were considered the least severe. This is illustrated in Figure 4-4. Although “function” faults were not rated as the most severe, this fault type still dominates when looking at the distribution of highly severe faults only.
• We also observed that the organization’s fault reporting process could be improved by adding additional information to the fault reports, e.g. fault location (name of program module) and fault repair effort. This would facilitate more effective targeting of fault types and locations in order to better focus future efforts for improvement.
Contribution of Study 4:
In paper P4 describe findings on faults types and fault origins in commercial projects.
We also identified some issues that are common shortcomings in fault reporting. These contributions relate to the main contributions C2 and C3.
4.5 Study 5: Interviewing practitioners about fault management (Paper P6) P6. Jon Arvid Børretzen: “Fault classification and fault management: Experiences from a software developer perspective” Abstract to P6: In most software development projects, faults are unintentionally injected in the software, and are later found through inspection, testing or field use and reported in order to be fixed later. The associated fault reports can have uses that go beyond just fixing discovered faults. This paper presents the findings from interviews performed with representatives involved in fault reporting and correcting processes in different software projects. The main topics of the interviews were fault management and fault reporting processes. The objective was to present practitioners’ view on fault reporting, and in particular fault classification, as well as to expand and deepen the knowledge gained from a previous study on the same projects. Through interviews and use of Grounded Theory we wanted to find the potential weaknesses in a current fault reporting process and elicit improvement areas and their motivation. The results show that fault management could and should include steps to improve product quality. The interviews also supported our quantitative findings in previous studies on the same development projects, where much rework through fault fixing need to be done after testing because areas of work in early stages of projects have been neglected.
The interviews were conducted by one interviewer, using an interview guide and a digital voice recorder. These interviews were later transcribed and coded by the same
person. The main results of P6 were the following:
• The interviewees agreed with our conclusions from the previous quantitative study from P4, i.e. that the early phases in their development process had weaknesses that lead to a high number of software faults from early development phases.
• They also expressed a need for better fault categorization in their fault reports, in order to analyze previous projects with intention of improving their work processes.
• The proposed ODC fault types were seen as a useful basis for introducing a better fault classification scheme, although simplicity was important.
• They were positive to using fault report analysis feedback to improve development processes, although introducing such analysis for regular use would have to be done carefully in the organization.
• Finally, they revealed some areas in their fault reporting scheme that could be improved in order to improve analysis usefulness, for instance including attributes like fault finding and correction effort and component location of fault. The knowledge was present, it was just not recorded formally.
Contributions of Study 5:
Our main contribution is showing that practitioners are motivated to use their existing knowledge of software faults in a more extensive manner to improve their work practices. These findings support our main contributions C2 and C3.
4.6 Study 6: Using hazard identification to identify faults (Paper P7) Abstract to P7: When designing a business-critical software system, early analysis with correction of software faults and hazards (commonly called anomalies) may improve the system’s reliability and safety, respectively. We wanted to investigate if safety hazards, identified by Preliminary Hazard Analysis, could also be related to the actual system faults that had been discovered and documented in existing fault reports from testing and field use. A research method for this is the main contribution of this paper. For validation, a small web-based database for management of student theses was studied, using both Preliminary Hazard Analysis and analysis of fault reports. Our findings showed that Preliminary Hazard Analysis was suited to find potential specification and design faults in software.
P7 presented the description and an implementation of a novel method for identifying software faults using the PHA technique. This method identified 6 faults that were actually found in the system as well as 20 potential faults that may be in the system. We also showed that there are certain types of faults that analysis techniques such as PHA can help to uncover in an early process phase. Performing the PHA elicited many hazards that could have been found in the system as “function” faults, as shown in Figure 4-5. That is, faults which originate from early phases of system development, and are related to the specification and design of the system. From this we conclude that PHA can be useful for identifying hazards that are related to faults introduced early in software development.
35,0 30,0 25,0 20,0 15,0 10,0 5,0 0,0
As for finding direct ties between hazards found in PHA and faults reported in fault reports, we were not very successful. This, we feel, was mainly due to the studied system’s particular fault type profile which was very different from fault distribution profiles we had found in earlier studies. Some weak links were found, but the data did not support any systematic links.
Contributions of Study 6:
The main contribution of this paper was the description and implementation of the method for identifying software faults using the PHA technique. The contributions of this study are related to the main contributions C1 and C3.
4.7 Study 7: Experiences from fault report studies (Technical Report P8) This section will describe, sum up and reflect upon our experiences from several fault reporting studies. It has not yet been written as a final paper, but this is planned in the near future. See the technical report P8 in Appendix A.
P8. Jon Arvid Børretzen: “Diverse Fault management – a prestudy of industrial practice” Abstract to P8: This report describes our experiences with fault reports and fault reporting from working with fault reports from several different organizations. Data from projects we have studied is presented in order to show the variance and at times lack of information in the reports used. Also we show that although useful process information is readily available, it is seldom used or analyzed with process improvement in mind. An important challenge is to describe to practitioners why using a common description of faults is advantageous and also to propose a way to better use the knowledge gained in colleting data about faults. The main contribution is to explain why more effort should be put into the production of fault reports, and how this information can be used to improve the software development process. We explain how fault reports can become more useful just by including information that is already available in development projects.
P8 presents an overview of studies performed concerning fault reports, and shows the type of information that exists and is lacking from such reports. Our learnings include that fault data is in some cases under-reported, and in most cases under-analyzed. By including some of the information that the organization already has, more focused analyses could be made possible. One possibility is to introduce a standard for fault reporting, where the most important and useful fault information is mandatory.
Furthermore, we have learnt that the effort spent by external researchers to produce useful results based on the available data is quite small compared to the collective effort spent by developers recording this data. This shows that very little effort may give substantial effects for many software developing organizations.
Finally, there are two main points we want to convey as a result of the studies we have
• It is important to be able to approach the subject of fault data analysis with a bottom-up approach, at least in early phases of such research and analysis initiatives. The data is readily available, the work that has to be performed is designing and performing a study of these data.
• Much of the recorded fault data is of poor quality. This is most likely because of the lack of interest in use of the data.
We are planning to write a final paper P8 to combine lessons learned from Study 3 and 5, cf. Section 4.3 and 4.5. This is partly in response to very positive review comments on paper P3. The preliminary paper is presented as a Technical Report in Appendix A.
Contributions of Study 7:
This study directly identifies issues that are common shortcomings in fault reporting, and suggests actions to improve and support the use of fault report analysis as at tool for process improvement. These findings support our main contribution C2.
5 Evaluation and Discussion
This chapter intends to answer the four research questions RQ1-RQ4 based on the results. This chapter discusses the relations between the thesis contributions and the research questions. The research context, papers and BUCS goals are also discussed.
There is also a discussion of validity threats and experience from industrial cooperation.
From Section 1.4 we reiterate the main contributions in this thesis and elaborate on
C1. “Describing how to utilize safety criticality techniques to improve the development process for business-critical software.”
• We have described ways of integrating of safety criticality techniques with regular development practices to improve the development process for business-critical software. We have proposed integrating safety techniques like PHA and Hazop into early development phases in order to help improve safety and reliability of the resulting software [P1], although this has not been validated industrially. In addition we have shown that the PHA technique is useful in eliciting hazards that are related to faults that are introduced in early development process phases [P7].
C2. “Identification of typical shortcomings in fault reporting.”
• Through our studies on fault reports, we have described several issues concerning shortcomings in fault reporting. The most striking is that commercial organizations generally do not exploit the fault report data they possess for more than day-to-day fault logging or at most shallow analysis.
Additionally, it is clear that fault reporting is treated more as a necessary chore, than as a potential source for process improvement. Fault reports are often inaccurate, incomplete or incomprehensible, which makes for poor reusability for analysis. In addition fault data that could easily have been recorded for process improvement gains, e.g. correction effort or location of fault, are not even considered in fault reports.
C3. Improved model of fault origins and types for business-critical software.
• We have described studies to give insight in what fault types are most common or severe in business-critical software. We found that the most common faults were ones that originated from early process phases, namely specification and design. We have also shown that certain fault types tend to be more severe than others [P2][P4].
These contributions were described more briefly in Section 1.4. Table 5-1 shows the relationship between the contributions C1-C3 and research questions RQ1-RQ4.