«Software Fault Reporting Processes in Business-Critical Systems Jon Arvid Børretzen Doctoral Thesis Submitted for the partial fulfilment of the ...»
Since hazard analysis e.g. by PHA typically is performed in earlier phases of the system development, we were motivated to investigate whether the PHA technique can be used to reveal faults early in the system development process of a given system. This paper describes a method and study where we analyzed fault reports from system testing and field use and compared them with results from hazard analysis of the same software system. In doing this we can compare the results from the PHA and the analysis of fault reports, to see if some faults could potentially have been identified and removed earlier.
Measuring quality and effects on quality in a software system is not a trivial matter.
One of the means Avizienis et al. suggest for attaining dependability in a system is fault removal and fault prevention in order to reduce the number and severity of faults . By identifying common fault types, developers can reduce the number of critical faults by focusing their efforts on preventing such faults. Also, identifying the most severe fault types makes developers able to focus on preventing those faults that have the biggest detrimental impact on the system. This concurs with Boehm’s concept of “value-based” software engineering and value-based testing, as presented in [3, 4]. Fault report analysis can thereby be of help in identifying the most important fault types, in order to focus quality improvement work on these in later projects.
Basili et al. have presented several experiments where inspection techniques are compared to testing, for example in . In a related article, Shull et al. presents Perspective-Based Reading and how this technique can improve requirements inspections . Wagner has made a survey of the quality economics of defect-detection techniques in , where he presents some numbers on the costs of removing faults at different stages of development. In , Ciolkowski et al. state that the software review is a popular quality assurance method, and presents a survey concluding that reviews should be integrated in the development process, performed systematically rather than ad hoc, and be optimized for their target system.
The PHA method is used in the early life cycle stages to identify critical system functions and broad system hazards. The identified hazards are assessed and prioritized, and safety design criteria and requirements are identified. As Rausand states, a PHA is started early in the concept exploration phase, so that safety considerations are included in trade-off studies and design alternatives . This process is iterative, with the PHA being updated as more information about the design is obtained and as changes are being made. PHA is a relatively light-weight method, the information requirements are low as high-level documentation like concept and requirements specification is sufficient for an early PHA analysis. The method is also not very training-intensive, and practitioners can start using the method fairly quickly. The PHA sessions are performed as semi-structured brainstorming using the available documentation as source of information. The results are sets of PHA sheets containing the identified hazards and further information about the hazards, e.g. the cause and effect of the hazard and also proposed measures for removing the hazard. This serves as a baseline for later analysis and is used in developing system safety requirements and in the preparation of performance and design specifications. Since PHA starts at the concept formation stage of a project, little detail is available, and the assessments of hazard and risk levels are therefore qualitative. A PHA should be performed by a small group with some knowledge about the system requirements . PHA is usually performed in order to identify system hazards, translate system hazards into high-level system safety design constraints, assess hazards if necessary, and establish a hazard log. These system hazards are not equivalent with faults or failures. Failures (incorrect behaviour vs requirement specifications) may contribute to hazards, but hazards are system states that combined with certain environmental conditions, cause accidents regardless of whether requirement specifications are violated.
More commonly used alternatives than the PHA method are different inspection techniques for specification, design and code. Table 1 shows in which development phases the different techniques are used to identify faults.
Table 1. Different techniques identify faults in different development phases.
Fault identifying technique Development Inspections PHA Program phase execution Requirements ● ● Design ● ● Coding (●) ● Testing ● Field use ●
2.2 The DAIM context
DAIM is a web-based database for delivery and processing of academic master theses. It is a small-framed system developed internally at the Department of Computer and Information Science at the Norwegian University of Science and Technology (NTNU). The development process was small-scaled, with strong user-orientation. The specification and design process involved system users and administrators, and used interviews and paper prototyping to produce specification and design documents. The implementation was carried out by a small team, and consists mainly of a database implementation and a php-based web presentation application.
The system description contains 14 distinct use cases, with description of functionality for the different user types.
3. Research Method This work proposes a method which combines two different analysis techniques, where PHA is applied in early stages of software development, and testing or field use with fault analysis is performed late in the development process, typically after system testing or when the system is put in production. By comparing the results from a PHA performed on available documentation of system concepts and specifications, with the results from analysis of fault reports from late testing and field use, we want to investigate how the PHA helps us in identifying hazards that are relevant towards faults actually found in the system.
The dotted lines in Figure 1 show the common view of how faults are related to reliability and hazards are related to safety, and how our work proposes how we may possibly relate findings in hazard analysis to reliability as well, as shown by the full line.
This results in a method as described in Section 4.1, of using safety reviews (like PHA) on requirements and design documentation to not only find hazards but also find faults. The converse could also be considered, using reliability reviews and inspections of requirements, design and code documents to not only find faults but also hazards. In Figure 1, this would have been represented by an arrow linking faults to safety.
3.1 Research questions
The research questions we wanted to explore in this study were the following:
RQ1: What kind of faults in terms of Orthogonal Defect Classification (ODC) fault types does the PHA technique help elicit?
RQ2: How does the distribution of fault types found in the fault analysis compare to the one found in the PHA?
RQ3: Does the PHA technique identify potential hazards that also actually appear as faults in the software?
3.2 Hazard analysis by PHA
The hazard analysis was to be performed prior to studying the fault reports, so that we would not be influenced by the faults that had actually been reported. This is also the same order of analyses in a practical project; the hazard analysis would be performed at an early stage of development, while the fault report analysis would be performed at the very end of the development process.
To be able to compare the results from fault report analysis with those from hazard analysis, we assigned one or more fault types to each of the hazards identified in the PHA. We had to assign several fault types for some hazards that were somewhat generic in nature and which could correspond to several different fault types. Some hazards were not possible to relate to a fault type, for instance hazards related to human error or manual routines not directly related to the software under study. These were then classed as “Not fault” in accordance with our classification scheme.
3.3 Fault analysis
One property of the ODC fault types is that they can be associated with different process phases, as stated by Chillarege et al. in  and also by Zheng et al. in .
Table 3 shows the associations as presented in . This division of fault types into process phases can not be considered to be unassailable, but it gives a good indication of where a fault of a certain type is most likely to have originated from.
3.4 Research execution This PHA was performed not as a part of analyzing specifications, but rather after the DAIM system had been developed and been put in use for some time. Usually a PHA would be performed much earlier, but we chose to analyze a completed system in order to compare PHA results with fault analysis results. The hazard analysis was performed in four sessions, each session concentrating on the use-cases for certain user types.
These sessions were attended by five to six persons, of which one participant was a system expert, and the others had experience in performing PHA. One person was responsible for leading the sessions, and one person was scribe, recording the PHA elicitations to PHA sheets. In total, the PHA sessions consisted of 38 staff-hours of effort.
The fault analysis of the DAIM system was done by two researchers individually categorizing the fault reports using a fault categorizing scheme based on that used in the Orthogonal Defect Classification (ODC) scheme [11, 12]. We used fault descriptions in the fault reports to categorize the faults into the fault types shown in Table 2.
Afterwards, we compared our categorization results and came to a consensus on the reports where our initial categorization was dissimilar.
4. Results The results are presented in form of a description of the method we used for evaluating the use of PHA to identify faults in Section 4.1, and then the presentation of the results of this evaluation in Section 4.2 – 4.5.
4.1 Method description: Using PHA to identify faults
1) Initially, we define and delimit the system to be studied. Information required is the same as that of a PHA analysis; a system description and requirements and design documentation like use-cases, high level class-diagrams, or similar documentation. It is important to make clear the system context and the roles of the members of the PHA group: Are they to be independent of development, are they part of the development team, are they domain experts?
2) Executing the PHA session(s). This involves making sure the group understands the use of the PHA technique and that they have some knowledge of the system to be analyzed, like its main functionality and the actors involved in system use. This group meets and performs a systematic walkthrough of the available use-cases and system descriptions to identify possible hazards. These are decided upon through discussion and consensus and recorded in PHA tables, an example of which is shown in Table 4.
3) Next, the resulting hazards are considered in terms of which fault types they potentially may cause. This is not necessarily a one-to-one relation; a hazard can be the potential origin for several faults. The fault types used should be the same that are used in the categorization task in step 4.
4) A collection of fault reports (from testing and field use) are compiled from the system. If the fault reports are not already categorized, the faults are categorized by using the same fault type categorization scheme as in step 3. This categorization should be performed by persons that understand the fault type categories well. This also requires that the fault reports are descriptive enough to be properly categorized.
5) Finally, we perform a comparison of the fault reports with the possible faults from the PHA session, helped by the categorization of faults.
We sum up some attributes for this method in Table 4, which is based on a similar description of review methods by Laitenberger et al. in .
4.2 Hazard analysis (PHA) The result of the PHA sessions was PHA sheets containing potential hazards the group had elicited from the system description and documentation. Table 5 shows two short examples of hazard results from the PHA sessions.
In total the PHA identified 33 hazards in the DAIM system. By assigning fault types to the hazards, with some hazards potentially causing several types of faults, we identified 43 potential faults in total. Six of these potential faults were later classified as “Not fault” bringing the actual number of identified potential faults to 37. Figure 3 shows the distribution of hazards represented as fault types.
35,0 30,0 25,0 20,0 15,0 10,0 5,0 0,0
Figure 3. Distribution of hazards represented as fault types (%).
4.3 Fault analysis In total, 117 fault reports collected by both human reporting during system testing and automatic failure log generation during system use were categorized using the ODC fault types shown in Table 2. Figure 4 shows the distribution of fault types. Of these 117 faults, 25 were categorized as “Not fault”, giving us 92 actual faults found in the system.
The collected fault reports from the DAIM system were split in two different groups one from system testing, and one from the first months of field use. The two groups had different distribution of fault types. Figure 5 shows the difference in the distribution of the faults reported in field use and the faults reported in system testing.
We see that there are certain fault types that were only reported at system test level and not in field use, such as “documentation”, “function” and “GUI” type faults.
30,0 % 25,0 % 20,0 % 15,0 % 10,0 % 5,0 % 0,0 %
30,0 25,0 20,0 15,0 10,0 5,0 0,0