«Software Fault Reporting Processes in Business-Critical Systems Jon Arvid Børretzen Doctoral Thesis Submitted for the partial fulfilment of the ...»
Figure 5. Distribution of fault reports in the two DAIM fault report collections (%).
4.4 Comparing hazards and faults The method of employing hazard analysis and analysis of fault reports that is used here, could be described as triangulation of techniques to show how hazard analysis can be used to identify possible faults in software.
Combining the distributions from Figures 3 and 4 in one graph, we get Figure 6, which shows that the distribution of hazards and faults to be quite different to each other.
Hazards (%) Faults (%) 35,0 30,0 25,0 20,0 15,0 10,0 5,0
With only 6 of the 92 faults reported also being identified by the PHA analysis, there are many faults that was not identified by the PHA. With many of the faults reported being pure coding faults, like the faults of the “assignment” fault type, this was to be expected. A typical example of a fault report description in this fault type was “ThesisID missing for document”.
4.5 Efficiency of PHA for fault identification Looking at staff-hours spent per fault identified, we get two figures, one for the faults that were actually found in system testing and field use, and one for the total number of potential faults in the system (including the actual faults). Table 7 shows these figures, and compares them with some mean numbers on inspection efficiency from .
This result is based on four PHA sessions with five or six participants, but PHA can also be performed with as little as two participants, and still produce good results in the number of anomalies identified. This would certainly have reduced the ratio of staffhours per fault considerably.
5.1 The results in terms of our Research Questions Our main findings related to RQ1 were that PHA was most useful in eliciting hazards that were related to “function” faults. These types of faults are related to specification and design, as shown in Table 3 and stated in [11, 13]. Hazards related to the “checking” and “algorithm” fault types were also common. Our reasoning about this result is that when performing a PHA, you are mostly basing your analysis on documentation and artefacts for the early stages of development. This means that you are more likely to be able to elicit possible hazards that are related to more general design and specification. Other types of hazards are found as well, but as the system details are unclear, it is more difficult in the PHA to specify exactly what can go wrong technically.
For RQ2, we did not find any correlation between hazards elicited through PHA and faults found in the fault analysis. As for finding direct matches between PHA findings and fault analysis, as stated in RQ3, there was a very low match rate. Of the 92 fault reports, only 6 of them could be said to have been specifically elicited as hazards in the PHA.
In this instance we believe that an important reason for the lack of match between elicited hazards and faults reported is the nature of the system under study. Compared to other systems we have performed fault analyses of, the DAIM system has a very different fault distribution profile. Earlier, we have performed similar analyses of fault reports, and these have had a distribution where “function” and “GUI” faults have been the most frequent. [15, 16].
5.2 Comparing fault distribution with previous studies When comparing the fault distribution for DAIM with fault distributions we have found in previous studies, we see that the distribution for DAIM seems atypical. As an example, we compare the fault distributions of DAIM and that of another fault report study where five industrial projects were analyzed from system testing and field use . Figure 8 shows the difference between the distributions.
30,0 25,0 20,0
Figure 8. Comparison of DAIM fault distribution with previous fault study (%).
This is one example of why we think the DAIM system has an atypical fault distribution, and this is supported also by findings we made in  and also by Vinter et al. in . As we see in Figure 8, the very numerous fault types “function” and “GUI” for the systems analyzed in  are not at all numerous in the fault reports for the DAIM system. This will of course have an impact on the ability to compare the fault reports in the DAIM system to the hazards found, where “function” and “GUI” were more numerous. It seems it would have been more appropriate to perform a “postmortem” hazard analysis on the systems studied in  as they had a fault profile more similar to the fault profile that hazard analyses are likely to result in.
5.3 Method evaluation: Improving specification and design inspection
The comparison of possible hazards identified during PHA sessions and the faults found after system testing and field use is a novel approach for exploring how the PHA method can be used for eliciting possible faults in a software system. PHA is a very light-weight and easy to learn method, which is suited for use in very early phases of development, as shown in Table 1. If we compare with the Perspective-Based Reading technique from , using PHA will result in inspections where the role of the readers will be with an emphasis on system safety.
Compared to the economics of other inspection methods, the results in terms of efficiency depend on using the number of faults actually found or the number of potential faults found. According to Wagner’s literature survey, the mean inspection efficiency is 1.06 staff-hours per defect found for requirements and 2.31 staff-hours per defect found for design . Our study showed for actual faults found in fault reports an efficiency of 6.33 staff-hours per defect, and for potential faults found an efficiency of
1.46 staff-hours per defect, as shown in Table 6. As the DAIM system under study had not been injected with known faults, but was used because it was an accessible real life system with available documentation and fault reports, it is difficult to say how many actual faults the system had.
Also, it should be noted that the fault distribution of the DAIM system was very different from several other systems we have studied in previous studies. Another remark is that since PHA is a safety review technique, it will also catch potential safety hazards like the technique is originally meant to do. So when using the proposed method you could combine the results for safety and fault review, which would give another number on the efficiency, staff-hours per caught anomaly (hazards and faults).
5.4 Validity threats
The main validity threats in this study are:
The main threat to construct validity is the difference between hazards and faults.
Hazard analysis and fault report analysis do not produce the same type of reports. By converting the hazards found to types of fault we were able to make a comparison between the two. As the results show, most of the hazards identified did not show up in the fault reports as actual faults. Still, a great deal of the hazards identified through the PHA would have manifested themselves as faults, if these hazards over time and diverse users had in fact occurred in the system. If these faults would have manifested themselves as observable failures in some future execution context is another matter.
One threat to internal validity is that the fault categorization was performed by us.
Since such categorization is a subjective task, this can affect the reliability of measures.
By having two persons independently categorize and then compare results, we feel we reduced this threat.
Similarly, PHA sessions are also based on subjective views on the description of a system. This threat is not possible to circumvent as the PHA technique is based on personal ideas and collective “brainstorming”. The quality of the PHA results is dependant on the participants experience and knowledge.
Another threat is the issue of unconscious bias or “fishing”. By comparing the results from the PHA and the fault analysis, we were looking to find some connections between the results and this could have lead us to find ”weak” connections which others would not have found.
The group of hazards that were compared to the reported faults may cause a certain degree of validity threat in the form of selection. The PHA sessions were time limited, so only the most obvious hazards were taken into account. Also, the PHA sessions were performed over a period of time, so some maturation in the form of better understanding of the actual system may have occurred.
Another issue here is the time span of the study. It is possible that by studying too short a time span of fault report collection, some fault types were underreported in the collected fault reports.
In our work we have analyzed data from only one and possibly atypical software system, DAIM, and this will conflict with the ability to generalize the results. Another issue is the size and simplicity of the software system studied, which may be smaller than many other web based projects of similar type. The development process of this system has also been rather small-scaled, with few people involved in design, implementation and testing. This may influence the distribution of faults found, as the developers have had a less complex system to develop. The reason thie DAIM system was chosen for study was that it was system developed close to us, which gave us a lot of freedom with respect to documentation accessibility and possibility for data collection and clarification with developers.
6. Conclusion and further work This paper has presented the description and an implementation of a novel method for identifying software faults using the PHA technique. Because of the nature of the system, the results did not turn out as clear as we had hoped. The fault reports were few and mostly limited to certain types. On the other hand, we did identify 6 faults that were actually found in the system as well as 20 potential faults that may be in the system. The hazard analysis also showed that there is a certain type of faults that analysis techniques such as PHA can help to uncover in an early process phase. Performing the PHA elicited many hazards that could have been found in the system as “function” faults.
That is, faults which originate from early phases of system development, and are related to the specification and design of the system. From this we conclude that PHA can be useful for identifying hazards that are related to faults introduced early in software development.
As for finding direct ties between hazards found in PHA and faults reported in fault reports, we were not very successful. This, we feel, is mainly due to the studied system’s particular fault type profile which was very different from fault distribution profiles we had found in earlier studies. Some weak links were found, but the data did not support any systematic links.
The method we have proposed in this paper should be validated by performing future similar studies. Because of the circumstances and type of system we analyzed here, interesting further work would be to execute a similar study on a larger system where the fault distribution would be more similar to other systems we have conducted fault report analyses of.
Acknowledgements The author wishes to thank Professor Reidar Conradi for valuable input and reviewing. I would also like to thank Jostein Dyre-Hansen, Professor Tor Stålhane, Kai Torgeir Dragland, Torgrim Lauritsen and Per Trygve Myhrer for their assistance during the execution of this study.
References  N. Leveson, Safeware: System safety and computers, Addison-Wesley, Boston, 1995.
 A. Avizienis, J.-C. Laprie, B. Randell, C. Landwehr: “Basic Concepts and Taxonomy of Dependable and Secure Computing”, IEEE Transactions on Dependable and Secure Computing, (1)1, January-March 2004.
 L. Huang, B. Boehm: “How Much Software Quality Investment Is Enough: A Value-Based Approach”, IEEE Software, (23)5, pp. 88-95, Sept.-Oct. 2006.
 S. Biffl, A.Aurum, B. Boehm, H. Erdogmus, P. Grünbacher: Value-Based Software Engineering, Springer, Berlin Heidelberg, 2006.
 Basili, V.R., Selby, R.W.: “Comparing the Effectiveness of Software Testing Strategies”, IEEE Transactions on Software Engineering, (13)12, pp. 1278 – 1296, Dec. 1987.
 Shull, F., Rus, I., Basili, V.: How perspective-based reading can improve requirements inspections, IEEE Computer, (33)7, pp. 73-79, July 2000.
 Stefan Wagner: “A literature survey of the quality economics of defect-detection techniques”, Proceedings of the 2006 ACM/IEEE international symposium on International symposium on empirical software engineering, Rio de Janeiro, Brazil, September 21-22, 2006.
 Ciolkowski, M, Laitenberger, O, Biffl, S.: “Software reviews, the state of the practice”, IEEE Software, (20)6, pp. 46-51, Nov.-Dec. 2003.
 M. Rausand: Risikoanalyse, Tapir Forlag, Trondheim, 1991.
 J. A. Børretzen; T. Stålhane; T. Lauritsen; P. T. Myhrer, “Safety activities during early software project phases,” Proceedings of the Norwegian Informatics Conference, pp. 180-191, Stavanger, Norway, 2004.
 R. Chillarege, I.S. Bhandari, J.K. Chaar, M.J. Halliday, D.S. Moebus, B.K. Ray, M.-Y.
Wong: “Orthogonal defect classification-a concept for in-process measurements”, IEEE Transactions on Software Engineering, (18)11, pp. 943-956, Nov. 1992.
 K. El Emam, I. Wieczorek: “The repeatability of code defect classifications”, Proceedings of The Ninth International Symposium on Software Reliability Engineering, pp. 322-333, 4-7 Nov. 1998.
 J. Zheng, L. Williams, N. Nagappan, W. Snipes, J.P. Hudepohl, M.A. Vouk: “On the value of static analysis for fault detection in software”, IEEE Transactions on Software Engineering, (32)4, pp. 240-253, April 2006.
 Oliver Laitenberger, Sira Vegas, Marcus Ciolkowski: “The State of the Practice of Review and Inspection Technologies in Germany”, Technical Report ViSEK/010/E, ViSEK, 2002.