«Software Fault Reporting Processes in Business-Critical Systems Jon Arvid Børretzen Doctoral Thesis Submitted for the partial fulfilment of the ...»
• Proposals on adaptation of methods from safety development for businesscritical system development.
• Guides and advice on business-critical system development.
• Tools supporting development of business-critical systems.
• Investigations on use of component-based development in the development of business-critical systems.
References [Adelard98] “ASCAD, Adelard Safety Case Development Manual”, Published 1998 by Adelard.
[Bishop98] P.G. Bishop, R.E. Bloomfield, "A Methodology for Safety Case Development", Safety-critical Systems Symposium (SSS 98), Birmingham, UK, Feb, 1998.
[Kroll03] P. Kroll, P. Krutchen, The Rational Unified Process Made Easy: A Practitioner's Guide to Rational Unified Process, Addison Wesley, Boston, 2003, ISBN: 0-321-16609-4.
[Krutchen00] P. Krutchen, The Rational Unified Process: An Introduction (2nd Edition), Addison Wesley, Boston, 2000, ISBN: 0-201-70710-1.
[Leveson95] N.G. Leveson, Safeware: System safety and computers, Addison Wesley, USA, 1995, ISBN: 0-201-11972-2.
[Leveson00] N.G Leveson, “Intent specifications: an approach to building human-centered specifications”, IEEE Transactions on Software Engineering, Volume: 26, Issue: 1, Jan. 2000, Pages:15 – 35.
[Rational] Rational Software, http://www.rational.com [Rausand91] M. Rausand, Risikoanalyse, Tapir Forlag, Trondheim, 1991, ISBN: 82-519-0970Stålhane03] T. Stålhane, T. Lauritsen, P.T. Myhrer, J.A. Børretzen, BUCS rapport - Intervju med utvalgte norske bedrifter omkring utvikling av forretningskritiske systemer, October 2003, available from: http://www.idi.ntnu.no/grupper/su/bucs/files/BUCS-rapport-h03.doc
Abstract. Faults introduced into systems during development are costly to fix, and especially so for business-critical systems. These systems are developed using common development practices, but have high requirements for dependability. This paper reports on an ongoing investigation of fault reports from Norwegian IT companies, where the aim is to seek a better understanding on faults that have been found during development and how this may affect the quality of the system. Our objective in this paper is to investigate the fault profiles of four business-critical commercial projects to explore if there are differences in the way faults appear in different systems. We have conducted an empirical study by collecting fault reports from several industrial projects, comparing findings from projects where components and reuse have been core strategies with more traditional development projects. Findings show that some specific fault types are generally dominant across reports from all projects, and that some fault types are rated as more severe than others.
Producing high quality software is an important goal for most software developers. The notion of software quality is not trivial, different stakeholders will have different views on what software quality is. In the Business-Critical Software (BUCS) project  we are seeking to develop a set of methods to improve support for analysis, development, operation, and maintenance of business-critical systems. These are systems that we expect and hope will run correctly because of the possibly severe effects of failure, even if the consequences are mainly of an economic nature. In these systems, software quality is important, and the main target for developers will be to make systems that operate correctly all the time . One important issue in developing these kinds of systems is to remove any possible causes for failure, which may lead to wrong operation of the system.
The study presented here investigated fault reports from two software projects using components and reuse strategies, and two projects using a more traditional development process. It compares the fault profiles of the reuse-intensive projects with the other two, in several dimensions; Fault type, fault severity and location of fault.
2. Previous studies on software faults and fault implications Software quality is a notion that encompasses a great number of attributes. When speaking about business-critical systems, the critical quality attribute is often experienced as the dependability of the system. According to Littlewood et al. , dependability is a software quality attribute that encompasses several other attributes, the most important are reliability, availability, safety and security.
Faults in the software lessen the software’s quality, and by reducing the number of faults introduced during development you can improve the quality of software. Faults are potential flaws in a software system, that later may be activated to produce an error.
An error is the execution of a fault, leading to a failure. A failure results in erroneous external behaviour, system state or data state. Remedies known for errors and failures are to limit the consequences of a failure, in order to resume service, but studies have shown that this kind of late protection is more expensive than removing the faults before they are introduced into the code . Faults are also known as defects or bugs, and a more extensive concept is anomalies, which is used in the IEEE 1044 standard . Orthogonal Defect Classification – ODC – is a way of studying defects in software systems [5, 6, 7, 8]. ODC is a scheme to capture the semantics of each software fault quickly.
It has been debated if faults can be tied to reliability in a cause-effect relationship. Some papers like [6, 8] indicate that this is valid, while others like  are more critical. Still, reducing the number of faults will make the system less prone to failure, so by removing faults without adding new ones, there is a good case for the system reliability increasing. This is called “reliability-growth models”, and is discussed by Hamlet in .
Avizienis et al. states  that fault prevention aim to provide the ability to deliver a service that can be trusted. Hence, preventing faults and reducing their numbers and severity in a system, the quality of the system can be improved in the area of dependability.
3. Research design
Research questions. Initially we want to find which types of faults that are most
frequent, and if there are some parts of the systems that have more faults than others:
RQ1: Which types of faults are most typical for the different software parts?
When we know which types of faults dominate and where these faults appear in the systems, we can choose to concentrate on the most serious ones in order to identify the
most important issues to target in improvement work:
RQ2: Are certain types of faults considered to be more severe than others by the developers?
Research method. This study is based on data mining, where the data consists of fault reports we have received from four commercial projects. The investigation has mostly been a bottom-up process, because of the initial uncertainty about the available data from potential participants. After establishing a dialogue with the projects, and acquiring the fault reports, our initial research questions and goals were altered accordingly.
The metrics used. The metrics have been chosen based on what we wanted to investigate and on what data turned out to be available from the projects participating in the study. The frequency number of detected faults is an indirect metric, attained by counting the number of faults of a type or for a system part etc. The metrics used directly from the data in the reports are type, severity and location of the fault.
3.1 Fault categories
Note that projects C and D have been developed using modern practices, including component-based development, while projects A and B have been developed using more traditional development practices.
4. Research results RQ1 – Which types of faults are most typical?
To answer RQ1, we look at the distribution of the fault type categories for the projects, shown in Table 3. For projects C and D, we see that functional logic faults are dominant, with 49% and 58% of the faults for those projects. Functional logic faults are also a large part of the faults in projects A and B.
In the same manner, the distribution of faults with a severity rating of “high” is shown in Table 4. Functional logic faults are still dominant in projects C and D, with 45% and 69% of the faults, respectively. Project A is a special case here, as only one single fault was reported to be of high severity.
RQ2 – Are certain types of faults considered to be more severe?
To answer RQ2, we need to look at the number of “high” severity rated faults for different fault categories. Figure 1 shows the percentage of high severity faults found in some fault categories for three of the projects. Project A is left out because of having only one high severity fault reported.
From Figure 1, we see that some fault types seem to be judged as more severe than others. In the projects that do report them, “Memory fault” stands out as a high severity type of fault. For Projects C and D, “GUI faults” are not judged to be very severe, while Project B rates them in line with other fault types. We also see that Project B has generally rated more of their faults as being highly severe than Projects C and D.
5. Discussion A major issue when doing the analysis of the data collected was the heterogeneity of the data. These are four different companies where data collection has not been coordinated beforehand, and as each company used their own proprietary fault report system, no standards for reporting was followed. Another issue was cases of missing data in reports, e.g. missing information about fault location. Because the reports have been used for development rather than for research purposes, the developers have not always entered all data into the reports. A final issue was incompatibility between fault reports for one of the projects and other information concerning the project. No satisfactory link between the functional and structural modules was available in project D. This prevented us from separating the reused parts from the rest of the system, and hindered a valid study of comparing reused to non-reused system parts at this time.
Concerning validity, the most serious threats to external validity are the small number of projects under investigation and that the chosen projects may also not necessarily be the most typical. As for conclusion validity, one possible threat is low reliability of measures, because of some missing data or parts of the data.
6. Conclusion and future work
This paper has presented some preliminary results of an investigation on fault reports in
industrial projects. The results answer our two questions:
RQ1: Which types of faults are most typical for the different software parts? -Looking at all faults in all projects, “functional logic” faults were the dominant fault type. For high severity faults, “functional logic” and “functional state” faults were dominant.
RQ2: Are certain types of faults considered to be more severe than others? -We have seen that some fault types are rated more severe than others, for instance “Memory fault”, while the fault type “GUI fault” was rated as less severe for the two projects employing reuse in development.
Results from this study are preliminary, and the next step is to focus on the differences between reuse-based development projects and non-reuse projects. We will also try to incorporate fault report data from 2-3 other projects into the investigation in order to increase the validity of the study.
Later, the BUCS project wants to focus on the most typical and serious faults, and describe how we can identify and prevent these at an earlier development stage. This may be in the form of a checklist for some hazard analysis scheme.
1. J. A. Børretzen; T. Stålhane; T. Lauritsen; P. T. Myhrer, “Safety activities during early software project phases”. Proceedings, Norwegian Informatics Conference, 2004
2. B. Littlewood; L. Strigini, “Software reliability and dependability: a roadmap”, Proceedings
of the Conference on The Future of Software Engineering, Limerick, Ireland, 2000, Pages:
175 - 188
3. N. Leveson, Safeware: System safety and computers, Addison-Wesley, Boston, 1995
4. IEEE Standard Classification for Software Anomalies, IEEE Std 1044-1993, December 2,
5. K. Bassin; P. Santhanam, “Managing the maintenance of ported, outsourced, and legacy software via orthogonal defect classification”, Proceedings. IEEE International Conference on Software Maintenance, 2001, 7-9 Nov. 2001
6. K. El Emam; I. Wieczorek, “The repeatability of code defect classifications”, Proceedings.
The Ninth International Symposium on Software Reliability Engineering, 1998, 4-7 Nov.
1998 Page(s):322 – 333
7. R. Chillarege; I.S. Bhandari; J.K. Chaar; M.J. Halliday; D.S. Moebus; B.K. Ray; M.-Y.
Wong, “Orthogonal defect classification-a concept for in-process measurements”, IEEE Transactions on Software Engineering, Volume 18, Issue 11, Nov. 1992 Page(s):943 - 956
8. R.R. Lutz; I.C. Mikulski, “Empirical analysis of safety-critical anomalies during operations”, IEEE Transactions on Software Engineering, 30(3):172-180, March 2004
9. D. Hamlet, “What is software reliability?”, Proceedings of the Ninth Annual Conference on Computer Assurance, 1994. COMPASS '94 'Safety, Reliability, Fault Tolerance, Concurrency and Real Time, Security', 27 June-1 July 1994 Page(s):169 - 170
10. A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr; Basic Concepts and Taxonomy of Dependable and Secure Computing, IEEE Transactions on Dependable and Secure Computing, vol. 1, no. 1, January-March 2004