«Software Fault Reporting Processes in Business-Critical Systems Jon Arvid Børretzen Doctoral Thesis Submitted for the partial fulfilment of the ...»
Categories and Subject Descriptors D.2.8 [Software Engineering]: Metrics- product metrics, process metrics; D.2.4 [Software Engineering]: Software/Program Verification- reliability, validation.
General Terms Measurement, Reliability.
Keywords Quality, defect density, validity.
Data collected on defect or faults (or in general problems) are used in evaluating software quality in several empirical studies. For example, our review of extant literature on industrial software reuse experiments and case studies verified that problem-related measures were used in 70% of the reviewed papers to compare quality of reused software components versus the non-reused ones, or development with systematic reuse to development without it. However, the studies report several concerns using data from problem reports and we identified some common concerns as well. The purpose of this paper is to reflect over these concerns and generalize the experience, get feedback from other researchers on the problems in using problem reports, and how they are handled or should be handled.
In this paper, we use data from 6 large commercial systems all developed by the Norwegian industry. Although most quantitative results of the studies are already published [4, 12, 18], we felt that there is a need for summarizing the experience in using problem reports, identifying common questions and concerns, and raising the level of discussion by answering them. Examples from similar research are provided to further illustrate the points. The main goal is to improve the quality of future research on product or process quality using problem reports.
The remainder of this paper is organized as follows. Section 2 partly builds on work of others; e.g.,  has integrated IEEE standards with the Software Engineering Institute (SEI)’s framework and knowledge from four industrial companies to build an entityrelationship model of problem report concepts, and  has compared some attributes of a number of problem classification schemes (the Orthogonal Defect ClassificationODC , the IEEE Standard Classification for Software Anomalies (IEEE Std. 1044and a classification used by Hewlett-Packard). We have identified three dimensions that may be used to clarify the vagueness in defining and applying terms such as problem, anomaly, failure, fault or defect. In Section 3 we discuss why analyzing data from problem reports is interesting for quality assessment and who the users of such data are. Section 4 discusses practical problems in defining goals and metrics, collecting and analyzing data, and reporting the results through some examples.
Finally, Section 5 contains discussion and conclusion.
There is great diversity in the literature on the terminology used to report software or system related problems. The possible differences between problems, troubles, bugs, anomalies, defects, errors, faults or failures are discussed in books (e.g., ), standards and classification schemes such as IEEE Std. 1044-1993, IEEE Std. 982.1-1988 and 982.2-1988 , the United Kingdom Software Metrics Association (UKSMA)’s scheme  and the SEI’s scheme , and papers; e.g., [2, 9, 14]. The intention of this section is not to provide a comparison and draw conclusions, but to classify differences and discuss the practical impacts for research. We have identified the following three questions that should be answered to distinguish the above terms from one another, and
call these as problem dimensions:
What- appearance or cause: The terms may be used for manifestation of a problem (e.g., to users or testers), its actual cause or the human encounter with software. While there is consensus on “failure” as the manifestation of a problem and “fault” as its cause, other terms are used interchangeably. For example, “error” is sometimes used for the execution of a passive fault, and sometimes for the human encounter with software . Fenton uses “defect” collectively for faults and failures , while Kajko-Mattson defines “defect” as a particular class of cause that is related to software .
Where- Software (executable or not) or system: The reported problem may be related to software or the whole system including system configuration, hardware or network problems, tools, misuse of system etc. Some definitions exclude non-software related problems while others include them. For example, the UKSMA’s defect classification scheme is designed for software-related problems, while SEI uses two terms: “defects” are related to the software under execution or examination, while “problems” may be caused by misunderstanding, misuse, hardware problems or a number of other factors that are not related to software. Software related problems may also be recorded for executable software or all types of artefacts: “Fault” is often used for an incorrect step, logic or data definition in a computer program (IEEE STd. 982.1-1998), while a “defect” or “anomaly”  may also be related to documentation, requirement specifications, test cases etc. In , problems are divided into static and dynamic ones (failures), where the dynamic ones are related to executable software.
When- detection phase: Sometimes problems are recorded in all life cycle phases, while in other cases they are recorded in later phases such as in system testing or later in field use. Fenton gives examples of when “defect” is used to refer to faults prior to coding , while according to IEEE STd. 982.1-1998, a “defect” may be found during early life cycle phases or in software mature for testing and operation [from 14]. SEI distinguishes the static finding mode which does not involve executing the software (e.g., reviews and inspections) from the dynamic one.
Until there is agreement on the terminology used in reporting problems, we must be aware of these differences and answer the above questions when using a term.
Some problem reporting systems cover enhancements in addition to corrective changes.
For example, an “anomaly” in IEEE Std. 1044-1993 may be a problem or an enhancement request, and the same is true for a “bug” as defined by OSS (Open Source Software) bug reporting tools such as Bugzilla  or Trac . An example of ambiguity in separating change categories is given by Ostrand et al. in their study of 17 releases of an AT&T system . In this case, there was generally no identification in the database of whether a change was initiated because of a fault, an enhancement, or some other reason such as a change in the specifications. The researches defined a rule of thumb that if only one or two files were changed by a modification request, then it was likely a fault, while if more than two files were affected, it was likely not a fault.
We have seen examples where minor enhancements were registered as problems to accelerate their implementation and major problems were classified as enhancement requests (S5 and S6 in Section 4).
In addition to the diversity in definitions of a problem, problem report fields such as Severity or Priority are also defined in multiple ways as discussed in Section 4.
3. QUALITY VIEWS AND DEFECT DATAIn this section, we use the term “problem report” to cover all recorded problems related to software or other parts of a system offering a service, executable or non-executable artefacts, and detected in phases specified by an organization, and a “defect” for the cause of a problem.
Kitchenham and Pfleeger refer to David Garvin’s study on quality in different application domains . It shows that quality is a complex and multifaceted concept that can be described from five perspectives: The user view (quality as fitness for
Figure 1. Quality views associated to defect data, and relations between them Q1.
Evaluating product quality from a user’s view. What truly represents software quality in the user’s view can be elusive. Nevertheless, the number and frequency of defects associated with a product (especially those reported during use) are inversely proportional to the quality of the product , or more specific to its reliability. Some problems are also more severe from the user’s point of view.
Q2.Evaluating product quality from the organization’s (developers’) view. Product quality can be studied from the organization’s view by assuming that improved internal quality indicators such as defect density will result in improved external behavior or quality in use . One example is the ISO 9126 definition of internal, external and quality-in-use metrics. Problem reports may be used to identify defectprone parts and take actions to correct them and prevent similar defects.
Q3.Evaluating software process quality. Problem reports may be used to identify when most defects are injected, e.g., in requirement analysis or coding. Efficiency of Verification and Validation (V&V) activities in identifying defects and the organization’s efficiency in removing such defects are also measurable by defining proper metrics of defect data .
Q4.Planning resources. Unsolved problems represent work to be done. Cost of rework is related to the efficiency of the organization to detect and solve defects and to the maintainability of software. A problem database may be used to evaluate whether the product is ready for roll-out, to follow project progress and to assign resources for maintenance and evolution.
Q5.Value-based decision support. There should be a trade-off between the cost of repairing a defect and its presumed customer value. Number of problems and criticality of them for users may also be used as a quality indicator for purchased or reused software.
We conclude that the contents of problem reports should be adjusted to quality views.
We discuss the problems we faced in our use of problem reports in the next section.
4. INDUSTRIAL CASES Ours and other’s experience from using problem reports in assessment, control or prediction of software quality (the three quality functions defined in ) shows problems in defining measurement goals and metrics, collecting data from problem reporting systems, analyzing data and finally reporting the results. An overview of our case studies is shown in Table 2.
4.1 Research Questions and Metrics
The most common purpose of a problem reporting system is to record problems and follow their status (maps to Q1, Q4 and Q5). However, as discussed in Section 3, they may be used for other views as well if proper data is collected. Sometimes quality views and measurement goals are defined top-down when initiating a measurement program (e.g., by using the Goal-Question-Metric paradigm ), while in most cases the topdown approach is followed by a bottom-up approach such as data-mining or Attribute Focusing (AF) to identify useful metrics when some data is available; e.g., [17, 19, 22].
We do not intend to focus on the goals more than what is already discussed in Section 3 and refer to literature on that. But we have encountered the same problem in several industrial cases which is the difficulty of collecting data across several tools to answer a single question. Our experience suggests that questions that need measures from different tools are difficult to answer unless effort is spent to integrate the tools or data.
− In S6, problems for systems not based on the reusable framework were not recorded in the same way as those based on it. Therefore it was not possible to evaluate whether defect density is improved or not by introducing a reusable framework .
− In S5, correction effort was recorded in an effort reporting tool and modified modules could be identified by analyzing change logs in the configuration management tool, without much interoperability between these tools and the problem reporting tool. This is observed in several studies. Although problem reporting systems often included fields for reporting correction effort and modifications, these data were not reliable or consistent with other data. Thus evaluating correction effort or the number of modified modules per defect or type of defect was not possible.
Graves gives another example on the difficulty of integrating data . The difference between two organizations’ problem reporting systems within the same company lead to a large discrepancy in the fault rates of modules developed by the two organizations because the international organization would report an average of four faults for a problem that would prompt one fault for the domestic organization.
To solve the problem, researchers often collect or mine industrial data, transform it and save it in a common database for further analysis. Examples are given in the next section.
4.2 Collecting and Analyzing Data
Four problems are discussed in this section:
1. Ambiguity in defining problem report fields even when the discussion on
terminology is settled. A good example is the impact of a problem:
− The impact of a problem on the reporter (user, customer, tester etc.) is called for Severity in , Criticality in  or even Product status in IEEE Std. 1044This field should be set when reporting a problem.