«Software Fault Reporting Processes in Business-Critical Systems Jon Arvid Børretzen Doctoral Thesis Submitted for the partial fulfilment of the ...»
Still, we choose to refer to a report of a software failure as a fault report, even if the fault has not yet been identified, since it is stored with the other fault reports, and work is usually being done to identify the fault that caused the failure.
2.4.1 Reflection and challenges
As stated in Section 1.1, the terminology from the literature, although clear and concise in each individual field and source, gets confusing and conflicting when you compare definitions. In our work, we have not tried to redefine the terms and definitions to make them smoothly fit together, we merely want to explain some of our understanding about faults and fault reporting, to the degree it is relevant for the thesis.
We still see a need for work unifying concepts, especially in the reliability area. There is great diversity in the literature on the terminology used to report software or system related problems. The possible differences between problems, troubles, bugs, anomalies, defects, errors, faults or failures are discussed in books (e.g., [Fenton97]), standards and classification schemes such as IEEE Std. 1044-1993 [IEEE 1044] the United Kingdom Software Metrics Association (UKSMA)’s scheme [UKSMA], and papers; e.g., [Freimut01]. Until there is agreement on the terminology used in reporting problems, we must be aware of these differences and answer the above questions when using a term.
2.5 Current methods and practices 2.5.1 General software engineering paradigms In software engineering there have been many different paradigms or life-cycle models.
The most common and well known paradigms are presented in the following.
The traditional software process (waterfall): The waterfall model was the first widely used software development model. It was first proposed in 1970 by W. W. Royce [Royce70], in which software development is seen as flowing steadily through the phases of requirements analysis, design, implementation, testing (validation), integration and maintenance. In the original article, Royce advocated using the model repeatedly, in an iterative way. However, many people do not know that, and some have unjustly discredited this paradigm for real use. In practice, the process rarely proceeds in a purely linear fashion. Iterations, by going back to or adapting results of previous stages, are common.
The spiral model: The spiral model was defined by Barry Boehm [Boehm88], and combines elements of both design and prototyping in stages, so it's a mix of top-down and bottom-up concepts. This model was not the first model to discuss iteration, but it was the first model to explain why iteration is important. As originally envisioned, the iterations were typically 6 months to 2 years long. This persisted until around 2000.
Increasingly, development has turned towards shorter iteration periods, because of higher time-to-market demand. In her doctoral thesis, Parastoo Mohagheghi reports iterations of 2-3 months being common [Mohagheghi04b].
Prototyping, iterative and incremental development: The prototyping model is a software development process that starts with (incomplete) requirements gathering, followed by prototyping and user evaluation. Often the customer/user may not be able to provide a complete set of application objectives, detailed input, processing, or output requirements at the start. After user evaluation, another prototype will be built based on feedback from users, and again the cycle returns to customer evaluation.
Agile methods: The benefits of agile methods for small teams working with rapidly changing requirements have been documented [Beck99]. However, both by proponents and critics, the applicability of agile methods to larger projects is hotly debated. Largescale projects, with high QA requirements, have traditionally been seen as the homeground for plan-driven software development methods. Deciding when to use agile methods also depends on the values and principles that a developer wishes to be reflected in her/his work. Extreme Programming (XP) [Beck99], one of the more popular of the agile methods, is explicit in its demand for developers to follow a "code of software conduct" that transmits these values and principles to the project at-hand. In keeping with the philosophy of agile methods, there is no rigid structure defining when to use any particular feature of these approaches(!).
2.5.2 Software Reuse
Reuse in software development is a term describing development that includes systematic activities for creation and later incorporation ("reuse") of common, domainspecific artifacts. Reuse can lead to profound technological, practical, economic, and legal obstacles, but the benefits may be substantial. It mostly concerns program artifacts in the form of components. In the SEI’s report [Bachmann00] on technical aspects of
CBD, a component is defined as:
• An opaque implementation of functionality.
• Subject to third-party composition.
• Conformant to component model.
Software development that systematically develops domain-specific and generalized software artifacts for possible, later reuse is called software development for reuse.
Software development that systematically makes use of such pre-made, reusable artefacts, is called software development with reuse.
Component-based software engineering (CBSE) Component-based software engineering is a field of study within software engineering, building on prior theories of software objects, software architectures, software frameworks and software design patterns, and on extensive theory of object-oriented (OO) programming and design of all these. It claims that software components, like the idea of a hardware component used e.g. in telecommunication systems, can be ultimately made interchangeable and reliable. CBSE is often said to be mostly software development with reuse, and with emphasis on reusing components developed outside the actual project.
Commercial Off-The-Shelf (COTS) COTS components are external executable software components being sold, leased, or licensed to the general public; offered by a vendor trying to profit from it; supported and evolved by the vendor, and used by the customers normally without source code access or modification ("black box"). Different ways of incorporating COTS-based activities is described by Li et al. in [Li06].
Open Source Software (OSS) Open Source Software is software released following the principles of the open source movement. In particular, it must be released under an Open Source license as defined by the Open Source Definition, with there being over 50 license types. The Open Source movement is a result of the free software movement, that advocates the term "Open Source Software" as an alternative term for free software, and primarily makes its arguments on pragmatical rather than philosophical grounds. Nearly all Open Source Software is also "Free Software". An OSS component is an external component for which the source code is available ("white box"), and the source code can be acquired either free of charge or for a nominal fee, and with a possible obligation to report back any changes done.
2.5.3 Specific software development methods The two following methods are well-known and commonly used in software development.
Rational Unified Process (RUP) The Rational Unified Process (RUP) is a software process, design and development method created by the Rational Software Corporation [Rational], and is described in [Kruchten00] and [Kroll03]. It describes how to effectively deploy software using commercially proven techniques. It is really a heavyweight process, and therefore particularly applicable to larger software development teams working on large projects.
It is essentially an incremental development process which centers on the Unified Modelling Language (UML) [Fowler04]. It divides a project into four distinct phases;
Inception, Elaboration, Construction and Transition. Figure 2-4 shows the overall architecture of the RUP.
Figure 2-4 The Rational Unified Process
Patterns and Architecture-driven methods Design patterns are recurring solutions to problems in object-oriented design. The
phrase was introduced to computer science in the 1990s by the text “Design Patterns:
elements of reusable object-oriented software” [Gamma95]. The scope of the term remained a matter of dispute into the next decade. Algorithms are not thought of as design patterns, since they solve implementation problems rather than design problems.
Typically, a design pattern is thought to encompass a tight interaction of a few classes and objects. Three major terms have been proposed: pattern languages, pattern catalogs and pattern systems [Riehle96].
The architect Christopher Alexander's work on a pattern language, for designing buildings and communities, was the inspiration for the design patterns of software [Price99]. Interest in sharing patterns in the software community has led to a number of books and symposia. The goal of the pattern literature is to make the experience of past designers accessible to beginners and others in the field. Design patterns thus presents different solutions in a common format, to provide a language for discussing design issues.
2.5.4 Techniques for increasing trust in software systems In addition to the general practices of QA and SPI for improving quality in software systems, there are also some specific verification techniques that are commonly used in software development to increase the trust in software. Software verification is a discipline whose goal is to assure that software fully satisfies all the expected
requirements, and the following are some well known techniques in use:
Testing: Dynamic verification is performed during the execution of software, and dynamically checks its behaviour; it is commonly known as testing. Testing is part of more or less all software development processes, and can be performed at many levels, for instance unit level, interface level and system level.
Inspections: An inspection is also a very common sort of review used in software development projects. The goal of the inspection is for all of the inspectors to reach consensus on a work product and approve it for use in the project. Commonly inspected work products include software requirements specifications, design documentation and test plans. In an inspection, a work product is selected for review and a team is gathered for an inspection meeting to review the work product. In an inspection, a defect is any part of the work product that will keep an inspector from approving it. For example, if the team is inspecting a software requirements specification, each defect will be text in the document which an inspector disagrees with. Basili et al. describes an investigation of an inspection technique called perspective-based testing in [Basili00].
Formal methods: Formal methods are mathematically-based techniques for the specification, development and verification of software and hardware systems. The use of formal methods for software and hardware design is motivated by the expectation that, as in other engineering disciplines, performing appropriate mathematical analyses can contribute to the reliability and robustness of a design. However, the high cost of using formal methods means that they are usually only used in the development of highintegrity systems, where safety or security is important. Heimdahl and Heitmayer present some issues concerning formal methods in [Heimdahl98].
2.5.5 Business-Critical computing and related terms
At first glance, there is little evidence of work on business-critical computing, when searching the literature. The term “mission-critical” is much more commonly used, and can be interpreted to include many of the characteristics of “business-critical”. The key similarity is that both terms are related to the core activity of an organization, and that the computer systems supporting this activity should not fail. Another term that comes from Software Engineering Institute (SEI) is “performance-critical” [SEI], and has much of the same meaning as “business-critical”.
“Safety-critical” systems are closely connected to these former terms, but this term has a more severe meaning. Nonetheless, most of the main characteristics of these terms are the same; i.e. that reliability, availability and similar quality attributes are deemed very important. Safety-critical systems have been much more thoroughly researched than the other types of “-critical” systems, simply because of the seriousness of failure and the potential effects of failure in safety-critical systems.
2.6 Business-critical software
As mentioned, our societies’ dependency on timely and well-functioning software systems is increasing. Banking systems, train control systems, airport landing systems, automatic teller machines and industrial process control systems are but examples of the systems many of us are directly or indirectly critically dependent on. Of these, some are highly critical to our safety (e.g. traffic control), while others are critical only in the sense that we are able to perform operations that we want or need to carry out our work/business (for instance cinema ticket sales).
That a software-intensive system is business-critical means that:
If and when a system failure occurs, the consequences are restricted to financial or financially related negative implications, not including physical harm to humans, animals or physical objects. The consequences are severe enough to mean a considerable loss of money if the fault or failure is not corrected or averted swiftly enough.
2.6.1 Criticality definitions
Business-critical software systems have a lot in common with safety-critical systems, but there are also quite telling differences. A simplistic way to distinguish them is to put them into classes according to the effects that software anomalies (faults or hazards) may have on the environment. The classes are safety-critical, mission-critical,
performance-critical, business-critical, and non-critical: