«Software Fault Reporting Processes in Business-Critical Systems Jon Arvid Børretzen Doctoral Thesis Submitted for the partial fulfilment of the ...»
That a system is business-safe does not mean that the system is error free. What it means is that the system will have a low probability of causing losses for the users. In this respect, the system characteristic is close to the term “safe”. This term is, however, wider, since it is concerned with all activities that can cause damage to people, equipment or the environment or severe economic losses. Just as with general safety, business-safety is not a characteristic of the system alone – it is a characteristic of the system’s interactions with its environment.
BUCS is considering two groups of stakeholders and wants to help them both.
• The customers and their users. They need methods that enables them to:
o Understand the dangers that can occur when they start to use the system as part of their business.
o Write or state requirements to the developers so that they can take care of the risks incurred when operating the system – product risk.
• The developers. They need help to implement the system so that:
o It is business-safe.
o They can create confidence by supporting their claims with analysis and documentation.
o It is possible to change the systems so that when the environment changes, the systems are still business-safe.
This will not make it cheaper to develop the system. It will, however, help the developers to build a business-safe system without large increases in the development costs.
Why should developing companies do something that costs extra – is this a smart business proposition? We definitively mean that the answer is “Yes”, and for the
• The only solution most companies have to offer to customers with business-safety concerns today is that the developers will be more careful and test more – which is not a good enough solution.
• By building a business-safe system the developers will help the customer achieve efficient operation of their business and thus build an image of a company that have their customers’ interest in focus. Applying new methods to increase the products’ business-safety must thus be viewed as an investment. The return on the investment will come as more business from large, important customers.
2. The Rational Unified Process
The Rational Unified Process (RUP) is a software engineering process. It provides a disciplined approach to assigning tasks and responsibilities within a development organization. Its goal is to ensure the production of high-quality software that meets the needs of its end users within a predictable schedule and budget.
RUP is developed and supported by Rational Software [Rational]. The framework is based on popular development methods used by leading actors in the software industry.
RUP consists of four phases; inception, elaboration, construction and transition. The BUCS project has identified the three first phases as most relevant to our work, and will make proposals for introduction of safety methods for these phases. In this paper, we will concentrate on the inception phase.
Figure 1 - Rational Unified Process; © IBM [Rational]
Figure 1 shows the overall architecture of the RUP, and its two dimensions:
• The horizontal axis which represents time and shows the lifecycle aspects of the process as it unfolds
• The vertical axis which represents disciplines and group activities to be performed in each phase.
The first dimension represents the dynamic aspect of the process as it is enacted, and is expressed in terms of phases, iterations, and milestones. The second dimension represents the static aspect of the process: how it is described in terms of process components, disciplines, activities, workflows, artefacts, and roles [Kroll03] [Krutchen00]. The graph shows how the emphasis varies over time. For example, in early iterations, we spend more time on requirements, and in later iterations we spend more time on implementation.
The ideas presented in this paper are valid even if the RUP process is not used. An iterative software development process will in most cases be quite similar to a RUP process in broad terms, with phases and where certain events, artefacts and actions exist.
Some companies also use other process frameworks that in principle differ from RUP mostly in name. Therefore, it is possible and beneficial to include and integrate the safety methods we propose into any iterative development process.
Early in a software development project, system requirements will always be on top of the agenda. In the same way as well thought-out plans are important for a system in general, well thought-out plans for system safety are important when trying to build a correctly functioning, safe system. Our goal is to introduce methods that are helpful for producing a safety requirements specification, which can largely be seen as one type of non-functional requirements. However, safety requirements also force us to include the system’s environment. In RUP, with its use-case driven approach, this process can be seen as analogous to the process of defining general non-functional requirements, since use-case driven processes are not well suited for non-functional requirements specification. Because the RUP process itself does not explicitly command safety requirements in the same way it does not command non-functional requirements, other methods have to be introduced for this purpose. On the other hand, the architecturecentric approach in RUP is helpful for producing non-functional requirements, as these requirements are strongly linked to a system’s architecture. Considerations about system architecture will therefore influence non-functional and safety requirements.
Although designing safety into the system from the beginning (upstream protection) may incur some design trade-offs, eliminating or controlling hazards may result in lower costs during both development and overall system lifetime, due to fewer delays and less need for redesign [Leveson95]. Working in the opposite direction, adding protection features to a completed design (downstream protection) may cut costs early in the design process, but will increase system costs, delays and risk to a much greater extent than the costs owing to early safety design.
The main goal of the inception phase is to achieve a common understanding among the stakeholders on the lifecycle objectives for the development project [Krutchen00]. You should decide exactly what to build, and from a financial perspective, whether you should start building it at all. Key functionality should be identified early. The inception phase is important, primarily for new development efforts, in which there are significant project risks which must be addressed before the project can proceed. The primary
objectives of the inception phase include (from [Kroll03] [Krutchen00]):
• Establishing the project's software scope and boundary conditions, including an operational vision, acceptance criteria and what is intended to be included in the product and what is not.
• Identifying the critical use cases of the system, the primary scenarios of operation that will drive the major design trade-offs. This also includes deciding which use cases that are the most critical ones.
• Exhibiting, and maybe demonstrating, at least one candidate architecture against some of the primary scenarios.
• Estimating the overall cost and schedule for the entire project (and more detailed estimates for the elaboration phase that will immediately follow).
• Assessing risks and the sources of unpredictability.
• Preparing the supporting environment for the project.
3. Safety methods introduced by BUCS
Early in a project’s life-cycle, many decisions have not yet been made, and we have to deal with a conceptual view or even just ideas for the forthcoming system. Therefore, much of the information we have to base our safety-related work on is at a conceptual level. The methods we can use will therefore be those that can use this kind of highlevel information, and the ones that are suited to the early phases of software development projects.
We have identified five safety methods that are suitable for the inception phase of a development project. Two of them, Safety Case and Intent Specification, are methods that are well suited for use throughout the development project [Adelard98] [Leveson00], as they focus on storing and combining information relevant to safety through the product’s life-cycle. The other three, Preliminary Hazard Analysis, Hazards and Operability Analysis and Event Tree Analysis are focused methods [Rausand91] [Leveson95], well suited for use in the inception phase, as they can be used on a project where many details are yet to be defined. In this paper, the Safety Case, Preliminary Hazard Analysis and Hazard and Operability Analysis methods are used as examples of how such methods can be used in a RUP context.
When introducing safety related development methods into an environment where the aim is to build a business-safe system, but not necessarily error-free and completely safe, we have to accept that usage of these methods will not be as stringent and effort demanding as in a safety-critical system. This entails that the safety methods used in business-critical system development will be adapted and simplified versions, in order to save time and resources.
3.1 Safety Case
A safety case is a documented body of evidence that provides a convincing and valid argument that a system is adequately safe for a given application in a given environment [Adelard98] [Bishop98]. The safety case method is a tool for managing safety claims, containing a reasoned argument that a system is or will be safe. It is manifested as a collection of data, metadata and logical arguments. The main elements of a safety case
are shown in Figure 2:
• Claims about a property of the system or a subsystem (Usually about safety requirements for the system)
• Evidence which is used as basis for the safety argument (Facts, assumptions, subclaims)
• Arguments linking the evidence to the claim
• Inference rules for the argument
The arguments can be:
• Deterministic: Application of predetermined rules to derive a true/false claim, or demonstration of a safety requirement.
• Probabilistic: Quantitative statistical reasoning, to establish a numerical level.
• Qualitative: Compliance with rules that have an indirect link to the desired attributes.
The safety case method can be used throughout a system’s life-cycle, and divides a project into four phases: Preliminary, Architectural, Implementation, and Operation and Installation. This is similar to the phases of RUP, and makes it reasonable to tie a preliminary safety case to the inception phase of a development project. The development of a safety case does not follow a simple step by step process. The main activities interact with each other and iterate as the design proceeds and as the level of detail in the system design increases. This also fits well with the RUP process.
The question the safety case documents will answer is in our case “How will we argue that this system can be trusted?” The safety case shows how safety requirements are decomposed and addressed, and will provide an appropriate answer to the question. The
characteristics of the safety case elements in the inception phase are:
1. Establish the system context, whether the safety case is for a complete system or a component within a system.
2. Establish safety requirements and attributes for the current level of the design, and how these requirements and attributes are related to the system’s safety analysis.
3. Define important operational requirements and constraints such as maintenance levels, time to repair and issues related to the operating environment.
3.2 Preliminary Hazard Analysis and Hazard and Operability Analysis
Preliminary Hazard Analysis (PHA) is used in the early life cycle stages to identify critical system functions and broad system hazards. The identified hazards are assessed and prioritized, and safety design criteria and requirements are identified. A PHA is started early in the concept exploration phase so that safety considerations are included in tradeoff studies and design alternatives. This process is iterative, with the PHA being updated as more information about the design is obtained and as changes are being made. The results serve as a baseline for later analysis and are used in developing system safety requirements and in the preparation of performance and design specifications. Since PHA starts at the concept formation stage of a project, little detail is available, and the assessments of hazard and risk levels are therefore qualitative. A PHA should be performed by a small group with good knowledge about the system specifications.
Both Preliminary Hazard Analysis and Hazard and Operability Analysis (HazOp) are performed to identify hazards and potential problems that the stakeholders see at the conceptual stage, and that could be created by the system after being put into operation.
A HazOp study is a more systematic analysis of how deviations from the design specifications in a system can arise, and whether these deviations can result in hazards.
Both analysis methods build on information that is available at an early stage of the project. This information can be used to reduce the severity or build safeguards against the effects of the identified hazards.