«Software Fault Reporting Processes in Business-Critical Systems Jon Arvid Børretzen Doctoral Thesis Submitted for the partial fulfilment of the ...»
Software Fault Reporting Processes in
Jon Arvid Børretzen
Submitted for the partial fulfilment of the requirements for the degree of
Department of Computer and Information Science
Faculty of Information Technology, Mathematics and Electrical Engineering
Norwegian University for Science and Technology
Copyright © 2007 Jon Arvid Børretzen
ISBN 82-471-xxxx-x (printed)
ISBN 82-471-xxxx-x (electronic)
NTNU 2007:xx (local report series) Printed in Norway by NTNU Trykk, Trondheim ii Abstract Today’s society is crucially dependent on software systems. The number of areas where functioning software is at the core of operation is growing steadily. Both financial systems and e-business systems relies on increasingly larger and more complex computer and software systems. To increase e.g. the reliability and performance of such systems we rely on a plethora of methods, techniques and processes specifically aimed at improving the development, operation and maintenance of such software. The BUCS project (BUsiness-Critical Systems) is seeking to develop and evaluate methods to improve the support for development, operation and maintenance of business-critical software and systems. Improving software processes relies on the ability to analyze previous projects and derive concrete improvement proposals. The research in this thesis is based on empirical studies performed in several Norwegian companies that develop business-critical software. The work specifically aims to assess the use of fault reporting approaches, and describe how improvement in this area can benefit process and product quality.
Some specific software methods will be adopted from safety-critical software engineering practices, while others will be taken from general software engineering.
Together they will be tuned and refined for this particular context. A specific goal in the BUCS project has been to facilitate the use of traditional Software Criticality Analysis techniques for the development of business-critical software. This encompasses techniques used to evaluate and explore potential risks and hazards in a system. The thesis describes six studies of software development technology for business-critical systems. The main goal is to attain a better understanding of business-critical systems, as well as to adapt and improve relevant methods and processes. Through data mining of historical software project data and other studies of relevant projects, we have gathered information to be evaluated with the goal of improving business-critical systems development. The BUCS projecthas been involved in investigation of development projects for business-critical systems, investigations that have been continued in the EVISOFT user-driven project. The main goal was to study the effects of revised development methods for business-critical software, in order to improve important quality aspects of these systems.
The main research questions in this work are:
• RQ1. What is the role of fault reporting in existing industrial software development?
• RQ2. How can we improve existing fault reporting processes?
• RQ3. What are the most common and severe fault types, and how can we reduce them in number and severity?
• RQ4. How can we use safety analysis techniques together with failure report analysis to improve the development process?
The main contributions of this thesis are:
• C1. Describing how to utilize safety criticality techniques to improve the development process for business-critical software.
• C2. Identification of typical shortcomings in fault reporting.
• C3. Improved model of fault origins and types for business-critical software.
The work has been performed at the Department of Computer and Information Science, NTNU, Trondheim, with Professor Reidar Conradi as the main advisor, and Professor Tor Stålhane and Professor Torbjørn Skramstad as co-advisors.
The thesis is part of the BUCS project (BUsiness-Critical Systems) and has been financed for three years by the Norwegian Research Council through the IKT’2010 basic IT Programme under NFR grant number 152923/V30. In addition comes one year as a teaching assistant paid by NTNU. The BUCS project has been lead by Professor Tor Stålhane. Some of the work in this thesis has also partly been financed by the EVISOFT user-driven R&D project under NFR grant number 174390/I40.
iv Acknowledgements During the work on this thesis, I have been lucky to been in contact with many people who have provided help, inspiration and motivation. First of all, I want to thank my supervisor, Professor Reidar Conradi, for giving valuable feedback and comments on many drafts and ideas during the last four years. Also, I want to thank Professor Tor Stålhane, my co-advisor, for being the source of a lot of good advice and many bad jokes. I also want to thank the present and former members of the software engineering group at IDI, NTNU for giving me a good working environment. A special thanks to my BUCS colleagues Torgrim Lauritsen and Per Trygve Myhrer for collaboration in our research and daily work.
Parts of the work for this thesis have been done in collaboration with people from several industrial organizations. I am very grateful to these companies and the people I have been in touch with from these organizations who have been helpful and accommodating when sharing their information and experience with me. Also I want to thank master student Jostein Dyre-Hansen for helping me analyze a great deal of data material.
Finally, I want to thank my family and friends for their encouragement and inspiration, and I would especially express my thanks to Ingvild for her love and patience.
v vi Table of contents 1 Introduction
1.2 Research Context
1.3 Research design
1.4 Research questions and contributions
1.5 Included research papers
1.6 Thesis structure
2.2 Software engineering
2.3 Software Quality
2.4 Anomalies: Faults, errors, failures and hazards
2.5 Current methods and practices
2.6 Business-critical software
2.6.1 Criticality definitions
2.7 Techniques and methods used to develop safety-critical systems
2.8 Empirical Software Engineering
2.9 Main challenges in business-critical software engineering
3 Research Context and Design
3.1 BUCS Context
3.2 Research Focus
3.3 Research approach and research design
3.4 Overview of the studies
4.1 Study 1: Preliminary Interviews with company representatives (used in P1)...... 43
4.2 Study 2: Combining safety methods in the BUCS project (Paper P1)................. 44
4.3 Study 3: Fault report analysis (Papers P2, P3, P5)
4.4 Study 4: Fault report analysis (Paper P4)
4.5 Study 5: Interviewing practitioners about fault management (Paper P6)............. 50
4.6 Study 6: Using hazard identification to identify faults (Paper P7)
4.7 Study 7: Experiences from fault report studies (Technical Report P8)................ 52 5 Evaluation and Discussion
5.2 Contribution of this thesis vs. literature
5.3 Revisiting the Thesis Research Questions, RQ1-RQ4
5.4 Evaluation of validity
5.5 Industrial relevance of results
5.6 Reflection: Research cooperation with industry
6 Conclusions and future work
vii 6.2 Future Work
Appendix A: Papers
Appendix B: Interview guide
ix x Abbreviations BUCS Business-Critical Software (project) CBD Component-Based Development CBSE Component-Based Software Engineering CCA Cause-Consequence Analysis COTS Commercial Off The Shelf DBMS Data Base Management System GQM Goal Question Metric GUI Graphical User Interface ETA Event Tree Analysis EVISOFT EVidence based Improvement of SOFTware engineering (project) FMEA Failure Mode and Effects Analysis FMECA Failure Mode Effects and Criticality Analysis FTA Fault Tree Analysis HAZOP Hazard and Operability Analysis IEEE Institute of Electrical and Electronics Engineers INCO Incremental and component-based software development (project) ISO International Organization for Standardization NFR Norwegian Research Council NS-ISO Norwegian Standard NTNU Norwegian University of Science and Technology OMG Object Management Group OS Operating System OSS Open Source Software PHA Preliminary Hazard Analysis QA Quality Assurance RUP Rational Unified Process (by Rational) SPI Software Process Improvement UML Unified Modelling Language (by Rational, later OMG) XP Extreme Programming
In this chapter the background and research context for this thesis is presented. The chapter also introduces the research design, the research questions and the contributions.
Finally, the list of papers and the outline of the thesis is presented.
1.1 Motivation The technological development in our society has lead to software systems being introduced into an increasing number of different business domains. In many of these areas we become more or less dependent on these systems, and their potential weaknesses could have grave consequences. In this respect, we can coarsely divide software products into three categories: safety-critical software (e.g. controlling traffic signals), business-critical software (e.g. for banking) and non-critical software (e.g. for word processing).
Evidently, the definition of business-critical versus the other two categories may be difficult to state precisely, and would in many cases depend on the particular viewpoint of the business and users. To clarify the distinction between business-critical and safetycritical, we can consider what consequences operation failure (observable and erroneous behaviour of the system compared to the requirements) will have in the two different cases. For safety-critical applications, the result of a failure could easily be a physical accident or an action leading to physical harm for one or more human beings. In the case of business-critical systems, the consequences of failures are not that grave, in the sense that accidents do not mean real physical damage, but that the negative implications may be of a more financial or trust-threatening nature.
Ian Sommerville states that business-criticality signifies the ability of core computer and other support systems of a business to have sufficient QoS to preserve the stability of the business [Sommerville04]. Thus business-critical systems are those whose failure could threaten the stability of a business.
The overall goal for the BUCS project is to better understand and thus sensibly improve software technologies, including processes used for developing business-critical software. In order to do this, empirical studies of projects have been performed in cooperation with Norwegian ICT industry.
Specific BUCS goals as presented in the BUCS project proposal [BUCS02] are the
BG1 To obtain a better understanding of the problems encountered by Norwegian industry during development, operation and maintenance of business-critical software.
BG2 Study the effects of introducing safety-critical methods and techniques into the development of business-critical software, to reduce the number of system failures (increased reliability).
BG3 Provide adapted and annotated methods and processes for development of business-critical software BG4 Package and disseminate the effective methods into Norwegian software industry.
In this thesis, we aim to study how software faults and software fault reporting practises affects business-critical software, and also if techniques (e.g. PHA, Hazop, etc) from the area of safety-critical systems development can have a positive effect on other quality attributes (e.g. reliability) than safety. The relation between faults and failures is explained in Section 2.4.
1.2 Research Context
This thesis is a part of the work done in the BUCS basic research and development project (BUsiness-Critical Software). The BUCS project was funded by the Norwegian Research Council as a basic R&D project in IT, and was run in 2003-2007. Some parts of the work in this thesis were also financed by the EVISOFT project, a national, userdriven R&D project on software process improvement funded by the Norwegian Research Council [EVISOFT06].
Within the BUCS project, this thesis will focus on fault reporting processes in businesscritical systems. Some important research issues we want to study are the following:
• How do software faults affect the reliability and safety of business-critical systems?
• What are the common fault types in business-critical systems?
• How can we use system safety methods in business-critical application development?
1.2.2 The BUCS project
The goal of the BUCS project is not to help developers to finish their development on schedule and budget. We are not particularly interested in the delivered functionality or how to identify or avoid process and project risk. This is not because we think that these properties are not important – it is just that we have defined them out of the BUCS project.
The goal of the BUCS project is to help developers, users and other stakeholders to develop software whose later use is less prone to critical problems, i.e. has sufficient reliability and safety. In a business environment this means that the system seldom behaves in such a way that it causes the customer or his users to lose money, important information, or both. We will use the term business-critical for this characteristic.
Another term is business-safe, which means that a system fulfils the criteria for business-safety in a business-critical system.