Introduction thetransfer ofthe concepts offault tolerance to comlputersoftware, that is discussed in this paper, began about20yearsafterthe first systematicdiscussionoffault. The adoption of software fault tolerance techniques based on design diversity has been advocated as a means of coping with residual software design faults in operational software lee and anderson. Hardware implemented fault tolerance design reduces operating system size, minimises systems software and increases processing speed, offering the end user the safest and simplest design. Assessment of data diversity methods for software fault tolerance. The root cause of software design errors is the complexity of the systems.
Its function is to prevent system accidents, and mask out faults if possible. Fault tolerance is the way in which an operating system os responds to a hardware or software failure. Most system designers go to great lengths to limit the impact of a hardware failure on system performance. Delivering full text access to the worlds highest quality technical literature in engineering and technology. We have several software fault tolerance schemes as proposed in 46,47,48,49,50 are based on software design diversity in order to tolerate software design bugs.
This section covers faulttolerant design principles and guidelines. We aim to support the software architect in the design of faulttolerant. If design fault detection is required, design diversity in the software has to be used, too. Software fault tolerance during the development of software, it is infeasible to find all its bugs, which can reach as far back as the design phase. Bridging fault tolerance and game theory for assuring. Designing faulttolerant soa based on design diversity. The term essentially refers to a systems ability to allow for failures or malfunctions, and this ability may be provided by software, hardware or a combination of both. Researchers agree that all software faults are design faults.
Software fault tolerance techniques are employed during the procurement, or development, of the software. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. Each channel is designed to provide the same function, and a method is provided to identify if one channel deviates unacceptably from the others. To tolerate faults, both of these techniques rely on design diversity, i. They include the recovery block scheme rbs programming, consensus recovery block programming, nversion programming nvp, n selfchecking programming nscp and data diversity. As digital computers developed, diversity became a key consideration when seeking fault tolerance, and throughout the history of computers, fault tolerance was often coupled with diversity for added assurance to computing 23. In order to prevent software failure caused by unpredicted conditions, different programs alternative programs are developed.
Design diversity is the generation of different implementations. Design philosophy the basic principle of a failsafe design is to identify the fault and mask its effect until recovery measures are taken. This course has been developed by the centre for software reliability with funding from the engineering and physical sciences research council grant number 00711eng95 as part of their. Sc high integrity system university of applied sciences, frankfurt am main 2.
The cost of softwarefault tolerance fault tolerance introduces additional costs. Read reports about fault tolerance systems and other exceptional papers on every subject and topic college can throw at you. Software designers or system integrators who want an introduction to the problems found in designing for fault tolerance and to the range of design solutions. The application of compiletime reflection to software. Software fault tolerance is basically the design faults in the computer system. Design fault tolerance by means of design diversity is a concept that traces back to the very early age of informatics. These principles deal with desktop, server applications andor soa. This chapter concentrates on software fault tolerance based on design diversity. The two bestknown methods of building faulttolerant software are nversion programming 3 and recovery blocks 7. Systematic and design diversity software techniques for. Fault elimination and fault prevention are parts of fault avoidance. An introduction to software engineering and fault tolerance. Failures of versions for each component are statistically independent.
Architecture and software fault tolerant technology. Fault tolerance the goal of fault tolerance methods is to include safety features in the software design or source code to ensure that the software will respond correctly to input data errors and prevent output and control errors software faults are what we commonly call bugs. As more and more complex systems get designed and built, especially safety critical systems, software fault tolerance and the next generation of hardware fault tolerance will need to evolve to be able to solve the design fault problem. Cis 573 fall 2012 information and computer science. Saluja, a survey of software fault tolerance techniques what is meant by design diversity. In order to complement design diversity in the quest for faulttolerance software, there exits several data diversity techniques which are similar to the aforementioned for the design diversity approach. Since design diversity affects costs dif ferently according to the lifecycle phases, we start with cost distribution among the various lifecycle activities for classical, nonfaulttolerant, soft ware. Despite more and more improvements in fault preventing techniques, it is a fact that faults remain in every complex software system.
Therefore faulttolerance is achieved by using diversity in the data space. Design diversity is the generation of different implementations codes from a. Software fault tolerance, audits, rollback, exception handling. This is certainly more true of software systems than almost any phenomenon, not all software change in the same way so software fault tolerance methods are designed to overcome execution errors by modifying variable values to create an acceptable program state. The multiple facets of software diversity archive ouverte hal. Therefore, it is reasonable to deal with the remaining software faults bugs during runtime to increase the overall reliability. The software faults that are particularly significant in a realtime concurrent system are identified, and the use of design diversity to prevent their. Fault tolerant fail safe system for railway signalling. Fault tolerance through automated diversity in the.
For a redundant system to function properly in presence of a fault, the redundancy must be. Designing faulttolerant soa based on design diversity springerlink. Data diversity fault tolerance design the software ft architecture in this research uses dd, a complementary approach to design diversity. Abstractnowadays the reliability of software is often the main goal in the software development process. Data diversity n limitations of some design diverse techniques led to the development of data diverse software fault tolerance techniques n data diverse techniques are meant to complement, rather than replace, design diverse techniques n steps n obtain a related set of points in the program data space, executing the same software on those points.
Fault avoidance and the development of faultfree software relies on. In ftft we used fault tolerance and diversity to address the more contemporary concern of cyber defense. Design diversity is the provision of software components called variants, which have the same or an equivalent specification but with different. Basic fault tolerant software techniques geeksforgeeks. Definition and analysis of hardware and softwarefault.
It would be very difficult to sum it up in one article since there are multiple ways to achieve fault tolerance in software. In order to complement design diversity in the quest for fault tolerance software, there exits several data diversity techniques which are similar to the aforementioned for the design diversity approach. An approach called design diversity combines hardware and software faulttolerance by implementing a faulttolerant computer system using different hardware and software in redundant channels. Software engineers assume that the different implementations use different designs. Hardware fault tolerance is based on replication, on the grounds that the hardware may eventually wear out but does not contain permanent design flaws. The versions are used as alternatives with a separate means of.
It is a way of handling unknown and unpredictable software and hardware failures. Software fault tolerance is a necessary part of a system with high reliability. Independent of the software used to increase availability, a system should be redundantly cabled, preferably at both the board level and the link level. Levitin presents a detailed reliability and performance analysis of a class of faulttolerant programs, i.
When a fault occurs, these techniques provide mechanisms to. Mutation testing, software fault tolerance, systematic failures. Fault tolerant software architecture stack overflow. Software fault tolerance carnegie mellon university. Software fault tolerance is an immature area of research. Over recent years, software developers have been evaluating the benefits of both serviceoriented architecture and software fault tolerance techniques based on design diversity by. Dd has been said to be orthogonal to design diversity 8. A characteristic of the software fault tolerance techniques is that they can, in principle, be applied at any level in a.
Kellyspecification of fault tolerant multi version software. To handle faults gracefully, some computer systems have two or more. Buy only what you need wide range of configurable, fault tolerant, multi function io modules to suit most applications. We suggest the combined utilization of so called systematic diversity and design diversity in a timeredundant system instead of the structural redundant duplex system. Data diverse software fault tolerance techniques n complements design diversity by compensating for design diversity s limitations n involves obtaining a related set of points in the program data space, executing the same software on those points in the program data space, and then using a decision algorithm to determine the resulting output. Experimental studies of a design diversity approach tech. Therefore fault tolerance is achieved by using diversity in the data space. In previous work, we conducted a software project with realworld application for investigation on software testing and fault tolerance for design diversity. Implement a software fault tolerance scheme distributed or concurrent as a library framework for a programming language of your choice, or study a specific software fault tolerance scheme middleware or application using software fault tolerance e.
Reliability and fault correlation are two main concerns for design diversity, yet empirical data are limited in investigating these two. Redundancy alone does not guarantee fault tolerance. All the results of the paper seem to depend upon an early assumption. Also there are multiple methodologies, few of which we already follow without knowing.
Designfault tolerance by means of design diversity is a concept that traces back to the very early age of informatics. Fault tol erance is a function of computing systems that serves to as. Nversion programming nvp is one of the software fault tolerance techniques based on design. Software fault tolerance techniques are designed to allow a system to tolerate software faults that remain in the system after its development. Comments on reliability and performance analysis for. The assumption is the design diversity of software, which itself is difficult to achieve. In fact there exist sophisticated computing systems, designed for environments requiring nearcontinuous service, which contain ad hoc checks and checkpointing facilities that provide a measure of tolerance against some software errors as well as hardware failures 11. Index termsdesign diversity, fault tolerance, multiple computation, nversion programming, nversion software, software reliability, tolerance ofdesign faults. Mcq on software reliability in software engineering part1. Implementing design diversity to achieve fault tolerance ieee xplore. The need to control software fault is one of the most rising challenges facing.
1410 1383 680 318 1165 1505 1576 104 1004 678 456 1483 1031 1125 450 478 478 492 1488 281 1416 1362 347 1267 752 125 378 1620 1464 27 277 219 34 1348 338 721 489 786 238 82 730 32 1359 1412 482 303 1096 796 1157 757