Trivedi KS, Grottke M, Lopez JA (2024)
Publication Type: Journal article
Publication year: 2024
Book Volume: 73
Pages Range: 67-72
Journal Issue: 1
Traditional software fault tolerance makes use of design-diversity-based redundancy. While proven to be effective, the independent development of multiple versions of a program or component is connected with high costs. This article shows that failures caused by so-called Mandelbugs (i.e., software faults whose activation and/or error propagation depends on the system environment) can often be treated by generating or forcing a new or modified execution environment. In the case of aging-related bugs, a subtype of Mandelbugs, failures can be postponed/prevented via a proactive technique known as software rejuvenation. Indeed, techniques based on environmental diversity, such as retry, reboot, or failover to an identical replica, are successfully used in practice. We discuss two such real-case examples, the IBM Session Initiation Protocol (SIP) Application Server cluster and Avaya gateway servers.
APA:
Trivedi, K.S., Grottke, M., & Lopez, J.A. (2024). Rethinking Software Fault Tolerance. IEEE Transactions on Reliability, 73(1), 67-72. https://dx.doi.org/10.1109/TR.2023.3330787
MLA:
Trivedi, Kishor S., Michael Grottke, and Javier Alonso Lopez. "Rethinking Software Fault Tolerance." IEEE Transactions on Reliability 73.1 (2024): 67-72.
BibTeX: Download