Book contents
- Frontmatter
- Contents
- List of Figures
- Preface
- Acknowledgments
- Glossary of Notations
- 1 Introduction
- 2 Information Dispersal
- 3 Interconnection Networks
- 4 Introduction to Parallel Routing
- 5 Fault-Tolerant Routing Schemes and Analysis
- 6 Simulation of the PRAM
- 7 Asynchronism and Sensitivity
- 8 On-Line Maintenance
- 9 A Fault-Tolerant Parallel Computer
- Bibliography
- Index
8 - On-Line Maintenance
Published online by Cambridge University Press: 03 October 2009
- Frontmatter
- Contents
- List of Figures
- Preface
- Acknowledgments
- Glossary of Notations
- 1 Introduction
- 2 Information Dispersal
- 3 Interconnection Networks
- 4 Introduction to Parallel Routing
- 5 Fault-Tolerant Routing Schemes and Analysis
- 6 Simulation of the PRAM
- 7 Asynchronism and Sensitivity
- 8 On-Line Maintenance
- 9 A Fault-Tolerant Parallel Computer
- Bibliography
- Index
Summary
The living clockwork of the State must be repaired
while it is in motion, and here it is a case of
changing the wheels as they revolve.
—Friedrich SchillerWe demonstrate in this chapter that, if fsra is used, a constant fraction of the wires in the hypercube network can be disabled simultaneously without disrupting the ongoing computation or degrading the routing performance. This general result can lead to efficient on-line maintenance procedures. This seems to be the first time that the important issue of on-line maintenance is addressed analytically.
Introduction
The fact that hardware deteriorates and the demand that machine be more available to the user make the property of on-line maintenance without performance penalty a desirable design goal. For example, in the Tandem/16 computer system [173], modular design allows some components to be replaced on-line. Periodic maintenance of the hardware is also key to ensuring consistent system performance; without it, one cannot safely say a particular component retains roughly the same failure rate at different times.
In this chapter we address the issue of on-line wire maintenance on the hypercube network with FSRA as the routing algorithm. It is shown that the set of edges can be partitioned into a constant number, 352, of disjoint edge sets of roughly equal sizes such that the probability of unsuccessful routing is exponentially small if any edge set in the proposed partition is disabled (Theorem 8.3). That implies little performance penalty as re-routing is extremely unlikely. The partition is also easily and locally computable.
- Type
- Chapter
- Information
- Information Dispersal and Parallel Computation , pp. 123 - 129Publisher: Cambridge University PressPrint publication year: 1993