Optimal Fault-Tolerant Routing in Hypercubes Using Extended Safety Vectors

Similar documents
MANY experimental, and commercial multicomputers

Problem Set 3 Solutions

Parallelism for Nested Loops with Non-uniform and Flow Dependences

An Optimal Algorithm for Prufer Codes *

Constructing Minimum Connected Dominating Set: Algorithmic approach

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Related-Mode Attacks on CTR Encryption Mode

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm

CHAPTER 2 DECOMPOSITION OF GRAPHS

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

A Binarization Algorithm specialized on Document Images and Photos

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Routing in Degree-constrained FSO Mesh Networks

Bridges and cut-vertices of Intuitionistic Fuzzy Graph Structure

Load-Balanced Anycast Routing

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Load Balancing for Hex-Cell Interconnection Network

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Efficient Distributed File System (EDFS)

F Geometric Mean Graphs

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A New Approach For the Ranking of Fuzzy Sets With Different Heights

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Distributed Middlebox Placement Based on Potential Game

the nber of vertces n the graph. spannng tree T beng part of a par of maxmally dstant trees s called extremal. Extremal trees are useful n the mxed an

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Loop-Free Multipath Routing Using Generalized Diffusing Computations

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

CMPS 10 Introduction to Computer Science Lecture Notes

Cordial and 3-Equitable Labeling for Some Star Related Graphs

On Fault-Tolerant Embedding of Meshes and Tori in a Flexible Hypercube with Unbounded Expansion

A Topology-aware Random Walk

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Deadlock-Free Path-Based Fault-Tolerant Multicast Communications on 2-D Mesh Networks-on-Chip

Module Management Tool in Software Development Organizations

Transaction-Consistent Global Checkpoints in a Distributed Database System

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Simulation Based Analysis of FAST TCP using OMNET++

A Semi-Distributed Load Balancing Architecture and Algorithm for Heterogeneous Wireless Networks

Report on On-line Graph Coloring

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

All-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University

A Low-Overhead Routing Protocol for Ad Hoc Networks with selfish nodes

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Parallel matrix-vector multiplication

Minimum Cost Optimization of Multicast Wireless Networks with Network Coding

AADL : about scheduling analysis

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

A Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics

Solving two-person zero-sum game by Matlab

Cluster Analysis of Electrical Behavior

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Efficient Load-Balanced IP Routing Scheme Based on Shortest Paths in Hose Model. Eiji Oki May 28, 2009 The University of Electro-Communications

Analysis of Continuous Beams in General

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Connection-information-based connection rerouting for connection-oriented mobile communication networks

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Classifier Selection Based on Data Complexity Measures *

Reliable Unicasting in Faulty Hypercubes Using Safety Levels

Fast Computation of Shortest Path for Visiting Segments in the Plane

A fault tree analysis strategy using binary decision diagrams

Brave New World Pseudocode Reference

Querying by sketch geographical databases. Yu Han 1, a *

Programming in Fortran 90 : 2017/2018

LocalTree: An Efficient Algorithm for Mobile Peer-to-Peer Live Streaming

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Lecture 5: Multilayer Perceptrons

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Virtual Machine Migration based on Trust Measurement of Computer Node

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

The Shortest Path of Touring Lines given in the Plane

Cell Count Method on a Network with SANET

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Collaboratively Regularized Nearest Points for Set Based Recognition

S1 Note. Basis functions.

Meta-heuristics for Multidimensional Knapsack Problems

kccvoip.com basic voip training NAT/PAT extract 2008

DEAR: A DEVICE AND ENERGY AWARE ROUTING PROTOCOL FOR MOBILE AD HOC NETWORKS

Priority-Based Scheduling Algorithm for Downlink Traffics in IEEE Networks

Analysis on the Workspace of Six-degrees-of-freedom Industrial Robot Based on AutoCAD

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Efficient Content Distribution in Wireless P2P Networks

Optimal Workload-based Weighted Wavelet Synopses

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Routability Driven Modification Method of Monotonic Via Assignment for 2-layer Ball Grid Array Packages

On Some Entertaining Applications of the Concept of Set in Computer Science Course

Transcription:

Optmal Fault-Tolerant Routng n Hypercubes Usng Extended Safety Vectors Je Wu Department of Computer Scence and Engneerng Florda Atlantc Unversty Boca Raton, FL 3343 Feng Gao, Zhongcheng L, and Ynghua Mn Center for Fault-Tolerant Computng, CAD Lab Insttute of Computng Technology, Chnese Academy of Scences Bejng, P.R. Chna Abstract Relable communcaton n cube-based multcomputers usng the extended safety vector concept s studed n ths paper. In our approach, each node n a cube-based multcomputer of dmenson n s assocated wth an extended safety vector of n bts, whch s an approxmated measure of the number and dstrbuton of faults n the neghborhood. In the extended safety vector model, each node knows fault nformaton wthn dstance- and fault nformaton outsde dstance- s coded n a specal way based on the coded nformaton of ts neghbors. The extended safety vector of each node can be easly calculated through n, rounds of nformaton exchanges among neghborng nodes. Optmal uncastng between two nodes s guaranteed f the kth bt of the safety vector of the source node s one, where k s the Hammng dstance between the source and destnaton nodes. In addton, the extended safety vector can be used as a navgaton tool to drect a message to ts destnaton through a mnmal path. Smulaton results show a sgnfcant mprovement n terms of optmal routng capablty n a hypercube wth faulty lnks usng the proposed model, compared wth the one usng the orgnal safety vector model.. Introducton Many recent expermental and commercal multcomputers use drect-connected networks wth the grd topology. The bnary hypercube [8] s one of the popular grd structures. Several research prototypes and systems have been bult n the past two decades, ncludng NCUBE-, In- Ths work was supported n part by NSFC of P. R. Chna under grant No. 6970300 and by NSF grant CCR 9900646. tel PSC, and the Connecton Machne. The more recently bult SGI Orgn 000 uses a varaton of the hypercube topologes. Effcent nterprocessor communcaton s a key to the performance of a multcomputer. Uncastng s a one-toone communcaton between two nodes, one s called the source node and the other the destnaton node. Wth the rapd progress n VLSI and hardware technologes, the sze of computer systems has ncreased tremendously and the probablty of processor falure has also ncreased. As a result, buldng a relable multcomputer has become one of the central ssues, especally n the communcaton subsystem whch handles all nterprocessor communcatons. Among dfferent routng (uncast) schemes, the classcal e-cube routng s smple to mplement and provdes hgh throughput for unform traffc; however, t cannot handle even smple node or lnk faults due to ts nonadaptve routng. Adaptve and fault-tolerant routng protocols have been the subject of extensve research n recent years ([3], [4], [7]). A general theory of fault-tolerant routng s dscussed n []. Lmted-global-nformaton-based routng s a compromse between local-nformaton-based and globalnformaton-based approaches. A routng algorthm of ths type normally obtans an optmal or suboptmal soluton and requres a relatvely smple process to collect and mantan fault nformaton n the neghborhood (such nformaton s called lmted global nformaton). Therefore, an approach of ths type can be more cost effectve than the ones based on global nformaton [9] or local nformaton ([], [5]). One smple but neffectve approach s to use dstance-k nformaton n whch each node knows the status of all components wthn Hammng-dstance-k (or smple dstancek). However, optmalty cannot be guaranteed, as a routng process could possbly go to ether a state where all mn-

mal paths are blocked by faulty components or a dead end where backtrackng s requred. In addton, each node has to mantan a relatvely large table contanng dstance-k nformaton. Another approach s based on the coded fault nformaton, where each node has the exact nformaton of adjacent nodes and nformaton of other nodes are coded n a specal way. Then an optmal/suboptmal routng algorthm s proposed based on the coded nformaton assocated wth each node. The followng s a summary of dfferent codng methods n an n-cube, all of them are prmarly desgned to cover node faults. Lee and Hayes [6] safe and unsafe node concept. A nonfaulty node s unsafe f and only f there are at least two unsafe or faulty neghbors. Therefore, each node s labeled (coded) faulty, unsafe, or safe. Wu and Fernandez [] extended safe node concept by relaxng certan condtons of Lee and Hayes defnton. Each node s stll labeled faulty, unsafe, or safe. However, a dfferent defnton s gven: A nonfaulty node s unsafe f and only f there are two faulty neghbors or there are at least three unsafe or faulty neghbors. Wu s safety level [] concept where each node s assgned wth a safety level k, 0 k n. A node wth a safety level k = n s called safe and a faulty node s assgned wth the lowest level 0. Therefore, there are n +possble labels for a node n the safety level model. Wu s safety vector [0] concept where each node s assocated wth a bnary vector. The bt value of the kth bt corresponds to the routng capablty to nodes that are dstance-k away. The safety vector s a refnement of the safety level model. The effectveness of a codng method s measured by the followng: () How fast fault nformaton can be collected (coded) at each node. () How accurate the coded fault nformaton representng the real fault dstrbuton n terms of optmal routng capablty. Both safe/unsafe and extended safe/unsafe models requre O(n ) rounds of nformaton exchanges n an n-cube to label (code) all the nodes. Both safety level and safety vector need only O(n) rounds of nformaton exchanges. The order, n terms of accurately representng fault nformaton, s the followng: safe/unsafe, extended safe/unsafe, safety level, and safety vector. The safety vector s the latest model that has a mert of smplcty and wde-range of fault coverage. However, ths model s stll relatvely neffcent n handlng of lnk faults. Bascally, a lnk fault s consdered by other nodes as node fault(s) by treatng two end nodes of the lnk faulty. Each end node of a faulty lnk treats the other one faulty, but t does not consder tself faulty. Ths overly conservatve approach generates many faulty nodes that severely dmnshes the routng capablty of the system. In ths paper, we propose a new codng methodology. It s assumed that each node has precse nformaton of fault dstrbuton wthn a gven dstance d. Fault nformaton outsde dstance d s coded, lke the one used n safety vector. We select d =as an example and the correspondng model s called extended safety vector. Smulaton results show a sgnfcant mprovement usng the proposed model n terms of optmal routng capablty n a hypercube wth faulty lnks, compared wth the one usng the orgnal safety vector model. We also show that the selecton of d =s a rght one and ts results stay very close to the one usng global fault nformaton,.e., d = n.. Prelmnares Hypercubes. An n-cube (Q n ) s a graph havng n nodes labeled from 0 to n,. Two nodes are joned by a lnk f ther addresses, as bnary numbers, dffer n exactly one bt poston. More specfcally, every node u has an address u(n)u(n, ) u() wth u() f0,g, n, and u() s called the th bt (dmenson) of the address. We denote node u () the neghbor of u along dmenson. u () s calculated by settng or resettng the th bt of u. For example, 0 (3) = 00. Ths notaton can be used to set or reset the th bt of any bnary strng. A faulty n-cube ncludes faulty nodes and/or lnks. A faulty n-cube may or may not be connected dependng on the number and locaton of faults. A path connectng two nodes s and d s called a mnmal path (also called a Hammng dstance path) f ts length s equal to the Hammng dstance between these two nodes. An optmal (or mnmal) routng s one whch always generates a mnmal path. In general, optmal routng has a broader meanng whch always generates a shortest path, not necessarly a mnmal one, among the avalable ones. It s possble that all mnmal paths are blocked by faults. In ths case, a shortest (avalable) path s not a mnmal one. In ths paper, the above stuaton wll never occur and we use the terms shortest and mnmal nterchangeably. The dstance between two nodes s and d s equal to the Hammng dstance between ther bnary addresses, denoted by H(s; d). Symbol denotes the btwse exclusve OR operaton on bnary addresses of two nodes. Clearly, s d has value at H(s; d) bt postons correspondng to H(s; d) dstnct dmensons. These H(s; d) dmensons are called preferred dmensons and the correspondng nodes are termed preferred neghbors. The remanng n, H(s; d) dmensons are called spare dmensons and the correspondng nodes are spare neghbors. A mnmal path can be ob-

taned by usng lnks at each of these H(s; d) preferred dmensons n some order. Safety Level and Safety Vector. Let us frst revew the concepts of safety level and safety vector. Safety level and safety vector are scalar and vector numbers assocated wth each node n a gven n-cube, respectvely. They provde coded nformaton about fault nformaton n the neghborhood. Defnton []: The safety level of a faulty node s 0. For a nonfaulty node u, let(sl 0 ;sl ;sl ; :::; sl n,), 0 sl n, be the nondescendng safety level sequence of node u s n neghborng nodes n an n-cube, such that sl sl +, 0 <n,. The safety level of node u, sl(u), s defned as: f (sl 0 ;sl ;sl ; :::; sl n,) (0; ; ; :::; n, ),then sl(u) =n,elsef(sl 0 ;sl ;sl ; :::; sl k,) (0; ; ; :::; k, ) ^ (sl k = k, ) then sl(u) =k. In the above defnton, t s assumed that all faults are node faults. To extend ths defnton to cover lnk faults, both end nodes of a faulty lnk have to be assgned a safety level of 0 n order to be consstent wth the orgnal safety level defnton. The safety vector concept s a refnement of the safety level concept by provdng routng capablty to destnatons at dfferent dstances. More specfcally, each node u n an n-cube s assgned wth a safety vector sv(u) =(u ;u ; :::; u n ). Defnton [0]: The safety vector of a faulty node s (0; 0; :::; 0). If node u s an end node of a faulty lnk, the other end node wll be regstered wth a safety vector of (0; 0; :::; 0) at node u. Base for the frst bt: ( 0 f node u s an end-node of a faulty lnk u = otherwse Inductve defnton for the kth bt: u k = 8 < P : 0 f u n otherwse () k, n, k In the safety level (vector) model, a node n an n-cube s sad to be safe f ts safety level s n ((; ; :::; )); otherwse, t s unsafe. Two propertes related to safety levels and safety vectors are as follows: Property : If the safety level of a node s k (0 <kn), then there s at least one Hammng dstance path from ths node to any node wthn dstance-k. Property : Assume that (u ;u ; :::; u n ) s the safety vector assocated wth node u n a faulty n-cube. If u k =then seq seq f and only f each element n seq s larger than or equal to the correspondng element n seq. (0,,0) 0 0 (0,0,0) 00 (0,0,0) 0 (,0,) 00 (,0,) (0,0,0) 0 0 0 (,0,0) 000 (0,0,0) Fgure. An example of a faulty 3-cube wth ts safety level and safety vector assgnments. there exsts at least one Hammng dstance path from node u to any node that s exactly dstance-k away. Based on the above two propertes, t s clear that a safe node n both models can reach any destnaton node (whch s wthn Hammng dstance n, the dameter of the cube) through a mnmal path. The safety vector model s an mprovement based on the safety level model. It can provde more and accurate nformaton about the number and dstrbuton of faults n an n-cube. However, the safety vector concept stll cannot effectvely present faulty lnk nformaton. Actually, t s not clear that an effcent codng method exsts under the assumpton that each node has only neghbor nformaton. Fg. shows an example of a 3-cube wth one faulty node and two faulty lnks. In ths example, the safety level of each node s ether 0 or,.e., a source node can only send a message to ts neghbors. Clearly, by nspecton, the safety level nformaton s not accurate. For example, nodes 0, 0, and can send a message to any nodes through a mnmal path that are dstance- or -3 away. Ths problem s partally resolved n the safety vector model, where the safety vectors assocated wth nodes 0 and are (0,,0) and (,0,), respectvely. Nodes 00, 00, and 0 stll have the lowest safety level (0,0,0). The reason that node 0 has a 0-bt at the nd bt of ts safety vector s that t has two neghbors 00 and 00 wth both 0-bt as the st bt of ther safety vectors. However, the two correspondng faulty lnks (00, 0) and (0, 00) do not span on the same dmenson, and hence, these two faulty nodes wll not block all the mnmal paths ntated from node 0 to a node that s dstance- away. Based on the above analyss, the drecton of each fault (especally lnk fault) s needed to provde accurate nformaton about fault dstrbuton. However, ths approach wll dramatcally ncrease the memory space requrement and the codng complexty. A compromse s therefore needed. 00 0

3. Extended Safety Vectors Proposed Model. In ths secton, we propose a new approach to code fault nformaton. In general, a good codng method s generated on the soundness of ts base. In all exstng approaches, the base s based on neghbor nformaton only. For both safety level and safety vector models, the above method s proved to be effectve for node faults, but not for lnk faults. Our approach here conssts of the followng two steps (see Fg. ):. Each node knows the exact fault nformaton wthn dstance-d.. Fault nformaton about nodes that are outsde dstance-d s coded n a specal way. In ths paper, we show an applcaton of the proposed model on d = ; that s, each node knows fault nformaton wthn dstance-. Informaton about faults that are more than dstance- away are coded. We show that d = s suffcent to handle lnk faults and there s no need to select a large d (ths wll be confrmed by our smulaton results later). Ths model s called extended safety vector, where the frst two bts of an extended safety vector (for a node) represent accurate fault nformaton and other bts k are coded based on the (k, )th bt of ts neghbors. Note that the regular safety vector model s a specal case of ths approach where d =. Results of our smulaton show a dramatc mprovement of ths approach over the safety vector model n handlng lnk faults. Extended Safety Vectors. Let esv(u) = (u ;u ; :::; u n ) be the extended safety vector of node u and esv(u) = (u () ;u() ; :::; u() n ) be the extended safety vector of node u (), u s neghbor along dmenson. We have the followng nductve defnton of extended safety vector esv(u). Defnton 3: The safety vector of a faulty node s (0; 0; :::; 0). If node u s an end node of a faulty lnk, the other end node wll be regstered wth a safety vector of (0; 0; :::; 0) at node u. Base for the frst bt: u =fnode u can reach any neghbor,.e., ( 0 f node u s an end-node of a faulty lnk u = otherwse Base for the second bt: u = f node u can reach any nonfaulty and faulty nodes that are two hops away through a mnmal path (a decson process wll be dscussed later); otherwse, u =0. Inductve defnton for the kth bt, where k 3: u k = 8 < P : 0 f u n otherwse () k, n, k u d precse nfo. coded nfo. Fgure. A new approach to code nformaton. In the extended safety vector model, n addton to the nformaton of adjacent lnks and nodes, each node has complete nformaton about adjacent lnks of ts neghbors. u = (or u = ) ndcates the exstence of a mnmal path to a nonfaulty nodes that are one hop (or two hops) away. For the case of u =0(or u =0), such a mnmal path may or may not exst, but for a gven source-destnaton par (that are one or two hops away) ts exstence can be easly verfed based on dstance- nformaton. We use coded nformaton for destnatons that are more than two hops away. Therefore, for the case of u k = 0, wth k >, the actual exstence of a mnmal path for a gven sourcedestnaton par cannot be verfed, that s, u k =provdes a suffcent condton for the exstence of a mnmal path to a dstance-k node, but t s not a necessary condton. To determne u, node u needs to keep faulty paths of length ntated from node u,.e., a path along whch there exsts at least one faulty lnk or node. A faulty path (u; u () ; (u () ) (j) ) can be smply represented by a dmenson sequence (; j). Clearly, an adjacent faulty lnk or node along dmenson can be represented as (; c ),where c = f; ; :::; ng,fg. That s, any path (u; u () ; (u () ) (k) ), where k c, s a faulty path of length. If both adjacent lnk and node along dmenson are healthy, but there are adjacent faulty lnks along dmensons n c 0 s a,wherec0 subset (ncludng empty set) of c, the correspondng faulty paths can be represented as (; c 0 ). Note that nformaton about (; c 0 ) s passed from node u() to node u. In general, each node u n an n-cube has exactly n pars of (; c 0 ),denoted as f (u) =f(; c 0 ):f; ; :::; ngg. Some(; ) s c0 correspond to healthy paths, where c 0 = fg s an empty set. Theorem : u = 0 for node u f and only f there exst (; c 0 ) and (j; ) n f (u) such that 6= and fjg\ c0 j fg\c0 j c 0 6=, where = fg s an empty set. Proof: There are two node-dsjont paths from node u to another node v that s dstance- away. Suppose these two nodes span on dmensons and j. Clearly, a path of length from node u to node v s (; j) and another s (j; ).

Calculatng extended safety vectors:. In the frst round, node u determnes u based on the status of ts adjacent lnks, and then, exchanges adjacent lnk and node status wth all ts neghbors.. In the second round, node u constructs f (u) whch s a lst of faulty paths of length based on the nformaton collected n the frst round. u s determned based on f (u), and then, exchanges u wth u 0 s of all ts neghbors. (0,,,) (0,,0,) 00 (,,,) (,0,,) (,,,) (,0,,) 0 00 (0,,,) (0,,0,) 0 (,,,) (0,0,0,0) (,0,,) (0,0,0,0) 0 (,,,) (,0,0,) 000 (,,,) (,0,0,) 00 (,,,) 00 (,,,) 0 4 4 (,,,) (,0,,) 000 (,,,) (,0,,) 00 4 (0,,,) (0,,0,) 000 (,,,) (,0,,) (,0,,) 00 (0,0,,) (0,0,0,) 0000 (0,0,0,0) (0,0,0,0) 000 3. In the kth round (3 k <n), node u determnes u k based on u k, s collected n the prevous round, and then, exchanges u k wth u 0 s of all ts neghbors. k 4. In the nth round, node u determnes u n based on u n, s collected n the prevous round. u cannot reach v f and only f (; j) s n (; c 0 ) (.e., that path s faulty) and (j; ) s n (j; c 0 ). In ths case, we have j fg \c 0 6= and fjg 6=. j \c0 In the example of Fg., f u =, then f (u) = f(; fg); (; f3g); (3; f; g)g. Based on the above theorem, the second bt u of the safety vector assocated wth node s. Clearly, bts u and u can be determned through rounds of nformaton exchanges among neghborng nodes and u 3, u 4,..., u n each needs one round. Therefore, the extended safety vector (u ;u ; :::; u n ) of node u n an n-cube can be calculated through n, rounds of nformaton exchanges among neghborng nodes. Theorem : The extended safety vector of each node n an n-cube can be determned through n, rounds of nformaton exchanges between adjacent nodes. 4. Examples and Propertes Fg. 3 shows an example of a faulty 4-cube wth two faulty nodes 000 and 0 and two faulty lnks (0000, 000) and (00, 0). The safety vector of each node s shown under both the safety vector and extended safety vector (on top) models. The safety level of each node s placed nsde each node. Nodes 000 and 0 have an extended safety vector (0; 0; 0; 0), nodes 000, 00, and 0 have an extended safety vector (0; ; ; ), 00 has (,0,,) and 0000 has (0,0,,), and the remanng nodes have a safety vector (; ; ; ). In the followng we show several propertes related to extended safety vectors. Theorem 3: Assume that (u ;u ; :::; u n ) s the extended safety vector assocated wth node u n a faulty n-cube. If Fgure 3. A faulty 4-cube wth two faulty nodes and two faulty lnks. u k = then there exsts at least one Hammng dstance path from node u to any node whch s exactly dstance-k away. Proof: We prove ths theorem by nducton on k. Ifu = (where k =), there s no adjacent faulty lnk. Clearly node u can reach all the neghborng nodes, faulty and nonfaulty. If u = (where k = ), based on the defnton of u, there s a mnmal path to any destnaton that s dstance- away from the source. Assume that ths theorem holds for k = l,.e., f u l = there exsts at least one Hammng dstance path from node u to any node whch s exactly dstance-l away. When k = l +,fu l+ =then Pn u() >n, (l +), whch means that there are at l most l neghbors whch have 0 at the lth bt of ther safety vectors. Therefore, among l +preferred neghbors, there s at least one neghbor, say node v, that has ts lth safety bt set. Based on the nducton assumpton, there s at least one Hammngdstance pathfrom node v to any destnaton node, say w, whch s dstance-l away. Connectng the lnk from node u to node v to the path orgnated from node v to destnaton node w, we construct a Hammng dstance path from node u to destnaton node w whch s dstance-(l +) away. Next we show that the extended safety vector s better than the regular safety vector n terms of accurately representng fault nformaton (Fgure 4 s such an example). Consder a vertex u n an n-cube wth safety vector sv(u) = (u ;u ; :::; u n ) and extended safety vector esv(u) =(u 0 ;u0 ; :::; u0 n ), the extended safety vector s sad to cover the safety vector at node u, fu 0 u k k for all k n. Intutvely, f ev(u) covers v(u) at the kth bt, then the routng based on the extended safety vector has at least the same routng capablty as one based on the safety vector to all destnatons that are dstance-k away. Theorem 4: For any gven faulty n-cube, esv(u) covers sv(u) for any node u n the cube.

Proof: We assume that a general node n a gven cube s represented as u, wth esv(u) = (u 0 ;u0 ; :::; ) and u0 n sv(u) = (u ;u ; :::; u n ) as extended safety vector and safety vector, respectvely. We prove the theorem by nducton on k n bt u k for all nodes n the cube. When k =, u has the same defnton as u 0 for all nodes. Clearly, u 0 u. When k =, based on Property, u = means that there s a mnmal path to any node that s dstance- away. On the other hand, u 0 =f and only f there s a mnmal path to any node that s dstance- away. Hence, f u = then u 0 =. The reverse condton normally does not hold. Therefore, u 0 covers u. Assume that the theorem holds for k = l >,.e., u 0 () u () l Pn u() l for all l. When k = l +, u l l+ = f >n,(l+) and =fp u 0 () l+ > n u0 l n, (l +). Based on P the fact that u 0 () u () for all, l l () where f; ; :::; ng, > n, (l +)mples n u0 l Pn u() >n,(l+),.e., u 0 =mples l l+ u l+ =. 5. Fault-Tolerant Routng Basc Idea. The routng algorthm s smlar to the one n [0]. Suppose that source node s, wth safety vector (s ;s ; :::; s n ), ntends to forward a message to a node that s dstance-k away. (s () ;s() ; :::; s() n ) s the safety vector of neghbor s (). The optmalty s guaranteed f the kth bt of ts safety vector s (s k =) or one of ts preferred neghbors (along dmenson ) (k, )th bt s,.e., s () =, k, n. Routng starts by forwardng the message to a preferred neghbor where the (k, )th bt of ts safety vector s one, and ths node n turn forwards the message to one of ts preferred neghbors that has n the (k,)th bt of ts safety vector, and so on. If the optmalty condton fals but there exsts a spare neghbor that has one n the (k +)th bt of ts safety vector, the message s frst forwarded to ths neghbor and then the optmal routng algorthm s appled. In ths case, the length of the resultant path s the Hammng dstance plus two. We call ths result suboptmal. Routng Algorthms. The routng process conssts of two parts: uncastng at source node s appled at the source node to decde the type of the routng algorthm and to perform the frst routng step. uncastng at ntermedate node s used at an ntermedate node. In the proposed routng process, a navgaton vector, N = s d, susedwhch s the relatve address between the source and destnaton nodes. Ths vector s determned at the source node and t s passed to a selected neghbor after resettng or settng (usng H () ) the correspondng bt n N. Upon recevng a routng message, each ntermedate node frst calculates ts preferred and spare neghbors based on the navgaton vec- Algorthm uncastng at source node begn N = s d; k = js dj; f s k =_9(s () k, =^N () =)_ (k =(or ) ^ a healthy -hop (or -hop) path along dmenson exsts) then optmal uncastng: send (m; N () ) to s (),wheres () k, =^N () = else f 9 (s () k+ =^N () =0) then suboptmal uncastng: send (m; N () ) to s (),wheres () k+ = else falure end. Algorthm uncastng at ntermedate node begn fat any ntermedate node u wth message m and navgaton vector N g f N =0 then stop else send (m; N () ) to u (),whereu () k, =^N () = end. tor assocated wth the message. If ths ntermedate node s dstance-(k +) away from the destnaton node (ths dstance can be determned based on the number of s n the navgaton vector), a preferred neghbor whch has n the kth bt of ts safety vector s selected. When a node receves a message wth an empty navgaton vector, t dentfes tself as the destnaton node by termnatng the routng process and by keepng a copy of ths message. Note that, at the source node, f both condtons for optmal and suboptmal routng fal, the proposed algorthm cannot be appled. Ths falure state can be easly detected at the source node. The cause of falure could be ether too many faults n the neghborhood or a network partton. 6. Performance Evaluaton The smulaton study focuses on the followng three aspects. () Percentage of optmal/suboptmal routng. () Comparson of safety vector and extended safety vector n terms of routng capablty. (3) Performance results when d = k (other than k =),.e., each node knows the exact fault nformaton that s wthn dstance-k. Percentage of optmal routng s measured by the probablty of an optmal routng usng the proposed approach for two randomly selected source and destnaton nodes. Agan, an optmal routng to a dstance-k destnaton s possble f

s k = for the source node or s () k, = for the source node s preferred neghbor along dmenson. In addton, suboptmal routng s feasble, f s () k+ = for the source node s spare neghbor along dmenson. When the source and destnaton nodes are separated by - or -hop, optmal routng can be decded drectly from the dstance- and dstance- nformaton at the source node. Note that a mnmal path may exst even when s and s are both zero. Routng capablty of safety vector and extended safety vector s compared manly under the above two measures. Tables and show smulaton results for 8-cubes and 0- cubes, respectvely. Each table contans three subtables for three dfferent dstrbutons of faults: (a) represents cases of all faults beng node faults; (b) for half faults beng node faults and the other half beng lnk faults; and (c) for all faults beng lnk faults. Wthn each cube, for a gven number of faults, these faults are randomly generated based on the specfced dstrbuton of lnk and node faults. We selected 00 dfferent fault dstrbutons for each case. For a gven fault dstrbuton, we randomly selected 00,000 source and destnaton pars. Percentage of the actual optmal routng s also reported (n the second column of each subtable). Ths also corresponds to cases when global fault nformaton s gven,.e., d = n. Percentage of optmal routng, when dstance-3 nformaton s gven, s shown n the thrd column of each subtable. Based on results n Tables and, the percentage of optmal routng under the extended safety vector model (when d =) stays very close to the one wth global nformaton (when d = n) for all cases. That s, the model for d = s suffcent. Note that when all faults are node faults, the safety vector and the extended safety vector models are the same. However, as the percentage of lnk faults ncreases, the results for the safety vector model deterorate quckly, especally for large numbers of faults. For example, when there are 75 lnk faults (no node faults) n a 0-cube, the percentage of optmal routng s only 35.8 percentage. For the extended safety vector model, the percentage of optmal routng remans hgh, especally when there s a hgh percentage of lnk faults. The summaton of percentages of optmal (Op) and suboptmal (SubOp) routng corresponds to the percentage of the applcablty of the corrspondng approach, whch s ether safety vector (d =) or extended safety vector (d =). 7. Conclusons We have proposed a new codng method of lmted global fault nformaton n an n-cube. Frst each node collects precse fault nformaton wthn dstance-d, andthen,fault nformaton about nodes that are more than dstance-d s coded n a specal way. A model, called extended safety vector, has been proposed whch s extended from the safety level and safety vector models to better handle lnk faults. The extended safety level model has been used to acheve optmal and suboptmal routng n an n-cube. A smulaton study has been conducted based on dfferent selectons of d and results have shown a sgnfcant mprovement under the proposed model over the safety vector model n handlng lnk faults, even for a small value of d such as d =. References [] M. S. Chen and K. G. Shn. Adaptve fault-tolerant routng n hypercube multcomputers. IEEE Trans. on Computers. 39, (), Dec. 990, 406-46. [] J. Duato. A theory of fault-tolerant routng n wormhole networks. IEEE Trans. Parallal and Dstrbuted Syst. 8, (8), Aug. 998, 790-80. [3] P. T. Gaughan and S. Yalamanchl. A famly of fault-tolerant routng protocols for drect multprocessor networks. IEEE Transactons on Parallel and Dstrbuted Systems. 6, (5), May 995, 48-497. [4] Q. P. Gu and S. Peng. Uncast n hypercubes wth large number of faulty nodes. IEEE Trans. Parallal and Dstrbuted Syst. to appear. [5] Y. Lan. A fault-tolerant routng algorthm n hypercubes. Proc. of the 994 Internatonal Conference on Parallel Processng. August 994, III 63 - III 66. [6] T.C. Lee and J.P. Hayes. A fault-tolerant communcaton scheme for hypercube computers. IEEE Trans. on Computers. 4, (0), Oct. 99, 4-56. [7] C. S. Raghavendra, P. J. Yang, and S. B. Ten. Free dmensons - an effectve approach to achevng fault tolerance n hypercubes. Proc. of the nd Internatonal Symposum on Fault-Tolerant Computng. 99, 70-77. [8] Y. Saad and M. H. Schultz. Topologcal propertes of hypercubes. IEEE Transactons on Computers. 37, (7), July 988, 867-87. [9] S. B. Ten and C. S. Raghavendra. Algorthms and bounds for shortest paths and dameter n faulty hypercubes. IEEE Transactons on Parallel and Dstrbuted Systems. Vol. 4, No. 6, June 993, 73-78. [0] J. Wu. Relable communcaton n cube-based multcomputers usng safety vectors. IEEE Transactons on Parallel and Dstrbuted Systems. 9, (4), Aprl 998, 3-334. [] J. Wu. Uncastng n faulty hypercubes usng safety levels. IEEE Transactons on Computers. 46, (), Feb. 997, 4-47. [] J. Wu and E. B. Fernandez. Broadcastng n faulty hypercubes. Proc. of th Symposum on Relable Dstrbuted Systems. Oct. 99, -9.

#of Op Op Safety Vec. Ext. Safety Vec. 6 99.99 99.99 99.99 0.006 99.99 0.006 0 99.98 99.98 99.97 0.030 99.97 0.030 5 99.96 99.90 99.8 0.87 99.8 0.87 0 99.9 99.75 99.03 0.83 99.03 0.83 99.89 99.65 99.3.37 98.3.37 5 99.86 99.43 96.37.583 96.37.853 8 99.80 99.09 93.05 3.987 93.05 3.987 30 99.77 98.75 90.74 4.950 90.74 4.950 (a) When all faults are node faults n 8-cubes #of Op Op Safety Vec. Ext. Safety Vec. 6 99.99 99.98 99.97 0.030 99.98 0.00 0 99.98 99.95 99.8 0.76 99.95 0.050 5 99.96 99.89 99.6 0.78 99.88 0. 0 99.93 99.80 95.8 3.005 99.70 0.89 99.9 99.74 93.4 4.353 99.6 0.380 5 99.90 99.66 87.58 6.467 99.30 0.66 8 99.87 99.56 79.8 8.500 98.9.003 30 99.85 99.46 7.49 9.33 98.45.344 (b) When half faults are node faults n 8-cubes #of Op Op Safety Vec. Ext. Safety Vec. 6 99.98 99.96 99.90 0.097 99.96 0.039 0 99.97 99.9 99.50 0.487 99.9 0.090 5 99.95 99.8 97.30.36 99.8 0.75 0 99.9 99.7 88.30 6.64 99.70 0.99 99.9 99.65 8.4 8.43 99.65 0.347 5 99.90 99.56 68.76 0. 99.55 0.450 8 99.88 99.47 58.38 0.39 99.45 0.55 30 99.87 99.39 5.79 0.63 99.35 0.645 (c) When all faults are lnk faults n 8-cubes Table. Percentage of optmal and suboptmal routng under dfferent models. #of Op Op Safety Vec. Ext. SafetyVec. 8 99.99 99.99 99.99 0.0003 99.99 0.0003 5 99.99 99.99 99.99 0.00 99.99 0.00 30 99.99 99.99 99.98 0.09 99.98 0.09 40 99.99 99.98 99.9 0.087 99.9 0.087 50 99.99 99.95 99.55 0.4 99.55 0.4 55 99.98 99.93 99.3 0.736 99.3 0.736 60 99.99 99.99 98..386 98..386 65 99.98 99.87 96.9.7 96.9.7 70 99.97 99.83 93.83 3.685 93.83 3.685 75 99.97 99.77 90.08 5.80 90.08 5.80 (a) When all faults are node faults n 0-cubes #of Op Op Safety Vec. Ext. SafetyVec. 8 99.99 99.99 99.99 0.00 99.99 0.00 5 99.99 99.99 99.99 0.008 99.99 0.003 30 99.99 99.98 99.88 0.7 99.98 0.04 40 99.99 99.98 99.33 0.6 99.97 0.06 50 99.99 99.97 96.9.337 99.94 0.056 55 99.99 99.96 94.05 3.87 99.9 0.08 60 99.99 99.95 88.34 6.03 99.88 0.4 65 99.98 99.94 8.76 7.73 99.83 0.7 70 99.98 99.93 7.86 9.035 99.74 0.48 75 99.98 99.9 6.3 9.635 99.54 0.46 (b) When half faults are node faults n 0-cubes #of Op Op Safety Vec. Ext. SafetyVec. 8 99.99 99.99 99.99 0.004 99.99 0.00 5 99.99 99.99 99.97 0.0 99.99 0.006 30 99.99 99.99 99.6 0.367 99.98 0.09 40 99.99 99.97 97.0.34 99.97 0.03 50 99.99 99.95 86.0 6.977 99.95 0.049 55 99.99 99.94 75.59 9.078 99.94 0.057 60 99.99 99.93 63.4 9.98 99.93 0.066 65 99.98 99.9 50.8 9.885 99.9 0.078 70 99.98 99.9 4.67 9.47 99.9 0.089 75 99.98 99.90 35.8 8.79 99.90 0.099 (c) When all faults are lnk faults n 0-cubes Table. Percentage of optmal and suboptmal routng under dfferent models.