Fault-Tolerant Hierarchical Networks for Shared Memory Multiprocessors and their Bandwidth Analysis

Size: px
Start display at page:

Download "Fault-Tolerant Hierarchical Networks for Shared Memory Multiprocessors and their Bandwidth Analysis"

Transcription

1 c British Computer Society 2002 Fault-Tolerant Hierarchical Networks for Shared Memory Multiprocessors and their Bandwidth Analysis SYED MASUD MAHMUD, L.TISSA SAMARATUNGA AND SHILPA KOMMIDI Department of Electrical and Computer Engineering Wayne State University, Detroit, MI 48202, USA Many researchers have paid significant attention to the design of cluster-based systems, due to the fact that such systems need very inexpensive networks compared to those needed for noncluster-based systems. A number of hierarchical interconnection networks HINs have also been proposed in the literature which can be used for building large cluster-based systems. Most of the existing HINs are not fault tolerant. It is very desirable that a HIN be fault tolerant, because even a single fault in the network can completely disconnect a large number of processors and/or memory modules from the rest of the processors and memory modules of the system. As a result, the performance of the system will decrease significantly. In this paper, we have proposed two types of hierarchical interconnection networks which are fault tolerant and can be used to build large cluster-based multiprocessor systems. We have also developed analytical models to determine the performance of the proposed fault-tolerant HINs under fault-free and faulty conditions. Simulation models were also developed to verify the accuracy of the analytical models. The results obtained from the analytical models were found to be very close to those obtained from the simulation models. The technique that has been used to develop models in this paper can also be used to develop models for other hierarchical systems. Received 1997; revised 6 September INTRODUCTION Recently a great deal of attention has been paid to the design of cluster-based multiprocessor systems [1 23]. Clusterbased design is very appealing when a system is to be built with a very large number of processors and memory modules. A cluster-based multiprocessor system needs a less expensive interconnection network compared to that needed for a non-cluster-based system. A number of clusterbased designs are available in the literature. The Cm* [1] is made up of 50 processor memory pairs called compute modules, grouped into clusters. Communication within a cluster is via a parallel bus controlled by an address mapping processor termed a Kmap. There are five clusters and these communicate via an intercluster bus. The CEDAR system [2, 3] uses a bus interconnection between the processors within a cluster and the cluster memory they share, and a multistage interconnection network between all processors and the global memory shared among all clusters. The DASH multiprocessor [4] is also a cluster-based system. The processors and memory modules of a cluster are connected by a bus. This multiprocessor system can have as many clusters as needed. All the clusters can be connected by a A short version of this paper was presented at the IEEE International Conference on Algorithms and Architectures for Parallel Processing, Brisbane, Australia, April 19 21, general interconnection network. A cluster structure using shared buses as the basic interconnection media has been proposed by Wu and Liu [5]. Multiple levels of clustering may be present in their organization. Shared buses are used to interconnect the units within a cluster, and the entire system is built using a hierarchy of buses. The Fat Tree network [24] provides uniform bandwidth between any two end-points on a net. It does this by doubling the number of paths as one goes up the tree. The cost of paths will increase significantly compared to that of other hierarchical systems described below. The CM-5 [25] is a message passing system. Its internal networks include two components, a data network and a control network. The topology of the data network is a fat tree. KSR-1 [26] is a shared memory system and it consists of a hierarchy of rings. Agrawal and Mahgoub [6, 7] proposed a cluster-based multiprocessor system where a hierarchical interconnection network HIN is used for communication. The conflictfree access within each cluster is satisfied by relatively smaller crossbar switches. They showed that a cluster-based scheme provides results closer to a fully connected crossbar system if every processor accesses memory modules within its own cluster more frequently than other memory modules. Mahgoub and Elmagarmid [8] proposed a generalized class of cluster-based multiprocessor systems. They proposed

2 148 S. M. MAHMUD, L.T.SAMARATUNGA AND S. KOMMIDI a multilevel hierarchical network for their systems, which consists of a large number of smaller crossbar switches. The performance of their network is very close to that of a full crossbar connection if a processor accesses its nearer memory modules more frequently than remote memory modules. Potlapalli and Agrawal [9] proposed a HIN called the Hierarchical Multistage Interconnection Network. This network consists of many levels and the network at each level is built using multistage interconnection networks. A number of other hierarchical interconnection networks are proposed in the literature [10 29] which can be used for multiprocessor and multicomputer systems. The motivation behind designing HINs is to exploit the inherent locality that exists in many general and parallel computations. The success of all cluster-based systems with reduced or limited interconnection depends on the locality of computations. This means that a processor must access the memory modules within its own cluster more frequently than those in other clusters. In fact, for the analysis of all hierarchical networks it is assumed that the probability that a processor generates a reference for one of its ith-level memory modules is p i,wherep i >p j for all i<j. The performance of a HIN is very sensitive to network faults. Sometimes a single fault in the network can degrade the performance of the system very significantly, depending upon the location of the fault. For example, if any one of the HINs presented in [5, 6, 8, 9] has a faulty link, then that faulty link will isolate a number of devices processors and/or memory modules from the rest of the system. The number of devices that will be isolated from the other devices depends on the location of the fault. If a fault occurs at a higher level, then that fault will isolate more devices than if the fault occurs at a lower level. Since all the devices of a hierarchical system can not be used together in the presence of a fault in the HIN, the performance of the system will degrade and the amount of degradation will depend on the location of the fault. The performance of the system will degrade significantly if the fault occurs at or near the highest level of the system. Moreover, if multiple faults exist in a HIN, then these faults may divide the entire system into many small isolated subsystems. Thus, in the presence of multiple faults in the HIN, the system may not be usable at all. In this paper we present two fault-tolerant HINs. Both HINs are designed using many small crossbar switches. In one type of HIN, multiple links are used at every input and output port of the crossbar switches. The bandwidth available from a port of a crossbar depends on the number of links present in that port. Thus, when a link becomes faulty, the bandwidth of the corresponding port decreases as opposed to the fact that a number of devices become disconnected from the rest of the system. Hence, all the devices of the entire system can still be used together, but with a slight degradation in performance. In another type of HIN we use only one link in every input and output port of a crossbar, but we use a small backup circuit with every crossbar in order to tolerate one or more faults within the crossbar. We have developed analytical models in order to determine the memory bandwidth of both types of HINs under fault-free and faulty conditions. We have verified our analytical models using extensive simulations. Most of the results from the analytical models match very closely within 5% to those from the simulation models. Section 2 describes our fault-tolerant HINs. The analytical models are presented in Section 3. Results from the analytical and simulation models are presented and discussed in Section 4, and the conclusions are presented in Section DESCRIPTION OF FAULT-TOLERANT HIERARCHICAL INTERCONNECTION NETWORKS The performance of a hierarchical system is very sensitive to network faults. If a link can not be used either because the link itself is faulty or there is a fault in the network which makes the link unusable, then a set of processors and memory modules becomes disconnected from the rest of the processors and memory modules of the system. As a result, the performance of the system may degrade significantly depending upon what fraction of the processors and memory modules is available within that set which becomes disconnected. Thus, it is very desirable that a hierarchical interconnection network must be fault tolerant A multiple-link-based HIN The multiple-link-based HIN, presented in this paper, has many levels of hierarchy. The processors and memory modules of the system are grouped into a number of processor memory clusters PMCs, called the local-level or the zeroth-level clusters. Every zeroth-level cluster has n 0 processors and m 0 memory modules. Every zeroth-level cluster also has an inlet with b 1 links coming from the first-level parent IN, and an outlet with a 1 links going to the parent IN. The interconnection network inside a zerothlevel cluster, called the zeroth-level IN, is built using an n 0 + b 1 m 0 + a 1 crossbar switch. A first-level IN is connected to k 1 zeroth-level INs child INs on one side and to a second-level IN parent IN on the other side. A first-level IN has k input ports inlets and k output ports outlets. A first-level IN is connected to its second-level parent IN using an inlet and an outlet containing b 2 and a 2 links, respectively. Each one of the other inlets and outlets of a first-level IN has a 1 and b 1 links, respectively, and these inlets and outlets are used to make connections between the first-level IN and the k 1 zeroth-level child INs. A first-level IN is built using a k 1 a 1 + b 2 k 1 b 1 + a 2 crossbar switch. In general, we can say that if a hierarchical system has L levels, then an ith 1 i L 2 level IN is connected to k i i 1th-level INs on one side and to an i + 1thlevel IN on the other side. An ith-level IN has k i + 1inlets and k i + 1 outlets. One inlet has b i+1 links coming from the i + 1th-level parent IN, and each one of the other k i inlets has a i links coming from an i 1th-level child IN.

3 HIERARCHICAL NETWORKS FOR SHARED MEMORY MULTIPROCESSORS 149 FIGURE 1. A four-level multiple-link-based hierarchical interconnection network. One outlet has a i+1 links going to the i + 1th-level parent IN, and each one of the other k i outlets has b i links going to an i 1th-level child IN. This network is built using a k i a i + b i+1 k i b i + a i+1 crossbar switch. The highest level IN of an L-level hierarchical system has k L 1 inlets and k L 1 outlets. Every inlet has a L 1 links coming from an L 2th-level child IN. Every outlet has b L 1 links going to an L 2th-level child IN. This network is built using a k L 1 a L 1 k L 1 b L 1 crossbar switch. From the above description it is clear that k 1 zeroth-level clusters are connected to a first-level IN to form a first-level cluster, k 2 first-level clusters are connected to a second-level IN to form a second-level cluster, k 3 second-level clusters are connected to a third-level IN to form a third-level cluster and so on. Thus, the total number of zeroth-level clusters in an L-level hierarchical system is k 1 k 2 k 3...k L 1. Since there are n 0 processors and m 0 memory modules in each zeroth-level cluster, the total numbers of processors and memory modules in the system are N = n 0 k 1 k 2 k 3...k L 1 and M = m 0 k 1 k 2 k 3...k L 1, respectively. Figure 1 shows a four-level multiple-link-based hierarchical system. If a processor generates a memory reference for one of its local zeroth-level memory modules, then that reference goes to the memory module through the local interconnection network. However, if a processor generates a reference for one of its ith- i >0 level memory modules, then that reference first keeps moving up through the parent outlets of different INs until it reaches the ith-level IN of the ith-level cluster in which the processor is located. The reference then starts moving down through the child outlets of different INs until it reaches the referenced memory module A HIN with fault-tolerant INs Here we propose another type of fault-tolerant HIN. This type of HIN is designed using only one link at every inlet and outlet port. However, every IN has a backup circuit, as shown in Figure 2, to tolerate faults within the crossbar main crossbar. If any reference can not move through the main crossbar due to the presence of faults in the main crossbar, then that reference tries to move through the backup circuit. For an ith-level fault-tolerant IN, the backup circuit is composed of two small crossbars: one k i + 1 z i crossbar and another z i k i + 1 crossbar. All the k i + 1inletsofanith-level fault-tolerant IN are connected to the k i + 1 k i + 1 main crossbar and the k i + 1 z i backup crossbar. The z i outlets of the k i + 1 z i backup crossbar are in turn connected to the z i inlets of the other z i k i + 1 backup crossbar. Then, all the k i + 1 outlets of the z i k i + 1 backup crossbar are connected to the k i + 1 outlets of the main crossbar. It is assumed that the main crossbar has a built-in fault detection circuit, which generates control signals to route memory references through the backup circuit when necessary. The backup circuit allows a maximum of z i references to move through it, if these references can not move through the main crossbar due to the presence of faults in the main crossbar. Thus, if z i or fewer references can not move through the main crossbar due to the presence of faults, the performance of the system will not degrade, because all these references can move through the backup circuit. However, the performance of the system will degrade if more than z i references need to be moved through the backup circuit.

4 150 S. M. MAHMUD, L.T.SAMARATUNGA AND S. KOMMIDI FIGURE 2. An ith-level IN with a backup circuit. 3. PERFORMANCE ANALYSIS In this section we present some analytical models to determine memory bandwidths of the HINs which are presented in the previous section. Different analytical models are developed to determine the bandwidths of the HINs in the presence of different types of faults. The following main notation is used to describe different parameters of the system. In addition to this notation, some other parameters are also used in this paper, and they are defined just before they are used in the text Notation N M L N i M i n i m i k i C i a i b i s i total number of processors in the system total number of memory modules in the system total number of levels in the system including the local level zeroth level number of processors in an ith-level cluster total number of memory modules in an ith-level cluster number of ith-level processors of a memory module number of ith-level memory modules of a processor number of i 1th-level clusters used to make an ith-level cluster total number of ith-level clusters in the entire system number of links in a child inlet of an ith-level IN number of links in a child outlet of an ith-level IN probability that a processor s generated reference is an ith-level reference pu i+1 pd i du i dd i BW BW i bw i bw i,j BWF bw i F BW F rate of reference at a link of the parent outlet of an ith-level IN rate of reference at a link of a child outlet of an ith-level IN number of distinct references competing for the parent outlet of an ith-level IN number of distinct references competing for a child outlet of an ith-level IN total bandwidth of the HIN bandwidth contribution from all the ith-level references of the system bandwidth contribution from all the processors of an ith-level cluster note that bw L 1 is the same as BW bandwidth contribution from the jth-level references of an ith-level cluster loss in bandwidth due to the presence of the set of faulty outlets F bandwidth contribution from an ith-level cluster in the presence of the set of faulty outlets F total bandwidth of the HIN in the presence of the set of faulty outlets F. The number of processors in an ith-level cluster is N i = { n 0, for i = 0, k i N i 1, for 1 i L 1. The number of memory modules in an ith-level cluster is M i = { m 0, for i = 0, k i M i 1, for 1 i L

5 HIERARCHICAL NETWORKS FOR SHARED MEMORY MULTIPROCESSORS 151 The number of ith-level processors of a memory module is { n 0, for i = 0, n i = 3 N i N i 1, for 1 i L 1. The number of ith-level memory modules of a processor is { m 0, for i = 0, m i = 4 M i M i 1, for 1 i L 1. The total number of ith-level clusters in the entire system is L 1 k j, for 0 i L 2, C i = j=i+1 1, for i = L 1. The models presented in this paper are developed based on the following assumptions: 1 the multiprocessor system is synchronous with N processors and M memory modules; 2 the new references generated in a cycle are random and independent of each other; 3 the references which are not accepted during a memory cycle are resubmitted for the same memory modules in the next cycle; 4 ψ is the probability with which an active unblocked processor generates a new reference in a memory cycle. Assumptions 1 and 2 are used by almost all the bandwidth analysis models available in the literature. Assumption 3 is used to make our model more realistic Bandwidth analysis of a multiple-link-based HIN Model for a fault-free multiple-link-based system Let f be the fraction of the processors which are active at steady state. Note that normally f is less than 1, because some of the processors might have been blocked due to the fact that their references were not accepted during the previous cycles. The blocked processors will remain in the inactive state until their references are accepted by the memory modules and then they will go to the active state. Since s i is the fraction of a processor s references which are directed to its ith-level memory modules and there are m i ith-level memory modules of a processor, using the empirical expression developed by Yen et al. [30], one can show that the average number of distinct references competing for the parent outlet of a local cluster is L 1 du 0 = m i 1 1 ψf s i i=1 1 1mi 1 m i n0 1 f i m i 5 n0 mi. 6 Computation of the rate of reference at a link of the parent outletofanin. Since the parent outlet of a local cluster has a 1 links, during a memory cycle the average number of distinct references arriving at a link of the parent outlet of a local cluster is du 0 /a 1.Sincepu 1 is the rate of reference at a link of the parent outlet of a zeroth-level IN and the rate of reference at a link can not be greater than unity, the value of pu 1 can be determined as du 0, if du 0 < 1, a 1 a 1 pu 1 = 1, if du a 1 During a memory cycle, the probability that a processor either generates a new reference for an ith-level memory module or it is already blocked for an ith-level memory module is ψf s i + f i.anith-level IN can receive only ithand higher-level references from its child INs. Let u i,j be the portion of these references which is directed to the jth-level memory modules. The value of u i,j can be expressed as ψf s j + f j u i,j = L 1 k=i ψf s k + f k, for 1 i L 1andi j L 1. 8 The average number of distinct references competing for the parent outlet of an ith-level IN is du i = L 1 j=i+1 m j 1 1 pu iu i,j a i m j ki, for 1 i L 2. 9 The rate of reference at a link of the parent outlet of an ith-level IN can be expressed as follows: du i, if du i < 1, a i+1 a i+1 pu i+1 = 1, if du for 1 i L 2. i 1, a i+1 10 Computation of the rate of reference at a link of a child outlet of an IN. The references which come to an ith-level IN through the parent inlet are uniformly distributed over all the memory modules of the ith-level cluster which exists underneath the ith-level IN. The average number of distinct references competing for a child outlet of an ith-level IN is dd i = M i pd i+1b i+1 1 pu iu i,i a ki 1 i, for 1 i L 1 m i 11 where pd L = 0andu L 1,L 1 = 1. The average number of distinct references competing for a link of a child outlet of an ith-level IN can be expressed as dd i, if dd i < 1, b i b i pd i = 1, if dd for 1 i L i 1, b i M i

6 152 S. M. MAHMUD, L.T.SAMARATUNGA AND S. KOMMIDI 1. f := 1.0; n := 0; For i := 0toL 1dof i n := 0; Done := False; REPEAT 2. n := n Do the analysis shown by 6 through 13, and get the value of BW using If BW fnψ <εwhere ε is a very small number Then Done := True; Else Begin Determine f i n + 1,0 i L 1, using 18; f := 1.0; For i := 0toL 1dof := f f i n + 1; End; UNTIL Done; 5. Accept BW as the bandwidth of the system. ALGORITHM 1. Bandwidth computation algorithm. Total memory bandwidth of the hierarchical system is BW = M 1 1 pd 1b 1 1 ψf s n m0 1 m 0 1 f 0 m 0 m 0 n0 m0. 13 Computation of bandwidth contribution from the ith- 0 i L 1 level references. An ith-level IN receives i + 1th and higher-level references through its parent inlet. Let v i+1,j be the portion of these references which is directed to the jth- i + 1 j L 1 level memory modules. The value of v i,j j i can be determined as follows: v i,i = M i 1 1 pu iu i,i a ki 1 i m i [pd i+1 b i+1 + M i 1 for 1 i L 2 and v i,j = pd i+1 b i+1 v i+1,j [pd i+1 b i+1 + M i 1 for 1 i L 2, i+ 1 j L 1 1 pu iu i,i a ki 1 ] 1 i, m i 14a 1 pu iu i,i a ki 1 ] 1 i, m i 14b where v L 1,L 1 = 1. The bandwidth contributions from different types of references will be proportional to the number of corresponding references which arrive at a local cluster. Let d z be the average number of distinct zeroth-level references generated by the active and blocked processors of a local cluster. The value of d z can be expressed as d z = m ψf s n m0 1 m 0 1 f 0 m 0 n0 m0. 15 The average number of ith-level references which arrive at a local cluster from the first-level parent IN is pd 1 b 1 v 1,i. Hence, d z BW 0 = BW 16 d z + pd 1 b 1 and pd1 b 1 v 1,i BW i = BW, d z + pd 1 b 1 for 1 i L The fraction of processors which attempt to access their ithlevel memory modules during a memory cycle is fs i + f i 0 i L 1. The value of f i for the next iteration of the bandwidth computation can be determined as f i n + 1 = ψf s i + f i n BW i N, for 0 i L 1 18 where n is the iteration number. The iterative algorithm shown in Algorithm 1 can be used to determine the bandwidth of the hierarchical system. Note that, at steady state, the bandwidth of the system must be equal to fnψ Models for multiple-link-based HINs with faulty links In this subsection we are going to show the analytical models for different types of link faults. By link faults we mean that some links can not be used due to the presence of faults in the network. A link may not be usable either because there is a fault on the link itself, or there is a fault in an IN which makes the link unusable. First we present a few lemmas and then we show the analytical models for different types of faulty HINs. LEMMA 1. At steady state, the bandwidth contribution bw i from the processors of an ith- 0 i L 1 level cluster is proportional to the bandwidth contribution bw i,j from the jth- 0 j L 1 level references of those processors. LEMMA 2. At steady state, the bandwidth contribution bw i from the processors of an ith- 0 i L 2 level cluster is proportional to the bandwidth available from the parent outlet of the corresponding ith-level IN. The bandwidth available from an outlet means the total bandwidth contribution from all the references which go through that outlet. Analytical model for the HIN with one faulty parent outlet. Let the faulty parent outlet be u i.letu i be the parent outlet of an ith-level IN and the number of faulty links in u i be x

7 HIERARCHICAL NETWORKS FOR SHARED MEMORY MULTIPROCESSORS 153 U 0 ={u i u i U and u i is neither an ancestor nor a descendant of u j U for all i j} t := 0; If U 0 U Then Repeat Pick u k such that u k U and u k / U 0... U t ; t := t + 1; U t ={u k } {u i u i U and u i is either an ancestor or a descendant of u k } Until U 0... U t = U ALGORITHM 2. x <a i+1. Due to the presence of the faulty links in an ithlevel parent outlet, the bandwidth contribution from the correspondingfaulty ith-level cluster will be less than that from other good ith-level clusters. In this subsection, we present an approximate analytical model for bandwidth analysis of a faulty HIN. Since the results obtained from this approximate analytical model were found to be very close to those obtained from the simulation model, we did not try to develop the actual probabilistic model for the faulty HIN which is very complex. Since the approximate analytical model gives very good results, we also felt that it may not be worthwhile developing the exact complex probabilistic model. Let pu i+1 be the rate of reference at a link of the faulty parent outlet of an ith-level IN. Note that the superscript is used to indicate that the corresponding value is for a faulty cluster. The value of pu i+1 can be expressed as du i pu i+1 = 1, if a i+1 x, if du i a i+1 x < 1, du i a i+1 x 1, 19 where du i is given by 9. However, the rate of reference at a link of the parent outlet of an ith-level IN of a good ithlevel cluster is pu i+1, as given by 10. In our approximate model we assume that the bandwidth available from the ithlevel parent outlet of a good ith-level cluster is proportional to pu i+1 a i+1 and that available from the faulty ith-level parent outlet is proportional to pu i+1 a i+1 x.nowusing Lemma 2 we can say that the bandwidth contribution from the processors of a good ith-level cluster is proportional to pu i+1 a i+1 and that from the processors of the faulty ith-level cluster is proportional to pu i+1 a i+1 x. Let us use the term ui, x, as shown below, to indicate the degradation due to the presence of x faulty links in an ithlevel parent outlet: ui, x = 1 pu i+1 a i+1 x. 20 pu i+1 a i+1 Hence, the bandwidth loss due to the presence of x faulty links in an ith-level parent outlet is given by bw i ui, x. Note that bw i = BW/C i,wherebw is the total bandwidth of a good HIN and C i is the total number of ith-level clusters in the entire system. Hence, the total bandwidth of the HIN in the presence of the faulty parent outlet u i is given by BW {u i } = BW 1 ui, x C i. 21 Models for multiple faulty parent outlets. Let the set of faulty parent outlets be U ={u 1,u 2,u 3,...,u r },wherer is the number of faulty outlets. Let the faulty parent outlet u i U be at level h i and the number of faulty links in the outlet be x i. First we use Algorithm 2 to generate a number of disjoint sets from U. From Algorithm 2 it is clear that U i U j = for 0 i, j t and i j. Now we are going to determine the loss in bandwidth due to the presence of the faulty outlets of the set U 0. Since the outlet u i U 0 is neither an ancestor nor a descendant of the outlet u j U for all i j, the references which move through u i U 0 can not move through any other outlets of the set U and vice versa. Thus, in our approximate model we assume that the outlet u i U 0 does not have any significant effect on any other outlets of the set U and vice versa. Hence, we can still assume that the bandwidth contribution from the h i th-level faulty cluster, which became faulty due to the presence of the faulty outlet u i U 0, is proportional to the bandwidth available from u i. Therefore, the loss in bandwidth due to the presence of the faulty outlet u i U 0 is bw hi uh i,x i,wherebw hi is the bandwidth contribution from a good h i th-level cluster and uh i,x i is given by 20. Hence, the total loss in bandwidth due to the presence of all the faulty outlets of the set U 0 is BWU 0 = bw hi uh i,x i u i U 0 uh i,x i = BW. 22 C u i U hi 0 If U 0 U, then there are more sets of faulty outlets. Since U i U j = for i j, it is clear that the references which move through one outlet of a set can not move through an outlet of another set. Thus, in our model we assume that the outlets of one set are not going to have any significant effect on the outlets of another set. Let us try to determine the loss in bandwidth due to the presence of the faulty outlets of the set U 1. Without loss of generality, we can assume that U 1 ={u 1,u 2,u 3,...,u v }, where v is the number of faulty outlets in U 1. It is clear that, for every pair of faulty outlets u i,u j U 1, the outlet u i is either an ancestor or a descendant of the outlet u j. Recall that the faulty parent outlet u i U is at level h i. Without loss of generality, we can assume that h i < h j for all i < j. Thus, u j is an ancestor of u i for all u i,u j U 1 and j > i. Since u j is an ancestor of u i,the

8 154 S. M. MAHMUD, L.T.SAMARATUNGA AND S. KOMMIDI references which move through u i, also move through u j. As a result, there will be a direct effect of one faulty outlet on another. The loss in bandwidth due to the presence of the faulty outlet u i is bw hi uh i,x i. Hence, the bandwidth contribution from the faulty h j th- h j >h i level cluster, due to the presence of the faulty outlet u i, is given by bw hj bw hi uh i,x i. Since the references which move through u i also move through u j, in our approximate model we assume that the bandwidth contribution from the faulty h j th-level cluster, due to the presence of the faulty outlets u i and u j,isbw hj bw hi uh i,x i 1 uh j,x j. Hence, the total loss in bandwidth due to the presence of all the faulty parent outlets of the set U 1 can be expressed as uhv,x v BWU 1 = BW C hv v 1 uh i,x i + C i=1 hi v j=i+1 1 uh j,x j. 23 The loss in bandwidth due to the presence of the faulty parent outlets of the set U i 2 i t can be determined in a similar way to that for the faulty parent outlets of the set U 1. Thus, the total loss in bandwidth due to the presence of all the faulty parent outlets of the set U is t BWU = BWUi. 24 Hence, the bandwidth of the HIN in the presence of all the faulty parent outlets of the set U is BW U = BW BWU. 25 Now we develop analytical models for the multiple-linkbased HIN with faulty child outlets. First we present three more lemmas and then we show the analytical model for a system with faulty child outlets. LEMMA 3. An ith-level child outlet carries the j th- i j L 1 level references of k j 1 j 1th-level clusters. LEMMA 4. The ith-level references of a processor move through Nd j jth- j i level child outlets, where, for j = i, Nd j = i 1 26 k y, for j<i. y=j LEMMA 5. Assume that D is a set of child outlets such that, for every pair of outlets d m,d n D, d m is either an ancestor or a descendant of d n, and all the outlets of D carry some references of a given processor. If d m D carries the ith-level references of the given processor, then d n D for all m and n also carries the ith-level references of the processor and no outlet of D can carry any other type of references, say jth- j i level references, from that given processor. A cluster is called an affected cluster if the bandwidth contribution from every processor of that cluster is going to be affected due to the presence of faults in the system. Analytical model for the HIN with one faulty child outlet. Let the faulty child outlet be d i.letd i be a child outlet of an ith-level IN and let the number of faulty links in d i be x x <b i. Let pd i be the rate of reference at a link of the faulty child outlet. The value of pd i can be expressed as dd i pdi = 1, if b i x, if dd i b i x < 1, dd i b i x 1, 27 where dd i is given by 11. However, the rate of reference at a link of a good ith-level child outlet is pd i, as given by 12. In our approximate model we assume that the bandwidth available from the faulty ith-levelchild outletis proportional to pd i b i x and that available from a good ith-level child outlet is proportional to pd i b i. Lemma 3 shows that an ithlevel child outlet carries the jth- i j L 1 level references of k j 1j 1th-level clusters. Thus, a faulty child outlet will affect the bandwidth contribution of those clusters whose references move through the faulty outlet. If an ith-level child outlet is faulty, then this faulty outlet will affect the bandwidth contribution of many i 1th- and higher-level clusters. Now let us try to determine the loss in bandwidth contribution from different types of affected clusters. From Lemma 3 it is clear that the bandwidth contribution from k i 1i 1th-level clusters will be affected by a faulty ith-level child outlet, because the ith-level references from these i 1th-level clusters move through the faulty ithlevel child outlet. From Lemma 4 we see that the ith-level references of a processor move through k i 1 ith-level child outlets. Thus, the ith-level references of an i 1th-level affected cluster move through k i 2 good ith-level child outlets and the faulty child outlet. Hence, in our approximate model we assume that the bandwidth contribution from the ith-level references of an i 1th-level affected cluster is proportional to k i 2pd i b i + pd i b i x and that from a non-affected i 1th-level cluster is proportional to pd i b i. Let us use the term di, x, asshown below, to indicate the degradation due to the presence of x faulty links in an ith-level child outlet: di, x = 1 pd i b i x pd i b i. 28 Let us use the term δj,i,x to indicate the fraction of the bandwidth contribution which will be lost from a jth-level affected cluster due to the presence of x faulty links in an ith-level child outlet. In general, the value of δj,i,x can be expressed as di, x, for j = i 1, δj,i,x = di, x, k j+1 1k j k j 1...k i+1 k i for i j L 2. 29

9 HIERARCHICAL NETWORKS FOR SHARED MEMORY MULTIPROCESSORS 155 For i := 1toC 0 do Begin D i,0 ={d j d j D i and d j is neither an ancestor nor a descendant of d k D i for j k}; m := 0; If D i,0 D i Then Repeat Pick u k such that u k D i and u k / D i,0... D i,m ; m := m + 1; D i,m ={u k } {d j d j D i and d j is either an ancestor or a descendant of u k }; Until D i,0... D i,m = D i End; ALGORITHM 3. For j := 1toL 1do bw i j := 0; For all d j D i,0 do bw i g j := bw i g j + bw 0 δg j 1,h j,x j ; If D i,0 D i Then For n := 1tom do bw i g n := bw i g n + bw δg n 1,h j,x j d j D i,n ALGORITHM 4. The total loss in bandwidth from all the affected clusters in the system can be expressed as BW{d i } = L 2 j=i 1 bw j k j+1 1δj, i, x. 30 Since bw j = k i k i+1...k j bw i 1 and bw i 1 = BW/C i 1, 30 can be reduced to the following closed form expression: BW{d i } = L ibw di,x C i Hence, the bandwidth of the HIN in the presence of the faulty child outlet is BW L i di,x {d i } = BW C i 1 Model for multiple faulty child outlets. Let the set of faulty child outlets be D = {d 1,d 2,d 3,...,d s },wheres is the number of faulty outlets. Let the faulty outlet d i D be at level h i and the number of faulty links in the outlet be x i.we know that the total number of zeroth-level local clusters in the system is C 0. Let us assume that the zeroth-level clusters are numbered as 1, 2, 3,... and C 0.Nowwemakethesets D i 1 i C 0 as D i ={d j d j D and d j carries some references generated by the processors of the zeroth-level cluster #i}, for 1 i C 0. We then use Algorithm 3 to generate a number of disjoint sets from every D i 1 i C 0. Now we are going to determine the loss in bandwidth from the zeroth-level cluster #i 1 i C 0 due to the presence of the faulty outlets of the sets D i,n 0 n m. Since the outlet d j D i,0 is neither an ancestor nor a descendant of the outlet d k D i for all j k, we assume that d j D i,0 does not have any significant effect on d k D i and vice versa for all j k. Assume that the outlet d j D i,0 carries some g j th-level references of the zeroth-level cluster #i. The loss in bandwidth from the zeroth-level cluster #i due to the presence of the faulty outlet d j D i,0 is given by bw 0 δg j 1,h j,x j. For every pair of outlets d j,d k D i,n 1 n m, d j is either an ancestor or a descendant of d k. Hence, any two outlets of D i,n will have a direct effect on each other. Lemma 5 shows that the outlets of D i,n carry only one type of reference say, g n th-level references generated by the processors of the zeroth-level cluster #i. If the faulty outlets of the set D i,n are the only faulty outlets in the system, then the loss in bandwidth from the zeroth-level cluster #i can be expressed as bw δg n 1,h j,x j. d j D i,n Since, at steady state, the total bandwidth available from a cluster is proportional to the bandwidth available from the ith- 0 i L 1 level references of the cluster, the maximum bandwidth which can be obtained from the zeroth-level cluster #i is limited by the references which cause the maximum loss in bandwidth. Let bw i j be the loss in bandwidth from the zeroth-level cluster #i caused by the jth-level references and let bw 0 i be the total bandwidth contribution from the zeroth-level cluster #i. Then we can write bw 0 i = bw 0 max bw i 1, bw i 2,..., bw i L The values of bw i j 1 j L 1 can be determined using Algorithm 4. Thus, the bandwidth of the HIN in the

10 156 S. M. MAHMUD, L.T.SAMARATUNGA AND S. KOMMIDI presence of the faulty child outlets of D is where bw 0 i is given by 33. C 0 BW D = bw 0 i 34 i= Bandwidth analysis of a HIN with fault-tolerant INs When there is no fault in this type of HIN, the analytical model is the same as that of the multiple-link-based HIN. The only difference is that for the HIN with fault-tolerant INs a i = b i = 1for1 i L 1. Since an ithlevel backup circuit can move z i references through it, the performance of the HIN will not degrade if z i or fewer references can not move through the main crossbar. Thus, the analytical models are developed for the case when more than z i references can not move through the main crossbar. Here we present an analytical model for only one faulty IN in the system. The model can easily be extended for multiple faulty INs in a similar way as was done for multiple faults of the other type of HIN, as described in the previous subsection. Let the faulty IN be an ith-level IN. Some of the inlets of the fault-tolerant IN will not be able to send their references through the main crossbar when there are some faults in the main crossbar. Let us call these inlets faulty inlets. The model will depend on whether or not the parent inlet is one of the faulty inlets The parent inlet is not one of the faulty inlets Let the number of faulty inlets be g g > z i. Thus, the references from these g faulty inlets will move to the backup circuit. The average number of distinct references which will try to move through the backup circuit is given by L 1 db i = m j 1 1 pu g iu i,j. 35 m j j=i Hence, the probability that there is a reference on any output line of the k i + 1 z i backup crossbar is given by db i, if db i < 1, z i z i qu i = 1, if db i 1. z i 36 The probability that the parent outlet of the faulty IN is going to be accessed by at least one reference is pu i+1 = 1 1 pu i1 u i,i k i g 1 qu i 1 u i,i z i. 37 In our approximate model we assume that the bandwidth contribution from the i + 1th- and higher-level references of the faulty ith-level cluster is proportional to pu i+1 and that of a good ith-level cluster is proportional to pu i+1. Let ddi,i be the average number of distinct ith-level references competing for all the child outlets of the faulty ith-level IN. The value of ddi,i can be expressed as ddi,i = k i g 1 1 pu ki 1 g iu i,i 1 qu zi iu i,i + g 1 1 pu ki g iu i,i g qu i u i,i zi. 38 Let dd i,i be the average number of distinct ith-level references competing for all the child outlets of a good ithlevel IN. The value of dd i,i can be expressed as dd i,i = k i 1 1 pu ki 1 iu i,i. 39 Total bandwidth of the HIN with one ith-level faulty IN can be expressed as BW = BW 1 1Ci pu 1 min i+1, dd i,i. pu i+1 dd i,i The parent inlet is one of the faulty inlets The average number of distinct references which will try to move through the backup circuit is given by L 1 db i = pd i+1 + j=i m j 1 1 pu iu i,j m j g Now the value of qu i can be determined using 36. Let x be the fraction of db i which are incluster references the references which came through g 1 child inlets. The value of x can be expressed as x = 1 L 1 db i j=i m j 1 1 pu g 1 iu i,j. 42 m j The probability that the parent outlet of the faulty IN is going to be accessed by at least one reference is pu i+1 =1 c1 pu i1 u i,i ki+1 g 1 xqu i 1 u i,i z i. 43 The total number of distinct ith-level references competing for all the child outlets of the faulty ith-level IN can be expressed as ddi,i = k i + 1 g 1 1 pu ki g iu i,i 1 qu iu i,i x zi + g pu iu i,i g 1 ki +1 g qu i u i,i x zi. 44

11 HIERARCHICAL NETWORKS FOR SHARED MEMORY MULTIPROCESSORS 157 The probability that an out of cluster reference the reference which comes from the parent IN will pass through the k i + 1 z i crossbar of the backup circuit is pd i+1 = qu iz i pd i+1 db i. 45 Let us use the term ci +1,g, as shown below, to indicate the degradation due to the fact that references from g inlets of an ith-level IN can not move through the main crossbar and one of these g inlets is the parent inlet: ci + 1,g = 1 pd i+1 pd i The effect of this degradation is similar to that of a faulty i + 1th-level child outlet of the multiple-link-based HIN. Thus, the total loss in bandwidth can be expressed as BW pu 1 min i+1 C i, dd i,i pu i+1 + dd i,i L i 1BW ci + 1,g C i. Hence, the total bandwidth of the HIN with one ith-level faulty IN is BW = BW 1 1Ci 1 + L i 1 ci + 1,g pu min i+1, dd i,i. 47 pu i+1 dd i,i 4. NUMERICAL RESULTS AND DISCUSSIONS We have analyzed a number of four-level hierarchical systems. In our analysis, the memory references of a processor were distributed among different memory modules in such a way that the INs at no particular level became the bottleneck. This means that we tried to have the same utilization for all the INs. Note that when the INs at a particular level become the bottleneck, the performance of a hierarchical system degrades severely, because most of the processors will be blocked for the memory modules at the corresponding level. We have developed simulation models to verify the accuracy of the analytical models which are presented in this paper. A simulation program was written to simulate the synchronous behavior of the hierarchical multiprocessors. Queues were maintained in the simulation model in order to keep track of the blocked processors. The simulation program was driven by a linear congruential random number generator. In order to determine the bandwidth with 95% confidence interval of a particular system for a given set of parameters, the program was run ten times with different seeds and each run was for 400,000 memory cycles. However, the first 50,000 memory cycles were ignored in order to avoid the initial transients Results for the multiple-link-based system In order to have the same utilization for all the INs of a multiple-link-based system, we determined the values of s i 0 i L 1 as follows: s 0 = m 0 m 0 + a 1 48 a 1 w 1 = m 0 + a 1 49 s i = w ik i b i, k i b i + a i+1 for 1 i L 2 50 w i+1 = w ia i+1, k i b i + a i+1 for 1 i L 2 51 s L 1 = w L Since the bandwidth available from the parent outlet of an ith-level IN is the same as that from the parent inlet of the ith-level IN, for a real system we can assume that a i = b i 1 i L 1. Table 1 shows the parameters of five different four-level HINs and their bandwidth under fault-free conditions. For all these systems a i = b i = 2 1 i L 1. Both the analytical and simulation results, shown in Table 1, were obtained for ψ = 1. The simulation results are shown with 95% confidence interval. From Table 1 it is seen that, for fault-free conditions, the results from the analytical model are very close to those from the simulation model. The error column of Table 1 shows the error in the analytical results, which is under 5%. Each one of the above five systems was analyzed under six different types of faulty conditions, and we assumed that a faulty outlet has only one faulty link. Table 2 shows the types of faults that were investigated for the above-mentioned five HINs. Tables 3 and 4 show the performance of the systems HIN1 HIN5 in the presence of different types of faulty parent outlets. These tables also show that the results from the analytical models are very close to those from the simulation models. For most of the cases, the results from the analytical models are within 5% of those from the simulation models. Comparing the results of Table 1 with those of Tables 3 and 4, we see that the performance of a system degrades in the presence of faults. The degradation depends on the position of the faults as well as on the number of faults. The performance degradation of different hierarchical systems in the presence of different types of faulty parent outlets is summarized in Table 5. This table shows that the performance of a system is more sensitive to the position of a fault rather than the number of faults. For example, the degradation due to one third-level faulty parent outlet see fault F1 is more than that due to two faulty parent outlets at the first and second levels see fault F2. Since the performance degradation of a HIN with a large number of processors is very significant when the highest level outlets become faulty, we further analyzed a HIN with a large number of processors and with the faults at the highest level in order to determine the accuracy of our analytical model at this high degradation. The parameters of the HIN which we investigated are k 1 = k 2 = k 3 = 8. The HIN

12 158 S. M. MAHMUD, L.T.SAMARATUNGA AND S. KOMMIDI TABLE 1. Some HINs and their bandwidths under fault-free conditions ψ = 1. Parameters of the HINs a i = b i = 2for1 i L 1 Bandwidth of fault-free HINs System n 0 = m 0 k 1 k 2 k 3 N = M Analytical Simulation Error in % HIN ± HIN ± HIN ± HIN ± HIN ± TABLE 2. Types of faults investigated for the systems HIN1 through HIN5 note that a faulty outlet has only one faulty link. Type of faults F1 F2 F3 F4 F5 F6 Description of the faults One third-level parent outlet is faulty Two parent outlets a first-level and a second-level outlet are faulty. Neither faulty outlet is the ancestor/descendant of the other Three parent outlets a first-level, a second-level and a third-level outlet are faulty. No faulty outlet is the ancestor/descendant of the other faulty outlets One third-level child outlet is faulty Two child outlets a first-level and a second-level outlet are faulty. Neither faulty outlet is the ancestor/descendant of the other Three child outlets a first-level, a second-level and a third-level outlet are faulty. No faulty outlet is the ancestor/descendant of the other faulty outlets TABLE 3. Bandwidths of the systems HIN1 HIN5 in the presence of faults F1 and F2 ψ = 1. Fault F1 Fault F2 System Analytical Simulation Error in % Analytical Simulation Error in % HIN ± ± HIN ± ± HIN ± ± HIN ± ± HIN ± ± TABLE 4. Bandwidths of the systems HIN1 HIN5 in the presence of faults F3 F6 ψ = 1. Bandwidth of the HINs under different types of faults System Fault F3 Fault F4 Fault F5 Fault F6 HIN HIN HIN HIN HIN

13 HIERARCHICAL NETWORKS FOR SHARED MEMORY MULTIPROCESSORS 159 TABLE 5. Loss in bandwidth in the presence of different types of faults ψ = 1. Loss in bandwidth in % under different types of faults System F1 F2 F3 F4 F5 F6 HIN HIN HIN HIN HIN FIGURE 3. Bandwidth versus the number of processors of a HIN in the presence of different types of faulty outlets k 1 = k 2 = k 3 = 8and a i = b i = 2for1 i 3. TABLE 6. Types of faults investigated for the HIN with k 1 = k 2 = k 3 = 8. Condition S1 S2 S3 S4 Description Fault-free HIN Four highest level parent outlets are faulty Four highest level child outlets are faulty Two highest level parent outlets and two highest level child outlets are faulty TABLE 7. Types of faults investigated for the HINs with faulttolerant INs. Condition X1 X2 X3 X4 Description Fault-free HIN A second-level crossbar has four faulty inlets A third-level crossbar has four faulty inlets A second-level crossbar has four faulty inlets and one of the faulty inlets is the parent inlet was analyzed under four different conditions, as shown in Table 6. The bandwidth of the HIN was determined for ψ = 1, and the number of processors in the system was varied from 1024 to The results from the simulation model are shown in Figure 3. This figure shows that for all three types of faults S2, S3 and S4 the degradation is almost the same. The system saturates after 3072 processors. At the saturation point, the degradation due to all three types of faults is about 20%, which is significant. Since the degradation due to all three types of faults is the same, we can conclude that the degradation due to a faulty parent or a child outlet is the same, as long as the fault occurs at the same level and a i = b i for all i. Since in practice it is unlikely that a given system will have too many faults at the same time and since the simulation is very time consuming, we did not try to determine how accurate our analytical model is for a system with too many faults Results for the HINs with fault-tolerant INs Even for the HINs with fault-tolerant INs, we determined the values of s i using The parameters of the HINs which we investigated are the same as those shown in Table 1, except the values of a i and b i for 1 i 3; note that for the HINs with fault-tolerant INs a i = b i = 1 for 1 i 3. We assumed that for all the fault-tolerant INs z i = 1for1 i 3. This means that the maximum number of references which can move through the backup circuit of any IN is one. Since the parameters k 1, k 2 and k 3 of the HINs with fault-tolerant INs which we investigated are the same as those of the HINs shown in Table 1, we still would like to use the notation HIN1, HIN2, HIN3,..., HIN5 to indicate these HINs. Each one these five HINs has been analyzed under four different conditions as shown in Table 7. Figure 4 shows the bandwidth of the HINs under different conditions. This figure also shows that the degradation in

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research CHAPTER 1 Introduction This chapter provides the background knowledge about Multistage Interconnection Networks. Metrics used for measuring the performance of various multistage interconnection networks

More information

A FAULT TOLERANT HIERARCHICAL INTERCONNECTION NETWORK AND ITS BANDWIDTH ANALYSIS

A FAULT TOLERANT HIERARCHICAL INTERCONNECTION NETWORK AND ITS BANDWIDTH ANALYSIS A FAULT TOLERANT HIERARCHICAL INTERCONNECTION NETWORK AND ITS BANDWIDTH ANALYSIS Abstract A number of hierarchical interconnection networks @INS) has been proposed in the literature which can be used for

More information

Multiprocessor Interconnection Networks- Part Three

Multiprocessor Interconnection Networks- Part Three Babylon University College of Information Technology Software Department Multiprocessor Interconnection Networks- Part Three By The k-ary n-cube Networks The k-ary n-cube network is a radix k cube with

More information

Scalable Cache Coherence

Scalable Cache Coherence arallel Computing Scalable Cache Coherence Hwansoo Han Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels of caches on a processor Large scale multiprocessors with hierarchy

More information

Laxmi N. Bhuyan, Ravi R. Iyer, Tahsin Askar, Ashwini K. Nanda and Mohan Kumar. Abstract

Laxmi N. Bhuyan, Ravi R. Iyer, Tahsin Askar, Ashwini K. Nanda and Mohan Kumar. Abstract Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor 1 Laxmi N. Bhuyan, Ravi R. Iyer, Tahsin Askar, Ashwini K. Nanda and Mohan Kumar Abstract A Multistage Bus Network (MBN)

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Scalable Cache Coherence Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels

More information

Shared Memory Architecture Part One

Shared Memory Architecture Part One Babylon University College of Information Technology Software Department Shared Memory Architecture Part One By Classification Of Shared Memory Systems The simplest shared memory system consists of one

More information

A MULTIPROCESSOR SYSTEM. Mariam A. Salih

A MULTIPROCESSOR SYSTEM. Mariam A. Salih A MULTIPROCESSOR SYSTEM Mariam A. Salih Multiprocessors classification. interconnection networks (INs) Mode of Operation Control Strategy switching techniques Topology BUS-BASED DYNAMIC INTERCONNECTION

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

IV. PACKET SWITCH ARCHITECTURES

IV. PACKET SWITCH ARCHITECTURES IV. PACKET SWITCH ARCHITECTURES (a) General Concept - as packet arrives at switch, destination (and possibly source) field in packet header is used as index into routing tables specifying next switch in

More information

Scalable Cache Coherence

Scalable Cache Coherence Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient

More information

Analysis of Simulation Results

Analysis of Simulation Results Analysis of Simulation Results Raj Jain Washington University Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse574-08/

More information

Hyper-Butterfly Network: A Scalable Optimally Fault Tolerant Architecture

Hyper-Butterfly Network: A Scalable Optimally Fault Tolerant Architecture Hyper-Butterfly Network: A Scalable Optimally Fault Tolerant Architecture Wei Shi and Pradip K Srimani Department of Computer Science Colorado State University Ft. Collins, CO 80523 Abstract Bounded degree

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

Loopback: Exploiting Collaborative Caches for Large-Scale Streaming

Loopback: Exploiting Collaborative Caches for Large-Scale Streaming Loopback: Exploiting Collaborative Caches for Large-Scale Streaming Ewa Kusmierek Yingfei Dong David Du Poznan Supercomputing and Dept. of Electrical Engineering Dept. of Computer Science Networking Center

More information

Part IV. Chapter 15 - Introduction to MIMD Architectures

Part IV. Chapter 15 - Introduction to MIMD Architectures D. Sima, T. J. Fountain, P. Kacsuk dvanced Computer rchitectures Part IV. Chapter 15 - Introduction to MIMD rchitectures Thread and process-level parallel architectures are typically realised by MIMD (Multiple

More information

Physical characteristics (such as packaging, volatility, and erasability Organization.

Physical characteristics (such as packaging, volatility, and erasability Organization. CS 320 Ch 4 Cache Memory 1. The author list 8 classifications for memory systems; Location Capacity Unit of transfer Access method (there are four:sequential, Direct, Random, and Associative) Performance

More information

Intel iapx 432-VLSI building blocks for a fault-tolerant computer

Intel iapx 432-VLSI building blocks for a fault-tolerant computer Intel iapx 432-VLSI building blocks for a fault-tolerant computer by DAVE JOHNSON, DAVE BUDDE, DAVE CARSON, and CRAIG PETERSON Intel Corporation Aloha, Oregon ABSTRACT Early in 1983 two new VLSI components

More information

Fault tolerant scheduling in real time systems

Fault tolerant scheduling in real time systems tolerant scheduling in real time systems Afrin Shafiuddin Department of Electrical and Computer Engineering University of Wisconsin-Madison shafiuddin@wisc.edu Swetha Srinivasan Department of Electrical

More information

In-Vehicle Network Architecture for the Next-Generation Vehicles SAE TECHNICAL PAPER SERIES

In-Vehicle Network Architecture for the Next-Generation Vehicles SAE TECHNICAL PAPER SERIES 2005-01-1531 SAE TECHNICAL PAPER SERIES In- Network Architecture for the Next-Generation s Syed Masud Mahmud Department of Electrical & Computer Engineering, Wayne State University Sheran Alles Ford Motor

More information

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015 Distributed Systems Pre-Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2015 October 2, 2015 CS 417 - Paul Krzyzanowski 1 Selected Questions From Past Exams October 2, 2015 CS 417 - Paul Krzyzanowski

More information

Multiprocessors Interconnection Networks

Multiprocessors Interconnection Networks Babylon University College of Information Technology Software Department Multiprocessors Interconnection Networks By Interconnection Networks Taxonomy An interconnection network could be either static

More information

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks Jose Duato Abstract Second generation multicomputers use wormhole routing, allowing a very low channel set-up time and drastically reducing

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Parallel Architecture. Sathish Vadhiyar

Parallel Architecture. Sathish Vadhiyar Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

Multilevel Fault-tolerance for Designing Dependable Wireless Networks

Multilevel Fault-tolerance for Designing Dependable Wireless Networks Multilevel Fault-tolerance for Designing Dependable Wireless Networks Upkar Varshney Department of Computer Information Systems Georgia State University Atlanta, Georgia 30302-4015 E-mail: uvarshney@gsu.edu

More information

Chapter 4 NETWORK HARDWARE

Chapter 4 NETWORK HARDWARE Chapter 4 NETWORK HARDWARE 1 Network Devices As Organizations grow, so do their networks Growth in number of users Geographical Growth Network Devices : Are products used to expand or connect networks.

More information

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃLPHÃIRUÃDÃSDFHLPH $GDSWLYHÃURFHVVLQJÃ$OJRULWKPÃRQÃDÃDUDOOHOÃ(PEHGGHG \VWHP Jack M. West and John K. Antonio Department of Computer Science, P.O. Box, Texas Tech University,

More information

Graph Search. Adnan Aziz

Graph Search. Adnan Aziz Graph Search Adnan Aziz Based on CLRS, Ch 22. Recall encountered graphs several weeks ago (CLRS B.4) restricted our attention to definitions, terminology, properties Now we ll see how to perform basic

More information

Memory Design. Cache Memory. Processor operates much faster than the main memory can.

Memory Design. Cache Memory. Processor operates much faster than the main memory can. Memory Design Cache Memory Processor operates much faster than the main memory can. To ameliorate the sitution, a high speed memory called a cache memory placed between the processor and main memory. Barry

More information

Multiprocessor Interconnection Networks

Multiprocessor Interconnection Networks Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 19, 1998 Topics Network design space Contention Active messages Networks Design Options: Topology Routing Direct vs. Indirect Physical

More information

Optimal Subcube Fault Tolerance in a Circuit-Switched Hypercube

Optimal Subcube Fault Tolerance in a Circuit-Switched Hypercube Optimal Subcube Fault Tolerance in a Circuit-Switched Hypercube Baback A. Izadi Dept. of Elect. Eng. Tech. DeV ry Institute of Technology Columbus, OH 43209 bai @devrycols.edu Fiisun ozgiiner Department

More information

Data Communication and Parallel Computing on Twisted Hypercubes

Data Communication and Parallel Computing on Twisted Hypercubes Data Communication and Parallel Computing on Twisted Hypercubes E. Abuelrub, Department of Computer Science, Zarqa Private University, Jordan Abstract- Massively parallel distributed-memory architectures

More information

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < : A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks Visvasuresh Victor Govindaswamy,

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

Appendix B. Standards-Track TCP Evaluation

Appendix B. Standards-Track TCP Evaluation 215 Appendix B Standards-Track TCP Evaluation In this appendix, I present the results of a study of standards-track TCP error recovery and queue management mechanisms. I consider standards-track TCP error

More information

A Framework for Clustering Massive Text and Categorical Data Streams

A Framework for Clustering Massive Text and Categorical Data Streams A Framework for Clustering Massive Text and Categorical Data Streams Charu C. Aggarwal IBM T. J. Watson Research Center charu@us.ibm.com Philip S. Yu IBM T. J.Watson Research Center psyu@us.ibm.com Abstract

More information

UNINFORMED SEARCH. Announcements Reading Section 3.4, especially 3.4.1, 3.4.2, 3.4.3, 3.4.5

UNINFORMED SEARCH. Announcements Reading Section 3.4, especially 3.4.1, 3.4.2, 3.4.3, 3.4.5 UNINFORMED SEARCH Announcements Reading Section 3.4, especially 3.4.1, 3.4.2, 3.4.3, 3.4.5 Robbie has no idea where room X is, and may have little choice but to try going down this corridor and that. On

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

A method to speedily pairwise compare in AHP and ANP

A method to speedily pairwise compare in AHP and ANP ISAHP 2005, Honolulu, Hawaii, July 8-10, 2005 A method to speedily pairwise compare in AHP and ANP Kazutomo Nishizawa Department of Mathematical Information Engineering, College of Industrial Technology,

More information

A Synchronization Algorithm for Distributed Systems

A Synchronization Algorithm for Distributed Systems A Synchronization Algorithm for Distributed Systems Tai-Kuo Woo Department of Computer Science Jacksonville University Jacksonville, FL 32211 Kenneth Block Department of Computer and Information Science

More information

Course Snapshot. The Next Few Classes. Parallel versus Distributed Systems. Distributed Systems. We have covered all the fundamental OS components:

Course Snapshot. The Next Few Classes. Parallel versus Distributed Systems. Distributed Systems. We have covered all the fundamental OS components: Course Snapshot The Next Few Classes We have covered all the fundamental OS components: Architecture and OS interactions Processes and threads Synchronization and deadlock Process scheduling Memory management

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Scheduling Algorithms to Minimize Session Delays

Scheduling Algorithms to Minimize Session Delays Scheduling Algorithms to Minimize Session Delays Nandita Dukkipati and David Gutierrez A Motivation I INTRODUCTION TCP flows constitute the majority of the traffic volume in the Internet today Most of

More information

On Using Machine Learning for Logic BIST

On Using Machine Learning for Logic BIST On Using Machine Learning for Logic BIST Christophe FAGOT Patrick GIRARD Christian LANDRAULT Laboratoire d Informatique de Robotique et de Microélectronique de Montpellier, UMR 5506 UNIVERSITE MONTPELLIER

More information

Course Snapshot. The Next Few Classes

Course Snapshot. The Next Few Classes Course Snapshot We have covered all the fundamental OS components: Architecture and OS interactions Processes and threads Synchronization and deadlock Process scheduling Memory management File systems

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Removing Belady s Anomaly from Caches with Prefetch Data

Removing Belady s Anomaly from Caches with Prefetch Data Removing Belady s Anomaly from Caches with Prefetch Data Elizabeth Varki University of New Hampshire varki@cs.unh.edu Abstract Belady s anomaly occurs when a small cache gets more hits than a larger cache,

More information

Max-Flow Protection using Network Coding

Max-Flow Protection using Network Coding Max-Flow Protection using Network Coding Osameh M. Al-Kofahi Department of Computer Engineering Yarmouk University, Irbid, Jordan Ahmed E. Kamal Department of Electrical and Computer Engineering Iowa State

More information

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network Congestion-free Routing of Streaming Multimedia Content in BMIN-based Parallel Systems Harish Sethu Department of Electrical and Computer Engineering Drexel University Philadelphia, PA 19104, USA sethu@ece.drexel.edu

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

Caching video contents in IPTV systems with hierarchical architecture

Caching video contents in IPTV systems with hierarchical architecture Caching video contents in IPTV systems with hierarchical architecture Lydia Chen 1, Michela Meo 2 and Alessandra Scicchitano 1 1. IBM Zurich Research Lab email: {yic,als}@zurich.ibm.com 2. Politecnico

More information

Dr e v prasad Dt

Dr e v prasad Dt Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction

More information

A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers

A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL 3343 Abstract The

More information

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Mathematical and Algorithmic Foundations Linear Programming and Matchings Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What

More information

Dalimir Orfanus (IFI UiO + ABB CRC), , Cyber Physical Systems Clustering in Wireless Sensor Networks 2 nd part : Examples

Dalimir Orfanus (IFI UiO + ABB CRC), , Cyber Physical Systems Clustering in Wireless Sensor Networks 2 nd part : Examples Dalimir Orfanus (IFI UiO + ABB CRC), 27.10.2011, Cyber Physical Systems Clustering in Wireless Sensor Networks 2 nd part : Examples Clustering in Wireless Sensor Networks Agenda LEACH Energy efficient

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Lecture 19. Lecturer: Aleksander Mądry Scribes: Chidambaram Annamalai and Carsten Moldenhauer

Lecture 19. Lecturer: Aleksander Mądry Scribes: Chidambaram Annamalai and Carsten Moldenhauer CS-621 Theory Gems November 21, 2012 Lecture 19 Lecturer: Aleksander Mądry Scribes: Chidambaram Annamalai and Carsten Moldenhauer 1 Introduction We continue our exploration of streaming algorithms. First,

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Cache Coherence. Todd C. Mowry CS 740 November 10, Topics. The Cache Coherence Problem Snoopy Protocols Directory Protocols

Cache Coherence. Todd C. Mowry CS 740 November 10, Topics. The Cache Coherence Problem Snoopy Protocols Directory Protocols Cache Coherence Todd C. Mowry CS 740 November 10, 1998 Topics The Cache Coherence roblem Snoopy rotocols Directory rotocols The Cache Coherence roblem Caches are critical to modern high-speed processors

More information

Determining the Number of CPUs for Query Processing

Determining the Number of CPUs for Query Processing Determining the Number of CPUs for Query Processing Fatemah Panahi Elizabeth Soechting CS747 Advanced Computer Systems Analysis Techniques The University of Wisconsin-Madison fatemeh@cs.wisc.edu, eas@cs.wisc.edu

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

On Pairwise Connectivity of Wireless Multihop Networks

On Pairwise Connectivity of Wireless Multihop Networks On Pairwise Connectivity of Wireless Multihop Networks Fangting Sun and Mark Shayman Department of Electrical and Computer Engineering University of Maryland, College Park, MD 2742 {ftsun, shayman}@eng.umd.edu

More information

Tutorial. Question There are no forward edges. 4. For each back edge u, v, we have 0 d[v] d[u].

Tutorial. Question There are no forward edges. 4. For each back edge u, v, we have 0 d[v] d[u]. Tutorial Question 1 A depth-first forest classifies the edges of a graph into tree, back, forward, and cross edges. A breadth-first tree can also be used to classify the edges reachable from the source

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

Fault-tolerant Distributed-Shared-Memory on a Broadcast-based Interconnection Network

Fault-tolerant Distributed-Shared-Memory on a Broadcast-based Interconnection Network Fault-tolerant Distributed-Shared-Memory on a Broadcast-based Interconnection Network Diana Hecht 1 and Constantine Katsinis 2 1 Electrical and Computer Engineering, University of Alabama in Huntsville,

More information

Warm-up as you walk in

Warm-up as you walk in arm-up as you walk in Given these N=10 observations of the world: hat is the approximate value for P c a, +b? A. 1/10 B. 5/10. 1/4 D. 1/5 E. I m not sure a, b, +c +a, b, +c a, b, +c a, +b, +c +a, b, +c

More information

Eulerian disjoint paths problem in grid graphs is NP-complete

Eulerian disjoint paths problem in grid graphs is NP-complete Discrete Applied Mathematics 143 (2004) 336 341 Notes Eulerian disjoint paths problem in grid graphs is NP-complete Daniel Marx www.elsevier.com/locate/dam Department of Computer Science and Information

More information

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which

More information

Replication in Mirrored Disk Systems

Replication in Mirrored Disk Systems Replication in Mirrored Disk Systems Athena Vakali and Yannis Manolopoulos Department of Informatics, Aristotle University 54006 Thessaloniki, Greece {avakali,manolopo}@athena.auth.gr Abstract. In this

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

CERIAS Tech Report Autonomous Transaction Processing Using Data Dependency in Mobile Environments by I Chung, B Bhargava, M Mahoui, L Lilien

CERIAS Tech Report Autonomous Transaction Processing Using Data Dependency in Mobile Environments by I Chung, B Bhargava, M Mahoui, L Lilien CERIAS Tech Report 2003-56 Autonomous Transaction Processing Using Data Dependency in Mobile Environments by I Chung, B Bhargava, M Mahoui, L Lilien Center for Education and Research Information Assurance

More information

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University. Operating Systems Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring 2014 Paul Krzyzanowski Rutgers University Spring 2015 March 27, 2015 2015 Paul Krzyzanowski 1 Exam 2 2012 Question 2a One of

More information

CS 204 Lecture Notes on Elementary Network Analysis

CS 204 Lecture Notes on Elementary Network Analysis CS 204 Lecture Notes on Elementary Network Analysis Mart Molle Department of Computer Science and Engineering University of California, Riverside CA 92521 mart@cs.ucr.edu October 18, 2006 1 First-Order

More information

Seminar on Algorithms and Data Structures: Multiple-Edge-Fault-Tolerant Approximate Shortest-Path Trees [1]

Seminar on Algorithms and Data Structures: Multiple-Edge-Fault-Tolerant Approximate Shortest-Path Trees [1] Seminar on Algorithms and Data Structures: Multiple-Edge-Fault-Tolerant Approximate Shortest-Path Trees [1] Philipp Rimle April 14, 2018 1 Introduction Given a graph with positively real-weighted undirected

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

Zonal based Deterministic Energy Efficient Clustering Protocol for WSNs

Zonal based Deterministic Energy Efficient Clustering Protocol for WSNs Zonal based Deterministic Energy Efficient Clustering Protocol for WSNs Prabhleen Kaur Punjab Institute of Technology, Kapurthala (PTU Main Campus), Punjab India ABSTRACT Wireless Sensor Network has gained

More information

An Algorithm for k-pairwise Cluster-fault-tolerant Disjoint Paths in a Burnt Pancake Graph

An Algorithm for k-pairwise Cluster-fault-tolerant Disjoint Paths in a Burnt Pancake Graph 2015 International Conference on Computational Science and Computational Intelligence An Algorithm for k-pairwise Cluster-fault-tolerant Disjoint Paths in a Burnt Pancake Graph Masato Tokuda, Yuki Hirai,

More information

Binning of Devices with X-IDDQ

Binning of Devices with X-IDDQ Binning of Devices with X-IDDQ Prasanna M. Ramakrishna Masters Thesis Graduate Committee Dr. Anura P. Jayasumana Adviser Dr. Yashwant K. Malaiya Co-Adviser Dr. Steven C. Reising Member Dept. of Electrical

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Metodologie di progetto HW Il test di circuiti digitali

Metodologie di progetto HW Il test di circuiti digitali Metodologie di progetto HW Il test di circuiti digitali Introduzione Versione del 9/4/8 Metodologie di progetto HW Il test di circuiti digitali Introduction VLSI Realization Process Customer s need Determine

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

Distributed Systems. Lecture 4 Othon Michail COMP 212 1/27

Distributed Systems. Lecture 4 Othon Michail COMP 212 1/27 Distributed Systems COMP 212 Lecture 4 Othon Michail 1/27 What is a Distributed System? A distributed system is: A collection of independent computers that appears to its users as a single coherent system

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer

More information

Characteristics of Mult l ip i ro r ce c ssors r

Characteristics of Mult l ip i ro r ce c ssors r Characteristics of Multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input output equipment. The term processor in multiprocessor can mean either a central

More information

Scheduling Periodic and Aperiodic. John P. Lehoczky and Sandra R. Thuel. and both hard and soft deadline aperiodic tasks using xed-priority methods.

Scheduling Periodic and Aperiodic. John P. Lehoczky and Sandra R. Thuel. and both hard and soft deadline aperiodic tasks using xed-priority methods. Chapter 8 Scheduling Periodic and Aperiodic Tasks Using the Slack Stealing Algorithm John P. Lehoczky and Sandra R. Thuel This chapter discusses the problem of jointly scheduling hard deadline periodic

More information

Metodologie di progetto HW Il test di circuiti digitali

Metodologie di progetto HW Il test di circuiti digitali Metodologie di progetto HW Il test di circuiti digitali Introduzione Versione del 9/4/8 Metodologie di progetto HW Il test di circuiti digitali Introduction Pag. 2 VLSI Realization Process Customer s need

More information

Dynamic Wavelength Assignment for WDM All-Optical Tree Networks

Dynamic Wavelength Assignment for WDM All-Optical Tree Networks Dynamic Wavelength Assignment for WDM All-Optical Tree Networks Poompat Saengudomlert, Eytan H. Modiano, and Robert G. Gallager Laboratory for Information and Decision Systems Massachusetts Institute of

More information

Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network

Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network Thomas Nolte, Hans Hansson, and Christer Norström Mälardalen Real-Time Research Centre Department of Computer Engineering

More information

ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS

ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS Proceedings of the 13th International Conference on Parallel Processing, Bellaire, Michigan, pp. 32-41, August 1984. ERROR RECOVERY I MULTICOMPUTERS USIG GLOBAL CHECKPOITS Yuval Tamir and Carlo H. Séquin

More information

Algorithms: Lecture 10. Chalmers University of Technology

Algorithms: Lecture 10. Chalmers University of Technology Algorithms: Lecture 10 Chalmers University of Technology Today s Topics Basic Definitions Path, Cycle, Tree, Connectivity, etc. Graph Traversal Depth First Search Breadth First Search Testing Bipartatiness

More information

RED behavior with different packet sizes

RED behavior with different packet sizes RED behavior with different packet sizes Stefaan De Cnodder, Omar Elloumi *, Kenny Pauwels Traffic and Routing Technologies project Alcatel Corporate Research Center, Francis Wellesplein, 1-18 Antwerp,

More information

Two-Stage Fault-Tolerant k-ary Tree Multiprocessors

Two-Stage Fault-Tolerant k-ary Tree Multiprocessors Two-Stage Fault-Tolerant k-ary Tree Multiprocessors Baback A. Izadi Department of Electrical and Computer Engineering State University of New York 75 South Manheim Blvd. New Paltz, NY 1561 U.S.A. bai@engr.newpaltz.edu

More information

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture

More information