Chapter 1 Introduction

Size: px
Start display at page:

Download "Chapter 1 Introduction"

Transcription

1 Chapter 1 Introduction The advent of synthesis systems for Very Large Scale Integrated Circuits (VLSI) and automated design environments for Application Specific Integrated Circuits (ASIC) have allowed digital systems designers to place large numbers of gates on a single IC in record time. Generation of test patterns for these circuits to insure that they are fault free however, still consumes considerable time. Currently, up to one third of the design time for ASICs is spent generating tests [1]. Many algorithms have been developed to automate the test generation process [2],[3],[4], but the test generation problem has been shown to be NP complete [5]. This thesis deals with the application of parallel processing techniques to Automatic Test Pattern Generation (ATPG) to address this problem. 1.1 Motivation There are two basic approaches to solve the Automatic Test Pattern Generation (ATPG) problem; algorithmic test pattern generation, and statistical or pseudorandom test pattern generation. In the algorithmic approach, a test is generated for each fault in the circuit using a specific ATPG algorithm. Most of these algorithms can be proven to be complete. That is, they are guaranteed to find a test for a fault if a test does exist. However, this process may involve a search of the entire solution space which is computationally expensive. Statistical or pseudorandom test pattern generation on the other hand, selects test patterns at random, or using some heuristic, and determines the faults that are detected by these patterns using fault simulation. Test patterns are selected and added to the test set if they detect any previously undetected faults. This process continues until some required fault coverage or computation time limit is reached. This method finds tests for the easyto-detect faults very quickly, but becomes less and less efficient as the easy-to-detect faults 1

2 2 are removed from the fault list and only the hard-to-detect faults are left. In many cases, the required fault coverage can not be achieved without excessive computation times. An efficient combined method for solving the ATPG problem uses statistical methods to find tests for the easy-to-detect faults on the fault list and switches to an algorithmic method to find tests for the hard-to-detect faults which remain. Using either this method or the purely algorithmic method, a significant portion of the computation time will be spent generating tests for the hard-to-detect faults algorithmically. Therefore, finding a method to speed up this process should reduce the overall computation time considerably. Much research has been done in increasing the efficiency of algorithms for ATPG through heuristics [6],[7],[8]. However, the overall gains that can be achieved through these improvements are limited and will not be adequate for future needs. This statement can be justified by two facts. First, no system currently presented in the literature has been proven on circuits that contain combinational logic blocks larger than 3 or 4 thousand gates. Second, most sequential ATPG techniques are based on combinational ATPG algorithms [9],[10]. These systems require that multiple passes be made through the ATPG process in order to generate a test for a single fault. Therefore, any excessive runtimes will be multiplied by this process and achieving fast combinational ATPG becomes even more important. An alternate approach to heuristics in reducing computation times is to use parallel processing techniques. Parallel processing machines are becoming available for general use and are being used to solve other problems in Computer Aided Design [11]. Most of these readily available parallel processors are distributed memory machines due to cost and scalability factors. Operating systems are also being developed that allow simple networks of workstations to be used as distributed memory parallel computing environments [12]. Previous efforts to parallelize the ATPG problem can be placed in one of five categories; fault partitioning, heuristic parallelization, search space partitioning, algorithmic partitioning, and topological partitioning [13],[14]. These techniques, which will be more fully detailed in Chapter 2, usually require each processing node in a

3 3 distributed memory system to contain the entire circuit description. However, the increasing size of VLSI circuits has caused the amount of memory required to process these circuits to grow rapidly. Topological partitioning techniques can be used to distribute the circuit database across several processors thereby increasing the size of the largest circuit that can be processed on a given distributed memory configuration. For example, results from the EST system [15] have shown that the memory requirements for processing the ISCAS 85 [16] benchmark circuit C7552 can be over 9 MBytes. This circuit contains only 3512 gates which is relatively small compared to state of the art VLSI devices. At this rate, a circuit of only 10,000 gates could take as much as 25 MBytes of memory to process. Typical commercially available distributed memory multicomputers must be able to take advantage of their entire memory space across all nodes to process circuits as large as, or larger than this. Thus, topological partitioning of the database across several processors will be required to perform ATPG on these larger circuits. Previous research in topological partitioning for ATPG [17],[18] has focused on the D-algorithm [2]. The initial effort contained in [17] was directed toward a shared memory parallel processor, hence, the parallelism exploited was fairly fine grained. The results of the effort to port this system to a distributed memory multicomputer were mixed [18]. Some speedup was obtained, but the large number of messages required even for simple circuits significantly limited the speedup. The parallelism exploited in these two systems was limited to a parallelized implication procedure and fault partitioning which was used to keep idle processors busy on other faults. The fact that the D-algorithm, which has been shown to be very inefficient for some classes of circuits, was used in these systems also increased the overall runtimes and limited the speedup possible. One of the most promising results presented in [18], however, was that topological partitioning resulted in significant reductions in the memory required in each processing node. For these reasons, research into topological partitioning with a more efficient ATPG algorithm such as PODEM [3] was undertaken.

4 4 1.2 Goals This research focused on a system that is based on topological partitioning of the circuit-under-test across several processing nodes. The goals of this research included expanding on previous work [17],[18], and extending it to a more efficient base ATPG algorithm. Analytical models of the topologically partitioned ATPG process were developed to help predict the performance that could be expected. These models, once validated through experimentation, were then used to predict the performance of the ATPG system. The model was then used to determine the communications latency required on a multicomputer to efficiently utilize this technique to achieve speedups. Another goal of this research was to develop parallelizations of the base ATPG algorithm to increase speedup. Investigations of how these parallelization methods could be used in conjunction with other parallelization methods such as fault or search space partitioning were also undertaken. Finally, this research outlined the additional work that will be required to make topological partitioning a valid addition to ATPG systems for large scale designs. 1.3 Organization This dissertation is divided into 7 Chapters including this introduction. Chapter 2 contains background material which includes a brief review of serial and parallel ATPG algorithms. A discussion of the ES-KIT distributed memory multicomputer ES-TGS parallel ATPG system used in this research is also included in Chapter 2. Chapter 3 describes the implementation details and results of the serial Topological Partitioning Test Generation System (TOP-TGS) developed for this research. Chapter 4 details the analytical model of the serial ATPG process and topologically partitioned ATPG. Predicted results are also developed in this chapter and compared to the actual results presented in Chapter 3. Chapter 5 details the algorithmic parallelizations developed for the TOP-TGS system and presents their results. Chapter 6 presents the results of using multiple parallelizations in the TOP-TGS system. Finally, Chapter 7 presents conclusions and future work. The future work section includes a discussion of how many of the heuristics presented in the literature could be implemented in a topologically partitioned ATPG system.

5 Chapter 2 Background This chapter presents the background material for the thesis. A brief presentation of serial ATPG algorithms is included to familiarize the reader with the ATPG problem. Next a discussion of the techniques available to parallelize ATPG is presented. Finally, the distributed memory multicomputer, ES-KIT, and the parallel test generation system, ES- TGS, that this work was based upon is presented. 2.1 Serial ATPG Most parallel ATPG algorithms, including the ones to be presented here, are based upon widely known serial ATPG algorithms. For a detailed discussion of ATPG algorithms, the interested reader is referred to [19] and [20]. For this research, we will consider only algorithms designed to generate tests for single stuck-at faults. These are physical faults that cause a node in the circuit to behave as if it were stuck at a logic 0 or a logic 1 level. The single stuck-at fault model is a simplification of the types of faults found in real circuits, but empirical evidence shows that for most common implementation technologies, it provides very high coverage of physical faults [21]. Automatic Test Pattern Generation can be thought of as the process of searching through the entire space of possible input patterns for a circuit in an attempt to find one which causes the output to differ depending on whether or not a circuit contains a specific fault. The size of the search space is 2 n where n is the number of inputs to the circuit. Because the search space is so large, many techniques have been developed to guide the search process. Most of the search techniques in popular use today fall into the class of algorithms called path runners. Path runners attempt to detect a fault by sensitizing it, and then sensitizing a path between the faulty node and a primary output. Sensitizing a fault consists of setting the value on the faulty node opposite the stuck-at vault, i.e. setting a logic 1 on a node being tested for a stuck-at 0 fault. Sensitizing a path consists of setting logic 5

6 6 values along a path in the circuit from the faulty node to the primary outputs such that a change of the logic value on the node is observable at the primary output. For example, if an AND gate is in the sensitized path, setting all of its inputs not in the path to a logic 1 will result in its output value following the value of the inputs in the path. In order for an algorithm to be complete, it must search all paths and combinations of paths in the circuit from the faulty node to the primary outputs. The major difference between the path sensitization algorithms presented in [2],[3],[4] is the method and order by which the actual logic values are assigned to nodes in the circuit. The D-algorithm [2], attempts to set logic values on nodes in the circuit by assigning values to nodes which precede them in the circuit topology. The PODEM [3] algorithm attempts to assign values to nodes in the circuit by assigning values to the circuit s primary inputs only. Because typical circuits have fewer inputs than internal nodes, frequently by orders of magnitude, the search space enumerated by PODEM is much smaller than the D-algorithm. Because this reduction of the search space makes PODEM much more efficient than the D-algorithm, PODEM is the basis for most follow-on algorithms that have been developed for ATPG [5],[6],[7],[8],[15]. For this reason, PODEM was also selected as the base algorithm for this work whereas the D-algorithm served as the basis for previous work in topological partitioning [17],[18]. A brief example of the PODEM algorithm will now be presented to familiarize the reader with this technique. Figure 2.1 contains a diagram of the circuit that will be used for the purposes of this discussion. The PODEM algorithm consists of 3 major processes; selection of the next objective, backtracing that objective to an unassigned primary input to determine the value that should be assigned to it, and assigning that value to the input and implying all node values in the circuit affected by that assignment. This latter simulation-like process is called forward implication. For example in Figure 2.1, consider the fault of line J stuck-at a logical 1. The first objective selected might be to sensitize the fault by setting node J to 0. Since both of the

7 7 A B G J s-a-1 C D H I L E F K Figure 2.1 Circuit under test. inputs to the OR gate which drives J need to be set to 0 in order to set J to a 0, setting one of them to this value becomes the next objective.the next step would be to backtrace this objective to a primary input. Figure 2.2 illustrates this process. Assume for the sake of discussion that line G is chosen to be set first. Typically, testability measures such as controllability and observability measures are used to assist in making these types of decisions. In order to set line G to a 0, either of the inputs to the AND gate that drives it must be set to 0. Input A is selected and it along with its value are pushed on a stack that is used to hold the input search space. The next step is to actually assign a value of 0 to input A and simulate the affect of this assignment on the rest of the circuit. This processes is called forward implication. Assigning a 0 to A will cause the AND gate to drive node G to a 0. No other nodes in the circuit will be affected by this assignment. Since the objective of setting node J to a 0 has not been satisfied, it will be backtraced again. This backtrace will determine that node I must be set to 0 and this must be accomplished by setting input E to 0 and node H to 0. Node H may be set to 0 by setting input C to 0. This assignment is pushed on the stack and implication is performed. Next, input E is pushed on the stack with its value and forward implication is done.this implication will result in node J taking on the required 0 value. This 0 value represents the value node J would have in the fault-free circuit, but node J will assume a value of 1

8 8 backtrace that objective to line A set to 0 A B backtrace that objective to line G set to 0 G J first objective: set line J to 0 s-a-1 C D H I L E F K Figure 2.2 Backtracing objective J = 0. in the presence of a stuck-at 1 fault. This set of values is represented using the D notation of [2]. A node with a value of D represents a 1 in the good circuit and a 0 in the faulty circuit. A node with a value of D represents a value of 0 in the good circuit and a 1 in the faulty circuit. Since node J is 0 in the good circuit and 1 in the faulty, its value will be represented by a D. Figure 2.3 shows the state of the circuit and the input stack after the assignments A= 0, C= 0, and E= 0. A=0 B C=0 D G=0 H=0 I=0 J=D s-a-0 L C 0 A 0 E=0 F K E 0 Figure 2.3 Circuit state and input stack. The final step in generating a test for node J stuck-at 1 is to make the value on node

9 9 J visible on the primary output. This process is called propagation and it involves sensitizing a path from the node to the output. This path is sensitized by setting all inputs to gates on the path to their non-controlling values. The non-controlling values for an AND or NAND gate is 1 and for an OR or NOR gate is 0. XOR and XNOR gates do not have a controlling value, so either a 0 or 1 may work. All gates which have a D or D on one of their inputs and an unknown, X, on their outputs are on potential paths. These gates constitute what is known as the D-frontier. In the example circuit, the gate OR gate that drive node L is on the D-frontier at this point. Node K must be set to the non-controlling 0 value so that the value on the output L follows the value of node J. This task may be accomplished by setting input F to a 0. The final circuit state and input stack are illustrated in Figure 2.4. A A=0 B C=0 D G=0 H=0 I=0 J=D s-a-0 L=D E C 0 0 E=0 F=0 K=0 F 0 0 Test Figure 2.4 Circuit state and input stack. A test vector is generated by simply removing all input assignments from of the stack. All inputs not in the stack are assigned don t-cares in the test vector. In the example circuit, the vector 0X0X00 would be the test vector for fault J stuck-at 1. If, during the assignments of input values, an assignment is made that makes a test no longer possible, the last input assignment is popped off of the stack and the alternate value is tried. This process is called backtracking and it continues until a circuit state where

10 10 a test is again possible is reached or the stack becomes empty. Situations that may cause a test to be impossible include setting the faulty node to the same value as the stuck-at fault, and the disappearance of the D-frontier. For example, consider the circuit of Figure 2.1 with a stuck-at 0 fault on node K. The first objective would be to set node K to a 1. The first backtrace could lead to the assignment of a logical 0 on input F. Then in order to set node K to a 1, node I must be assigned a 1. This task may be accomplished by assigning E= 1. However, when node I is set to 1, node J becomes 1 and node L is forced to a 1 regardless of the value on node K. This assignment makes a test impossible with this input vector because the fault cannot be observed at the primary output. Figure 2.5 illustrates this situation. A B G J=1 F C D H I=1 L=1 E 0 E=1 F=0 K=D s-a-0 1 Test not possible Figure 2.5 Circuit state and input stack. Backtracking will occur at this point and the alternate assignment of node E= 0 will be tried. This assignment will also cause a test to be impossible because node K will then be set to 0, the same value as the stuck-at fault. Since assignments of both logic values to node E did not result in a test, backtracking to the previous assignment must occur. Thus, node F is now assigned a value of 1. A test can now be found by assigning a 0 value to nodes E, C, and A as shown in Figure 2.6. Backtracking results in an ordered search of the solution space and results in implicit pruning of the search tree when inconsistent states are encountered such as

11 11 A=0 B C=0 D E=0 F=1 G=0 J=0 0 F 1 0 E 1 E H=0 I=0 L=D Test not possible C 0 0 K=D s-a-0 A 0 Test Figure 2.6 Circuit state and input stack. assigning a value of 0 to input F in the previous example. By checking the two alternate assignments of input E and determining that they are both inconsistent, the entire portion of the search tree below F= 0 may be pruned. 2.2 Parallel ATPG This section provides a brief discussion of the methods that have been used to parallelize the ATPG process. For a more detailed presentation of parallel ATPG techniques, the reader is referred to [14]. These techniques can be divided up into 5 categories [13],[14]: 1) Fault Partitioning 2) Heuristic Parallelization 3) Search Space Partitioning 4) Functional (Algorithmic) Partitioning 5) Topological Partitioning The simplest way to parallelize the ATPG problem is divide up the fault list among multiple processors. Each processor then generates tests for each fault on its portion of the fault list until all faults have been detected. This scheme results in each processor having a completely separate task in that it performs the entire test generation procedure on its own. This method of parallelization has been termed fault partitioning. If the fault list is divided up carefully, each processor will have roughly the same amount of work to do and they will all finish in about the same time. In practice, optimal partitioning of the fault list is not easy

12 12 to do a priori so the scheduling can be done dynamically with each processor requesting a new fault from a master scheduler whenever it is idle. Dynamic scheduling requires increased communications overhead due to the requests from idle processors for new faults to process. The fault partitioning method is very suitable for coarse grained parallel systems because synchronization is only necessary when a new fault is needed from the remaining fault list. The biggest disadvantage of fault partitioning is that the setup time will be large. The entire ATPG program and circuit database must be loaded into each processor s memory across the message fabric. If the total amount of work that can be divided up among the processors is large (i.e. the fault list is long) then the percentage of time spent on setup can be kept small and this scheme has promise. If the circuit has a small number of faults, or fault classes, then the speedup will be limited as the above analysis suggests. In any case, this method does not scale well because of the large setup time. Also, performance of this method is poor if there are only a few hard-to-detect faults which account for most of the processing time. Because processors cannot cooperate in generating a test for the same fault, one or two processors could take hours to generate a test for these hard-to-detect faults while the others stand idle. Many typical circuits have only a few hardto-detect faults and fall into this category. Results for systems which use this technique show that linear speedup is possible only for a small number of processors, usually less than ten [22],[23]. Clearly, this method of parallelization is less than optimum although it has the benefit of being the simplest to implement. Because ATPG is an NP complete problem [5], heuristics are used to guide the search process. Research has indicated that many heuristics will produce a test for a given fault within some computation time limit when other heuristics have failed to do so [24]. These complementary heuristics can be used in a multiprocessor system to aid in the ATPG process. There are two basic strategies to heuristic parallelization; a variation of the fault partitioning scheme discussed above, and concurrent parallel heuristics [25]. In the variation of the fault partitioning method, called uniform partitioning, the

13 13 fault list is divided up among the processors and each generates tests for the faults on its own portion of the list. In generating the tests, however, multiple heuristics are used in sequential order to attempt to generate a test. If a heuristic fails to generate a test within a time limit, that heuristic is discontinued and the next one in the list is begun. This scheme has the same advantages and disadvantages as the fault partitioning scheme discussed above. However, it will be slightly better in some cases because the multiple heuristics will shorten the test generation time for hard-to-detect faults. In the concurrent parallel heuristic method, the system is required to have (m x n) processors where n is the number of different heuristics available. If m is equal to one, each processor computes a test for the same fault using one of the n heuristics. Whenever a processor succeeds in generating a test for the fault, it sends a stopwork message to the other processors in the cluster and they stop processing that fault. A new fault is selected from the fault list and the process begins again. If m is greater than one, the processors are clustered into groups of n and each cluster works on a separate fault. In this case, the system is actually using a combination of the fault partitioning and heuristic parallelization schemes. The concurrent parallel heuristic method has the potential to achieve greater speedups than the uniform partitioning method due to possible anomalies in the ordering of the heuristics for different faults. The main disadvantage of the heuristic techniques discussed previously is that the processors that are working on the same fault with a different heuristic are not guaranteed to be searching disjoint portions of the search space. That is, all of the heuristics may lead the ATPG program down the same path towards a non-solution. A better way to parallelize work on a single fault is to divide up the search space into disjoint pieces and evaluate them concurrently. This approach is a parallel implementation of the branch and bound method which involves concurrent evaluation of subproblems [26],[27]. This technique is called OR parallelism and its application to ATPG is presented in detail in [28],[29]. Search space partitioning involves dividing up the search space such that subproblems skipped by one processor are evaluated by another. The search

14 14 spaces for the processors are therefore disjoint and are spread across the solution space as far as possible to maximize the area of the current search. This organization increases the chances of finding a valid solution quickly. The process of dividing up a search tree is illustrated in Figure 2.7. The search space 0 A 1 0 A 1 0 A B 0 1 B 0 1 B 0 1 Inconsistent 0 C 1 Inconsistent 0 C Inconsistent 0 C 1 0 D 1 0 D 1 0 D 1 Inconsistent 0 E 1 Inconsistent 0 E Inconsistent E 1 Processor X Processor X Processor Y Figure 2.7 Division of search tree. belonging to processor X is divided up into 2 parts for processors X and Y. Notice that the processors are in fact always working on different problems (i.e. disjoint search spaces) and that the place where each processor will backtrack to is different. If processor X finds a conflict, it will backtrack and try an alternate value for input A. Processor Y will backtrack and try an alternate value for input C in case of a conflict. This approach keeps the current search space as large as possible which tends to make the search more efficient. A major problem with search space partitioning is that it also requires a long setup time. Each processor must have the entire circuit database and ATPG program loaded into it. On the other hand, processors are dedicated to only one task which does not change and the tasks are completely independent. This fact makes the overhead due to communications

15 15 very low and results in greater efficiency. Search space division is therefore most appropriate for circuits that contain a small number of hard-to-detect faults which take up a great deal of computation time. It is also ideally suited to message passing systems because of its coarse grained parallelism. There is another technique that can be used to allow more than one processor to work simultaneously on finding a test for a single fault. This technique is called functional partitioning. Functional partitioning refers to the process of dividing up an algorithm into independent subtasks. These independent subtasks can then be executed on separate processors in parallel. This method of parallelization is also known as algorithmic or AND parallelism. Most serial ATPG algorithms are difficult to parallelize functionally. The few subtasks that can be identified, such as fault sensitization and path sensitization are not independent. That is, action taken to perform one of these processes may change the circuit state such that it has a side effect or causes an inconsistency in another process. Justification of two goals cannot, in general, be done simultaneously. One way to allow parallelism in justification is to perform justification for goals in different faults simultaneously. This parallelism is an adaptation of the fault partitioning scheme already discussed. In all of the parallel algorithms discussed thus far, each processor has to have access to the entire circuit database. This requirement may be a problem for large circuits because each node many not have enough memory to hold the entire circuit database. Also, loading the database into memory in a message passing system takes time. Topological partitioning of the circuit into separate partitions and instantiating each on a different processor would help alleviate this problem. Researchers have been investigating topological partitioning for parallel logic simulation for some time. Although logic simulation is a different problem, it has some similarities to algorithmic ATPG. A discussion of circuit partitioning for parallel logic simulation is included in [30]. The objective of the partitioning scheme is to reduce the communications necessary between partitions as much as possible while maximizing the

16 16 amount of work that can be done concurrently within the partitions. This paper analyzes 6 different partitioning schemes; random partitioning, natural partitioning, partitioning by gate level, partitioning by element strings, and partitioning by fanin and fanout cones. Fanin cones are an attempt to place all gates connected to a single primary input (even through other gates) in the same group. Fanout cones are constructed the same way using primary outputs. The results presented in [30] indicate that for simulation, random partitioning scores the best in maximizing concurrency, but worst in interprocessor communications. This condition would make random partitioning a bad choice for most systems. Partitioning by fanin and fanout cones offers the best trade off between concurrency and interprocessor communications with fanout cones being slightly better. This result is most likely due to the fact that fanout cones closely fit the flow of activity in the circuit during logic simulation. An analysis of circuit partitioning techniques for ATPG is the focus of section 3.3 of this work. Another issue in circuit partitioning for ATPG is the number of gates in each partition; the so-called block size. As the number of gates assigned to a block decreases, the amount of work that can be done between communications steps becomes smaller. Hence, the parallelism becomes more fine grained. The minimum block size will also affect how the problem scales with increasing numbers of processors. As more processors are added to the system, the block size will get smaller and efficiency will decrease. An investigation of the amount of parallelism theoretically available in topologically partitioned parallel ATPG was undertaken in [31]. This work attempted to find an upper bound on the amount of parallelism present in conflict-free test generation. Two phases of the test generation process using the PODEM algorithm, backtracing and forward implication, were parallelized. Each gate was assumed to be placed on its own individual processor. The objective then was to measure the maximum number of operations that could be performed in parallel by individual gate processors as possible. The methods that were used to accomplish this objective are best illustrated using an

17 17 example. Consider the example circuit of Figure 2.1 again. In setting the objective of J= 0, there are several paths that must be backtraced such as the J->G->A path and the J->I-> H->C path. If these paths could be backtraced at the same time, then parallelism would be present. If each backtrace operation is assumed to take place in the same amount of time, then the backtraces will propagate through the circuit on a level by level basis. At each gate, backtraces are generated on each input as required. Using this method, conflicts may arise at points of reconvergence of fanout. This point is where the authors use the conflict-free assumption. The correct values to be placed on reconvergence points are precomputed offline and the conflict is avoided. Figure 2.8 illustrates the process of parallel backtracing for the objective J= 0. backtrace operations Time-step: 2 Time-step: 1 objective values (not yet implied) A=0 B C=0 D Time-step: 3 G=0 Time-step: 2 H=0 I=0 Time-step: 1 J=0 s-a-1 L E=0 Time-step: 3 K F Figure 2.8 Multiple parallel backtrace. Notice that in this case, the maximum number of backtrace operations that occur during the same period of time, or time-step is 2. This measure would be the maximum amount of parallelism available in this step of the ATPG process. Also note that the objective values required on the individual lines are not assigned to them during the

18 18 backtrace procedure. These values must be set through forward implication as is done in the serial PODEM case. The difference is that in this parallel implementation, the implication procedure is parallelized. Thus in the case shown in Figure 2.8, the values A= 0,C= 0, and E= 0 would be implied at the same time. Implications would then be performed back through the circuit in parallel in a manner similar to backtraces. Analysis of Figure 2.8 shows that the maximum parallelism present during parallel implication of the above input assignments would be 2 as well. The authors of [31] use this technique to analyze the maximum and average amount of parallelism present in the ISCAS 85 [16] benchmark circuits. The analysis was done on a conventional workstation using a simulation based technique. The average amount of parallelism they found was less than they expected. For forward implication, most circuits had an average parallelism of 4 to 7, although some circuits had values higher than this. For backtracing, most circuits had average parallelism values of 1.5 to 3.5. This method of analysis of the parallelism present in topological partitioned ATPG has several drawbacks which do not allow a valid conclusion to be drawn concerning the performance of topological partitioning. First, the assumption of one gate per processor is unrealistic. Second, as shown in this thesis, there are other methods of parallelism available for use with topological partitioning. Finally, the work in [31] completely ignores the practical aspects, such as synchronization protocols and communications latency, of implementing this type of system on an actual multicomputer. The authors do acknowledge that this technique may have benefit when used with other parallelization methods and that it has the important characteristic of allowing larger circuits to be processed on a given distributed memory multicomputer. 2.3 Hardware Considerations This research utilized a distributed memory MIMD machine known as the ES-KIT 88K. It is assumed that the reader is familiar with the typical characteristics, such as message passing and connectivity, of parallel machines of this type. Only a brief discussion of the affect of the characteristics of the machine and the programming of the application

19 19 will be undertaken in this section. This section will be followed by a discussion of the actual hardware used in this research. Finally, the software system that formed the basis for this work will be presented. Distributed memory machines have local memory for each processor but no globally accessible memory. Processors must send messages across some interconnection medium, also called a message fabric, to share data. It may take hundreds or even thousands of instructions to package a message for transmission so communications costs are much higher than for shared memory. Also, message transfer time depends on the distance between communicating processors. Distance between processors is a measure of the length of the communications channel and the number of other processors which must pass the message along for it to be transferred. There are a number of interconnection strategies used on message passing systems [32]. Each one involves a trade-off of distance between processors and the number of connections per processor. Because communication time is distance dependent, data location in message passing systems is as critical as, if not more so, that in shared memory systems. Determination of which processors perform certain tasks is much more important in distributed memory systems than in shared memory systems. Processes that must communicate frequently must be instantiated on processors that are close to each other. Therefore, algorithms must be designed for the specific communications topology of the target machine. Algorithms designed for one machine may not perform satisfactorily on another [33]. Programs on a message passing system will in general, use built-in systems calls to send and receive messages. Data must be explicitly moved from one processor to another using the send and receive mechanism. Synchronization between processors must also take place using messages and is therefore more time consuming then in shared memory systems. For this reason, algorithms for message passing machines must use more coarse grained parallelism. Coarse grained parallelism implies that many instructions must be processed between synchronization events. Setup time is much longer on message passing systems because all of the program code and data, such as the circuit topology information, must be loaded across the message fabric. New processes are harder to spawn

20 20 for this same reason. Therefore, setting up one processor as a master is more difficult. Proper load balancing among the processors is also harder to achieve. In general, algorithms for message passing systems are more difficult to design well, but the programs are themselves easier to implement and debug because data consistency is more easily maintained [33] Experimental Systems Kit The parallel processing machine available for use in this research is the Experimental Systems-KIT (ES-KIT) 88K processor developed by the Microelectronics and Computer Technology Corporation (MCC). The ES-KIT was developed by MCC under a Defense Advanced Research Projects Agency (DARPA) grant to facilitate experimentation into new parallel computer architectures and application specific computing nodes. The ES-KIT system includes the 88K processor, described below, and the ESP runtime system, described in the next section. The description of the ES-KIT system is brief and limited to the characteristics which influence applications design. A more detailed description of the ES-KIT system can be found in [34]. Further, the ESP system and ES-KIT applications are implemented in the C++ language [35]. It is assumed that the reader is knowledgeable in this language ES-KIT Hardware The 88K processor is a distributed memory parallel architecture based on a 16 node, 4X4 2 dimensional mesh. A Sun 3/140 running the BSD 4.1 operating system acts as a host for the 88K hardware.the Sun communicates with the 88K processor through a VME bus interface board. The message fabric of the 88K processor is based on Simult System's Asynchronous Message Routing Device (AMRD). Each node in the mesh has its own AMRD. The use of the AMRDs prevents having to use store and forward message passing. The use of AMRDs means that the processor on each node is not involved in passing messages that are not addressed directly to it. The message fabric is in general capable of passing messages at a rate of 20MB per second, but the software overhead of packing and

21 21 unpacking messages at either end prevents this rate from being achieved. Each node is a general purpose computing system based on Motorola's processor family. The nodes consist of four boards which communicate with each other across an internal bus based on the 88K standard. The four boards consist of the Message Interface Board, the Processor Board, the Memory Module, and the Bus Terminator Module. The boards are connected through a unique set of stacking connectors which allow the nodes to be build on top of each other. A typical installation consists of two nodes stacked on top of each other. The node stacks are arranged on top of Mother Board modules that provide power and ground connections and contain the AMRDs. A 16 node configuration consists of four Mother Boards mounted in a plane, each one containing two stacks with two nodes per stack. The processor board consists of one 20 MHz Motorola RISC microprocessor, and two cache modules. The two 88200's provide separate paths for instructions and data. The Memory Module provides 8MB of dynamic RAM. It is possible to have up to four Memory Modules per node for a total of 32MB of memory, but the standard configuration is only 8MB. The Message Interface Module consists of one processor, 128KB of data memory, 128K of instruction memory, and interface logic to provide a path from memory to the node's AMRD. The processor on the MIM takes care of all processing necessary to package a message and send it out through the AMRD. The MIM processor off-loads a significant amount of message processing from the in the Processor Module. Finally, the Bus Terminator Module provides the electrical termination for the high speed lines in the 88K bus, general purpose services such as the system clock, and a UART interface to the outside world for debugging and repair of the node hardware ESP Runtime System The ESP (Extensible Software System) run-time environment is as important and complex as the 88K processor. The environment is written in C++ and is intended to maximize flexibility in the types of configurations that can be used in a parallel processing

22 22 system. The environment consists of four major components, the ISSD, the mail daemon, the shadow process, and the actual ESP kernels. The Inter Service Support Daemon (ISSD) is the heart of the system in that it is the first process invoked by the user and it constructs the rest of the run-time environment. The ISSD is the major interface between the ESP environment and the outside world. The ISSD controls the starting and terminating of applications programs, and all communication with peripheral devices including screen and disk IO. The ISSD begins by reading the configuration file to determine what ESP components are to be invoked. The configuration file is created by the user and contains instructions as to how many of the various components are to be constructed, where they are to run, and how they are connected. The minimum configuration file must contain the invocation instructions for a single ISSD, one mail daemon, and one or more ESP kernels. The ISSD also invokes the public service objects (PSO's) such as the application manager and the kernel librarian which are necessary to run any application. The ISSD always runs on the Sun host machine, but the PSO's can run on any of the 88K nodes. The mail daemon is responsible for routing messages between individual or groups of ESP kernels. Each mail daemon is connected to all of the kernels in its group, every other mail daemon in the configuration, and the ISSD. This connectivity allows messages to be passed from kernel to kernel with the minimum handling possible. The configuration must contain a minimum of one mail daemon for each type of ESP kernel in the configuration. The shadow process runs on the host Sun where the ISSD is located. The shadow process is responsible for reading the application source code files and managing the terminal IO for the application. The shadow process is the next ESP component invoked by the user after the ISSD. Finally, the ESP kernel is the work horse of the ESP system in that it actually runs the application. The kernel performs memory management, message packing and unpacking, and task switching for the applications. The kernel runs on top of a rudimentary OS in the 88K processor. The message passing portion of the kernel utilizes an MCC

23 23 developed protocol which utilizes the processor in the MIM on the 88K processor Object Oriented Programming in ESP The ESP system uses the C++ object oriented paradigm as its abstraction for parallel processing. Applications written for the 88K processor to run in ESP must be programmed in C++. C++ incorporates the ideas of objects, data encapsulation, and inheritance. C++ objects or classes are instantiated on different nodes and communicate with each other through method invocation and return values. Each object has its own local data contained within the node and that data can only be manipulated by method calls. There is no global or 'public' data allowed in ESP. There were five major changes made to the C++ language to implement the distributed processing environment of ESP. These changes consisted of overloading the pointer-to-member function ->(), redefining the return values available for methods, overloading the 'new' function, eliminating the 'main' routine, and incorporating the concept of futures. All objects that are to have methods that are available for remote invocation must be derived from the object remote_base. This object was developed by MCC and includes several features necessary to implement remote method invocation, the first of which is a handle. A handle is a pointer to an instance of an object and contains all of the information needed to address an object in a distributed system. This information is contained in four parts, a node number where the object actually resides, a class number for the object, an application number and the actual instance number of the object. Handles can be passed between objects or the address information can be passed and a new handle constructed to point to that object. The second feature included in remote_base is the overloaded pointer-to-member function. In regular C++, the method invocation: object_instance->method(arg1,arg2,...) is implemented as a subroutine call. In ESP C++, the object may reside on a remote

24 24 node. Therefore, the method call must be invoked through message passing. Overloading of the ->() function for remote methods handles this process. Methods for an object derived from remote_base are defined to be remote by declaring them in the public section of the object specification. When a method on a remote object is invoked, the kernel reads the argument list to determine its length. It then copies the argument list into the message buffer with the length of the argument list in bytes appended to it. Finally, the kernel uses the handle of the object to instruct the MIM processor where to send the message. When the receiving object receives the message, it invokes the proper method with the argument list. If a value is to be returned by the method, one of the return macros defined in the ESP programming environment must be used. The return macros instruct the kernel that the return is to a remote object and that it must be packaged as a message. Macros are available for returning most of the common data types such as integers, doubles, characters, and strings. There is also a pointer return macro, but its functionality is different than in regular C++. This macro is necessary because pointers on remote nodes are meaningless in the ESP environment. If a pointer return is specified, the kernel packages the entire object to which the pointer refers and sends it back to the node that invoked the method. The kernel on the invoking node then copies the returned object into its memory space and returns a pointer to this copy to the invoking object. In this way, any structure or object the size of which can be determined at compile time can be returned from a remote method invocation. In C++, the new operator is used to allocate memory space for instances of objects. In the ESP environment, the new operator is overloaded to allow arguments to be passed to it. These arguments specify which node an object is to be instantiated upon. The syntax for a call to the overloaded new function is: object_pointer = ( object_type* )new { node,relationship }object_type(); The node, relationship pair is used to specify the location of the object. For example if the variable homenode is specified to be (1,1), the call new{homenode,sameas} will create the object on node 1,1. Options for the relationship variable include; SAMEAS, DIFFERENT, NEAR, FAR, and NEXT. The next relationship does not need a node

25 25 specifier and it allows the kernel to select the node for the object using its own criteria. At this point, the criteria used is the amount of memory left on each node. The object is created on the node with the most free memory. Other algorithms that take into account load balancing and communications costs are under development by MCC. Until they are available, the user must be careful to take these factors into consideration and specify where each large object is to be created in order to optimize the application. In ESP C++, there is no 'main' routine. When the shadow program loads the first object in the application, its constructor is invoked after it is loaded. This constructor must do the work necessary to start the application. This may be as simple as calling another routine within the same object to take over control, or as complex as creating all other objects and directly performing the necessary algorithm. The former approach is recommended as it is more 'correct' and it allows the kernel to complete construction of the initial object and alter the stack size for that object if necessary. When a method on a remote object is invoked, the processing takes place on the remote node. The invoking method is then free to perform some other calculation if it does not need the result of the remote method. If the invoking method does need the result of the remote method, it must block until that result is returned. Controlling when the object blocks for the return result is done by using futures. Futures were introduced as a part of Multilisp, [36] and allow lazy evaluation of return values. Note that the only parallel processing that occurs is between the time that the remote method is invoked and the future is evaluated. This fact demonstrates the value of the future abstraction for methods that return values. Of course, remote methods that do not return values always run in parallel with the invoking method. One notable characteristic of objects in ESP is that to insure correctness only one method in a specific object may be invoked at a time. This includes methods that are blocked waiting for a future or return value. Thus if two objects each invoke a method in each other and wait for return value, deadlock is possible. This process is illustrated in Figure 2.9.

Preizkušanje elektronskih vezij

Preizkušanje elektronskih vezij Laboratorij za načrtovanje integriranih vezij Univerza v Ljubljani Fakulteta za elektrotehniko Preizkušanje elektronskih vezij Generacija testnih vzorcev Test pattern generation Overview Introduction Theoretical

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What

More information

VLSI System Testing. Outline

VLSI System Testing. Outline ECE 538 VLSI System Testing Krish Chakrabarty Test Generation: 2 ECE 538 Krish Chakrabarty Outline Problem with -Algorithm POEM FAN Fault-independent ATPG Critical path tracing Random test generation Redundancy

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

IMPLEMENTATION OF AN ATPG USING PODEM ALGORITHM

IMPLEMENTATION OF AN ATPG USING PODEM ALGORITHM IMPLEMENTATION OF AN ATPG USING PODEM ALGORITHM SACHIN DHINGRA ELEC 7250: VLSI testing OBJECTIVE: Write a test pattern generation program using the PODEM algorithm. ABSTRACT: PODEM (Path-Oriented Decision

More information

Following are a few basic questions that cover the essentials of OS:

Following are a few basic questions that cover the essentials of OS: Operating Systems Following are a few basic questions that cover the essentials of OS: 1. Explain the concept of Reentrancy. It is a useful, memory-saving technique for multiprogrammed timesharing systems.

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Advanced Digital Logic Design EECS 303

Advanced Digital Logic Design EECS 303 Advanced igital Logic esign EECS 33 http://ziyang.eecs.northwestern.edu/eecs33/ Teacher: Robert ick Office: L477 Tech Email: dickrp@northwestern.edu Phone: 847 467 2298 Outline. 2. 2 Robert ick Advanced

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

IN5050: Programming heterogeneous multi-core processors Thinking Parallel

IN5050: Programming heterogeneous multi-core processors Thinking Parallel IN5050: Programming heterogeneous multi-core processors Thinking Parallel 28/8-2018 Designing and Building Parallel Programs Ian Foster s framework proposal develop intuition as to what constitutes a good

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET476) Lecture 5 Combinational Circuit Test Generation (Chapter 7) Said Hamdioui Computer Engineering Lab elft University of Technology 29-2 Learning aims of today

More information

INTERCONNECT TESTING WITH BOUNDARY SCAN

INTERCONNECT TESTING WITH BOUNDARY SCAN INTERCONNECT TESTING WITH BOUNDARY SCAN Paul Wagner Honeywell, Inc. Solid State Electronics Division 12001 State Highway 55 Plymouth, Minnesota 55441 Abstract Boundary scan is a structured design technique

More information

CSc33200: Operating Systems, CS-CCNY, Fall 2003 Jinzhong Niu December 10, Review

CSc33200: Operating Systems, CS-CCNY, Fall 2003 Jinzhong Niu December 10, Review CSc33200: Operating Systems, CS-CCNY, Fall 2003 Jinzhong Niu December 10, 2003 Review 1 Overview 1.1 The definition, objectives and evolution of operating system An operating system exploits and manages

More information

Fault Simulation. Problem and Motivation

Fault Simulation. Problem and Motivation Fault Simulation Problem and Motivation Fault Simulation Problem: Given A circuit A sequence of test vectors A fault model Determine Fault coverage Fraction (or percentage) of modeled faults detected by

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

VLSI Testing. Virendra Singh. Bangalore E0 286: Test & Verification of SoC Design Lecture - 7. Jan 27,

VLSI Testing. Virendra Singh. Bangalore E0 286: Test & Verification of SoC Design Lecture - 7. Jan 27, VLSI Testing Fault Simulation Virendra Singh Indian Institute t of Science Bangalore virendra@computer.org E 286: Test & Verification of SoC Design Lecture - 7 Jan 27, 2 E-286@SERC Fault Simulation Jan

More information

ABC basics (compilation from different articles)

ABC basics (compilation from different articles) 1. AIG construction 2. AIG optimization 3. Technology mapping ABC basics (compilation from different articles) 1. BACKGROUND An And-Inverter Graph (AIG) is a directed acyclic graph (DAG), in which a node

More information

On Using Machine Learning for Logic BIST

On Using Machine Learning for Logic BIST On Using Machine Learning for Logic BIST Christophe FAGOT Patrick GIRARD Christian LANDRAULT Laboratoire d Informatique de Robotique et de Microélectronique de Montpellier, UMR 5506 UNIVERSITE MONTPELLIER

More information

4. Hardware Platform: Real-Time Requirements

4. Hardware Platform: Real-Time Requirements 4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture

More information

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration MULTIPROCESSORS Characteristics of Multiprocessors Interconnection Structures Interprocessor Arbitration Interprocessor Communication and Synchronization Cache Coherence 2 Characteristics of Multiprocessors

More information

Sequential Circuit Test Generation Using Decision Diagram Models

Sequential Circuit Test Generation Using Decision Diagram Models Sequential Circuit Test Generation Using Decision Diagram Models Jaan Raik, Raimund Ubar Department of Computer Engineering Tallinn Technical University, Estonia Abstract A novel approach to testing sequential

More information

Chapter 7 The Potential of Special-Purpose Hardware

Chapter 7 The Potential of Special-Purpose Hardware Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture

More information

Overview ECE 753: FAULT-TOLERANT COMPUTING 1/23/2014. Recap. Introduction. Introduction (contd.) Introduction (contd.)

Overview ECE 753: FAULT-TOLERANT COMPUTING 1/23/2014. Recap. Introduction. Introduction (contd.) Introduction (contd.) ECE 753: FAULT-TOLERANT COMPUTING Kewal K.Saluja Department of Electrical and Computer Engineering Test Generation and Fault Simulation Lectures Set 3 Overview Introduction Basics of testing Complexity

More information

An Efficient Test Relaxation Technique for Synchronous Sequential Circuits

An Efficient Test Relaxation Technique for Synchronous Sequential Circuits An Efficient Test Relaxation Technique for Synchronous Sequential Circuits Aiman El-Maleh and Khaled Al-Utaibi King Fahd University of Petroleum and Minerals Dhahran 326, Saudi Arabia emails:{aimane, alutaibi}@ccse.kfupm.edu.sa

More information

Preview. Memory Management

Preview. Memory Management Preview Memory Management With Mono-Process With Multi-Processes Multi-process with Fixed Partitions Modeling Multiprogramming Swapping Memory Management with Bitmaps Memory Management with Free-List Virtual

More information

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses.

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses. 1 Memory Management Address Binding The normal procedures is to select one of the processes in the input queue and to load that process into memory. As the process executed, it accesses instructions and

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III Subject Name: Operating System (OS) Subject Code: 630004 Unit-1: Computer System Overview, Operating System Overview, Processes

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor

More information

CS 475: Parallel Programming Introduction

CS 475: Parallel Programming Introduction CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking

More information

l Some materials from various sources! n Current course textbook! Soma 1! Soma 3!

l Some materials from various sources! n Current course textbook! Soma 1! Soma 3! Ackwledgements! Test generation algorithms! Mani Soma! l Some materials from various sources! n r. Phil Nigh, IBM! n Principles of Testing Electronic Systems by S. Mourad & Y. Zorian! n Essentials of Electronic

More information

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid

More information

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research CHAPTER 1 Introduction This chapter provides the background knowledge about Multistage Interconnection Networks. Metrics used for measuring the performance of various multistage interconnection networks

More information

COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS

COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS Computer types: - COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS A computer can be defined as a fast electronic calculating machine that accepts the (data) digitized input information process

More information

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture Last 2 Classes: Introduction to Operating Systems & C++ tutorial User apps OS Virtual machine interface hardware physical machine interface An operating system is the interface between the user and the

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

Q.1 Explain Computer s Basic Elements

Q.1 Explain Computer s Basic Elements Q.1 Explain Computer s Basic Elements Ans. At a top level, a computer consists of processor, memory, and I/O components, with one or more modules of each type. These components are interconnected in some

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

Functional extension of structural logic optimization techniques

Functional extension of structural logic optimization techniques Functional extension of structural logic optimization techniques J. A. Espejo, L. Entrena, E. San Millán, E. Olías Universidad Carlos III de Madrid # e-mail: { ppespejo, entrena, quique, olias}@ing.uc3m.es

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

Testing Digital Systems I

Testing Digital Systems I Testing Digital Systems I Lecture 6: Fault Simulation Instructor: M. Tahoori Copyright 2, M. Tahoori TDS I: Lecture 6 Definition Fault Simulator A program that models a design with fault present Inputs:

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed

More information

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS TECHNOLOGY BRIEF March 1999 Compaq Computer Corporation ISSD Technology Communications CONTENTS Executive Overview1 Notice2 Introduction 3 8-Way Architecture Overview 3 Processor and I/O Bus Design 4 Processor

More information

6. Parallel Volume Rendering Algorithms

6. Parallel Volume Rendering Algorithms 6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Modified by Rana Forsati for CSE 410 Outline Principle of locality Paging - Effect of page

More information

Virtual Memory. Chapter 8

Virtual Memory. Chapter 8 Virtual Memory 1 Chapter 8 Characteristics of Paging and Segmentation Memory references are dynamically translated into physical addresses at run time E.g., process may be swapped in and out of main memory

More information

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2) Lecture 15 Multiple Processor Systems Multiple Processor Systems Multiprocessors Multicomputers Continuous need for faster computers shared memory model message passing multiprocessor wide area distributed

More information

Multiprocessor scheduling

Multiprocessor scheduling Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.

More information

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture Robert S. French April 5, 1989 Abstract Computational origami is a parallel-processing concept in which

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Page 1. Outline. A Good Reference and a Caveat. Testing. ECE 254 / CPS 225 Fault Tolerant and Testable Computing Systems. Testing and Design for Test

Page 1. Outline. A Good Reference and a Caveat. Testing. ECE 254 / CPS 225 Fault Tolerant and Testable Computing Systems. Testing and Design for Test Page Outline ECE 254 / CPS 225 Fault Tolerant and Testable Computing Systems Testing and Design for Test Copyright 24 Daniel J. Sorin Duke University Introduction and Terminology Test Generation for Single

More information

Process size is independent of the main memory present in the system.

Process size is independent of the main memory present in the system. Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.

More information

Three basic multiprocessing issues

Three basic multiprocessing issues Three basic multiprocessing issues 1. artitioning. The sequential program must be partitioned into subprogram units or tasks. This is done either by the programmer or by the compiler. 2. Scheduling. Associated

More information

Scheduling with Bus Access Optimization for Distributed Embedded Systems

Scheduling with Bus Access Optimization for Distributed Embedded Systems 472 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Scheduling with Bus Access Optimization for Distributed Embedded Systems Petru Eles, Member, IEEE, Alex

More information

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃLPHÃIRUÃDÃSDFHLPH $GDSWLYHÃURFHVVLQJÃ$OJRULWKPÃRQÃDÃDUDOOHOÃ(PEHGGHG \VWHP Jack M. West and John K. Antonio Department of Computer Science, P.O. Box, Texas Tech University,

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne Distributed Computing: PVM, MPI, and MOSIX Multiple Processor Systems Dr. Shaaban Judd E.N. Jenne May 21, 1999 Abstract: Distributed computing is emerging as the preferred means of supporting parallel

More information

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358 Memory Management Reading: Silberschatz chapter 9 Reading: Stallings chapter 7 1 Outline Background Issues in Memory Management Logical Vs Physical address, MMU Dynamic Loading Memory Partitioning Placement

More information

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall Midterm Examination CLOSED BOOK

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall Midterm Examination CLOSED BOOK Department of Electrical and Computer Engineering University of Wisconsin Madison ECE 553: Testing and Testable Design of Digital Systems Fall 2013-2014 Midterm Examination CLOSED BOOK Kewal K. Saluja

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No # 09 Lecture No # 40 This is lecture forty of the course on

More information

Clustering and Reclustering HEP Data in Object Databases

Clustering and Reclustering HEP Data in Object Databases Clustering and Reclustering HEP Data in Object Databases Koen Holtman CERN EP division CH - Geneva 3, Switzerland We formulate principles for the clustering of data, applicable to both sequential HEP applications

More information

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)

More information

Computer-System Organization (cont.)

Computer-System Organization (cont.) Computer-System Organization (cont.) Interrupt time line for a single process doing output. Interrupts are an important part of a computer architecture. Each computer design has its own interrupt mechanism,

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018 CS 31: Intro to Systems Virtual Memory Kevin Webb Swarthmore College November 15, 2018 Reading Quiz Memory Abstraction goal: make every process think it has the same memory layout. MUCH simpler for compiler

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

Memory. From Chapter 3 of High Performance Computing. c R. Leduc Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor

More information

A High Performance Bus Communication Architecture through Bus Splitting

A High Performance Bus Communication Architecture through Bus Splitting A High Performance Communication Architecture through Splitting Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 797, USA {lur, chengkok}@ecn.purdue.edu

More information

6.895 Final Project: Serial and Parallel execution of Funnel Sort

6.895 Final Project: Serial and Parallel execution of Funnel Sort 6.895 Final Project: Serial and Parallel execution of Funnel Sort Paul Youn December 17, 2003 Abstract The speed of a sorting algorithm is often measured based on the sheer number of calculations required

More information

Chapter 20: Database System Architectures

Chapter 20: Database System Architectures Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

Rule partitioning versus task sharing in parallel processing of universal production systems

Rule partitioning versus task sharing in parallel processing of universal production systems Rule partitioning versus task sharing in parallel processing of universal production systems byhee WON SUNY at Buffalo Amherst, New York ABSTRACT Most research efforts in parallel processing of production

More information

Feasibility of Testing to Code. Feasibility of Testing to Code. Feasibility of Testing to Code. Feasibility of Testing to Code (contd)

Feasibility of Testing to Code. Feasibility of Testing to Code. Feasibility of Testing to Code. Feasibility of Testing to Code (contd) Feasibility of Testing to Code (contd) Feasibility of Testing to Code (contd) An incorrect code fragment for determining if three integers are equal, together with two test cases Flowchart has over 10

More information

a process may be swapped in and out of main memory such that it occupies different regions

a process may be swapped in and out of main memory such that it occupies different regions Virtual Memory Characteristics of Paging and Segmentation A process may be broken up into pieces (pages or segments) that do not need to be located contiguously in main memory Memory references are dynamically

More information

Operating Systems Unit 6. Memory Management

Operating Systems Unit 6. Memory Management Unit 6 Memory Management Structure 6.1 Introduction Objectives 6.2 Logical versus Physical Address Space 6.3 Swapping 6.4 Contiguous Allocation Single partition Allocation Multiple Partition Allocation

More information

For a long time, programming languages such as FORTRAN, PASCAL, and C Were being used to describe computer programs that were

For a long time, programming languages such as FORTRAN, PASCAL, and C Were being used to describe computer programs that were CHAPTER-2 HARDWARE DESCRIPTION LANGUAGES 2.1 Overview of HDLs : For a long time, programming languages such as FORTRAN, PASCAL, and C Were being used to describe computer programs that were sequential

More information

UMBC. space and introduced backtrace. Fujiwara s FAN efficiently constrained the backtrace to speed up search and further limited the search space.

UMBC. space and introduced backtrace. Fujiwara s FAN efficiently constrained the backtrace to speed up search and further limited the search space. ATPG Algorithms Characteristics of the three main algorithms: Roth s -Algorithm (-ALG) defined the calculus and algorithms for ATPG using -cubes. Goel s POEM used path propagation constraints to limit

More information

Chapter 1: Introduction. Operating System Concepts 9 th Edit9on

Chapter 1: Introduction. Operating System Concepts 9 th Edit9on Chapter 1: Introduction Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Objectives To describe the basic organization of computer systems To provide a grand tour of the major

More information

Why Multiprocessors?

Why Multiprocessors? Why Multiprocessors? Motivation: Go beyond the performance offered by a single processor Without requiring specialized processors Without the complexity of too much multiple issue Opportunity: Software

More information

Part 5. Verification and Validation

Part 5. Verification and Validation Software Engineering Part 5. Verification and Validation - Verification and Validation - Software Testing Ver. 1.7 This lecture note is based on materials from Ian Sommerville 2006. Anyone can use this

More information

Static Compaction Techniques to Control Scan Vector Power Dissipation

Static Compaction Techniques to Control Scan Vector Power Dissipation Static Compaction Techniques to Control Scan Vector Power Dissipation Ranganathan Sankaralingam, Rama Rao Oruganti, and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer

More information

A CSP Search Algorithm with Reduced Branching Factor

A CSP Search Algorithm with Reduced Branching Factor A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il

More information

Chapter 11: File System Implementation. Objectives

Chapter 11: File System Implementation. Objectives Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block

More information

Chapter 11: Implementing File-Systems

Chapter 11: Implementing File-Systems Chapter 11: Implementing File-Systems Chapter 11 File-System Implementation 11.1 File-System Structure 11.2 File-System Implementation 11.3 Directory Implementation 11.4 Allocation Methods 11.5 Free-Space

More information

Top-Level View of Computer Organization

Top-Level View of Computer Organization Top-Level View of Computer Organization Bởi: Hoang Lan Nguyen Computer Component Contemporary computer designs are based on concepts developed by John von Neumann at the Institute for Advanced Studies

More information

Virtual Memory. Reading: Silberschatz chapter 10 Reading: Stallings. chapter 8 EEL 358

Virtual Memory. Reading: Silberschatz chapter 10 Reading: Stallings. chapter 8 EEL 358 Virtual Memory Reading: Silberschatz chapter 10 Reading: Stallings chapter 8 1 Outline Introduction Advantages Thrashing Principal of Locality VM based on Paging/Segmentation Combined Paging and Segmentation

More information

Chapter 8: Virtual Memory. Operating System Concepts

Chapter 8: Virtual Memory. Operating System Concepts Chapter 8: Virtual Memory Silberschatz, Galvin and Gagne 2009 Chapter 8: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating

More information

On Computing Minimum Size Prime Implicants

On Computing Minimum Size Prime Implicants On Computing Minimum Size Prime Implicants João P. Marques Silva Cadence European Laboratories / IST-INESC Lisbon, Portugal jpms@inesc.pt Abstract In this paper we describe a new model and algorithm for

More information

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1]) EE392C: Advanced Topics in Computer Architecture Lecture #10 Polymorphic Processors Stanford University Thursday, 8 May 2003 Virtual Machines Lecture #10: Thursday, 1 May 2003 Lecturer: Jayanth Gummaraju,

More information

Lecture 23 Database System Architectures

Lecture 23 Database System Architectures CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

CS399 New Beginnings. Jonathan Walpole

CS399 New Beginnings. Jonathan Walpole CS399 New Beginnings Jonathan Walpole Memory Management Memory Management Memory a linear array of bytes - Holds O.S. and programs (processes) - Each cell (byte) is named by a unique memory address Recall,

More information

Computer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications

Computer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications Computer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications concurrently on all computers in the cluster. Disadvantages:

More information

Overview of Digital Design with Verilog HDL 1

Overview of Digital Design with Verilog HDL 1 Overview of Digital Design with Verilog HDL 1 1.1 Evolution of Computer-Aided Digital Design Digital circuit design has evolved rapidly over the last 25 years. The earliest digital circuits were designed

More information