A Distributed-multicore Hybrid ATPG System

Size: px

Start display at page:

Download "A Distributed-multicore Hybrid ATPG System"

Cody Norman
6 years ago
Views:

1 A Distributed-multicore Hybrid ATPG System X. Cai and P. Wohl Synopsys, Inc. Mountain View, CA, USA {xcai, Abstract We present a distributed-multicore hybrid ATPG system which leverages the computing power of multiple machines each with multiple CPUs. The system is versatile and scalable and supports flexible configuration. Experimental results are compared to a highly efficient multicore ATPG system. 1. Introduction Even using today s most advanced CPUs, ATPG on large industrial designs can take so long that it must be truncated, resulting in less than optimal test coverage and design quality. A parallel ATPG system is a way to cut the ATPG time as design sizes grow bigger. In general, a parallel ATPG system can be classified as distributed on different machines or localized on one machine. In a localized system (e.g., multicore), speedup is limited by the number of CPUs and the performance of the memory management on the machine. In a distributed system, the inter-process communication can become the bottleneck of the system as more machines get involved. For even more speedup, the hybrid mode that utilizes the advantages of multicore and distributed configurations is a good option. In practice, most of enterprise server farms have machines with 4-8 idle CPUs but memory size limits the degree to which multicore ATPG can utilize the speedup scalability of the system. A distributed-multicore hybrid system best fits such conditions. It can fully use the memory and CPUs on each machine while adding more speedup with more machines. Such a system can be cheaper to run than a multicore system on a high end machine. The paper is organized as follows. In Section 2, we present our job partitioning scheme and hybrid architecture. The experimental results are reported in Section 3. Section 4 discusses limitations and future improvements of the system. Finally Section 5 includes a conclusion. 2. Our Hybrid System An ATPG system contains 3 major components: test generation, good machine simulation and fault machine simulation. A parallel ATPG system should consider parallelization in all these 3 components in order to get scalable speedup. Job partitioning is an important aspect of a parallel ATPG system. A good job partitioning scheme is the foundation of scalable speedup. Various job partitioning schemes have been discussed in previous works. We will compare our job partitioning to previous works in the following section. System architecture is another foundational block of a parallel ATPG system. A partitioning scheme may not perform well if the system architecture cannot implement the job partitioning efficiently. 2.1 Job partitioning Various job partitioning schemes have been previously presented. These schemes can be classified as fault partitioning, search-space partitioning, circuit partitioning, heuristic partitioning and algorithm partitioning. For fault simulation, fault partitioning and sometimes pattern partitioning are commonly used methods [1, 2, 3, 4, 5]. Search space partitioning can only be used in test generation and has the benefit of improving coverage for hard-to-detect faults [6, 7, 8, 9]. While circuit partitioning is sometimes possible [10, 11, 12], it can be tricky to handle the boundaries between partitions. This partitioning is also highly dependent on circuit topology, which makes it hard to get consistent performance across different designs. Heuristic partitioning refers to applying different test generation heuristics in parallel for a given fault [13]. Algorithm partitioning distributes different components of ATPG to be executed in pipelined parallel fashion [14]. Such a partition scheme usually requires complex synchronization and large amount of data transfer between We employ fault partitioning as our main partitioning scheme as it can be applied to both test generation and fault simulation. Various fault partitioning schemes have been tested for fault simulation. Several heuristic algorithms on how to group faults together within a partition have been discussed. In [4], a predetermined fault partition based on gate level and fault effect cone correlation is presented. In [3], a multistage pipelined synchronous algorithm is implemented to reduce the possibility of missed detections when combined with pattern partition. However all of these works mainly used static fault partitions, which may lead to unbalanced work load among different In [15], faults are not partitioned but multiple faults are simulated in parallel to exploit the vast parallel vectors in a GPU system. In [2], static partitioning based on fault correlation is used first and then dynamic partitioning is used as a supplement. Paper 10.4 INTERNATIONAL TEST CONFERENCE /13/$ IEEE

2 Both static and dynamic partitioning have limitations seen in previously proposed fault partitioning schemes. With static partitioning, the heuristic used can have significant effects on pattern set size and speed up. With dynamic partitioning, the synchronization overhead can be substantial [16]. Current industrial designs can have tens or even hundreds of millions of gates. To apply a correlation analysis for the entire fault list can be time consuming. With ever growing circuit size, such analysis provides diminishing benefits compared to a simple partition of the fault list roughly built on localities. On the other hand, increased network communication bandwidth and memory access speed makes dynamic partitioning more applicable. We found that a simple locality based initial partitioning at the beginning of ATPG combined with dynamic fault partition gives best results [17]. The key idea is to blend static and dynamic fault partitioning at different stage of ATPG based on the number of undetected faults. Unlike previous works, we do not limit one process to only work on one part of the fault list. We first order the fault list based on fan-out-free-regions. Each slave process has access to the full fault list but starts picking a primary fault target from a different section of the fault set. The dynamic pattern compaction algorithm then picks secondary targets according to certain heuristics. If a fault has been picked as primary target or has been detected, the fault is skipped by the next fault target selection. Other than that, each slave can choose any remaining faults as target faults. Since we do not limit fault target selection to a subset of faults like a static fault partition scheme, we avoid the pattern inflation problem that may occur in a static partition system. At the beginning of the ATPG process, there are plenty of faults so that different slaves are unlikely to select the same fault targets. However, as the fault population decreases, we rely more on fault status communication to avoid duplicate fault targets. We keep a global fault status table, which is checked before each fault target is selected. Each slave sends a request to update the global status table as soon as it detects a fault. Unlike previous works, our dynamic fault partition technique exchanges fault status information much more frequently but with less overhead. This is achieved by eliminating the locking/unlocking associated with such communication. We keep a local copy of the global fault status table for each sub-master machine so that slave processes need not to go through network for fault status lookup. Since each slave works on different part of the fault list, it is unlikely the slaves would update the same fault entry in the table at the same time. So the locking/unlocking for updating the table can also be largely eliminated. Another key part of our hybrid system is pattern partitioning. In our system, each slave process only simulates the patterns it has created locally. This essentially partitions the patterns for good and fault machine simulation. Many of previous papers only addressed the fault simulation problem [3, 4, 15]. To avoid duplicating good machine simulations by different processes, it is necessary to partition the patterns [3]. 2.2 System Architecture Depending on the communication method between processes, two types of systems have been designed. One type is based on shared memory on one machine. The other is message passing between machines. Shared memory based systems have been discussed earlier [11, 15]. In [11], a system of fine grained search space partition in test generation and circuit partitioning in fault simulation are combined. This system requires different strategies for easy and hard to detect faults. It also has many synchronization points that undermine speedup. As a result, the speedup numbers are unpredictable. Several GPU based fault simulation system were also implemented in recent years [15, 19, 20]. In such systems, fault simulation takes advantage of the large vector parallelism in a GPU to simulate multiple faults at the same time. This is an extension of 32-bit-wise parallelism of existing parallel fault simulation and requires special hardware (GPUs) to implement. Examples of message passing based system can be found in [21, 2, 5]. In [2, 5], a distributed ATPG system is designed with a master/server setup. Each node can talk to other nodes for workloads. Such a system is not scalable because the number of communication links grows quadratically with the number of nodes. In [21], the good machine simulation was not parallelized based on the assumption that it is not a major bottleneck. Such an assumption may no longer be true with current circuit size and complicated clocking features. Even if this were still true it could become a bottleneck as the number of nodes increases as predicted by Amdahl s law [22]. In a shared memory system, the scale of parallelism is limited by the number of CPUs and the memory size of the machine. A message passing system can extend the computing power beyond one machine. As more and more machines became multi-core, a hybrid system can combine the strengths of the two approaches. In a computing system, locality is an important mean to achieving efficient system throughput. We designed our system to follow this principle as much as possible. All generated vectors are simulated by the same process that created them. As concluded in [5], vector broadcasting is not as efficient as fault broadcasting. This is mainly due to the fact that good machine simulation has to be repeated for the same vectors with vector broadcasting. Thus we designed our system to eliminate vector broadcasting. While this could cause some lost coverage with a partitioned fault list, we solve this problem with a scheme wherein every process can potentially simulate any faults Paper 10.4 INTERNATIONAL TEST CONFERENCE 2

in the remaining fault list. This scheme in turn may cause duplicate simulation work. We solve this second problem with an efficient dynamic fault partitioning.

3 in the remaining fault list. This scheme in turn may cause duplicate simulation work. We solve this second problem with an efficient dynamic fault partitioning. Overall scheme Our hybrid system has 3 types of processes: the master process, the sub master processes and the slave The architecture for the entire system can be seen in figure 1. This is a hierarchical architecture meant to reduce the number of direct communication links in the system. All ATPG work is done by the slave processes, which generate and simulate their own test patterns. Master and sub-master processes only consume very little memory and CPU for logistic and communication purpose. The hierarchy allows scaling to a large number of slave Figure 1. System architecture for hybrid mode Master process There is only one master process, which is the original process launched. The full design database and ATPG constraints are stored in the master process. The responsibilities of the master process are controlling the entire system, collecting fault status and patterns and reporting progress (Figure 2). Before launching sub-masters, the master process saves a binary copy of the database including the ATPG constraints. Once launched, the sub-masters on remote machines can then read in the saved database. The fault list is then sent to the sub-masters from the master process. The master process then enters a waiting loop for patterns or fault status events from sub-masters. The master exits this loop only after all sub-masters terminate. After launching sub-masters, the master process only accesses fault and pattern data. The rest of the design database can be swapped out to disk. Sub-master processes The sub masters are launched on remote machines through server farm utilities. They are responsible for collecting patterns and fault status from slave processes and communicating with the master. The sub-master processes are mostly idle and are awaken from time to time by fault status or pattern activities (Figure 2). To avoid jamming up the communication channels between the master and sub-masters, the sub-masters consolidate the fault and pattern information from slaves to make the data transfer to the master more compact and efficient; such consolidation is not necessary in the communication between the sub-masters and the slaves. Because they shared the same memory system and communication is much more efficient. For example, the sub-master can combine two or more fault status events which arrived at the same time into one message. Since each message has a fixed overhead, the consolidation improves communication efficiency. Like the master process, a sub-master process only needs to access fault and pattern data after launching slave However, unlike the master process, the rest of data base is shared by slave processes during ATPG. In our multicore system, no sub-master process is needed since all slave processes are on the same machine. The fault status change can be directly updated to the shared fault status table. Slave processes The real ATPG and simulation work is done by the slave They only communicate to their sub-masters. Each slave works similar to a single process ATPG flow. The process first performs test generation with dynamic pattern compaction to accumulate an interval of (usually 32) patterns. A primary fault target is randomly selected for test generation. We try to keep the primary target evenly distributed over the entire circuit for one interval. Then test generation packs many care bits as possible for secondary fault targets. After the test generation phase, a parallel pattern single fault simulation is performed for all active faults. Fault targets and fault status changes are broadcasted to other slaves on the same machine and also to the sub-master. The sub-master sends the information to the master, which in turn forwards it to other sub-masters. Unlike single-process ATPG, each slave checks the shared fault status table before a fault is targeted or simulated. The global fault status table records the most recent status for each fault after collecting information from all slave Figure 2 shows flow charts for all 3 types of Paper 10.4 INTERNATIONAL TEST CONFERENCE 3

4 Master process Store database Spawn sub-masters All processes done Store patterns Report progress Done no yes Figure 2. Process flows for master, sub-master and slave Fault status communication Sub-master processes Restore database Spawn slave processes All processes done Receive/send patterns Receive/send fault stat Done Start Available faults Test generation Fault simulation Done Fault status communication is crucial to the quality of results of the hybrid system. Different processes rely on efficient fault status communication to reduce duplicate works. There are two levels of communication. At the machine level, slave processes communicate with each other through shared memory. At the system level, different machines communicate through TCP/IP message passing. The slave processes associated with the same sub-master process share a fault status table. This table is updated immediately with any fault detections by slave The sub-master process periodically collects new fault detections from this shared table. The new detections are sent to the master process as messages. On the other hand, the sub-master also processes new detection messages received from the master process. These new detections from other sub-masters are also updated in the locally shared fault status table. The fault status changes at an uneven rate during ATPG. At the beginning, there are large numbers of faults detected per pattern. The rate gradually reduces to a few near the end. We designed the sub-master message package size to dynamically adjust to such changes in detection rate. An average detection per pattern data within a sliding window of 100 patterns is roughly the size of a fault status message. At the beginning of ATPG, most faults are detected by random patterns. As different slaves are working on different parts of the fault list, it is not critical to avoid duplicate work through communication. Thus, at the no Slave processes yes yes no beginning, a larger message size is used to reduce the overall number of messages to be sent. Since there is an overhead for each message sent, a larger message size utilizes the communication bandwidth more efficiently. The message size is gradually reduced as detection per pattern decreases. User Interface Based on their need and knowledge of available resources, users can specify the list of machines or get a fixed number of machines from the computer farm, and also the number of slave processes each machine should run. Assume we specify m sub-masters and n slaves for each sub-master, and then we have m x n slave processes perform ATPG. Therefore, there are multiple possible configurations to launch the same number of total slave It is possible to specify m=1 and n=1. Such a system has 1 master process, 1 sub-master process and 1 slave process. It adds no value as compared to a traditional single process ATPG. However, such a configuration is allowed for debugging purpose. 3. Experimental Results We selected 8 industrial circuits with roughly 1.5 to 70 million gates to evaluate our hybrid system. The circuit characteristics are listed in table 1 (all numbers are in millions). D1 to D4 are bigger designs for which we used the stuck-at fault model. D5 tod8 are smaller designs for which we used the transition fault model. Since transition faults ATPG runs very long on D1-D4 and stuck-at faults ATPG runs very fast for D5-D8, we didn t collect both fault model data on all designs. Table 1. Circuit characteristics. name Fault Sizes(M) Model #gates #flops #faults D1 stuck-at D2 stuck-at D3 stuck-at D4 stuck-at D5 transition D6 transition D7 transition D8 transition A 4x4 hybrid configuration was compared to single process, 4 core and 16 core multicore. We also ran a 5x6 hybrid configuration to demonstrate the versatility of the system. The single process and multicore results are obtained on one 2666MHz Intel Xeon machine with 256GB memory and 24 cores. The 24 cores are installed in 4 sockets with 6 cores in each socket. Since we could not Paper 10.4 INTERNATIONAL TEST CONFERENCE 4

5 find other machines with the same CPUs, the result of hybrid are obtained on MHz Intel Xeon server with 72GB memory and 8 cores each. Each machine has 2 sockets with 4 cores in each socket. Although the CPU speed of the two types of machines is the same, the internal cache performance of the two may not be identical. However, a thorough investigation of caching effects is beyond the scope of this paper. The final test coverage results are listed in table 2. Single process test coverage is shown in column 2 as the base lines. The final test coverage differences of various multicore and hybrid configurations are listed in column 3 to 6. A plus sign means higher coverage. In all multicore or hybrid runs, the final test coverage was higher than single process. The last row listed the average coverage gain. On average, the test coverage gain increases with the number of slave The explanation of coverage gain is that combining dynamic pattern compaction with dynamic fault partition brings more randomness into the system. Some hard to detect faults may get fortuitous detections in a multicore system [17]. Further, in most cases, we can see the final test coverage of hybrid is higher than multicore. The explanation for this behavior could be that the longer communication delays in a hybrid system may cause some of the hard to detect faults being targeted more times than in a multicore system. When a slave targets a fault, it informs the other slaves to avoid the duplicate work of targeting the same fault; this is good in general because it avoids pattern inflation; however, for hard to detect faults and slow communication, multiple slaves can end up targeting the same fault, thus increase coverage but also pattern count. Table 2. Final test coverage Name Cov (%) Final cov diff (%) Single x4 5x6 D D D D D D D D Ave Table 3 lists the speedup at the end of each run. The table is arranged similar to table 2. For all runs in table 3, hybrid mode has higher speedup as compared to a multicore configuration with same number of slave This is due to the fact that a full-fledged multicore ATPG run may challenge the performance of a machine s memory system. For example, in a multi-socket NUMA system, remote memories are more time consuming to access than local memories. If all required processes were on one socket, all shared memory may be local to these However if the number of required processes exceeds the number of CPUs that one socket can schedule, additional processes will be scheduled on another socket. Since these processes all share the same database memory, some processes may have to transfer data from remote memory. A hybrid system can distribute working processes to different machines so that remote memory usage can be reduced in each machine. With design D5 to D8, we used transition fault model. The speedup of multicore and hybrid configuration drops in transition fault ATPG than for stuck-at ATPG. This is mainly due to additional ATPG effort to detect more faults that contributed to the additional fault coverage. In transition fault ATPG, the search space of a target fault is much greater than for a stuck-at fault. With dynamic compaction the detection of the fault is more sensitive to fault ordering. With multicore or hybrid, more possibilities of difference fault ordering are tested in parallel by different fault ordering in each process, thus increasing the chance that a hard-to-test fault is detected. Table 3. Speedup at the end of run name CPU(s) end of run Single x4 5x6 D D D D D D D D Ave For example, in D4 the speedup for 16 cores is far less than for the 4x4 hybrid. This design has a very long and flat tail in ATPG. So a little bit more test coverage gain (+0.01% from table 2) diminished speedup considerably. Table 4 lists pattern count comparison at the end of run. A + sign means pattern inflation and a - sign means pattern reduction. We can see that both multicore and hybrid create more patterns, but also have additional coverage. However, hybrid produces a less compact pattern sets compared to a multicore configuration with the same number of slaves. This is due to the fact that network communication is much slower than on-chip shared memory communication so that duplicate work is more Paper 10.4 INTERNATIONAL TEST CONFERENCE 5

6 likely to happen in a hybrid system than in a multicore system. Table 4. Pattern count at the end of run name Patterns Pattern diffs (%) Single x4 5x6 D D D D D D D D Ave The designs we used may not be large enough to justify a run with 30 slave We can see the 5x6 hybrid configuration didn t produce significantly better speedup with the designs we have. The pattern inflation gets worse as design size decreases. However we demonstrated the versatility of our hybrid system. Hybrid system can leverage the computing resource of multiple machines to achieve the speedup expected. The memory consumption on each sub-master host is similar to our multicore system described in [17]. Each sub-master copies the master process memory. Each slave process adds about 25-30% memory overhead. 4. Limitations and Discussions Our ATPG system generates 32 patterns before fault grading these patterns. We cannot guarantee these 32 patterns do not have duplicate detections among them. With multiple ATPG processes, the number of patterns created in parallel increases even more. So our multicore or hybrid solution effectively increased the parallel pattern size, which can have a negative effect on pattern compactness. We rely on fault status communication among multiple processes to avoid duplicated work in ATPG. The communication delay among machines has significant impact on the compactness of the pattern set. As fewer and fewer fault remain, it is harder to avoid duplicate fault targets. This may also contribute to the pattern inflation of our multicore and hybrid system. The communication delay over the network is much slower than the communication through shared memory. In general, it is hard to share detailed information about fault targets, such as merging attempt counts, across different machines. This causes ATPG algorithms to behave slightly different between multicore and hybrid. This contributes to the slight difference in results between the two. It may also explain the increased pattern inflation on average between a 16 processes multicore system and a 4x4 hybrid system, as shown in table 4. These areas can be further optimized in the future. In some cases shown in table 4, multicore and hybrid create a more compact pattern set than single process ATPG. The reason is that fault status sharing in our parallel system essentially explores multiple fault orderings with different With dynamic merging, some hard to detect faults in one process can be detected early in another process. This may also explain our parallel ATPG results have slightly higher coverage than single process ATPG. However, this benefit may be offset by the duplicated detection as discussed earlier. The final result depends on how these two effects balance each other out. An implication of the above property in our parallel ATPG system is the non-repeatability of results. We can see the final coverage and pattern set size changes with parallel configuration in table 2 and 4. It is hard to predict which configuration will produce the best results since it is very design dependent. However, with a few experimental runs, it can easily be decided what the best configuration is based on speedup and pattern inflation. In practice, the configuration is often fixed due to resource constraints. 5. Conclusions We presented the architecture and algorithms of a novel distributed-multicore hybrid ATPG system. Results indicate that on average our hybrid ATPG system can achieve similar speedup as multicore with same number of slave The hybrid system uses much less expensive machines so that it is easier for user to find resources to run the parallel ATPG job. The experimental results also showed significant pattern inflation when a large number of slave processes are used for relatively small or medium sized designs. This suggests future optimizations for a large number of slaves competing for a small number of remaining faults. 6. References [1] An Analysis of Fault Partitioned Parallel Test Generation, Joseph M. Wolf, Lori M. Kaufman, Robert H. Klenke, James H. Aylor, Ron Waxman, IEEE Transactions on Computer-Aided of Integrated Circuits and Systems, Vol. 15, No. 5, May [2] Distributed Implementation of an ATPG System Using Dynamic Fault Allocation. M. J. Aguado, E. de la Torre, M. A. Miranda, C. Lopez-Barrio, Proceedings of International Test Conference [3] SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation. D. Krishnaswamy, E. M. Rudnick, J. H. Patel, P. Paper 10.4 INTERNATIONAL TEST CONFERENCE 6

7 Banerjee, Proceedings of the VLSI Test Symposium, pp , April [4] Data Parallel-Fault Simulation. Minesh B. Amin and Bapiraju Vinnakota, IEEE Transactions on Very Large Scale Intergration (VLSI) System, Vol. 7, No. 2, June [5] An Analysis of Fault Partitioned Parallel Test Generation. Joseph M. Wolf, Lori M. Kaufman, Robert H. Klenke, James H. Aylor, and Ron Waxman, IEEE Transactions on Computer-Aided of Integrated Circuits and Systems, Vol. 15, No. 5, May [6] On the Efficiency of Parallel Backtracking. V. Nageshwara Rao and Vipin Kumar, IEEE Transactions on Parallel and Distributed Systems, 4(4), pp , April [7] Parallel Test Generation with Low Communication Overhead. Sivaramakrishnan Venkatraman, Sharad Seth, Prathima Agrawal, Proceedings of International Conference on VLSI [8] A Parallel Branch and Bound Algorithm for Test Generation, Srinivas Patil and Prithviraj Banerjee, IEEE Transactions on Computer Aided, Vol. 9, No. 3, March [9] ProperHITEC: A Portable, Parallel, Object- Oriented Approach to Sequential Test Generation, Steven Parkes, Prithviraj Banerjee, Janak Patel, Proceedings of Automation Conference1994. [10] Parallel Test Generation Using Circuit Partitioning and Spectral Techniques. Consolacion Gil, Julio Ortega, Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing, [11] Parallel test generation for sequential circuits on general-purpose multiprocessors. S. Patil, P. Banerjee, and J. H. Patel, Proceedings of Automation Conference1991. [12] Parallelization methods for circuit partitioning based parallel automatic test pattern generation, Robert H. Klenke, Ronald D. Williams, James H. Aylor, VLSI Test Symposium [13] Experimental Evaluation of Testability Measures for Test Generation, Susheel J. Chandra and Janak H. Patel, IEEE Transactions on Computer Aided, Vol. 8, No. 1, Jan [14] VLSI logic and fault simulation on generalpurpose parallel computers, R. B. Mueller- Thurns, D. G. Saab, and R. F. D. J. A. Abraham, IEEE Transactions on Computer Aided of Integrated Circuit and Systems, Vol. 12, No. 3, Mar [15] Towards Acceleration of Fault Simulation using Graphics Pocessing Units. Kanupriya Galati and Sunil P. Khatri, Proceedings of Automation Conference2008. [16] Performance Trade-Offs in a Parallel Test Generation/Fault simulation Environment, Srinivas Patil, and Prithviraj Banerjee, IEEE Transactions on Computer Aided, Vol. 10, No. 12, Dec [17] Highly Efficient Parallel ATPG Based on Shared Memory. Xiaolei Cai, Peter Wohl, John A. Waicukauski, and Pramod Notiyath, Proceedings of International Test Conference [18] Optimal Granularity and Scheme of Parallel Test Generation in a Distributed System, Hideo Fujiwara and Tomoo Inoue, IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 7, July [19] GPU-Accelerated Fault Simulation and Its New Applications, Huawei Li, Dawen Xu, and Kwang- Ting Cheng, International Symposium on VLSI Automation and Test [20] 3-D Parallel Fault Simulation with GPGPU, Min Li, Michael S. Hsiao, IEEE Transactions on Computer Aided of Integrated Circuits and Systems, Vol. 30, No. 10, October [21] and Implementation of a Parallel Automatic Test Pattern Generation Algorithm with Low Test Vector Count. Robert Butler, Brion Keller, Sarala Paliwal, Richard Schoonover, Joseph Swenton, Proceedings of International Test Conference [22] Validity of the single processor approach to achieveing large scale computing capabilities. Gene M. Amdahl, Proceedings of AFIPS Spring Joint Computer Conference 1967.J. Doe & M. Jones, Measuring Interesting Waveforms with Novel Techniques, Proceedings IEEE Int. Test Conference, 1999, pp Paper 10.4 INTERNATIONAL TEST CONFERENCE 7

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing