Computer Science 246. Computer Architecture
|
|
- Margaret Casey
- 5 years ago
- Views:
Transcription
1 Computer Architecture Spring 2010 Harvard University Instructor: Prof. Multiprocessing
2 Multiprocessors Outline Multiprocessors Computation taxonomy Interconnection overview Shared Memory design criteria Cache coherence Very broad overview H&P: Chapter 6, mostly , 8.5 Next time: More Multiprocessors +Multithreading
3 Parallel Processing Multiple processors working cooperatively on problems: not the same as multiprogramming Goals/Motivation Performance: limits of uniprocessors ILP (branch prediction, RAW dependencies, memory) Cost Efficiency: build big systems with commodity parts (uniprocessor cores) Scalability: just add more cores to get more performance Fault tolerance: One processor fails you still can continue processing
4 Parallel Processing Sounds great, but As usual, software is the bottleneck Difficult to parallelize applications Compiler parallelization is hard (huge research efforts in 80s/90s) By-hand parallelization is harder/expensive/error-prone Difficult to make parallel applications run fast Communication is expensive Makes first task even harder Inherent problems Insufficient parallelism in some apps Long-latency remote communication
5 Recall Amdahl s Law Speedup Overall = 1 ((1 - Fraction enhanced ) + ) Fraction enhanced Speedup enhanced Example: Achieve speedup of 80x using 100 processors 80 = 1/[Frac parallel /100+1-Frac parallel ] Frac parallel = => only.25% of the work can be serial! Assumes linear speedup Superlinear speedup is possible in some cases Increased memory + cache with increased processor count
6 Parallel Application Domains Traditional Parallel Programs True parallelism in one job Regular loop structures Data usually tightly shared Automatic parallelization Called data-level parallelism Can often exploit vectors as well Workloads Scientific simulation codes (FFT, weather, fluid dynamics) Dominant market segment in the 1980s
7 Parallel Program Example: Parameters Size of matrix: (N*N) Matrix Multiply P processors: N/P 1/2 * N/P 1/2 blocks Cij needs Aik, Bkj k=0 to n-1 Growth Functions Computation grows as f(n 3 ) Computation per processor: f(n 3 /P) Data size: f(n 2 ) Data size per processor: f(n 2 /P) Communication: f(n 2 /P 1/2 ) Computation/Communication: f(n/p 1/2 ) P N
8 Application Domains Parallel independent but similar tasks Irregular control structures Loosely shared data locked at different granularities Programmer defines and fine tunes parallelism Cannot exploit vectors Thread-level Parallelism (throughput is important) Workloads Transaction Processing, databases, web-servers Dominant MP market segment today
9 Database Application Example Bank Database Parameters: D = number of accounts P = number of processors in server N = number of ATMs (parallel transactions) Growth Functions Computation: f(n) Computation per processor: F(N/P) Communication? Lock records while changing them Communication: f(n) Computation/communication: f(1)
10 Computation Taxonomy Proposed by Flynn (1966) Dimensions: Instruction streams: single (SI) or multiple (MI) Data streams: single (SD) or multiple (MD) Cross-product: SISD: uniprocessors (chapters 3-4) SIMD: multimedia/vector processors (MMX, HP-MAX) MISD: no real examples MIMD: multiprocessors + multicomputers (ch. 6)
11 SIMD vs. MIMD MP vs. SIMD Programming model flexibility Could simulate vectors with MIMD but not the other way Dominant market segment cannot use vectors Cost effectiveness Commodity Parts: high volume (cheap) components MPs can be made up of many uniprocessors Allows easy scalability from small to large
12 Taxonomy of Parallel (MIMD) Processors Center on organization of main memory Shared vs. Distributed Appearance of memory to hardware Q1: Memory access latency uniform? Shared (UMA): yes, doesn t matter where data goes Distributed (NUMA): no, makes a big difference Appearance of memory to software Q2: Can processors communicate directly via memory? Shared (shared memory): yes, communicate via load/store Distributed (message passing): no, communicate via messages Dimensions are orthogonal e.g. DSM: (physically) distributed, (logically) shared memory
13 UMA vs. NUMA: Why it matters Memory p0 p1 p2 p3 Ideal model: Perfect (single-cycle) memory latency Perfect (infinite) memory bandwidth Real systems: Latencies are long and grow with system size Bandwidth is limited Add memory banks, interconnect to hook up (latency goes up) Low latency
14 UMA vs. NUMA: Why it matters long latency long latency short latency m0 m1 m2 m3 Interconnect p0 p1 p2 p3 Interconnect m0 m1 m2 m3 p0 p1 p2 p3 UMA: uniform memory access From p0 same latency to m0-m3 Data placement doesn t matter Latency worse as system scales Interconnect contention restricts bandwidth Small multiprocessors only NUMA: non-uniform memory access From p0 faster to m0 than m1-m3 Low latency to local memory helps performance Data placement important (software!) Less contention => more scalable Large multiprocessor systems
15 Interconnection Networks Need something to connect those processors/memories Direct: endpoints connect directly Indirect: endpoints connect via switches/routers Interconnect issues Latency: average latency more important than max Bandwidth: per processor Cost: #wires, #switches, #ports per switch Scalability: how latency, bandwidth, cost grow with processors Interconnect topology primarily concerns architects
16 Interconnect 1: Bus m0 m1 m2 m3 Direct Interconnect Style p0 p1 p2 p3 m0 m1 m2 m3 + Cost: f(1) wires + Latency: f(1) - Bandwidth: Not Scalable, f(1/p) Only works in small systems (<=4) + Only performs ordered broadcast No other protocols work p0 p1 p2 p3
17 Interconnect 2: Crossbar Switch p0 p1 p2 p3 m0 m1 m2 m3 Indirect Interconnect Switch implemented as big MUXes + Latency: f(1) + Bandwidth: f(1) - Cost: f(2p) wires F(P 2 ) switches 4 wires per switch
18 Interconnect 3: Multistage Network m0 m1 m2 m3 p0 p1 p2 p3 Indirect Interconnect Routing done by address decoding k: switch arity (#inputs/#outputs per switch) d: number of network stages = log k P + Cost f(d*p/k) switches f(p*d) wires f(k) wires per switch + Latency: f(d) + Bandwidth: f(1) Commonly used in large UMA systems
19 Interconnect 4: 2D Torus p/m p/m p/m p/m p/m p/m p/m p/m p/m p/m p/m p/m p/m p/m p/m p/m Direct Interconnect + Cost f(2p) wires 4 wires per switch + Latency: f(p 1/2 ) + Bandwidth: f(1) Good scalability (widely used) Variants: 1D (ring), 3D, mesh (no wraparound)
20 Interconnect 5: Hypercube Direct Interconnect K: arity (#nodes per dimension) p/m p/m p/m p/m p/m p/m p/m p/m D: dimension =log k P + Latency: f(d) + Bandwidth: f(k*d) p/m p/m p/m p/m p/m p/m p/m p/m - Cost F(k*d*P) wires F(k*d) wires per switch Good scalability, expensive switches new design needed for more nodes
21 Interconnect Routing Store-and-Forward Routing Switch buffers entire message before passing it on Latency = [(message length / bandwidth) + fixed switch overhead] * #hops Wormhole Routing Pipeline message through interconnect Switch passes message on before completely arrives Latency = (message length / bandwidth) + (fixed switch overhead * #hops) No buffering needed at switch Latency (relative) independent of number of intermediate hops
22 Shared Memory vs. Message Passing MIMD (appearance of memory to software) Message Passing (multicomputers) Each processor has its own address space Processors send and receive messages to and from each other Communication patterns explicit and precise Explicit messaging forces programmer to optimize this Used for scientific codes (explicit communication) Message passing systems: PVM, MPI, OpenMP Simple Hardware Difficult programming Model
23 Shared Memory vs. Message Passing Shared Memory (multiprocessors) One shared address space Processors use conventional load/stores to access shared data Communication can be complex/dynamic Simpler programming model (compatible with uniprocessors) Hardware controlled caching is useful to reduce latency + contention Has drawbacks Synchronization (discussed later) More complex hardware needed
24 Parallel Systems (80s and 90s) Machine Communication Interconnect #cpus Remote latency (us) SPARCcenter Shared memory Bus <=20 1 SGI Challenge Shared memory Bus <=32 1 CRAY T3D Shared memory 3D Torus Convex SPP Shared memory X-bar/ring KSR-1 Shared memory Bus/ring TMC CM-5 Messages Fat tree Intel Paragon Messages 2D mesh IBM SP-2 Messages Multistage Trend towards shared memory systems
25 Multiprocessor Trends Shared Memory Easier, more dynamic programming model Can do more to optimize the hardware Small-to-medium size UMA systems (2-8 processors) Processor + memory + switch on single board (4x Pentium) Single-chip multiprocessors (POWER4) Commodity parts soon glueless MP systems Larger NUMAs built from smaller UMAs Use commodity small UMAs with commodity interconnects (ethernet, myrinet) NUMA clusters
26 Computation Taxonomy SISD SIMD MISD MIMD ILP Vectors, MM-ISAs Shared Memory Message Passing UMA vs. NUMA Cache Coherence Write Invalidate/Update? Snooping? Directory? Synchronization Memory Consistency
27 Major issues for Shared Memory Cache coherence Common Sense P1 Read[X] => P1 Write[X] => P1 Read[X] will return X P1 Write[X] => P2 Read[X] => will return value written by P1 P1 Write[X] => P2 Write[X] => Serialized (all processor see the writes in the same order) Synchronization Atomic read/write operations Memory consistency Model When will a written value be seen? P1 Write[X] (10ps later) P2 Read[X] what happens? These are not issues for message passing systems Why?
28 Cache Coherence Benefits of coherent caches in parallel systems? Migration and Replication of shared data Migration: data moved locally lowers latency + main memory bandwidth Replication: data being simultaneously read can be replicated to reduce latency + contention Problem: sharing of writeable data Processor 0 Processor 1 Correct value of A in: Read A Write A Read a Read A Memory Memory, p0 cache Memory, p0 cache, p1 cache P0 cache, memory (if write-through) P1 gets stale value on hit
29 Solutions to Cache Coherence No caches Not likely :) Make shared data non-cacheable Simple software solution low performance if lots of shared data Software flush at strategic times Relatively simple, but could be costly (frequent syncs) Hardware cache coherence Make memory and caches coherent/consistent with each other
30 HW Coherence Protocols Absolute coherence All copies of each block have same data at all times A little bit overkill Need appearance of absolute coherence Temporary incoherence is ok Similar to write-back cache As long as all loads get their correct value Coherence protocol: FSM that runs at every cache Invalidate protocols: invalidate copies in other caches Update protocols: update copies in other caches Snooping vs. Directory based protocols (HW implementation) Memory is always updated
31 Write Invalidate Much more common for most systems Mechanics Broadcast address of cache line to invalidate All processor snoop until they see it, then invalidate if in local cache Same policy can be used to service cache misses in write-back caches Processor Activity Bus Activity Contents of CPU A s cache Contents of CPU B s cache Contents of Memory Location X 0 CPU A reads X Cache miss for X 0 0 CPU B reads X Cache miss for X CPU A writes 1 to X Invalidation miss for X 1 0 CPU B reads X Cache miss for X 1 1 1
32 Write Update (Broadcast) Processor Activity Bus Activity Contents of CPU A s cache Contents of CPU B s cache CPU A reads X Cache miss for X 0 0 CPU B reads X Cache miss for X CPU A writes 1 to X Write Broadcast for X CPU B reads X Contents of Memory Location X Bandwidth requirements are excessive Can reduce by checking if word is shared (also can reduce write-invalidate traffic) Comparison with Write invalidate Multiple writes, no intervening reads require multiple broadcasts Multiple broadcasts needed for multiple word cache line writes (only 1 invalidate) Advantage: other processors can get the data faster 0
33 Bus-based protocols (Snooping) Snooping All caches see and react to all bus events Protocol relies on global visibility of events (ordered broadcast) Events: Processor (events from own processor) Read (R), Write (W), Writeback (WB) Bus Events (events from other processors) Bus Read (BR), Bus Write (BW)
34 Three-State Invalidate Protocol CPU Write Put write miss on bus invalid CPU Read/ exclusive Write Hit CPU read hit Write miss for block CPU Read Put read miss on bus Write miss for block WB block shared CPU Write Miss WB, write miss on bus CPU read miss Put read miss on bus Three states Invalid: don t have block Exclusive: have block and wrote it Shared: have block but only read it CPU activity Bus activity
35 Three-State Invalidate Protocol Processor 1 Processor 2 C E invalid exclusive A D B C shared A1 invalid E exclusive A1 B D F-Z shared A: P1 Read A1: P2 Read B: P1 Write C: P2 Read D: P1 Write E: P2 Write F-Z: P2 Write
36 Three State Invalidate Problems Real implementations are much more complicated Assumed all operations are atomic Multiphase operation can be done with no intervening ops E.g. write miss detected, acquire bus, receive response Even read misses are non-atomic with split transaction busses Deadlock is a possibility, see Appendix I Other Simplifications Write hits/misses treated same Don t distinguish between true sharing and clean in one cache Other caches could supply data on misses to shared block
37 Scalable Coherence Protocols: Directory Based Systems Bus based systems are not scalable Not enough bus B/W for everyone s coherence traffic Not enough processor snooping B/W to handle everyone s traffic Directories: scalable cache coherence for large MPs Each memory entry (cache line) has a bit vector (1 bit per processor) Bit vector tracks which processors have cached copies of line Send all messages to memory directory If no other cached copies, memory returns data Otherwise memory forwards request to correct processor + Low b/w consumption (communicate only with relevant processors) + Works with general interconnect (bus not needed) - Longer latency (3-hop transaction: p0 => directory => p1 => p0)
38 Multiprocessor Performance OLTP: TPC-C (Oracle) CPI ~ 7.0 DSS: TPC-D CPI ~ 1.61 AV: Altavista CPI ~1.3 Huge fraction of time in memory and I/O for TPC-C
39 Coherence Protocols: Performance Misses per 1000 instructions As block size is increased Coherence misses increase (false sharing) False sharing Sharing of different data in the same cache line Block is shared, but data word is not shared OLTP Workload
40 Coherence Protocols: Performance Miss Rate 3C Miss model => 4C miss model Capacity, compulsory, conflict Coherence: additional misses due to coherence protocol Complicates uniprocessor cache analysis As cache size is increased Capacity misses decrease Coherence misses may increase (more shared data is cached)
41 Coherence Protocols: Performance As processors are added Coherence misses increase (more communication)
42 Today multiprocessor trends: Single Chip multiprocessors e.g. IBM POWER4 1 chip contains: 2 1.3GHz processors, L2, L3 tags Can connect 4 chips on 1 MCM to create 8-way system Targets threaded server workloads, high-performance computing Core0 Core1 L3 Cache L2 Tags/Cache L3 Tags Bus I/O L3 Cache
43 Single Chip MP: The Future but how many cores? Some companies are betting big on it Sun s Niagara (8 cores per chip) Intel s rumored Tukwila or Tanglewood processor (8 or 16?) Basis in Piranha research project (ISCA 2000)
44 Multithreading Another trend: multithreaded processors Processor utilization: IPC / Processor Width Decreases as width increases (~50% on 4-wide) Why? Cache misses, branch mis-predicts, RAW dependencies Idea? Two (or more) processes (threads) share one pipeline Replicate process (thread) state PC, register file, bpred, page table pointer, etc (5% area overhead) One copy of stateless (naturally tagged) structures Caches, functional units, buses, etc Hardware thread context must be fast Multiple on-chip contexts => no need to load from memory
45 Multithreading Paradigms SuperScalar Coarse MT Fine MT Simultaneous MT Pentium 3, Alpha EV6/7 IBM Pulsar Intel P4-HT, EV8,POWER5
46 Coarse vs. Fine Grained MT Coarse-Grained Makes sense for in-order/shorter pipelines Switch threads on long stalls (L2 cache misses) Threads don t interfere with each other much Can t improve utilization on L1 misses/bpred mispredicts Fine-grained Out-of-order, deep pipelines Instructions from multiple threads in stage at a time, miss or not Improves utilization in all scenarios Individual thread performance suffers due to interference
47 Pentium 4 Front-End Front end resources arbitrate between threads every cycle ITLB, RAS, BHB(Global History), Decode Queue are duplicated
48 Pentium 4 Backend Some queues and buffers are partitioned (only ½ entries per thread) Scheduler is oblivious to instruction thread Ids (limit on # per scheduler)
49 Multithreading Performance 9-stage pipeline 128KB I/D Cache 6 IntALU (4 Load/Store) 4 FPALUs 8 SMT threads 8-insn fetch (2 threads) Intel claiming more like 20-30% increase on P4 Debate still exist on performance benefits
50 Synchronization Synchronization: important issue for shared memory Regulates access to shared data E.g. semaphore, monitor, critical section (s/w constructs) Synchronization primitive: lock Well written parallel programs will synchronize on shared data acquire(lock); //while (lock!= 0); lock = 1; critical section; release(lock); // lock = 0; 0: ldw r1, lock // wait for lock to be free 1: bnez r1, #0 2: stw #1, lock // acquire lock // critical section 9: stw #0, lock // release lock
51 Implementing Locks Called spin lock, doesn t work quite right Processor 0 Processor 1 0: ldw r1, lock 1: bnez r1, #0 // p0 sees lock free 0: ldw r1, lock 1: bnez r1, #0 // p1 sees lock free 2: stw #1, lock // p0 acquires lock 9: stw #0, lock 2: stw #1, lock // p1 acquires lock // p0 and p1 in // critical section // together
52 Implementing Locks Problem: acquire sequence (load-test-store) is not atomic Solution: ISA provides an atomic lock-acquire operation Load+Check+Store in one instruction (uninterruptible by definition) E.g. test+set instruction (fetch-and-increment, exchange) t&s r1, lock // ldw r1, lock; stw #1, lock 0: t&s r1, lock 1: bnez r1, 0 2: 3: stw r1, #0 Lock-release is already atomic
53 More atomic instructions All atomic instructions can implement semaphores Fetch-and-Add (generalization of T&S) Compare-and-Swap (CAS) Load-linked and Store-conditional (LL/SC) Only CAS and LL/SC can implement lock-free and waitfree algorithms (hard) Semaphores can deadlock if a thread with lock swaps out Lock-free: Thread does not require lock, always makes progress Wait-free: Can complete in finite-steps See proof: M.P. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1): , January 1991.
54 Load-Linked/Store-Conditional Example from Prof. Computer Bruning, Science 246 Universität Mannheim
55 HW Implementation Issues
56 Memory Ordering Consistency Memory updates may become reordered by the memory system Processor 0 Processor 1 A = 0; B = 0; A = 1; B = 1; L1: if (B==0) L2: if (A==0) Critical section Critical Section Intuitively, impossible for both to be in critical section BUT, if memory ops are reordered it could happen Coherence: A s and B s must be same eventually Doesn t specify relative timing of coherence See: Sarita V. Adve, Kourosh Gharachorloo, Shared Memory Consistency Models: A Tutorial, IEEE Computer, 1995.
57 Memory Ordering: Sequential Consistency System is sequentially consistent if the result of any execution is the same as if the operations of all processors were executed in some sequential order and the operations of each individual processor appear in this sequence in program order [Lamport] Sequential Consistency All loads and stores in order Delay completion of memory access until all invalidations caused by that access complete Delay next memory access until previous one completes Delay read of A/B until previous write of A/B completes Cannot place writes in a write buffer and continue with read! Simple for programmer Not much room for HW/SW performance optimizations
58 Sequential Consistency Performance Suppose a write miss takes: 40 cycles to establish ownership 10 cycles to issue each invalidate after ownership 50 cycles to complete/acknowledge invalidates Suppose 4 processors have shared cache block Sequential Consistency: Must wait cycles = 130 cycles Ownership time is only 40 cycles Write buffers could continue before this
59 Weaker Consistency Models Assume programs are synchronized Observation: SC only needed for lock variables Other variables? Either in critical section (no parallel access) Or not shared Weaker consistency: can delay/reorder loads and stores More room for hardware optimizations Somewhat trickier programming model (memory barriers)
60 One other possibility Many processors have extensive support for speculation recovery Can this be re-used to support strict consistency? Use delayed commit feature for memory references If we receive an invalidate message, back-out the computation and restart Rarely triggered anyway (only on unsynchronized accesses that cause a race)
Computer Science 146. Computer Architecture
Computer Architecture Spring 24 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 2: More Multiprocessors Computation Taxonomy SISD SIMD MISD MIMD ILP Vectors, MM-ISAs Shared Memory
More informationReadings. Multiprocessors and Multithreading. Threads, Processes, Processors, etc. Why MT/MP? H+P chapter 3.5, 4 Will only loosely follow text
why multiprocessors and/or multithreading? and why is parallel processing difficult? types of MT/MP interconnection networks why caching shared memory is challenging cache coherence, synchronization, and
More informationInterconnect Routing
Interconnect Routing store-and-forward routing switch buffers entire message before passing it on latency = [(message length / bandwidth) + fixed overhead] * # hops wormhole routing pipeline message through
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationMultiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism
Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationParallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence
Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationCS/COE1541: Intro. to Computer Architecture
CS/COE1541: Intro. to Computer Architecture Multiprocessors Sangyeun Cho Computer Science Department Tilera TILE64 IBM BlueGene/L nvidia GPGPU Intel Core 2 Duo 2 Why multiprocessors? For improved latency
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message
More informationParallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?
Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing
More informationMultiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.
Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More informationParallel Architecture. Hwansoo Han
Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationComputer parallelism Flynn s categories
04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel
More informationEITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor
EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently
More informationMultiprocessors 1. Outline
Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationChapter 5. Thread-Level Parallelism
Chapter 5 Thread-Level Parallelism Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors saturated
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationCOSC4201 Multiprocessors
COSC4201 Multiprocessors Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Multiprocessing We are dedicating all of our future product development to multicore
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationChapter-4 Multiprocessors and Thread-Level Parallelism
Chapter-4 Multiprocessors and Thread-Level Parallelism We have seen the renewed interest in developing multiprocessors in early 2000: - The slowdown in uniprocessor performance due to the diminishing returns
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationAleksandar Milenkovich 1
Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection
More informationChapter 5 Thread-Level Parallelism. Abdullah Muzahid
Chapter 5 Thread-Level Parallelism Abdullah Muzahid 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors is saturating + Modern multiple issue processors are becoming very complex
More informationComputer Architecture
Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit
More informationMultiprocessors - Flynn s Taxonomy (1966)
Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1)
1 MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) Chapter 5 Appendix F Appendix I OUTLINE Introduction (5.1) Multiprocessor Architecture Challenges in Parallel Processing Centralized Shared Memory
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationChapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST
Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will
More informationAleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture 18: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Parallel Computers Definition: A parallel computer is a collection
More informationPortland State University ECE 588/688. Cray-1 and Cray T3E
Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector
More informationFlynn s Classification
Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:
More informationLecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single
More informationCOSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University
COSC4201 Multiprocessors and Thread Level Parallelism Prof. Mokhtar Aboelaze York University COSC 4201 1 Introduction Why multiprocessor The turning away from the conventional organization came in the
More informationIssues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance
More informationEE382 Processor Design. Processor Issues for MP
EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I EE 382 Processor Design Winter 98/99 Michael Flynn 1 Processor Issues for MP Initialization Interrupts Virtual Memory TLB Coherency
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More information18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More
More informationChapter 6. Parallel Processors from Client to Cloud Part 2 COMPUTER ORGANIZATION AND DESIGN. Homogeneous & Heterogeneous Multicore Architectures
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Part 2 Homogeneous & Heterogeneous Multicore Architectures Intel XEON 22nm
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationCS 654 Computer Architecture Summary. Peter Kemper
CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More information3/13/2008 Csci 211 Lecture %/year. Manufacturer/Year. Processors/chip Threads/Processor. Threads/chip 3/13/2008 Csci 211 Lecture 8 4
Outline CSCI Computer System Architecture Lec 8 Multiprocessor Introduction Xiuzhen Cheng Department of Computer Sciences The George Washington University MP Motivation SISD v. SIMD v. MIMD Centralized
More informationComputer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015
18-447 Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 Assignments Lab 7 out Due April 17 HW 6 Due Friday (April 10) Midterm II April
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationReview: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology
Review: Multiprocessor CPE 631 Session 21: Multiprocessors (Part 2) Department of Electrical and Computer Engineering University of Alabama in Huntsville Basic issues and terminology Communication: share
More informationProcessor Architecture and Interconnect
Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing
More informationCSCI 4717 Computer Architecture
CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel
More informationCMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More informationPage 1. Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency. Bus Snooping Topology
CS252 Graduate Computer Architecture Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency Review: Multiprocessor Basic issues and terminology Communication:
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More information740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University Readings: Memory Consistency Required Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Multi-{Socket,,Thread} Getting More Performance Keep pushing IPC and/or frequenecy Design complexity (time to market) Cooling (cost) Power delivery (cost) Possible, but too
More informationNon-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.
CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationSMD149 - Operating Systems - Multiprocessing
SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction
More informationOverview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy
Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationOnline Course Evaluation. What we will do in the last week?
Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do
More informationConvergence of Parallel Architecture
Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty
More informationComputer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Parallel Processing Basics Prof. Onur Mutlu Carnegie Mellon University Readings Required Hill, Jouppi, Sohi, Multiprocessors and Multicomputers, pp. 551-560 in Readings in Computer
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationEN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors
EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering
More informationCMSC 611: Advanced. Parallel Systems
CMSC 611: Advanced Computer Architecture Parallel Systems Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems
More informationParallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization
Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor
More informationToday s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming
CS758: Multicore Programming Today s Outline: Shared Memory Review Shared Memory & Concurrency Introduction to Shared Memory Thread-Level Parallelism Shared Memory Prof. David A. Wood University of Wisconsin-Madison
More informationMemory Consistency and Multiprocessor Performance. Adapted from UCB CS252 S01, Copyright 2001 USB
Memory Consistency and Multiprocessor Performance Adapted from UCB CS252 S01, Copyright 2001 USB 1 Memory Consistency Model Define memory correctness for parallel execution Execution appears to the that
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604
More informationShared Symmetric Memory Systems
Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationOverview. Processor organizations Types of parallel machines. Real machines
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments
More informationEECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont
Lecture 18 Simultaneous Multithreading Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi,
More informationMemory Consistency and Multiprocessor Performance
Memory Consistency Model Memory Consistency and Multiprocessor Performance Define memory correctness for parallel execution Execution appears to the that of some correct execution of some theoretical parallel
More informationParallel Computing Platforms
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationModule 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:
The Lecture Contains: Four Organizations Hierarchical Design Cache Coherence Example What Went Wrong? Definitions Ordering Memory op Bus-based SMP s file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture10/10_1.htm[6/14/2012
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationParallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple
More informationModule 5 Introduction to Parallel Processing Systems
Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this
More informationEE382 Processor Design. Illinois
EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors Part II EE 382 Processor Design Winter 98/99 Michael Flynn 1 Illinois EE 382 Processor Design Winter 98/99 Michael Flynn 2 1 Write-invalidate
More informationParallel Architectures
Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s
More informationMultiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems
Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing
More informationParallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor
Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel
More informationIntroducing Multi-core Computing / Hyperthreading
Introducing Multi-core Computing / Hyperthreading Clock Frequency with Time 3/9/2017 2 Why multi-core/hyperthreading? Difficult to make single-core clock frequencies even higher Deeply pipelined circuits:
More information