ManArray Processor Interconnection Network: An Introduction

Size: px
Start display at page:

Download "ManArray Processor Interconnection Network: An Introduction"

Transcription

1 ManArray Processor Interconnection Network: An Introduction Gerald G. Pechanek 1, Stamatis Vassiliadis 2, Nikos Pitsianis 1, 3 1 Billions of Operations Per Second, (BOPS) Inc., Chapel Hill, NC, USA gpechanek@bops.com 2 Delft University of Technology, Department of Electrical Engineering Delft, The Netherlands stamatis@einstein.et.tudelft.nl 3 Duke University, Department of Computer Science Chapel Hill, NC, USA nikos@bops.com Abstract. The present paper introduces the new interconnection network of the BOPS ManArray family of available core products. To form a ManArray network, the processing elements are completely connected within clusters and communicate with members of only two other clusters thereby reducing signal fan-out and wiring density. With this simple network, single-step communications between a hypercube and its compliment node, single-step transpose operations, and a diameter of 2 are achieved. 1 Introduction As chip densities continue to improve, demand increases for the low cost integration of high performance parallel processing systems. For example audio, video, and communication signal processing are but three areas requiring very high performance that are in demand for low cost consumer products. High performance multi-processor array systems, such as the mesh, torus, and hypercube [1, 2, 3, 4] have some characteristics, we feel, that limits their use for System-On-a-Chip products. Consequently, the ManArray family of processor cores was developed for high volume commercial applications with improved network connectivity, a simple implementation scheme, and a simple programming model. This paper introduces the ManArray processor platform as a strong contender to be a ubiquitous, high-volume signal processor in commercial applications. 2 Background In this section we briefly describe some characteristics of conventional networks, which we felt were too limiting for the intended products. A crossbar switch interconnection network is generally known to be expensive to implement since for N processors it has a cost of O(N 2 ). Even on single chip systems, with relatively small N, this wiring, fan-out, and logic delay limits its acceptability for a pervasive scalable approach to single chip multiprocessing. The torus has limited connectivity between processing elements (PEs), which can cause high communication latency effects. Even though new approaches to arrays have been proposed in [5, 6, 7, 8], due to their irregularity of PE combinations there are problems in the implementation and with the generality of connections. P. Amestoy et al. (Eds.): Euro-Par 99, LNCS 1685, pp , Springer-Verlag Berlin Heidelberg 1999

2 762 Gerald G. Pechanek, Stamatis Vassiliadis, and Nikos Pitsianis The hypercube [9, 10] reduces the communication latency from O(n) on a nxn torus to O(logn), the distance between two binary complement nodes. But even O(logn) can represent a high latency on large networks. Reducing the longest path between complement PEs has been deemed difficult and costly. In the next section we present the ManArray network, which alleviates the previously stated concerns by improving the connectivity among the PEs with low implementation expense and low interconnection wiring requirements. 3 ManArray Network The ManArray network achieves the goals of providing higher connectivity than a mesh, torus, or hypercube network, a simple switch implementation for multiple array sizes, and a simple programming model. First we explain how we create the ManArray organization of PEs. Consider by way of example, a 2D 4 x 4 torus and the corresponding embedded 4D hypercube, written as a 4 x 4 table with the hypercube node labels, see Fig. 1A. In Fig. 1A, the PE i,j cluster nodes are labeled in Gray-code as follows: PE G(i),G(j) where G(x) is the Gray code of x. Along the rows and the columns of this table, the distance between adjacent elements is one. If columns 2, 3 and 4 are rotated one position up, then the distance of the corresponding elements between the first and the second column becomes two. Repeating the same rotation with columns 3 and 4 and then column 4, the distance between elements of a column with the corresponding elements of the adjacent columns is two. The resulting 4D ManArray table is shown in Fig. 1B. It is important to note that each row of the table contains a grouping of 4 nodes, including two pairs of diametrically opposite hypercube nodes. In higher dimensional tori, and thus hypercubes, the grouping of diametrically opposite nodes is achieved by the same rotation along each new dimension except the last one. PE-0,0 PE-1,0 PE-2,0 PE-3,0 PE-0, PE-0, PE-1,1 PE-1, PE-2,1 PE-2, PE-3,1 PE-3, Figure 1A PE-0,3 PE-1,3 PE-2,3 PE-3,3 PE-0,0 PE-1,0 PE-2,0 PE-3,0 PE-1, PE-2, PE-2,1 PE-3, PE-3,1 PE-0, PE-0,1 PE-1, Figure 1B PE-3,3 Pe-0,3 PE-1,3 PE-2,3 Using these groupings, it can be shown that the complexity of the ManArray network is small although its connectivity is high. To demonstrate this, we show how this organization of PEs is interconnected with a simple cluster switch network. A 4x4 ManArray with torus and hypercube node Ids, in Fig. 2, consists of four 2x2 clusters. The cluster-switch for the upper left hand 2x2 is shown partitioned into four groups, each consisting of a 4-input and a 3-input multiplexer. Each of these groups is associated with a particular PE and this has been indicated with the dotted line arrows. For example, PE 0,0 is associated with the A group multiplexers a1 and a2. The circled multiplexers are controlled by their associated PE.

3 ManArray Processor Interconnection Network: An Introduction 763 The 4x4 ManArray includes connection paths that connect hypercube complements as shown in Fig. 2. For example, PE 0111 (PE 1,2 ) can communicate with PE (PE 3,0 ) as well as the other members of its cluster. The longest path between hypercube complement PEs, 4 steps for a 4D hypercube, is reduced to 1 step in the ManArray network. The improved connectivity and simplicity of the ManArray network supports single-cycle communications and efficient algorithms. A 0,0 1,3 3, C 2, B d1 a1 a2 b2 b1 c2 c1 3, ,3 0, ,0 d2 D 1, , ,0 1, ,3 3,0 3,3 0, Figure 2 4x4 ManArray highlighting cluster switch control 4 ManArray Network Properties In this section, we discuss some of the properties of the ManArray network within the problem domain of single-chip parallel processors. For the purposes of this paper we constrain our discussion to network sizes that can be implemented on a single chip. With the advancement of technology though, the number of PEs in a ManArray processor scale with the technology allowing larger array sizes to be developed for future products. The network diameter is the largest distance between any pair of nodes and captures the worst case number of steps required for node-to-node communication. The smaller the diameter, the fewer steps needed to communicate between far away nodes. Small network diameters are desirable. As the table below shows, the network diameter of a d-dimensional hypercube is d, and with the addition of the complementary node connections it becomes, d/2. Note that only the edges connecting complementary nodes are accounted for in the middle column. For this introductory paper, we add the third column labeled "ManArray Network", which indicates the number of edges contained in the structure as well as

4 764 Gerald G. Pechanek, Stamatis Vassiliadis, and Nikos Pitsianis the constant network diameter of 2 for current ManArray single chip implementations. In this paper, we show by way of example the connectivity of a 4x4 ManArray (Figure 2). Each 4-PE cluster contains 12 uni-directional edges or 6 bi-directional edges, if you exclude the self-connecting edges. With four clusters, this amounts to 24 bi-directional edges. In ManArray, any PE in a cluster can communicate with any PE in an adjacent cluster. Consequently, there are 16 bidirectional edges between any cluster. The ManArray needs only 8 uni-directional connections between the clusters since that is the maximum number of paths that can be connected between 8 PEs at any one time. By sharing these 8 uni-directional links appropriately with the multiplexers used in the cluster-switches, the 16 bi-directional path combinations can be created. The total number of edges in a 4x4 ManArray is *16 = 88 corresponding to d=4 and k=2 in the table. Hypercube* Hypercube+ ManArray Network compliment edges* Nodes 2 d 2 d 2 d Edges d2 d-1 (d+1)2 d-1 2 2k-1 ((4*3 k-1 )-1); for d=2k 2 2k ((8*3 k-1 )-1); for d=2k+1 Diameter d d/2 2 * A hypercube and a hypercube with complementary edges are proper subgraphs of the ManArray. Where the upper bound on "k" and "d" depends upon the chosen process technology and the processor cycle time requirements. With the full number of ManArray edges provided as shown in the third column above, the network diameter is reduced to a constant diameter of 2 for all d, within the design constraints of the process technology. 5 ManArray Processing Generally speaking, ManArray combines PEs in clusters that also contain a Sequence Processor (SP), uniquely merged into the PE array, and a cluster-switch, interconnecting the PEs. The SP provides program control, contains the instruction and data address generation units, and dispatches instructions to the processor array. Each PE contains five execution units (a multiply accumulate unit, an arithmetic logic unit, a data select unit (DSU), a load unit, and a store unit, supporting various 8/16/32/64-bit packed-data types) a 32x32-bit reconfigurable register file, a VLIW-Instruction-Memory unit, and local data memory. The DSU supports shifts, rotates, and single-cycle PE-to-PE communications across the ManArray network. With the indirect VLIW (ivliw) architecture, the communications operations can be overlapped with the compute operations, thereby providing zero-latency data transfers between PEs. The load and store units provide independent data paths between the local memory in each PE in the array. This allows very high memory bandwidth support for compute-intensive algorithms.

5 ManArray Processor Interconnection Network: An Introduction Conclusions Using the ManArray network, BOPS has implemented an advanced, scalable family of DSP cores for emerging applications such as broadband communications, digital video, digital audio, imaging, and graphics. The BOPS ManArray (Hardware, Software, and Programming Environment) is the culmination of a thorough examination of DSP requirements, dozens of innovative ground-breaking patents, and hundreds of man-years of development effort. The ManArray elegantly provides three basic levels of parallelism (indirectvliw, packed-data, and multi-processing), all independent of each other and available to the compiler or programmer on an asneeded basis. These features are combined in a way which allows a 2x2 ManArray processor to produce a radix 4 distributed 256 point FFT in 425 cycles using complex numbers of 32 bits (16 bits for real and imaginary parts) and an 8x8 IDCT in 34 cycles that meets IEEE standards. And finally, because these emerging markets are primarily System-On-Chip markets, BOPS is providing the ManArray as licensable IP in the form of Cores, SW, and Programming Tools. REFERENCES 1. R. Cypher and J.L.C. Sanz, SIMD Architectures and Algorithms for Image Processing and Computer Vision, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 37, No. 12, pp , Dec K.E. Batcher, Design of a Massively Parallel Processor, IEEE Transactions on Computers, Vol. C-29 No. 9, pp , Sept "Multiprocessor FFTs", P. N. Swarztrauber, Parallel Computing 5, pp , Elzevier Science Publishers B. V. (North-Holland) Ian Foster, "Designing and Building Parallel Programs", 1995 Addison-Wesley Publishing Company, Inc., pp "M.F.A.S.T.: A Single Chip Highly Parallel Image Processing Architecture", G. G. Pechanek, M. Stojancic, S. Vassiliadis, and C. J. Glossner, Proceedings of the IEEE 1995 International Conference on Image Processing, pp , Oct , 1995 Washington,D.C. 6. A Massively Parallel Diagonal Fold Array Processor, G.G. Pechanek et al., 1993 International Conference on Application Specific Array Processors, pp , Oct , 1993, Venice, Italy. 7.``Digital Neural Emulators Using Tree Accumulation and Communication Structures'', G. G. Pechanek, S. Vassiliadis, J. G. Delgado-Frias, IEEE Transactions on Neural Networks Vol. 3, No. 6, pp , Nov Multiple Fold Clustered Processor Torus Array, G.G. Pechanek, et. al., Proceedings Fifth NASA Symposium on VLSI Design, pp , Nov. 4-5, 1993, University of New Mexico, Albuquerque, New Mexico. 9. Robert Cypher and Jorge L.C. Sanz, The SIMD Model of Parallel Computation, 1994 Springer-Verlag, New York, pp F. Thomas Leighton, Introduction To Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, 1992 Morgan Kaufman Publishers, Inc., San Mateo, CA, pp ,.

Methods and apparatus for manifold array processing

Methods and apparatus for manifold array processing ( 1 of 5 ) United States Patent Application 20030088754 Kind Code A1 Barry, Edwin F. ; et al. May 8, 2003 Methods and apparatus for manifold array processing Abstract A manifold array topology includes

More information

Efficient complex multiplication and fast fourier transform (FFT) implementation on the ManArray architecture

Efficient complex multiplication and fast fourier transform (FFT) implementation on the ManArray architecture ( 6 of 11 ) United States Patent Application 20040221137 Kind Code Pitsianis, Nikos P. ; et al. November 4, 2004 Efficient complex multiplication and fast fourier transform (FFT) implementation on the

More information

A Rotated Array Clustered Extended Hypercube Processor: The RACE-H Processor

A Rotated Array Clustered Extended Hypercube Processor: The RACE-H Processor 4 A Rotated Array Clustered Extended Hypercube Processor: The RACE-H Processor Gerald G. Pechanek Lightning Hawk Consulting Inc. and Priest & Goldstein, PLLC Mihailo Stojancic ViCore Technologies Inc.

More information

An Introduction to an Array Memory Processor for Application Specific Acceleration

An Introduction to an Array Memory Processor for Application Specific Acceleration An Introduction to an Array Memory Processor for Application Specific Acceleration Gerald G. Pechanek Nikos Pitsianis Independent Consultant Department of Electrical and Computer Engineering Department

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

SHARED MEMORY VS DISTRIBUTED MEMORY

SHARED MEMORY VS DISTRIBUTED MEMORY OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers Y. Latha Post Graduate Scholar, Indur institute of Engineering & Technology, Siddipet K.Padmavathi Associate. Professor,

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

COMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS

COMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS International Journal of Computer Engineering and Applications, Volume VII, Issue II, Part II, COMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS Sanjukta

More information

Global Scheduler. Global Issue. Global Retire

Global Scheduler. Global Issue. Global Retire The Delft-Java Engine: An Introduction C. John Glossner 1;2 and Stamatis Vassiliadis 2 1 Lucent / Bell Labs, Allentown, Pa. 2 Delft University oftechnology, Department of Electrical Engineering Delft,

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier

Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier Vivek. V. Babu 1, S. Mary Vijaya Lense 2 1 II ME-VLSI DESIGN & The Rajaas Engineering College Vadakkangulam, Tirunelveli 2 Assistant Professor

More information

Communication Performance in Network-on-Chips

Communication Performance in Network-on-Chips Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In

More information

CSC630/CSC730: Parallel Computing

CSC630/CSC730: Parallel Computing CSC630/CSC730: Parallel Computing Parallel Computing Platforms Chapter 2 (2.4.1 2.4.4) Dr. Joe Zhang PDC-4: Topology 1 Content Parallel computing platforms Logical organization (a programmer s view) Control

More information

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2 Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99

More information

Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays

Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 19 25 October 2018 Topics for

More information

Systolic Arrays. Presentation at UCF by Jason HandUber February 12, 2003

Systolic Arrays. Presentation at UCF by Jason HandUber February 12, 2003 Systolic Arrays Presentation at UCF by Jason HandUber February 12, 2003 Presentation Overview Introduction Abstract Intro to Systolic Arrays Importance of Systolic Arrays Necessary Review VLSI, definitions,

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Node-Disjoint Paths in Hierarchical Hypercube Networks

Node-Disjoint Paths in Hierarchical Hypercube Networks Node-Disjoint aths in Hierarchical Hypercube Networks Ruei-Yu Wu 1,GerardJ.Chang 2,Gen-HueyChen 1 1 National Taiwan University 2 National Taiwan University Dept. of Computer Science and Information Engineering

More information

Time-Optimal Algorithm for Computing the Diameter of a Point Set on a Completely Overlapping Network

Time-Optimal Algorithm for Computing the Diameter of a Point Set on a Completely Overlapping Network Time-Optimal Algorithm for Computing the Diameter of a Point Set on a Completely Overlapping Network Prapaporn Techa-angkoon and Saowaluk Rattanaudomsawat Abstract- Given a finite set P of n points in

More information

A 90k gate CLB for Parallel Distributed Computing

A 90k gate CLB for Parallel Distributed Computing A 90k gate CLB for Parallel Distributed Computing Bruce chulman 1 and Gerald Pechanek 2 1 BOP, Inc. Palo Alto, CA bruces@bops.com 2 BOP, Inc. Chapel Hill, NC gpechanek@bops.com Abstract. A reconfigurable

More information

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Sivakumar Harinath 1, Robert L. Grossman 1, K. Bernhard Schiefer 2, Xun Xue 2, and Sadique Syed 2 1 Laboratory of

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Online Hardware Task Scheduling and Placement Algorithm on Partially Reconfigurable Devices

Online Hardware Task Scheduling and Placement Algorithm on Partially Reconfigurable Devices Online Hardware Task Scheduling and Placement Algorithm on Partially Reconfigurable Devices Thomas Marconi, Yi Lu, Koen Bertels, and Georgi Gaydadjiev Computer Engineering Laboratory, EEMCS TU Delft, The

More information

Transposing Arrays on Multicomputers Using de Bruijn Sequences

Transposing Arrays on Multicomputers Using de Bruijn Sequences Transposing Arrays on Multicomputers Using de Bruijn Sequences by Paul N. Swarztrauber 1 J. Parallel Distrib. Comput., 53 (1998) pp. 63-77. ABSTRACT Transposing an N N array that is distributed row- or

More information

The MorphoSys Parallel Reconfigurable System

The MorphoSys Parallel Reconfigurable System The MorphoSys Parallel Reconfigurable System Guangming Lu 1, Hartej Singh 1,Ming-hauLee 1, Nader Bagherzadeh 1, Fadi Kurdahi 1, and Eliseu M.C. Filho 2 1 Department of Electrical and Computer Engineering

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Increasing interconnection network connectivity for reducing operator complexity in asynchronous vision systems

Increasing interconnection network connectivity for reducing operator complexity in asynchronous vision systems Increasing interconnection network connectivity for reducing operator complexity in asynchronous vision systems Valentin Gies and Thierry M. Bernard ENSTA, 32 Bd Victor 75015, Paris, FRANCE, contact@vgies.com,

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

A Low-Power Carry Skip Adder with Fast Saturation

A Low-Power Carry Skip Adder with Fast Saturation A Low-Power Carry Skip Adder with Fast Saturation Michael Schulte,3, Kai Chirca,2, John Glossner,2,Suman Mamidi,3, Pablo Balzola, and Stamatis Vassiliadis 2 Sandbridge Technologies, Inc. White Plains,

More information

Node-Independent Spanning Trees in Gaussian Networks

Node-Independent Spanning Trees in Gaussian Networks 4 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16 Node-Independent Spanning Trees in Gaussian Networks Z. Hussain 1, B. AlBdaiwi 1, and A. Cerny 1 Computer Science Department, Kuwait University,

More information

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 6c High-Speed Multiplication - III Spring 2017 Koren Part.6c.1 Array Multipliers The two basic operations - generation

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Linear Arrays. Chapter 7

Linear Arrays. Chapter 7 Linear Arrays Chapter 7 1. Basics for the linear array computational model. a. A diagram for this model is P 1 P 2 P 3... P k b. It is the simplest of all models that allow some form of communication between

More information

Prefix Computation and Sorting in Dual-Cube

Prefix Computation and Sorting in Dual-Cube Prefix Computation and Sorting in Dual-Cube Yamin Li and Shietung Peng Department of Computer Science Hosei University Tokyo - Japan {yamin, speng}@k.hosei.ac.jp Wanming Chu Department of Computer Hardware

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

A course on Parallel Computer Architecture with Projects Subramaniam Ganesan Oakland University, Rochester, MI

A course on Parallel Computer Architecture with Projects Subramaniam Ganesan Oakland University, Rochester, MI Abstract: A course on Parallel Computer Architecture with Projects Subramaniam Ganesan Oakland University, Rochester, MI 48309 Ganesan@oakland.edu This paper describes integration of simple design projects

More information

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

Implementation of a FIR Filter on a Partial Reconfigurable Platform

Implementation of a FIR Filter on a Partial Reconfigurable Platform Implementation of a FIR Filter on a Partial Reconfigurable Platform Hanho Lee and Chang-Seok Choi School of Information and Communication Engineering Inha University, Incheon, 402-751, Korea hhlee@inha.ac.kr

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DESIGN OF QUATERNARY ADDER FOR HIGH SPEED APPLICATIONS MS. PRITI S. KAPSE 1, DR.

More information

JOURNAL OF INTERNATIONAL ACADEMIC RESEARCH FOR MULTIDISCIPLINARY Impact Factor 1.393, ISSN: , Volume 2, Issue 7, August 2014

JOURNAL OF INTERNATIONAL ACADEMIC RESEARCH FOR MULTIDISCIPLINARY Impact Factor 1.393, ISSN: , Volume 2, Issue 7, August 2014 DESIGN OF HIGH SPEED BOOTH ENCODED MULTIPLIER PRAVEENA KAKARLA* *Assistant Professor, Dept. of ECONE, Sree Vidyanikethan Engineering College, A.P., India ABSTRACT This paper presents the design and implementation

More information

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,

More information

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2 CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #4 1/24/2018 Xuehai Qian xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Announcements PA #1

More information

COURSE DESCRIPTION. CS 232 Course Title Computer Organization. Course Coordinators

COURSE DESCRIPTION. CS 232 Course Title Computer Organization. Course Coordinators COURSE DESCRIPTION Dept., Number Semester hours CS 232 Course Title Computer Organization 4 Course Coordinators Badii, Joseph, Nemes 2004-2006 Catalog Description Comparative study of the organization

More information

An Optical Data -Flow Computer

An Optical Data -Flow Computer An ptical Data -Flow Computer Ahmed Louri Department of Electrical and Computer Engineering The University of Arizona Tucson, Arizona 85721 Abstract For many applications, such as signal and image processing,

More information

Balance of Processing and Communication using Sparse Networks

Balance of Processing and Communication using Sparse Networks Balance of essing and Communication using Sparse Networks Ville Leppänen and Martti Penttonen Department of Computer Science University of Turku Lemminkäisenkatu 14a, 20520 Turku, Finland and Department

More information

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued) Cluster Computing Dichotomy of Parallel Computing Platforms (Continued) Lecturer: Dr Yifeng Zhu Class Review Interconnections Crossbar» Example: myrinet Multistage» Example: Omega network Outline Flynn

More information

Hyper-Butterfly Network: A Scalable Optimally Fault Tolerant Architecture

Hyper-Butterfly Network: A Scalable Optimally Fault Tolerant Architecture Hyper-Butterfly Network: A Scalable Optimally Fault Tolerant Architecture Wei Shi and Pradip K Srimani Department of Computer Science Colorado State University Ft. Collins, CO 80523 Abstract Bounded degree

More information

group 0 group 1 group 2 group 3 (1,0) (1,1) (0,0) (0,1) (1,2) (1,3) (3,0) (3,1) (3,2) (3,3) (2,2) (2,3)

group 0 group 1 group 2 group 3 (1,0) (1,1) (0,0) (0,1) (1,2) (1,3) (3,0) (3,1) (3,2) (3,3) (2,2) (2,3) BPC Permutations n The TIS-Hypercube ptoelectronic Computer Sartaj Sahni and Chih-fang Wang Department of Computer and Information Science and ngineering University of Florida Gainesville, FL 32611 fsahni,wangg@cise.u.edu

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

RCC-Full: An Effective Network for Parallel Computations 1

RCC-Full: An Effective Network for Parallel Computations 1 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 41, 139 155 (1997) ARTICLE NO. PC961295 RCC-Full: An Effective Network for Parallel Computations 1 Mounir Hamdi*,2 and Richard W. Hall *Department of Computer

More information

The Recursive Dual-net and its Applications

The Recursive Dual-net and its Applications The Recursive Dual-net and its Applications Yamin Li 1, Shietung Peng 1, and Wanming Chu 2 1 Department of Computer Science Hosei University Tokyo 184-8584 Japan {yamin, speng}@k.hosei.ac.jp 2 Department

More information

An Efficient List-Ranking Algorithm on a Reconfigurable Mesh with Shift Switching

An Efficient List-Ranking Algorithm on a Reconfigurable Mesh with Shift Switching IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.6, June 2007 209 An Efficient List-Ranking Algorithm on a Reconfigurable Mesh with Shift Switching Young-Hak Kim Kumoh National

More information

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011 CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

MULTIPROCESSOR system has been used to improve

MULTIPROCESSOR system has been used to improve arallel Vector rocessing Using Multi Level Orbital DATA Nagi Mekhiel Abstract Many applications use vector operations by applying single instruction to multiple data that map to different locations in

More information

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture Robert S. French April 5, 1989 Abstract Computational origami is a parallel-processing concept in which

More information

A Parallel Algorithm for Minimum Cost Path Computation on Polymorphic Processor Array

A Parallel Algorithm for Minimum Cost Path Computation on Polymorphic Processor Array A Parallel Algorithm for Minimum Cost Path Computation on Polymorphic Processor Array P. Baglietto, M. Maresca and M. Migliardi DIST - University of Genoa via Opera Pia 13-16145 Genova, Italy email baglietto@dist.unige.it

More information

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami Dept. Electrical & Computer Eng. Univ. of California, Santa Barbara Parallel Computer Architecture

More information

CS Parallel Algorithms in Scientific Computing

CS Parallel Algorithms in Scientific Computing CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan

More information

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 1, Feb 2015, 01-07 IIST HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

An Enhanced Mixed-Scaling-Rotation CORDIC algorithm with Weighted Amplifying Factor

An Enhanced Mixed-Scaling-Rotation CORDIC algorithm with Weighted Amplifying Factor SEAS-WP-2016-10-001 An Enhanced Mixed-Scaling-Rotation CORDIC algorithm with Weighted Amplifying Factor Jaina Mehta jaina.mehta@ahduni.edu.in Pratik Trivedi pratik.trivedi@ahduni.edu.in Serial: SEAS-WP-2016-10-001

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 6c High-Speed Multiplication - III Israel Koren Fall 2010 ECE666/Koren Part.6c.1 Array Multipliers

More information

Single Pass Connected Components Analysis

Single Pass Connected Components Analysis D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected

More information

Recursive Dual-Net: A New Universal Network for Supercomputers of the Next Generation

Recursive Dual-Net: A New Universal Network for Supercomputers of the Next Generation Recursive Dual-Net: A New Universal Network for Supercomputers of the Next Generation Yamin Li 1, Shietung Peng 1, and Wanming Chu 2 1 Department of Computer Science Hosei University Tokyo 184-8584 Japan

More information

Dynamic Partial Reconfigurable FIR Filter Design

Dynamic Partial Reconfigurable FIR Filter Design Dynamic Partial Reconfigurable FIR Filter Design Yeong-Jae Oh, Hanho Lee, and Chong-Ho Lee School of Information and Communication Engineering Inha University, Incheon, Korea rokmcno6@gmail.com, {hhlee,

More information

An Empirical Comparison of Area-Universal and Other Parallel Computing Networks

An Empirical Comparison of Area-Universal and Other Parallel Computing Networks Loyola University Chicago Loyola ecommons Computer Science: Faculty Publications and Other Works Faculty Publications 9-1996 An Empirical Comparison of Area-Universal and Other Parallel Computing Networks

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication

More information

Efficient Radix-4 and Radix-8 Butterfly Elements

Efficient Radix-4 and Radix-8 Butterfly Elements Efficient Radix4 and Radix8 Butterfly Elements Weidong Li and Lars Wanhammar Electronics Systems, Department of Electrical Engineering Linköping University, SE581 83 Linköping, Sweden Tel.: +46 13 28 {1721,

More information

High Performance Datacenter Networks

High Performance Datacenter Networks M & C Morgan & Claypool Publishers High Performance Datacenter Networks Architectures, Algorithms, and Opportunity Dennis Abts John Kim SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE Mark D. Hill, Series

More information

Interconnection Networks

Interconnection Networks Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially

More information

The Complexity of FFT and Related Butterfly Algorithms on Meshes and Hypermeshes

The Complexity of FFT and Related Butterfly Algorithms on Meshes and Hypermeshes The Complexity of FFT and Related Butterfly Algorithms on Meshes and Hypermeshes T.H. Szymanski McGill University, Canada Abstract Parallel FFT data-flow graphs based on a Butterfly graph followed by a

More information

Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology

Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology EE382C: Embedded Software Systems Final Report David Brunke Young Cho Applied Research Laboratories:

More information

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018 RESEARCH ARTICLE DESIGN AND ANALYSIS OF RADIX-16 BOOTH PARTIAL PRODUCT GENERATOR FOR 64-BIT BINARY MULTIPLIERS K.Deepthi 1, Dr.T.Lalith Kumar 2 OPEN ACCESS 1 PG Scholar,Dept. Of ECE,Annamacharya Institute

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

Hypercubes. (Chapter Nine)

Hypercubes. (Chapter Nine) Hypercubes (Chapter Nine) Mesh Shortcomings: Due to its simplicity and regular structure, the mesh is attractive, both theoretically and practically. A problem with the mesh is that movement of data is

More information

Vertical-Horizontal Binary Common Sub- Expression Elimination for Reconfigurable Transposed Form FIR Filter

Vertical-Horizontal Binary Common Sub- Expression Elimination for Reconfigurable Transposed Form FIR Filter Vertical-Horizontal Binary Common Sub- Expression Elimination for Reconfigurable Transposed Form FIR Filter M. Tirumala 1, Dr. M. Padmaja 2 1 M. Tech in VLSI & ES, Student, 2 Professor, Electronics and

More information

Embedding Large Complete Binary Trees in Hypercubes with Load Balancing

Embedding Large Complete Binary Trees in Hypercubes with Load Balancing JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 35, 104 109 (1996) ARTICLE NO. 0073 Embedding Large Complete Binary Trees in Hypercubes with Load Balancing KEMAL EFE Center for Advanced Computer Studies,

More information

Design methodology for programmable video signal processors. Andrew Wolfe, Wayne Wolf, Santanu Dutta, Jason Fritts

Design methodology for programmable video signal processors. Andrew Wolfe, Wayne Wolf, Santanu Dutta, Jason Fritts Design methodology for programmable video signal processors Andrew Wolfe, Wayne Wolf, Santanu Dutta, Jason Fritts Princeton University, Department of Electrical Engineering Engineering Quadrangle, Princeton,

More information

Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors

Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors Peter Brezany 1, Alok Choudhary 2, and Minh Dang 1 1 Institute for Software Technology and Parallel

More information

Chap. 2 part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chap. 2 part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1 Chap. 2 part 1 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 Provocative question (p30) How much do we need to know about the HW to write good par. prog.? Chap. gives HW background knowledge

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

Abstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs

Abstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs Implementation of Split Radix algorithm for length 6 m DFT using VLSI J.Nancy, PG Scholar,PSNA College of Engineering and Technology; S.Bharath,Assistant Professor,PSNA College of Engineering and Technology;J.Wilson,Assistant

More information

Slim Fly: A Cost Effective Low-Diameter Network Topology

Slim Fly: A Cost Effective Low-Diameter Network Topology TORSTEN HOEFLER, MACIEJ BESTA Slim Fly: A Cost Effective Low-Diameter Network Topology Images belong to their creator! NETWORKS, LIMITS, AND DESIGN SPACE Networks cost 25-30% of a large supercomputer Hard

More information

Performance Assessment of Wavelength Routing Optical Networks with Regular Degree-Three Topologies of Minimum Diameter

Performance Assessment of Wavelength Routing Optical Networks with Regular Degree-Three Topologies of Minimum Diameter Performance Assessment of Wavelength Routing Optical Networks with Regular Degree-Three Topologies of Minimum Diameter RUI M. F. COELHO 1, JOEL J. P. C. RODRIGUES 2, AND MÁRIO M. FREIRE 2 1 Superior Scholl

More information

CS575 Parallel Processing

CS575 Parallel Processing CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.

More information

DIGITAL SIGNAL PROCESSING AND ITS USAGE

DIGITAL SIGNAL PROCESSING AND ITS USAGE DIGITAL SIGNAL PROCESSING AND ITS USAGE BANOTHU MOHAN RESEARCH SCHOLAR OF OPJS UNIVERSITY ABSTRACT High Performance Computing is not the exclusive domain of computational science. Instead, high computational

More information

SC: Prototypes for Interactive Architecture

SC: Prototypes for Interactive Architecture SC: Prototypes for Interactive Architecture Henriette Bier 1, Kathleen de Bodt 2, and Jerry Galle 3 1 Delft University of Technology, Berlageweg 1, 2628 CR, Delft, The Netherlands 2 Institute of Architectural

More information