A study on SIMD architecture

Size: px
Start display at page:

Download "A study on SIMD architecture"

Transcription

1 A study on SIMD architecture Gürkan Solmaz, Rouhollah Rahmatizadeh and Mohammad Ahmadian Department of Electrical Engineering and Computer Science University of Central Florida Abstract Single instruction, multiple data (SIMD) architectures became popular with the demanding increase on data streaming applications such as real-time games and video processing. Since modern processors in desktop computers support SIMD instructions with various implementations, we may use these machines for optimizing applications in which we process multiple data with single instructions. In this project, we study the use SIMD architectures and learn their effects on the performance of specific applications. We choose matrix multiplication and Advanced Encryption Standard (AES) encryption algorithms and modify them to exploit the use of SIMD instructions. The performance improvements using the SIMD instructions are analyzed and validated by the experimental study. I. INTRODUCTION Performance in a computer system is defined by the amount of useful work accomplished by the computer system compared to the time and the resources used. There are several aspects for improving performance of computer systems. Researchers from several areas are striving to achieve higher performance ranging from algorithm, compiler, OS, and hardware designers. Nowadays, most CPU designs contain at least some vector processing instructions, typically referred to as SIMD in which typically operate on a few vectors elements per clock cycle in a pipeline. These vector processors run multiple mathematical operations on multiple data elements simultaneously. Thus, they have effects on the performance equation. From the Iron law, we know that performance of a program is calculated by the formula P erformance := IC CP I CT, where IC is the number of instructions (instruction count), CP I is the number of cycles per instruction and CT is the cycle time of the processor. The use of SIMD architecture changes IC and CP I values of a program. With the new enhancements in the processor architectures, the current modern processors started supporting 256- bit vector implementations. Moreover, the interest on SIMD architectures by the computer architecture research community is increasing. In the near future, the new powerful machines are expected to make new high performance multiple data applications available with the enhanced SIMD architectures. In this study, we target the SIMD architecture and its effects on performance of some designed cases. Then, we analyze the performance of our approach by evaluating the speed-up values of the programs using SIMD architecture. One of the problems we challenge is the configuration of compilers to output vector instruction in binary executable file. The other challenge is designing case study for calculating of speed up. For this reason, matrix multiplication and the AES encryption algorithms are chosen as the best candidates in which we need to do several linear binary operations on multiple data. For implementation part of this research, we implement it on single machine with minimum load as test platform for measuring the performance. As a group of graduate students taking the CDA 5106 course, we try to get involved in this challenge by doing research on the SIMD CPU architecture. In this project, we worked in almost all phases together as a team, having regular meetings before and after the milestones. To sum up, Gürkan worked on the implementation and documentation, prepared the final report, made research for the history and related studies. Rouhollah worked on implementation of the optimized algorithms and documentation phases. He also conducted the experiments. Mohammad proposed the main idea, worked on the proposal, documentation for the benchmarks, and analysis of the results. The rest of the paper is organized as follows. Section II briefly summarizes the history of SIMD architectures and the related work. We describe SIMD and some architectures in Section III. We provide a detailed description for our benchmarks in Section IV. The results of the experiments are presented in Section V. We finally conclude in Section VI. II. RELATED WORK Let us briefly discuss the history of the SIMD architectures and the related work in the literature. The first use of SIMD instructions was in early 1970s [1]. As an example, they were used in CDC Star 100 and TI ASC machines which could do the same operation on a bunch of data. This architecture specially became common when Cray Inc. used it in their supercomputers. However, vector processing used in these machines nowadays are considered different from the SIMD machines. Thinking Machines CM-1 and CM-2 which are considered as massively parallel processing-style supercomputers [2], started a new era in using SIMD for processing the data in parallel. However, current researches focus on using SIMD instructions in desktop computers. A lot of tasks desktop computers do these days like video processing and real-time gaming need to do the same operation on a bunch of data. So, companies tried to use this architecture in desktops. As one of the earliest attempts, Sun Microsystems introduced SIMD integer instructions in VIS (visual instruction set) extensions in UltraSPARC I microprocessor in MIPS introduced MDMX (MIPS Digital Media extension). Intel made SIMD widely-used by introducing MMX extensions to the x86 architecture in Then Motorolla introduced AltiVec system in its PowerPC s which also was used in IBM s POWER systems. So, this caused the Intels respond which was introduction of SSE. These days SSE and its extensions are used more than the others.

2 Fig. 3. Data processing with SISD vs. SIMD. Taken from [6]. Fig. 1. Processor array in a supercomputer. Taken from [5]. Fig. 2. The relationship between processor array and the host computer. Taken from [5]. There are various studies in the literature which are conducted by research groups or companies which focus on hardware. Holzer-Graf et al. [3] studied the efficient vector implementations of AES-based designs. In this paper three different vector implementations are analyzed and the performance of each of them is compared. The use of chip multiprocessing and the cell broadband engine is described by Gschwind [4]. III. SIMD ARCHITECTURE To understand the improvements being made on SIMD architectures, let us first start with the older SIMD architectures. Earlier versions of SIMD architectures are proposed for supercomputers [1] which have a number of processing units (elements) and a control unit. In these machines, the processing units (PUs) are the pieces which make the computation while the control unit controls these array of processing elements (PEs). The single control unit is generally responsible for reading the instructions, decoding the instructions and sending control signals to the PEs. Data are supplied to PEs by a memory. In this architecture, the number of data paths from the memory to the PEs is equal to the number of PEs. The supercomputers also have an specific interconnection networks. The interconnection networks provide flexibility for data from and to PEs with high performance. They also have an I/O system which have differences from one machine to another. Figure 1 illustrates processor array architectures in supercomputers including memory modules, interconnection network, PEs, control unit and the I/O system. In some supercomputers, processing elements are controlled by a host computer, which is illustrated in Figure 2. Supercomputers which are categorized as multiple instruction, multiple data (MIMD) in Flynn s taxonomy is became popular and this caused the reduce in the interest in SIMD machines for some period of time. Enhancements on the desktop computers lead to powerful machines which are strong enough to handle applications such as video processing. Therefore, the SIMD architectures again became popular in 1990s. SIMD architectures exploit a property of data stream called as data parallelism. SIMD computing is also known as vector processing, considering the row of data coming to the processor as vectors of data. It is almost impossible to have applications which are purely parallel and the pure use of SIMD computing is not possible. Hence, in applications of SIMD computing, the programs are written for single instruction, single data (SISD) machines and they include SIMD instructions. The proportion of sequential part and the SIMD part in the program determines the maximum speed-up according to Amdahl s law. Figure 3 shows the data exploitation by a SIMD machine which processes 3 vectors at the same time, compared to an SISD machine which is able to process one row of data. Length of vectors in a SIMD processor determines the number of elements of a given data type. For instance, a 128-bit vector implementation in a processor allows us to do four-way single-precision floatingpoint operations. We may categorize the SIMD operations by their types. The one obvious operation type is the intra-element arithmetic and non-arithmetic operations. Addition, multiplication, and substraction are example arithmetic operations, while AN D,

3 Fig. 4. [6]. Intra-element arithmetic and non-arithmetic operations. Taken from Fig. 6. AltiVec architecture with 4 distinct registers. Taken from [6]. Fig. 5. Inter-element operations between the elements of a vector. Taken from [6]. XOR, and OR are examples of non-arithmetic operations. Figure 4 illustrates the intra-element operations with two source vectors V A and V B and a destination vector V T. Each of these vectors contain 4 registers with 32-bit. This means we can have operations on 2 vectors and each of the vectors include 4 integers or floating points. Similarly, we can process two vectors with 2 double values inside each of them. The other operation type is inter-element arithmetic and non-arithmetic operations. This operations are between the elements of a single vector. Vector permutes, logical shifts are example inter-element operations 5. To understand the modern SIMD architectures better, let us now briefly discuss some of them. We start with AltiVec architecture which is used by many companies including Apple and Motorola. The illustration of the AltiVec architecture is shown in Figure 6. As one can see from this figure, there are 4 distinct set of registers in AltiVec. Vectors V A and V B are the source vectors while vector V T is the destination vector. Source registers hold the operands while the destination registers hold the value of the result. The new vector V C is called as filter or modifier. This vector is useful for many operations including vector permutations. In vector permutation operation, the vector V C holds the values to show the elements in each of the vectors V A and V B their new locations in the destination vector V T. An illustration of this procedure can be seen in Figure 7. Intel introduced MMX/SSE architecture which is capable of SIMD operations for integers and floating points with the new MMX series of processors. They used Katmai new instructions (KNI) for the new processors. As in Figure 9, they added bit new registers and called the new ones SSE registers. Fig. 7. Vector permutation with AltiVec architecture. Taken from [6]. They had a floating point unit along with the SIMD units in their processors. They also had a switch mechanism in their processors, which allow the processor to change its mode from SISD to SIMD. Figure 8 shows the architecture of a 128-bit Intel MMX/SSE architecture. The 128-bit processor is considered as a combination of 2 64-bit processing units. This enables to do two different operations at the same time with these processors. Nowadays, Intel s 256-bit processor for SIMD architecture is available. Here, we finish our discussion for SIMD architectures by giving significant architecture examples and come to the next level in the study. The main idea of this study is to use these architectures with convenient benchmarks to evaluate their performance improvements for various programs. Therefore, the benchmarks which are used in the study are described in detail in the next section. IV. MATRIX MULTIPLICATION AND AES There is no doubt that future processors will differ significantly from the current designs and will reshape the way of thinking about programming. SIMD is one of the most important advancements of modern CPUs. This research deals with SIMD and how you could use it to increase performance of programs. The main goal in our research is to use SIMD instruction set for improving programs performance. The first problem we encountered is that there is no SIMD implemented benchmarks for assessing the performance. So, we created our own tools to assess the improvement in performance. In the next part

4 Fig. 8. Intel MMX/SSE architecture. Taken from [6]. Fig. 10. Ordinary implementation of matrix multiplication. Fig. 11. Matrix multiplication with just 3 SIMD instructions. Fig. 9. [6]. 128-bit Intel MMX/SSE architecture with SSE registers. Taken from we will briefly describe the types of vector operations were available for the implementation [7]. V V Example: Complement all elements S V Examples: Min, Max, Sum V V x V Examples: Vector addition, multiplication, division V V x S Examples: Multiply or add a scalar to a vector S V x V Example: Calculate an element of a matrix We choose matrix multiplication and AES for the benchmarks. There are two reasons behind that. First, they are being used in many applications in desktop computers and also mobile devices these days. Second, they can be optimized by SIMD instructions because they do some instructions on a bunch of data. Now, we briefly describe the benchmarks. We chose matrix multiplication and AES for the benchmarks. A. Matrix multiplication Matrix multiplication is one of the best candidates to be optimized by SIMD instruction, because it deals with matrices which consist of arrays. Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra. It forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. Figure 10 shows a naive implementation of matrix multiplication with so many scalar instructions. By applying SIMD instructions, the number of executed instructions can decrease. Also, we will achieve noticeable performance improvement. Figure 11 shows the listing of code with SIMD instruction set. B. AES encryption algorithm AES is other candidate for implementation by SIMD instructions. It has vast applications everywhere in computer systems from mobile device to distributed data centers. AES is a block cipher algorithm that has been selected by the U.S. National Institute of Standards and Technology (NIST). It was selected by contest from a list of five finalists that were themselves selected from an original list of more than 15 submissions. AES will begin to supplant the Data Encryption Standard (DES) - and the later Triple DES - over the next few years in many cryptography applications. The algorithm was designed by two Belgian cryptologists, Vincent Rijmen and Joan Daemen [8], whose surnames are reflected in the cipher s name. The cipher has a variable block length and key length. The authors currently try to specify how to use keys with a length of 128, 192 or 256 bits, to encrypt blocks with a length of 128, 192 or 256 bits (all nine combinations of key length and block length are possible). Both block length and key length can be extended very easily to multiples of 32 bits. Figure 12 illustrates the flow chart of the algorithm that we use. AES can be implemented very efficiently on a wide range of processors. We have implemented it using the SIMD instructions.

5 Fig. 12. Flow chart for the AES algorithm. Fig. 14. Time to do 512x512 multiplication in milliseconds. Fig. 13. [9]. Performance gains using streaming SIMD extensions. Taken from A. Implementation V. EXPERIMENTAL STUDY Basically, there are two ways to implement an algorithm using SIMD instructions. First, it is possible to implement the algorithm using the normal instructions and use a compiler which automatically converts the instructions to SIMD instruction wherever it is possible. Second, to optimize the algorithm by writing the SIMD instructions manually. In the first method, Microsoft Visual Studio is not able to do this task at this time, but, Intel has proposed a compiler which is capable of doing this. However, the most optimized version can be achieved by manually implementing an algorithm and considering the most proper instructions for the implementation. So, we chose the second method which gives the best result. We implemented applications using C++ language and used Microsoft Visual Studio to compile them. We disabled all the compiler optimization features to make sure that the optimization is only because of using SIMD instructions, not compiler optimizations like loop unrolling. For having an acceptable estimation of number of cycles the algorithms need to be executed, we closed all unnecessary processes and ran the implemented function times. Then, we divided the number of cycles achieved by the number of executions of the function. So, we found the average number of cycles needed for the algorithm to be executed. B. Results First benchmark was matrix multiplication. Table 13 shows the results of its implementation by Intel. Most of the imple- Fig. 15. AES encryption (cycles/byte). mentations of matrix multiplication is doing the task on fixed dimension matrices. This makes the algorithm faster, because they do not use the loops which have their own overhead and cannot be written using SIMD instructions. However, our implementation of matrix multiplication can do this task on variable matrix dimensions. Considering this point, our optimization is still comparable to the Intel implementation. Figure 14 compares the result of our implementation of a naive implementation of matrix multiplication with the original algorithm. It is about 2 times faster than the original algorithm. Figure 15 shows the result of our optimization of AES encryption which is about 23 times faster. We will give some explanation about these results. First of all, by using SIMD instructions, wherever we replace a normal instruction with an SIMD one we decrease the number of instructions by a factor of four. However, not all the instructions are not replaceable by SIMD instructions. In addition, in average, number of cycles needed for the SIMD instructions to be executed is more than the normal instructions for doing the same task. But, considering the Irons law, decrease in number of instructions will affect the final performance more than increase in the average number of cycles for instructions to be executed. So, successful optimization of an algorithm depends

6 on two factors. First, how many instructions are replaceable by SIMD instructions. Second, what combination of instructions are used in the SIMD implementation. The second is important because different SIMD instructions need different number of cycles to be executed. VI. CONCLUSION In this paper, we studied the SIMD architectures. We summarized the history of these architectures and explained some of the new architectures. We proposed use of SIMD for two applications, matrix multiplication and AES. The operations which are done for matrix multiplication and the AES algorithm is described in detail. The experimental results showed that the use of SIMD significantly improves the performance of each of the programs. REFERENCES [1] Wikipedia, SIMD. 2013, [Accessed 15-April-2013]. [Online]. Available: [2] D. A. Patterson and J. L. Hennessey, Computer Organization and Design: the Hardware/Software Interface., [3] T. K. M. P. M. S. P. S. D. S. W. W. Holzer-Graf, Severin, Efficient vector implementations of AES-Based designs: A case study and new implemenations for Grstl. Topics in Cryptology CTRSA, pp , [4] M. Gschwind, Chip multiprocessing and the cell broadband engine. Computing Frontiers, [5] L. W. Harold W. Lawson, Bertil Svensson, Parallel processing in industrial real-time applications. Prentice Hall, [6] J. Stokes, SIMD architectures. Ars Technica, March [7] I. Corp., Intel advanced vector extensions programming reference. June [8] J. Daemen and V. Rijmen, The design of Rijndael: AES-the advanced encryption standard. Springer, [9] I. Corp., Streaming SIMD extensions - matrix multiplication. AP-930, June 1999.

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

CMPE 655 Multiple Processor Systems. SIMD/Vector Machines. Daniel Terrance Stephen Charles Rajkumar Ramadoss

CMPE 655 Multiple Processor Systems. SIMD/Vector Machines. Daniel Terrance Stephen Charles Rajkumar Ramadoss CMPE 655 Multiple Processor Systems SIMD/Vector Machines Daniel Terrance Stephen Charles Rajkumar Ramadoss SIMD Machines - Introduction Computers with an array of multiple processing elements (PE). Similar

More information

FAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH

FAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH Key words: Digital Signal Processing, FIR filters, SIMD processors, AltiVec. Grzegorz KRASZEWSKI Białystok Technical University Department of Electrical Engineering Wiejska

More information

Computer Architecture and Organization

Computer Architecture and Organization 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 10 Advanced Computer Architecture 10-2 Chapter 10 - Advanced Computer

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

Hakam Zaidan Stephen Moore

Hakam Zaidan Stephen Moore Hakam Zaidan Stephen Moore Outline Vector Architectures Properties Applications History Westinghouse Solomon ILLIAC IV CDC STAR 100 Cray 1 Other Cray Vector Machines Vector Machines Today Introduction

More information

Parallel Processors. Session 1 Introduction

Parallel Processors. Session 1 Introduction Parallel Processors Session 1 Introduction Applications of Parallel Processors Structural Analysis Weather Forecasting Petroleum Exploration Fusion Energy Research Medical Diagnosis Aerodynamics Simulations

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

RISC Processors and Parallel Processing. Section and 3.3.6

RISC Processors and Parallel Processing. Section and 3.3.6 RISC Processors and Parallel Processing Section 3.3.5 and 3.3.6 The Control Unit When a program is being executed it is actually the CPU receiving and executing a sequence of machine code instructions.

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Chapter 1. Introduction To Computer Systems

Chapter 1. Introduction To Computer Systems Chapter 1 Introduction To Computer Systems 1.1 Historical Background The first program-controlled computer ever built was the Z1 (1938). This was followed in 1939 by the Z2 as the first operational program-controlled

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Introduction to Cryptology. Lecture 17

Introduction to Cryptology. Lecture 17 Introduction to Cryptology Lecture 17 Announcements HW7 due Thursday 4/7 Looking ahead: Practical constructions of CRHF Start Number Theory background Agenda Last time SPN (6.2) This time Feistel Networks

More information

High-Performance Cryptography in Software

High-Performance Cryptography in Software High-Performance Cryptography in Software Peter Schwabe Research Center for Information Technology Innovation Academia Sinica September 3, 2012 ECRYPT Summer School: Challenges in Security Engineering

More information

Multi-core Programming - Introduction

Multi-core Programming - Introduction Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Architectures of Flynn s taxonomy -- A Comparison of Methods

Architectures of Flynn s taxonomy -- A Comparison of Methods Architectures of Flynn s taxonomy -- A Comparison of Methods Neha K. Shinde Student, Department of Electronic Engineering, J D College of Engineering and Management, RTM Nagpur University, Maharashtra,

More information

Parallel computer architecture classification

Parallel computer architecture classification Parallel computer architecture classification Hardware Parallelism Computing: execute instructions that operate on data. Computer Instructions Data Flynn s taxonomy (Michael Flynn, 1967) classifies computer

More information

Programmation Concurrente (SE205)

Programmation Concurrente (SE205) Programmation Concurrente (SE205) CM1 - Introduction to Parallelism Florian Brandner & Laurent Pautet LTCI, Télécom ParisTech, Université Paris-Saclay x Outline Course Outline CM1: Introduction Forms of

More information

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor

More information

CSE 260 Introduction to Parallel Computation

CSE 260 Introduction to Parallel Computation CSE 260 Introduction to Parallel Computation Larry Carter carter@cs.ucsd.edu Office Hours: AP&M 4101 MW 10:00-11 or by appointment 9/20/2001 Topics Instances Principles Theory Hardware specific machines

More information

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance

More information

Chapter 06: Instruction Pipelining and Parallel Processing. Lesson 14: Example of the Pipelined CISC and RISC Processors

Chapter 06: Instruction Pipelining and Parallel Processing. Lesson 14: Example of the Pipelined CISC and RISC Processors Chapter 06: Instruction Pipelining and Parallel Processing Lesson 14: Example of the Pipelined CISC and RISC Processors 1 Objective To understand pipelines and parallel pipelines in CISC and RISC Processors

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

Computer Organization and Design, 5th Edition: The Hardware/Software Interface Computer Organization and Design, 5th Edition: The Hardware/Software Interface 1 Computer Abstractions and Technology 1.1 Introduction 1.2 Eight Great Ideas in Computer Architecture 1.3 Below Your Program

More information

Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement

Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement Adil Gheewala*, Jih-Kwon Peir*, Yen-Kuang Chen**, Konrad Lai** *Department of CISE, University of Florida,

More information

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Using Intel Streaming SIMD Extensions for 3D Geometry Processing Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,

More information

Advanced Topics in Computer Architecture

Advanced Topics in Computer Architecture Advanced Topics in Computer Architecture Lecture 7 Data Level Parallelism: Vector Processors Marenglen Biba Department of Computer Science University of New York Tirana Cray I m certainly not inventing

More information

EECS4201 Computer Architecture

EECS4201 Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be

More information

Part VII Advanced Architectures. Feb Computer Architecture, Advanced Architectures Slide 1

Part VII Advanced Architectures. Feb Computer Architecture, Advanced Architectures Slide 1 Part VII Advanced Architectures Feb. 2011 Computer Architecture, Advanced Architectures Slide 1 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture:

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Cycles Per Instruction For This Microprocessor

Cycles Per Instruction For This Microprocessor What Is The Average Number Of Machine Cycles Per Instruction For This Microprocessor Wikipedia's Instructions per second page says that an i7 3630QM deliver ~110,000 It does reduce the number of "wasted"

More information

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions

CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions CS 61C: Great Ideas in Computer Architecture The Flynn Taxonomy, Intel SIMD Instructions Guest Lecturer: Alan Christopher 3/08/2014 Spring 2014 -- Lecture #19 1 Neuromorphic Chips Researchers at IBM and

More information

Lecture 25: Interrupt Handling and Multi-Data Processing. Spring 2018 Jason Tang

Lecture 25: Interrupt Handling and Multi-Data Processing. Spring 2018 Jason Tang Lecture 25: Interrupt Handling and Multi-Data Processing Spring 2018 Jason Tang 1 Topics Interrupt handling Vector processing Multi-data processing 2 I/O Communication Software needs to know when: I/O

More information

CS420/CSE 402/ECE 492. Introduction to Parallel Programming for Scientists and Engineers. Spring 2006

CS420/CSE 402/ECE 492. Introduction to Parallel Programming for Scientists and Engineers. Spring 2006 CS420/CSE 402/ECE 492 Introduction to Parallel Programming for Scientists and Engineers Spring 2006 1 of 28 Additional Foils 0.i: Course organization 2 of 28 Instructor: David Padua. 4227 SC padua@uiuc.edu

More information

ARCHITECTURES FOR PARALLEL COMPUTATION

ARCHITECTURES FOR PARALLEL COMPUTATION Datorarkitektur Fö 11/12-1 Datorarkitektur Fö 11/12-2 Why Parallel Computation? ARCHITECTURES FOR PARALLEL COMTATION 1. Why Parallel Computation 2. Parallel Programs 3. A Classification of Computer Architectures

More information

SAE5C Computer Organization and Architecture. Unit : I - V

SAE5C Computer Organization and Architecture. Unit : I - V SAE5C Computer Organization and Architecture Unit : I - V UNIT-I Evolution of Pentium and Power PC Evolution of Computer Components functions Interconnection Bus Basics of PCI Memory:Characteristics,Hierarchy

More information

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

Dan Stafford, Justine Bonnot

Dan Stafford, Justine Bonnot Dan Stafford, Justine Bonnot Background Applications Timeline MMX 3DNow! Streaming SIMD Extension SSE SSE2 SSE3 and SSSE3 SSE4 Advanced Vector Extension AVX AVX2 AVX-512 Compiling with x86 Vector Processing

More information

CSC2/458 Parallel and Distributed Systems Machines and Models

CSC2/458 Parallel and Distributed Systems Machines and Models CSC2/458 Parallel and Distributed Systems Machines and Models Sreepathi Pai January 23, 2018 URCS Outline Recap Scalability Taxonomy of Parallel Machines Performance Metrics Outline Recap Scalability Taxonomy

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

More information

An Introduction to Parallel Architectures

An Introduction to Parallel Architectures An Introduction to Parallel Architectures Andrea Marongiu a.marongiu@unibo.it Impact of Parallel Architectures From cell phones to supercomputers In regular CPUs as well as GPUs Parallel HW Processing

More information

Quiz for Chapter 1 Computer Abstractions and Technology

Quiz for Chapter 1 Computer Abstractions and Technology Date: Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

Embedded Systems Architecture. Computer Architectures

Embedded Systems Architecture. Computer Architectures Embedded Systems Architecture Computer Architectures M. Eng. Mariusz Rudnicki 1/18 A taxonomy of computer architectures There are many different types of architectures, and it is worth considering some

More information

Fundamentals of Computer Design

Fundamentals of Computer Design CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining

More information

Lecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2)

Lecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2) Lecture Topics Today: The MIPS ISA (P&H 2.1-2.14) Next: continued 1 Announcements Milestone #1 (due 1/26) Milestone #2 (due 2/2) Milestone #3 (due 2/9) 2 1 Evolution of Computing Machinery To understand

More information

Parallel Processing SIMD, Vector and GPU s

Parallel Processing SIMD, Vector and GPU s Parallel Processing SIMD, ector and GPU s EECS4201 Comp. Architecture Fall 2017 York University 1 Introduction ector and array processors Chaining GPU 2 Flynn s taxonomy SISD: Single instruction operating

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

EE382 Processor Design. Concurrent Processors

EE382 Processor Design. Concurrent Processors EE382 Processor Design Winter 1998-99 Chapter 7 and Green Book Lectures Concurrent Processors, including SIMD and Vector Processors Slide 1 Concurrent Processors Vector processors SIMD and small clustered

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing W. P. Petersen Seminar for Applied Mathematics Department of Mathematics, ETHZ, Zurich wpp@math. ethz.ch P. Arbenz Institute for Scientific Computing Department Informatik,

More information

3.3 Hardware Parallel processing

3.3 Hardware Parallel processing Parallel processing is the simultaneous use of more than one CPU to execute a program. Ideally, parallel processing makes a program run faster because there are more CPUs running it. In practice, it is

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

CISC Attributes. E.g. Pentium is considered a modern CISC processor

CISC Attributes. E.g. Pentium is considered a modern CISC processor What is CISC? CISC means Complex Instruction Set Computer chips that are easy to program and which make efficient use of memory. Since the earliest machines were programmed in assembly language and memory

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

Review of Last Lecture. CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions. Great Idea #4: Parallelism.

Review of Last Lecture. CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions. Great Idea #4: Parallelism. CS 61C: Great Ideas in Computer Architecture The Flynn Taxonomy, Intel SIMD Instructions Instructor: Justin Hsia 1 Review of Last Lecture Amdahl s Law limits benefits of parallelization Request Level Parallelism

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions

CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions CS 61C: Great Ideas in Computer Architecture The Flynn Taxonomy, Intel SIMD Instructions Instructor: Justin Hsia 3/08/2013 Spring 2013 Lecture #19 1 Review of Last Lecture Amdahl s Law limits benefits

More information

Structure of Computer Systems

Structure of Computer Systems Structure of Computer Systems Structure of Computer Systems Baruch Zoltan Francisc Technical University of Cluj-Napoca Computer Science Department U. T. PRES Cluj-Napoca, 2002 CONTENTS PREFACE... xiii

More information

Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures

Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures 1 Suresh Sharma, 2 T S B Sudarshan 1 Student, Computer Science & Engineering, IIT, Khragpur 2 Assistant

More information

Principles of Computer Architecture. Chapter 10: Trends in Computer. Principles of Computer Architecture by M. Murdocca and V.

Principles of Computer Architecture. Chapter 10: Trends in Computer. Principles of Computer Architecture by M. Murdocca and V. 10-1 Principles of Computer Architecture Miles Murdocca and Vincent Heuring Chapter 10: Trends in Computer Architecture 10-2 Chapter Contents 10.1 Quantitative Analyses of Program Execution 10.2 From CISC

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

Computer Architecture: SIMD and GPUs (Part I) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: SIMD and GPUs (Part I) Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: SIMD and GPUs (Part I) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-447 Spring 2013, Computer Architecture, Lecture 15: Dataflow

More information

Introduction to Parallel Processing

Introduction to Parallel Processing Babylon University College of Information Technology Software Department Introduction to Parallel Processing By Single processor supercomputers have achieved great speeds and have been pushing hardware

More information

Computer Architecture

Computer Architecture Computer Architecture Computer Architecture Hardware INFO 2603 Platform Technologies Week 1: 04-Sept-2018 Computer architecture refers to the overall design of the physical parts of a computer. It examines:

More information

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically

More information

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC Philosophy CISC Limitations 1 CISC Creates the Anti CISC Revolution Digital Equipment Company (DEC) introduces VAX (1977) Commercially successful 32-bit CISC minicomputer From CISC to RISC In 1970s and 1980s CISC minicomputers became

More information

CS Parallel Algorithms in Scientific Computing

CS Parallel Algorithms in Scientific Computing CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Implementation of the block cipher Rijndael using Altera FPGA

Implementation of the block cipher Rijndael using Altera FPGA Regular paper Implementation of the block cipher Rijndael using Altera FPGA Piotr Mroczkowski Abstract A short description of the block cipher Rijndael is presented. Hardware implementation by means of

More information

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE The most popular taxonomy of computer architecture was defined by Flynn in 1966. Flynn s classification scheme is based on the notion of a stream of information.

More information

Review of previous examinations TMA4280 Introduction to Supercomputing

Review of previous examinations TMA4280 Introduction to Supercomputing Review of previous examinations TMA4280 Introduction to Supercomputing NTNU, IMF April 24. 2017 1 Examination The examination is usually comprised of: one problem related to linear algebra operations with

More information

10th August Part One: Introduction to Parallel Computing

10th August Part One: Introduction to Parallel Computing Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

RAID 0 (non-redundant) RAID Types 4/25/2011

RAID 0 (non-redundant) RAID Types 4/25/2011 Exam 3 Review COMP375 Topics I/O controllers chapter 7 Disk performance section 6.3-6.4 RAID section 6.2 Pipelining section 12.4 Superscalar chapter 14 RISC chapter 13 Parallel Processors chapter 18 Security

More information

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2 CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

Advanced Encryption Standard and Modes of Operation. Foundations of Cryptography - AES pp. 1 / 50

Advanced Encryption Standard and Modes of Operation. Foundations of Cryptography - AES pp. 1 / 50 Advanced Encryption Standard and Modes of Operation Foundations of Cryptography - AES pp. 1 / 50 AES Advanced Encryption Standard (AES) is a symmetric cryptographic algorithm AES has been originally requested

More information

Block Ciphers. Lucifer, DES, RC5, AES. CS 470 Introduction to Applied Cryptography. Ali Aydın Selçuk. CS470, A.A.Selçuk Block Ciphers 1

Block Ciphers. Lucifer, DES, RC5, AES. CS 470 Introduction to Applied Cryptography. Ali Aydın Selçuk. CS470, A.A.Selçuk Block Ciphers 1 Block Ciphers Lucifer, DES, RC5, AES CS 470 Introduction to Applied Cryptography Ali Aydın Selçuk CS470, A.A.Selçuk Block Ciphers 1 ... Block Ciphers & S-P Networks Block Ciphers: Substitution ciphers

More information

Cryptography and Network Security

Cryptography and Network Security Cryptography and Network Security Spring 2012 http://users.abo.fi/ipetre/crypto/ Lecture 6: Advanced Encryption Standard (AES) Ion Petre Department of IT, Åbo Akademi University 1 Origin of AES 1999: NIST

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

Performance, Power, Die Yield. CS301 Prof Szajda

Performance, Power, Die Yield. CS301 Prof Szajda Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the

More information

How to Write Fast Code , spring th Lecture, Mar. 31 st

How to Write Fast Code , spring th Lecture, Mar. 31 st How to Write Fast Code 18-645, spring 2008 20 th Lecture, Mar. 31 st Instructor: Markus Püschel TAs: Srinivas Chellappa (Vas) and Frédéric de Mesmay (Fred) Introduction Parallelism: definition Carrying

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

Chapter 04. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 04. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 04 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 4.1 Potential speedup via parallelism from MIMD, SIMD, and both MIMD and SIMD over time for

More information

Figure 1-1. A multilevel machine.

Figure 1-1. A multilevel machine. 1 INTRODUCTION 1 Level n Level 3 Level 2 Level 1 Virtual machine Mn, with machine language Ln Virtual machine M3, with machine language L3 Virtual machine M2, with machine language L2 Virtual machine M1,

More information