Introduction to Computing and Systems Architecture
|
|
- Esmond Pierce
- 5 years ago
- Views:
Transcription
1 Introduction to Computing and Systems Architecture 1. Computability A task is computable if a sequence of instructions can be described which, when followed, will complete such a task. This says little really. Therefore, we use Turing machines to allow us to more formally describe computability. Turing Machine The machine can attain a finite number of states. A number of transition rules exist that help govern the actions of the machine. A tape of infinite length exists which contains symbols (laid out in a linear manner). The tape can be read from (one symbol at a time) or written to (one symbol at a time) by the machine s read/write head. The state of the machine together with the symbol under scrutiny (position of the tape), and the transition rules present dictate what the machine will do next. For example, if we are in state S1, and we read symbol M1, then move to state S2 and move read/write head one to the left. By definition a Turing machine (as just described) is deterministic in nature. There also exists nondeterministic Turing machines. However, these are more for the theorist at this moment in time. They are very interesting though. A machine, which can be fed data and act upon such data by following rules to produce outcomes that are deterministic are all the ingredients we need to start programming. Computer Going from a Turing machine to a computer is a big engineering leap, but not a great theoretical one. One change, theoretically speaking, is a limitation on resources (finite). Consider a very simple computer made up of a CPU that has a list of instructions encoded within it (instruction set) and has access to memory which can hold data. The memory can be accessed by the CPU at any place (random access). The CPU reads memory locations in sequence, and decodes the data using its instruction set and does something (either write data back to memory or read data from memory). Now, depending on what the CPU is doing (its state) the CPU may decode the memory as instructions (as mentioned) or values. For example, add memory location 1 to memory location 2 and put the answer in memory location 3 has a few instructions and a few values. Now, this seems easy. What could possibly go wrong? Well... What if the CPU reads values and thinks its instructions? What if the CPU writes values into memory locations that already contain something (possibly instructions)? What if the person who wrote the program doesn t know what they are doing and writes values over the instructions that are responsible for the correct operation of the computer s basic systems (operating system)?
2 2. Engineering speed (and complications) Geography and magnitude: It is much quicker to reach out to the table and find out your friends number on your iphone, but takes a lot longer to access the global directory of phone numbers from the Internet... Getting data from memory is very slow compared to the CPU so let s get a chunk of memory and store it close by to speed things up (cache). Direct access Sometimes it is much quicker to simply google than to ask the teacher... The CPU is not the only piece of hardware around in a modern computer, however, traditionally it is the only thing that knows how to handle memory. Let s break this tradition and allow other things to access memory directly for reading and writing. Disk drives, graphic cards and many other devices need direct memory access (DMA). - (Multi-core use it as well) Parallel processing Two hands are better than one... Multi-processor and multi-core allow more than one processing element to simultaneously operate on the memory (either their own or shared). Sometimes the computer can automate the parallel process, but sometimes the programmer has to do it. Single Instruction Multiple Data (SIMD) is when an instruction can be applied to many pieces of data at the same time. If you have 50 cars and you want to apply the same velocity function to all of them this may be for you (note. check PS3 cell architecture). Multiple Instruction Multiple Data (MIMD) is when many instructions can be applied to many pieces of data at the same time. Yes, you may have SISD and MISD. 3. Caches In general a processor has a small amount of memory attached to it, with very fast access. This is known as a cache. The processor stores a copy of the most recent data it has accessed in the cache. This means that, if the processor needs to read that data again, it is accessed considerably faster than if it is accessed from main memory again. The time it takes a processor to read a piece of data from memory is known as latency. In effect, the processor has to wait until the data becomes available to it. Obviously the smaller the latency, the faster the processor can do its job. Reading from a cache has a much lower latency than reading from main memory, as the cache is a piece of physical memory attached to the processor specifically designed for fast access.
3 If a requested piece of data has been used recently then it is already in the cache, giving a much faster latency when accessing it. If the data is not in the cache then what is known as a cache miss occurs ie the data has not been found in the cache, so it has to be accessed from main menu, and will be copied into the cache in case it is needed again. Code which needs to run as optimally as possible is designed so that it reuses data in manageable chunks. A thoroughly optimised piece of code will work on one section of data, with no cache misses (ie no dependencies on data from other parts of memory), before moving on to another chunk of data which will, in turn, fill up the cache for reuse. A cache consists of a set of rapid access memory locations, known as cache lines each line contains: The address in main memory where the data has been copied from (or the tag). The data itself. Cache sizes vary from processor to processor, which means that highly optimised code for one processor may result in cache misses, or the opportunity for further cached data, on another processor. It is important to note that the cache is not physically part of main memory, and that it contains copies of the data held in main memory. This means that, if the data in main memory is changed, the cached value will not reflect that change i.e. the cache is not a set of pointers into main memory, it is a recent copy of the data in main memory. Cache Coherence This approach works nicely in a single processor system. However if there is more than one processor at work in the system (for example a CPU and a GPU), then an issue arises. We need to consider what happens if the data which has been copied from main memory into the cache of one processor is changed in main memory by another processor. Further to this, it is possible that two processors could have copied the same data from main memory into their respective caches again there is an issue if that data has been copied at different times and each processor is working with different values of the data. There is therefore a need to ensure that cached data is consistent with the data in main memory that it is copied from, and also that the caches of each processor can handle inconsistencies in matching data. This is known as cache coherence.
4 In order to resolve any inconsistencies between cached memory accesses, a coherency protocol is required. Various protocols exist to maintain coherency between distributed caches in a multiprocessor system. Sequential Consistency This is an early type of coherency protocol. The protocol requires that the results of the instructions carried out on multiple processors are identical to the results if the instructions had been carried out sequentially by a single processor. To achieve this, the protocol must ensure that: All main memory requests (both read and write) are carried out in the order specified by the control program Each cache has a single FIFO queue for memory requests (ie first in, first out). It can be seen that this ensures that each cache contains the most recent version of data as it is being processed another processor requesting that data can t read it until the first processor has written its changes back to main memory. Release Consistency This coherency protocol is based on the idea of a processor identifying data as being subject to change. It uses two synchronisation operations known as acquire and release - when the processor is ready to write to main memory it acquires that data, which prevents any other processor from accessing it. When it has finished writing, it releases the data, freeing up that data for other processors to acquire. It is similar to the idea of checking out a sourcecode file from a repository using SVN.
5 Weak Consistency This coherency protocol allows for more processes to occur concurrently, by relaxing the rules of Release Consistency slightly. Rather than locking out other processors during the whole period from the acquire to the release, this protocol ensures that the order of acquires and releases is seen to be consistent across all processors, but the order of reads and writes between those events can be seen to vary. The sum of the reads and writes between each synchronisation event must also be the same. 4. The Cell and Playstation3 The Cell Engine Broadband Architecture is a microprocessor developed by IBM, Sony and Toshiba. The Cell architecture is shown in the diagram below: Note: In the PS3 one of the SPEs is offline/locked out/lanced and one is dedicated for use by the operating system. This leaves six left for game developers. Overview The PowerPC Processing Element (PPE) may be considered capable of running alone, like any regular CPU. However, in this architecture the PPE may be aided in satisfying its computational requirements by the Synergistic Processing Elements (SPEs). The Element Interconnect Bus (EIB) accommodates communication between PPE, SPEs and external components. The PPE may control execution on the SPEs. This can take the form of scheduling processes to run and interacting with such processes (e.g., start, stop, interrupt). Memory Access Standard load/store commands are used throughout the system for memory access by PPE and SPEs (just like a regular CPU would use them). However, SPEs may only access their own local store (256 KB) whereas the PPE can access main memory and the memory of the PPEs. The local store of the SPEs is not cache (it is actually considered local memory), but it is high performance Static Random Access Memory (SRAM), higher performance than Dynamic Random Access Memory (DRAM) and is commonly found in cache architectures.
6 PPE main memory access is via cache (the PPE has L1 and L2) that utilises Direct Memory Access (DMA) for cache updates. All processing elements (PPE and SPEs) have DMA utilisation capabilities, making DMA the main focus of memory movement throughout the system. This allows a PPE to also access main memory (and the memory of other SPEs) via its local memory (using a virtual memory address to indicate the source of the get and a variable to indicate the size of memory to retrieve). All caches are cache coherent (access of stale memory locations in main memory are avoided). However, as the SPEs' local memory is not considered cache then this may prove problematic to some programmers. This really makes programmers consider the SPEs as distinct programming elements and should be treated as such. As an added complication (or bonus, depending on how you look at it) each SPE has a small amount of cache (512B). Cache coherence is avoided as requests from other processing units for a memory location that is actually in the SPE cache are satisfied from the SPE cache (not the local memory of the SPE). This gives a very nice, very fast, shared memory system that exhibits consistency. How you use this effectively is not clear, but it may have its uses XBox360 The Xbox360 also has a multiprocessor architecture, consisting of: 3 CPU cores with a shared 1Mb cache. 1 GPU with 10Mb embedded cache, and 48 parallel unified shaders. The CPU The central processing unit is a 64-bit triple-core processor. Each core contains multiple floating point processors, and SIMD vector processing units, so the focus of the hardware design is on floating point calculations. Each core is capable of simultaneous multithreading, although they utilise in-order execution, rather than the more recent out-of-order execution. The cores share a comparatively large cache (I Mb), which is accessible at half the CPU clock speed (i.e. significantly faster than main memory). The GPU The key to the GPU s power in the Xbox360 is the 10Mb cache, which is attached to the GPU on a daughter die. This cache is sufficiently large that graphical processes such as alpha-blending, z- buffering, and full-screen anti-aliasing can be implemented with virtually no performance cost to the GPU. Each of the 48 shader processors consists of a 5-wide vector unit, capable of executing two instructions per cycle i.e. each shader processor can execute 10 floating-point operations every cycle. The shader processors are arranged into three groups of 16 processors each. All processors in a SIMD group execute the same instruction, so in total up to three instruction threads can be simultaneously under execution. Main Memory The Xbox360 has 512 Mb of main memory. This memory is shared between the CPU and GPU, via a unified memory architecture. For development purposes, this means that there is no distinction
7 between graphics memory and main memory, so the overall memory footprint can be designed with the needs of each game in mind. The GPU also acts as the memory controller, although this is largely invisible to the developer. 6. Cross-platform development Most modern games are released on multiple platforms, unless they are developed by the hardware developer s in-house and first-party teams. There are two approaches which can be taken here: Develop on one platform, and port to the other platforms once the lead platform development is complete. Develop on multiple platforms at once, using shared technology as much as possible. Both approaches have their advantages and disadvantages. Developing on multiple platforms can lead to a better engineered codebase (as much of it has to compile and run in more than one development environment), but can also lead to taking the common ground of multiple platforms, so features which are designed to fully utilise one platform s abilities are prioritised below crossplatform functionality. In general, the skills and approaches required to run code on one multi-processor platform are the same as for another (setting up a code architecture for multi-threading will benefit both Xbox360 and PS3 development). However getting down to the specifics of what tasks are assigned to the processors must be platform-specific. As can be seen in this document, developing code to take full
8 advantage of the PS3 s SPU architecture with their minimal caches, is a different challenge to using the triple-core processor on the Xbox360 with its much larger cache. Identifying which parts of a game engine need to be targeted for platform-specific development is an important skill. The key decisions are based on how often a piece of code runs and how much data it requires to run.
Parallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12
More informationComputer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley
Computer Systems Architecture I CSE 560M Lecture 19 Prof. Patrick Crowley Plan for Today Announcement No lecture next Wednesday (Thanksgiving holiday) Take Home Final Exam Available Dec 7 Due via email
More informationXbox 360 Architecture. Lennard Streat Samuel Echefu
Xbox 360 Architecture Lennard Streat Samuel Echefu Overview Introduction Hardware Overview CPU Architecture GPU Architecture Comparison Against Competing Technologies Implications of Technology Introduction
More informationIBM Cell Processor. Gilbert Hendry Mark Kretschmann
IBM Cell Processor Gilbert Hendry Mark Kretschmann Architectural components Architectural security Programming Models Compiler Applications Performance Power and Cost Conclusion Outline Cell Architecture:
More informationCell Broadband Engine. Spencer Dennis Nicholas Barlow
Cell Broadband Engine Spencer Dennis Nicholas Barlow The Cell Processor Objective: [to bring] supercomputer power to everyday life Bridge the gap between conventional CPU s and high performance GPU s History
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors
More informationCONSOLE ARCHITECTURE
CONSOLE ARCHITECTURE Introduction Part 1 What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design What
More informationCell Processor and Playstation 3
Cell Processor and Playstation 3 Guillem Borrell i Nogueras February 24, 2009 Cell systems Bad news More bad news Good news Q&A IBM Blades QS21 Cell BE based. 8 SPE 460 Gflops Float 20 GFLops Double QS22
More informationThis Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationUnit 11: Putting it All Together: Anatomy of the XBox 360 Game Console
Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture
More informationXbox 360 high-level architecture
11/2/11 Xbox 360 s Xenon vs. Playstation 3 s Cell Both chips clocked at a 3.2 GHz Architectural Comparison: Xbox 360 vs. Playstation 3 Prof. Aaron Lanterman School of Electrical and Computer Engineering
More informationMultiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.
Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network
More informationHow to Write Fast Code , spring th Lecture, Mar. 31 st
How to Write Fast Code 18-645, spring 2008 20 th Lecture, Mar. 31 st Instructor: Markus Püschel TAs: Srinivas Chellappa (Vas) and Frédéric de Mesmay (Fred) Introduction Parallelism: definition Carrying
More informationChapter 17 - Parallel Processing
Chapter 17 - Parallel Processing Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 17 - Parallel Processing 1 / 71 Table of Contents I 1 Motivation 2 Parallel Processing Categories
More informationComputer Organization. Chapter 16
William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data
More informationRoadrunner. By Diana Lleva Julissa Campos Justina Tandar
Roadrunner By Diana Lleva Julissa Campos Justina Tandar Overview Roadrunner background On-Chip Interconnect Number of Cores Memory Hierarchy Pipeline Organization Multithreading Organization Roadrunner
More informationThe University of Texas at Austin
EE382N: Principles in Computer Architecture Parallelism and Locality Fall 2009 Lecture 24 Stream Processors Wrapup + Sony (/Toshiba/IBM) Cell Broadband Engine Mattan Erez The University of Texas at Austin
More information5 Computer Organization
5 Computer Organization 5.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three subsystems of a computer. Describe the
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim PowerPC-base Core @3.2GHz 1 VMX vector unit per core 512KB L2 cache 7 x SPE @3.2GHz 7 x 128b 128 SIMD GPRs 7 x 256KB SRAM for SPE 1 of 8 SPEs reserved for redundancy total
More informationAll About the Cell Processor
All About the Cell H. Peter Hofstee, Ph. D. IBM Systems and Technology Group SCEI/Sony Toshiba IBM Design Center Austin, Texas Acknowledgements Cell is the result of a deep partnership between SCEI/Sony,
More informationModule 5 Introduction to Parallel Processing Systems
Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this
More information1993 Paper 3 Question 6
993 Paper 3 Question 6 Describe the functionality you would expect to find in the file system directory service of a multi-user operating system. [0 marks] Describe two ways in which multiple names for
More informationCOSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors
COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors Edgar Gabriel Fall 2018 References Intel Larrabee: [1] L. Seiler, D. Carmean, E.
More informationHigh Performance Computing. University questions with solution
High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The
More information5 Computer Organization
5 Computer Organization 5.1 Foundations of Computer Science ã Cengage Learning Objectives After studying this chapter, the student should be able to: q List the three subsystems of a computer. q Describe
More informationCSCI 4717 Computer Architecture
CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel
More informationChapter 18 Parallel Processing
Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD
More informationInstruction Register. Instruction Decoder. Control Unit (Combinational Circuit) Control Signals (These signals go to register) The bus and the ALU
Hardwired and Microprogrammed Control For each instruction, the control unit causes the CPU to execute a sequence of steps correctly. In reality, there must be control signals to assert lines on various
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationSlide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng
Slide Set 5 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide
More informationA Brief View of the Cell Broadband Engine
A Brief View of the Cell Broadband Engine Cris Capdevila Adam Disney Yawei Hui Alexander Saites 02 Dec 2013 1 Introduction The cell microprocessor, also known as the Cell Broadband Engine (CBE), is a Power
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Seventh Edition By William Stallings Course Outline & Marks Distribution Hardware Before mid Memory After mid Linux
More informationMemory Architectures. Week 2, Lecture 1. Copyright 2009 by W. Feng. Based on material from Matthew Sottile.
Week 2, Lecture 1 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. Directory-Based Coherence Idea Maintain pointers instead of simple states with each cache block. Ingredients Data owners
More informationOrganisasi Sistem Komputer
LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple
More informationINF5063: Programming heterogeneous multi-core processors Introduction
INF5063: Programming heterogeneous multi-core processors Introduction Håkon Kvale Stensland August 19 th, 2012 INF5063 Overview Course topic and scope Background for the use and parallel processing using
More informationSpring 2010 Prof. Hyesoon Kim. Xbox 360 System Architecture, Anderews, Baker
Spring 2010 Prof. Hyesoon Kim Xbox 360 System Architecture, Anderews, Baker 3 CPU cores 4-way SIMD vector units 8-way 1MB L2 cache (3.2 GHz) 2 way SMT 48 unified shaders 3D graphics units 512-Mbyte DRAM
More informationRISC Processors and Parallel Processing. Section and 3.3.6
RISC Processors and Parallel Processing Section 3.3.5 and 3.3.6 The Control Unit When a program is being executed it is actually the CPU receiving and executing a sequence of machine code instructions.
More informationCS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches
CS 152 Computer Architecture and Engineering Lecture 11 - Virtual Memory and Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationCSC 553 Operating Systems
CSC 553 Operating Systems Lecture 1- Computer System Overview Operating System Exploits the hardware resources of one or more processors Provides a set of services to system users Manages secondary memory
More informationPOLITECNICO DI MILANO. Advanced Topics on Heterogeneous System Architectures! Multiprocessors
POLITECNICO DI MILANO Advanced Topics on Heterogeneous System Architectures! Multiprocessors Politecnico di Milano! SeminarRoom, Bld 20! 30 November, 2017! Antonio Miele! Marco Santambrogio! Politecnico
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationLast class: Today: Course administration OS definition, some history. Background on Computer Architecture
1 Last class: Course administration OS definition, some history Today: Background on Computer Architecture 2 Canonical System Hardware CPU: Processor to perform computations Memory: Programs and data I/O
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationAdministrivia. Minute Essay From 4/11
Administrivia All homeworks graded. If you missed one, I m willing to accept it for partial credit (provided of course that you haven t looked at a sample solution!) through next Wednesday. I will grade
More informationEE 4980 Modern Electronic Systems. Processor Advanced
EE 4980 Modern Electronic Systems Processor Advanced Architecture General Purpose Processor User Programmable Intended to run end user selected programs Application Independent PowerPoint, Chrome, Twitter,
More informationOptimizing DirectX Graphics. Richard Huddy European Developer Relations Manager
Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationPage 1. Cache Coherence
Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 24 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 2: More Multiprocessors Computation Taxonomy SISD SIMD MISD MIMD ILP Vectors, MM-ISAs Shared Memory
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationConcurrent Programming with the Cell Processor. Dietmar Kühl Bloomberg L.P.
Concurrent Programming with the Cell Processor Dietmar Kühl Bloomberg L.P. dietmar.kuehl@gmail.com Copyright Notice 2009 Bloomberg L.P. Permission is granted to copy, distribute, and display this material,
More informationCell Broadband Engine Architecture. Version 1.0
Copyright and Disclaimer Copyright International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corporation 2005 All Rights Reserved Printed in the United States of America
More informationModern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design
Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant
More informationROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define
More informationCHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN
CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Ninth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationCS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging
CS162 Operating Systems and Systems Programming Lecture 14 Caching (Finished), Demand Paging October 11 th, 2017 Neeraja J. Yadwadkar http://cs162.eecs.berkeley.edu Recall: Caching Concept Cache: a repository
More informationMemory. From Chapter 3 of High Performance Computing. c R. Leduc
Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor
More informationParallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization
Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor
More information3.3 Hardware Parallel processing
Parallel processing is the simultaneous use of more than one CPU to execute a program. Ideally, parallel processing makes a program run faster because there are more CPUs running it. In practice, it is
More informationUnit 9 : Fundamentals of Parallel Processing
Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing
More informationPDF created with pdffactory Pro trial version How Computer Memory Works by Jeff Tyson. Introduction to How Computer Memory Works
Main > Computer > Hardware How Computer Memory Works by Jeff Tyson Introduction to How Computer Memory Works When you think about it, it's amazing how many different types of electronic memory you encounter
More informationPROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18
PROCESS VIRTUAL MEMORY CS124 Operating Systems Winter 2015-2016, Lecture 18 2 Programs and Memory Programs perform many interactions with memory Accessing variables stored at specific memory locations
More informationCourse II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan
Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor
More informationComputer Organization
Objectives 5.1 Chapter 5 Computer Organization Source: Foundations of Computer Science Cengage Learning 5.2 After studying this chapter, students should be able to: List the three subsystems of a computer.
More informationChapter 18. Parallel Processing. Yonsei University
Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types
More informationUNIT I (Two Marks Questions & Answers)
UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-
More informationChapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST
Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will
More informationOPENCL GPU BEST PRACTICES BENJAMIN COQUELLE MAY 2015
OPENCL GPU BEST PRACTICES BENJAMIN COQUELLE MAY 2015 TOPICS Data transfer Parallelism Coalesced memory access Best work group size Occupancy branching All the performance numbers come from a W8100 running
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationModern Virtual Memory Systems. Modern Virtual Memory Systems
6.823, L12--1 Modern Virtual Systems Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L12--2 Modern Virtual Systems illusion of a large, private, uniform store Protection
More informationChapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative
Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory
More informationSAE5C Computer Organization and Architecture. Unit : I - V
SAE5C Computer Organization and Architecture Unit : I - V UNIT-I Evolution of Pentium and Power PC Evolution of Computer Components functions Interconnection Bus Basics of PCI Memory:Characteristics,Hierarchy
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationOperating Systems. Memory Management. Lecture 9 Michael O Boyle
Operating Systems Memory Management Lecture 9 Michael O Boyle 1 Memory Management Background Logical/Virtual Address Space vs Physical Address Space Swapping Contiguous Memory Allocation Segmentation Goals
More informationComputer Organization and Design, 5th Edition: The Hardware/Software Interface
Computer Organization and Design, 5th Edition: The Hardware/Software Interface 1 Computer Abstractions and Technology 1.1 Introduction 1.2 Eight Great Ideas in Computer Architecture 1.3 Below Your Program
More informationCellSs Making it easier to program the Cell Broadband Engine processor
Perez, Bellens, Badia, and Labarta CellSs Making it easier to program the Cell Broadband Engine processor Presented by: Mujahed Eleyat Outline Motivation Architecture of the cell processor Challenges of
More informationVirtual File System -Uniform interface for the OS to see different file systems.
Virtual File System -Uniform interface for the OS to see different file systems. Temporary File Systems -Disks built in volatile storage NFS -file system addressed over network File Allocation -Contiguous
More informationIntroduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization
Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency
More informationProcessor Architecture and Interconnect
Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing
More informationModule 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT
TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012
More informationPage 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "
Review Address Segmentation " CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 23, 2011! Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! 1111 0000" 1110 000" Seg #"
More informationMultiprocessor Support
CSC 256/456: Operating Systems Multiprocessor Support John Criswell University of Rochester 1 Outline Multiprocessor hardware Types of multi-processor workloads Operating system issues Where to run the
More informationCSC 2405: Computer Systems II
CSC 2405: Computer Systems II Dr. Mirela Damian http://www.csc.villanova.edu/~mdamian/csc2405/ Spring 2016 Course Goals: Look under the hood Help you learn what happens under the hood of computer systems
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More informationCellular Planets: Optimizing Planetary Simulations for the Cell Processor
Trinity University Digital Commons @ Trinity Computer Science Honors Theses Computer Science Department 4-18-2007 Cellular Planets: Optimizing Planetary Simulations for the Cell Processor Brent Peckham
More informationMapping C code on MPSoC for Nomadic Embedded Systems
-1 - ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 8 2008 Mapping C code on MPSoC for Nomadic Embedded Systems http://www.artist-embedded.org/ Lecturer: Diederik
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationChapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel
Chapter-6 SUBJECT:- Operating System TOPICS:- I/O Management Created by : - Sanjay Patel Disk Scheduling Algorithm 1) First-In-First-Out (FIFO) 2) Shortest Service Time First (SSTF) 3) SCAN 4) Circular-SCAN
More information1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola
1. Microprocessor Architectures 1.1 Intel 1.2 Motorola 1.1 Intel The Early Intel Microprocessors The first microprocessor to appear in the market was the Intel 4004, a 4-bit data bus device. This device
More informationCS 426 Parallel Computing. Parallel Computing Platforms
CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More information