Optimizing Data Sharing and Address Translation for the Cell BE Heterogeneous CMP

Size: px
Start display at page:

Download "Optimizing Data Sharing and Address Translation for the Cell BE Heterogeneous CMP"

Transcription

1 Optimizing Data Sharing and Address Translation for the Cell BE Heterogeneous CMP Michael Gschwind IBM T.J. Watson Research Center

2 Cell Design Goals Provide the platform for the future of computing 10 performance of desktop systems Computing density as main challenge Dramatically increase performance per X X = Area, Power, Volume, Cost, Single core designs offer diminishing returns on investment In power, area, design complexity and verification cost 2 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

3 Cell B.E. Architecture Heterogeneous multicore system architecture SPE SPU SXU LS SPU SXU LS SPU SXU LS SPU SXU LS SPU SXU LS SPU SXU LS SPU SXU LS SPU SXU LS Power Processor Element for control tasks Synergistic Processor Elements for dataintensive processing 16B/cycle MFC PPE MFC MFC 16B/cycle MFC MFC EIB (up to 96B/cycle) MFC 16B/cycle MFC MFC 16B/cycle (2x) Synergistic Processor Element (SPE) consists of Synergistic Processor Unit (SPU) Synergistic Memory Flow Control (MFC) L2 PPU L1 32B/cycle 16B/cycle PXU MIC Dual XDR TM BIC FlexIO TM 3 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

4 Heterogeneous Multiprocessors HMPs are emerging as a new class of processors Central processors + architected accelerators Beyond SoC Tighter system integration between processors and accelerators Multiprogramming and execute user-defined code Use of accelerators by user-level applications Desired attributes Efficient data sharing Efficient CPU/accelerator communication Process isolation 4 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

5 Integrated Execution View heterogeneous architecture as the platform for execution Executes on different processor elements Share data across processor elements Single (common) process container under operating system Heterogeneous threads (LWPs) for scheduling execution on different processor cores Share common effective (virtual) memory space 5 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

6 Heterogeneous Multi-threading and OS Management for Integrated Executables Heterogeneous Multi- Threading Model PPE Threads SPE Threads SPE Context Fully Managed OS assignment of SPE threads Programmer directed using affinity mask Integrated Executable PPE object files PPE threads Physical PPE Application Source & Libraries Cell Broadband Engine Linux SPE threads SPE object files SPE SPE SPE PPE T1 T2 SPE SPE SPE SPE Physical SPEs SPE 6 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

7 Synergistic Optimization of the Cell B.E. Integrate SPE into established SMP architecture Define a scalable system architecture From consumer electronics to supercomputers target applications programming model Cell B.E. Design architecture definition 7 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

8 Data access for accelerators No memory access Input and output via device registers Use real (physical) addresses to access system memory Similar to traditional device models Use effective (virtual) addresses to access system memory 8 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

9 Programming models with accelerator real addressing Applications run without address translation in RA space Similar to deeply embedded applications No paging possible EA-space multiprogramming with real address accelerators super-trusted applications Provide safety fencing mechanism for real address accesses Accelerators managed by operating system OS provides accelerated services Parameters checked on OS entry High entry cost reduces acceleration benefits Trusted/secure code in accelerator Programming/Performance Security 9 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

10 Multiprogramming with real address accelerators Applications need effective (virtual) to real translations How to obtain address translations? Where to store real addresses? How to revoke address translations? Once a real address has been give to user program, page is pinned and cannot be moved or paged Limits OS memory allocation & paging flexibility Data non-contiguous at page boundaries 10 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

11 Virtual-address accelerators Accelerators have access to page translation Architect MMU for accelerator System memory used by accelerator can be paged out Virtual addressing provides pointer sharing Same pointer representation for same data Linear addressing in effective memory space across page boundaries Virtual addressing provides process isolation Security in multiprogramming/multiuser environment 11 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

12 Accelerator MMU options Minimized hardware complexity Small accelerator-specific virtual/physical lookup table If address not in lookup table notify central processor Essentially, a software-managed accelerator TLB Full system page tables Accelerator has full-scale MMU which uses system page table located in memory Hardware page table walks and participate in translation coherence actions 12 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

13 TLB software-management becomes serial bottleneck 9 normalized execution time in seconds Cost avoided by system-wide page tables in Cell B.E. compute address management SPEs 13 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

14 Enabling efficient translation management Offer system protection and data security with memory translation across system Central CPU and accelerators Enable distributed address translation Avoid serial bottleneck in central CPU Address translation parallelized across accelerator MMUs 14 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

15 Architecture optimization for data sharing DMA data transfer into and out of local store Managing data coherence between PPE data caches and DMA transfers Software managed Explicitly evict data by software in program Hardware managed DMA requests perform coherence interrogates (See scenarios in the paper) 15 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

16 Coherence management in software Software coherence cost scales with area synchronized Need to flush all cache lines from cache Insert cache flush instructions into code Power ISA dbcf instruction sequence Performance modeling Execute code with explicit flush instructions on Cell BE Not needed for functional correctness, but reflects costs Compare execution time with unmodified code 16 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

17 Coherence cost for software managed coherence Cost avoided by cache-coherent MFC block transfers compute flush time SPEs 17 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

18 Usage scenario impact on performance Area to be synchronized hard to determine Programmers will use worst case assumptions FFT16M is a best-case scenario Access data ranges accurately determined High data reuse 18 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

19 Green Computing: TOP500 & green500.org Cell BE offers superior power efficiency 488 MFLOPS/W (DP) Power efficient design enables Petaflop computing Cell BE powers world s first and only Petaflop system world s top 3 power-efficient high-performance systems 19 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

20 Cell BE: a Synergistic System Architecture Cell BE is not a collection of different processors, but a synergistic whole Accelerator integration important focus in design process Data sharing between central processor and accelerators Memory address translation performed in parallel in accelerators Avoid serial bottleneck in address translation Cell BE implements hardware DMA coherence Avoid programmer burden and programming errors Software was a driver in the design of the architecture Optimize for processing-intensive applications 20 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

21 Cell BE Cell BE is the result of a partnership between SCEI/Sony, Toshiba, and IBM Cell BE represents the work of more than 400 people starting in 2000 and a design investment of about $400M Thank you! 21 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

22 Copyright International Business Machines Corporation 2005,6,7,8. All Rights Reserved. Printed in the United States October The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both. IBM IBM Logo Power Architecture Other company, product and service names may be trademarks or service marks of others. All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary. While the information contained herein is believed to be accurate, such information is preliminary, and should not be relied upon for accuracy or completeness, and no representations or warranties of accuracy or completeness are made. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN "AS IS" BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document. 22 M. Gschwind, Optimizing Data Sharing and Address Translation for the Cell BE

Cell Broadband Engine Architecture. Version 1.0

Cell Broadband Engine Architecture. Version 1.0 Copyright and Disclaimer Copyright International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corporation 2005 All Rights Reserved Printed in the United States of America

More information

OpenMP on the IBM Cell BE

OpenMP on the IBM Cell BE OpenMP on the IBM Cell BE PRACE Barcelona Supercomputing Center (BSC) 21-23 October 2009 Marc Gonzalez Tallada Index OpenMP programming and code transformations Tiling and Software Cache transformations

More information

Cell Programming Tips & Techniques

Cell Programming Tips & Techniques Cell Programming Tips & Techniques Course Code: L3T2H1-58 Cell Ecosystem Solutions Enablement 1 Class Objectives Things you will learn Key programming techniques to exploit cell hardware organization and

More information

Cell Broadband Engine Architecture. Version 1.02

Cell Broadband Engine Architecture. Version 1.02 Copyright and Disclaimer Copyright International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corporation 2005, 2007 All Rights Reserved Printed in the United States

More information

Technology Trends Presentation For Power Symposium

Technology Trends Presentation For Power Symposium Technology Trends Presentation For Power Symposium 2006 8-23-06 Darryl Solie, Distinguished Engineer, Chief System Architect IBM Systems & Technology Group From Ingenuity to Impact Copyright IBM Corporation

More information

Developing Code for Cell - Mailboxes

Developing Code for Cell - Mailboxes Developing Code for Cell - Mailboxes Course Code: L3T2H1-55 Cell Ecosystem Solutions Enablement 1 Course Objectives Things you will learn Cell communication mechanisms mailboxes (this course) and DMA (another

More information

All About the Cell Processor

All About the Cell Processor All About the Cell H. Peter Hofstee, Ph. D. IBM Systems and Technology Group SCEI/Sony Toshiba IBM Design Center Austin, Texas Acknowledgements Cell is the result of a deep partnership between SCEI/Sony,

More information

Cell Broadband Engine. Spencer Dennis Nicholas Barlow

Cell Broadband Engine. Spencer Dennis Nicholas Barlow Cell Broadband Engine Spencer Dennis Nicholas Barlow The Cell Processor Objective: [to bring] supercomputer power to everyday life Bridge the gap between conventional CPU s and high performance GPU s History

More information

Towards Efficient Video Compression Using Scalable Vector Graphics on the Cell Broadband Engine

Towards Efficient Video Compression Using Scalable Vector Graphics on the Cell Broadband Engine Towards Efficient Video Compression Using Scalable Vector Graphics on the Cell Broadband Engine Andreea Sandu, Emil Slusanschi, Alin Murarasu, Andreea Serban, Alexandru Herisanu, Teodor Stoenescu University

More information

Cell Broadband Engine Overview

Cell Broadband Engine Overview Cell Broadband Engine Overview Course Code: L1T1H1-02 Cell Ecosystem Solutions Enablement 1 Class Objectives Things you will learn An overview of Cell history Cell microprocessor highlights Hardware architecture

More information

PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors

PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors Peter Sandon Senior PowerPC Processor Architect IBM Microelectronics All information in these materials is subject to

More information

OpenMP on the IBM Cell BE

OpenMP on the IBM Cell BE OpenMP on the IBM Cell BE 15th meeting of ScicomP Barcelona Supercomputing Center (BSC) May 18-22 2009 Marc Gonzalez Tallada Index OpenMP programming and code transformations Tiling and Software cache

More information

Hello World! Course Code: L2T2H1-10 Cell Ecosystem Solutions Enablement. Systems and Technology Group

Hello World! Course Code: L2T2H1-10 Cell Ecosystem Solutions Enablement. Systems and Technology Group Hello World! Course Code: L2T2H1-10 Cell Ecosystem Solutions Enablement 1 Course Objectives You will learn how to write, build and run Hello World! on the Cell System Simulator. There are three different

More information

Hands-on - DMA Transfer Using Control Block

Hands-on - DMA Transfer Using Control Block IBM Systems & Technology Group Cell/Quasar Ecosystem & Solutions Enablement Hands-on - DMA Transfer Using Control Block Cell Programming Workshop Cell/Quasar Ecosystem & Solutions Enablement 1 Class Objectives

More information

Amir Khorsandi Spring 2012

Amir Khorsandi Spring 2012 Introduction to Amir Khorsandi Spring 2012 History Motivation Architecture Software Environment Power of Parallel lprocessing Conclusion 5/7/2012 9:48 PM ٢ out of 37 5/7/2012 9:48 PM ٣ out of 37 IBM, SCEI/Sony,

More information

Cell Processor and Playstation 3

Cell Processor and Playstation 3 Cell Processor and Playstation 3 Guillem Borrell i Nogueras February 24, 2009 Cell systems Bad news More bad news Good news Q&A IBM Blades QS21 Cell BE based. 8 SPE 460 Gflops Float 20 GFLops Double QS22

More information

IBM Cell Processor. Gilbert Hendry Mark Kretschmann

IBM Cell Processor. Gilbert Hendry Mark Kretschmann IBM Cell Processor Gilbert Hendry Mark Kretschmann Architectural components Architectural security Programming Models Compiler Applications Performance Power and Cost Conclusion Outline Cell Architecture:

More information

( ZIH ) Center for Information Services and High Performance Computing. Event Tracing and Visualization for Cell Broadband Engine Systems

( ZIH ) Center for Information Services and High Performance Computing. Event Tracing and Visualization for Cell Broadband Engine Systems ( ZIH ) Center for Information Services and High Performance Computing Event Tracing and Visualization for Cell Broadband Engine Systems ( daniel.hackenberg@zih.tu-dresden.de ) Daniel Hackenberg Cell Broadband

More information

Roadrunner. By Diana Lleva Julissa Campos Justina Tandar

Roadrunner. By Diana Lleva Julissa Campos Justina Tandar Roadrunner By Diana Lleva Julissa Campos Justina Tandar Overview Roadrunner background On-Chip Interconnect Number of Cores Memory Hierarchy Pipeline Organization Multithreading Organization Roadrunner

More information

Exercise Euler Particle System Simulation

Exercise Euler Particle System Simulation Exercise Euler Particle System Simulation Course Code: L3T2H1-57 Cell Ecosystem Solutions Enablement 1 Course Objectives The student should get ideas of how to get in welldefined steps from scalar code

More information

Portable Parallel Programming for Multicore Computing

Portable Parallel Programming for Multicore Computing Portable Parallel Programming for Multicore Computing? Vivek Sarkar Rice University vsarkar@rice.edu FPU ISU ISU FPU IDU FXU FXU IDU IFU BXU U U IFU BXU L2 L2 L2 L3 D Acknowledgments Rice Habanero Multicore

More information

Evaluating the Portability of UPC to the Cell Broadband Engine

Evaluating the Portability of UPC to the Cell Broadband Engine Evaluating the Portability of UPC to the Cell Broadband Engine Dipl. Inform. Ruben Niederhagen JSC Cell Meeting CHAIR FOR OPERATING SYSTEMS Outline Introduction UPC Cell UPC on Cell Mapping Compiler and

More information

CellSs Making it easier to program the Cell Broadband Engine processor

CellSs Making it easier to program the Cell Broadband Engine processor Perez, Bellens, Badia, and Labarta CellSs Making it easier to program the Cell Broadband Engine processor Presented by: Mujahed Eleyat Outline Motivation Architecture of the cell processor Challenges of

More information

Crypto On the Playstation 3

Crypto On the Playstation 3 Crypto On the Playstation 3 Neil Costigan School of Computing, DCU. neil.costigan@computing.dcu.ie +353.1.700.6916 PhD student / 2 nd year of research. Supervisor : - Dr Michael Scott. IRCSET funded. Playstation

More information

Sony/Toshiba/IBM (STI) CELL Processor. Scientific Computing for Engineers: Spring 2008

Sony/Toshiba/IBM (STI) CELL Processor. Scientific Computing for Engineers: Spring 2008 Sony/Toshiba/IBM (STI) CELL Processor Scientific Computing for Engineers: Spring 2008 Nec Hercules Contra Plures Chip's performance is related to its cross section same area 2 performance (Pollack's Rule)

More information

Concurrent Programming with the Cell Processor. Dietmar Kühl Bloomberg L.P.

Concurrent Programming with the Cell Processor. Dietmar Kühl Bloomberg L.P. Concurrent Programming with the Cell Processor Dietmar Kühl Bloomberg L.P. dietmar.kuehl@gmail.com Copyright Notice 2009 Bloomberg L.P. Permission is granted to copy, distribute, and display this material,

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12

More information

How to Write Fast Code , spring th Lecture, Mar. 31 st

How to Write Fast Code , spring th Lecture, Mar. 31 st How to Write Fast Code 18-645, spring 2008 20 th Lecture, Mar. 31 st Instructor: Markus Püschel TAs: Srinivas Chellappa (Vas) and Frédéric de Mesmay (Fred) Introduction Parallelism: definition Carrying

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number

More information

IBM. Software Development Kit for Multicore Acceleration, Version 3.0. SPU Timer Library Programmer s Guide and API Reference

IBM. Software Development Kit for Multicore Acceleration, Version 3.0. SPU Timer Library Programmer s Guide and API Reference IBM Software Development Kit for Multicore Acceleration, Version 3.0 SPU Timer Library Programmer s Guide and API Reference Note: Before using this information and the product it supports, read the information

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Software Development Kit for Multicore Acceleration Version 3.0

Software Development Kit for Multicore Acceleration Version 3.0 Software Development Kit for Multicore Acceleration Version 3.0 Programming Tutorial SC33-8410-00 Software Development Kit for Multicore Acceleration Version 3.0 Programming Tutorial SC33-8410-00 Note

More information

COMP 322: Principles of Parallel Programming. Lecture 18: Understanding Parallel Computers (Chapter 2, contd) Fall 2009

COMP 322: Principles of Parallel Programming. Lecture 18: Understanding Parallel Computers (Chapter 2, contd) Fall 2009 COMP 322: Principles of Parallel Programming Lecture 18: Understanding Parallel Computers (Chapter 2, contd) Fall 2009 http://www.cs.rice.edu/~vsarkar/comp322 Vivek Sarkar Department of Computer Science

More information

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors Edgar Gabriel Fall 2018 References Intel Larrabee: [1] L. Seiler, D. Carmean, E.

More information

Optimizing DMA Data Transfers for Embedded Multi-Cores

Optimizing DMA Data Transfers for Embedded Multi-Cores Optimizing DMA Data Transfers for Embedded Multi-Cores Selma Saïdi Jury members: Oded Maler: Dir. de these Ahmed Bouajjani: President du Jury Luca Benini: Rapporteur Albert Cohen: Rapporteur Eric Flamand:

More information

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 19 Prof. Patrick Crowley Plan for Today Announcement No lecture next Wednesday (Thanksgiving holiday) Take Home Final Exam Available Dec 7 Due via email

More information

Comparing the IBM eserver xseries 440 with the xseries 445 Positioning Information

Comparing the IBM eserver xseries 440 with the xseries 445 Positioning Information Comparing the IBM eserver xseries 440 with the xseries 445 Positioning Information Main Feature x440 server x445 server IBM chipset First generation XA-32 Second generation XA-32 SMP scalability Support

More information

Department of Computer Science. Chair of Computer Architecture. Diploma Thesis. Execution of SPE code in an Opteron-Cell/B.E.

Department of Computer Science. Chair of Computer Architecture. Diploma Thesis. Execution of SPE code in an Opteron-Cell/B.E. Department of Computer Science Chair of Computer Architecture Diploma Thesis Execution of SPE code in an Opteron-Cell/B.E. hybrid system Andreas Heinig Chemnitz, March 11, 2008 Supervisor: Advisor: Prof.

More information

Overview. Column or Page Access (t CAC ) Access Times. Cycle Time. Random Access (t RAC ) Random Access Cycle Time (t RC )

Overview. Column or Page Access (t CAC ) Access Times. Cycle Time. Random Access (t RAC ) Random Access Cycle Time (t RC ) Overview DRAM manufacturers use a number of different performance specifications to describe the speed and competitiveness of their products. Although useful for comparing the offerings of the various

More information

GM8126 MAC DRIVER. User Guide Rev.: 1.0 Issue Date: December 2010

GM8126 MAC DRIVER. User Guide Rev.: 1.0 Issue Date: December 2010 GM8126 MAC DRIVER User Guide Rev.: 1.0 Issue Date: December 2010 REVISION HISTORY Date Rev. From To Dec. 2010 1.0 - Original Copyright 2010 Grain Media, Inc. All Rights Reserved. Printed in Taiwan 2010

More information

GM8126 I2C. User Guide Rev.: 1.0 Issue Date: December 2010

GM8126 I2C. User Guide Rev.: 1.0 Issue Date: December 2010 GM8126 I2C User Guide Rev.: 1.0 Issue Date: December 2010 REVISION HISTORY Date Rev. From To Dec. 2010 1.0 - Original Copyright 2010 Grain Media, Inc. All Rights Reserved. Printed in Taiwan 2010 Grain

More information

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors

More information

The network interface configuration property screens can be accessed by double clicking the network icon in the Windows Control Panel.

The network interface configuration property screens can be accessed by double clicking the network icon in the Windows Control Panel. Introduction The complete instructions for setting up the PowerPC 750FX Evaluation Kit are provided in the PowerPC 750FX Evaluation Board User's Manual which can be found on the 750FX Evaluation Kit CD.

More information

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP4300/6430 Parallel Systems

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP4300/6430 Parallel Systems THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2009 COMP4300/6430 Parallel Systems Study Period: 15 minutes Time Allowed: 3 hours Permitted Materials: Non-Programmable Calculator This

More information

Cell BE enabling density computing for data rich environments

Cell BE enabling density computing for data rich environments Cell BE enabling density computing for data rich environments Michael Gschwind Bruce D Amora Alexandre Eichenberger Cell Broadband Engine - enabling density computing for data-rich environments Cell History

More information

Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications

Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications Daniel Jiménez-González, Xavier Martorell, Alex Ramírez Computer Architecture Department Universitat Politècnica de

More information

Flavors of Memory supported by Linux, their use and benefit. Christoph Lameter, Ph.D,

Flavors of Memory supported by Linux, their use and benefit. Christoph Lameter, Ph.D, Flavors of Memory supported by Linux, their use and benefit Christoph Lameter, Ph.D, Twitter: @qant Flavors Of Memory The term computer memory is a simple term but there are numerous nuances

More information

Introduction to CELL B.E. and GPU Programming. Agenda

Introduction to CELL B.E. and GPU Programming. Agenda Introduction to CELL B.E. and GPU Programming Department of Electrical & Computer Engineering Rutgers University Agenda Background CELL B.E. Architecture Overview CELL B.E. Programming Environment GPU

More information

Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands MSc THESIS

Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands  MSc THESIS Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands http://ce.et.tudelft.nl/ 2010 MSc THESIS Implementation of Nexus: Dynamic Hardware Management Support for Multicore Platforms Efrén Fernández

More information

INF5063: Programming heterogeneous multi-core processors Introduction

INF5063: Programming heterogeneous multi-core processors Introduction INF5063: Programming heterogeneous multi-core processors Introduction Håkon Kvale Stensland August 19 th, 2012 INF5063 Overview Course topic and scope Background for the use and parallel processing using

More information

Cell Broadband Engine CMOS SOI 65 nm Hardware Initialization Guide

Cell Broadband Engine CMOS SOI 65 nm Hardware Initialization Guide Hardware Initialization Guide Title Page Copyright and Disclaimer Copyright International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corporation 2006, 2007 All Rights

More information

Introduction to PCI Express Positioning Information

Introduction to PCI Express Positioning Information Introduction to PCI Express Positioning Information Main PCI Express is the latest development in PCI to support adapters and devices. The technology is aimed at multiple market segments, meaning that

More information

Cell GC: Using the Cell Synergistic Processor as a Garbage Collection Coprocessor

Cell GC: Using the Cell Synergistic Processor as a Garbage Collection Coprocessor Cell GC: Using the Cell Synergistic Processor as a Garbage Collection Coprocessor Chen-Yong Cher Michael Gschwind IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 chenyong@us.ibm.com mkg@us.ibm.com

More information

Revisiting Parallelism

Revisiting Parallelism Revisiting Parallelism Sudhakar Yalamanchili, Georgia Institute of Technology Where Are We Headed? MIPS 1000000 Multi-Threaded, Multi-Core 100000 Multi Threaded 10000 Era of Speculative, OOO 1000 Thread

More information

Lenovo RAID Introduction Reference Information

Lenovo RAID Introduction Reference Information Lenovo RAID Introduction Reference Information Using a Redundant Array of Independent Disks (RAID) to store data remains one of the most common and cost-efficient methods to increase server's storage performance,

More information

A Buffered-Mode MPI Implementation for the Cell BE Processor

A Buffered-Mode MPI Implementation for the Cell BE Processor A Buffered-Mode MPI Implementation for the Cell BE Processor Arun Kumar 1, Ganapathy Senthilkumar 1, Murali Krishna 1, Naresh Jayam 1, Pallav K Baruah 1, Raghunath Sharma 1, Ashok Srinivasan 2, Shakti

More information

Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008

Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008 Run Anywhere The Hardware Platform Perspective Ben Pollan, AMD Java Labs October 28, 2008 Agenda Java Labs Introduction Community Collaboration Performance Optimization Recommendations Leveraging the Latest

More information

Programming for Performance on the Cell BE processor & Experiences at SSSU. Sri Sathya Sai University

Programming for Performance on the Cell BE processor & Experiences at SSSU. Sri Sathya Sai University Programming for Performance on the Cell BE processor & Experiences at SSSU Sri Sathya Sai University THE STI CELL PROCESSOR The Inevitable Shift to the era of Multi-Core Computing The 9-core Cell Microprocessor

More information

Cell Broadband Engine Processor: Motivation, Architecture,Programming

Cell Broadband Engine Processor: Motivation, Architecture,Programming Cell Broadband Engine Processor: Motivation, Architecture,Programming H. Peter Hofstee, Ph. D. Cell Chief Scientist and Chief Architect, Cell Synergistic Processor IBM Systems and Technology Group SCEI/Sony

More information

Crusoe Processor Model TM5800

Crusoe Processor Model TM5800 Model TM5800 Crusoe TM Processor Model TM5800 Features VLIW processor and x86 Code Morphing TM software provide x86-compatible mobile platform solution Processors fabricated in latest 0.13µ process technology

More information

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies

Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies John C. Linford John Michalakes Manish Vachharajani Adrian Sandu IMAGe TOY 2009 Workshop 2 Virginia

More information

Overview of the CELL Broadband Engine Memory Architecture. Implications for Real Time Operating Systems

Overview of the CELL Broadband Engine Memory Architecture. Implications for Real Time Operating Systems Carleton University Department of Systems and Computer Engineering SYSC5701 Operating System Methods for Real-Time Applications Class Project Overview of the CELL Broadband Engine Memory Architecture Implications

More information

QDP++/Chroma on IBM PowerXCell 8i Processor

QDP++/Chroma on IBM PowerXCell 8i Processor QDP++/Chroma on IBM PowerXCell 8i Processor Frank Winter (QCDSF Collaboration) frank.winter@desy.de University Regensburg NIC, DESY-Zeuthen STRONGnet 2010 Conference Hadron Physics in Lattice QCD Paphos,

More information

PS3 programming basics. Week 1. SIMD programming on PPE Materials are adapted from the textbook

PS3 programming basics. Week 1. SIMD programming on PPE Materials are adapted from the textbook PS3 programming basics Week 1. SIMD programming on PPE Materials are adapted from the textbook Overview of the Cell Architecture XIO: Rambus Extreme Data Rate (XDR) I/O (XIO) memory channels The PowerPC

More information

Distributed Operation Layer Integrated SW Design Flow for Mapping Streaming Applications to MPSoC

Distributed Operation Layer Integrated SW Design Flow for Mapping Streaming Applications to MPSoC Distributed Operation Layer Integrated SW Design Flow for Mapping Streaming Applications to MPSoC Iuliana Bacivarov, Wolfgang Haid, Kai Huang, and Lothar Thiele ETH Zürich MPSoCs are Hard to program (

More information

Evaluating Multicore Architectures for Application in High Assurance Systems

Evaluating Multicore Architectures for Application in High Assurance Systems Evaluating Multicore Architectures for Application in High Assurance Systems Ryan Bradetich, Paul Oman, Jim Alves-Foss, and Theora Rice Center for Secure and Dependable Systems University of Idaho Contact:

More information

Intel Hyper-Threading technology

Intel Hyper-Threading technology Intel Hyper-Threading technology technology brief Abstract... 2 Introduction... 2 Hyper-Threading... 2 Need for the technology... 2 What is Hyper-Threading?... 3 Inside the technology... 3 Compatibility...

More information

Cell today and tomorrow

Cell today and tomorrow Cell today and tomorrow H. Peter Hofstee, Ph. D. Cell Chief Scientist and Chief Architect, Cell Synergistic Processor IBM Systems and Technology Group SCEI/Sony Toshiba IBM (STI) Design Center Austin,

More information

Compilation for Heterogeneous Platforms

Compilation for Heterogeneous Platforms Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey

More information

Intel Cache Acceleration Software for Windows* Workstation

Intel Cache Acceleration Software for Windows* Workstation Intel Cache Acceleration Software for Windows* Workstation Release 3.1 Release Notes July 8, 2016 Revision 1.3 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine. David A. Bader, Virat Agarwal

FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine. David A. Bader, Virat Agarwal FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine David A. Bader, Virat Agarwal Cell System Features Heterogeneous multi-core system architecture Power Processor Element for control tasks

More information

Introduction to Computing and Systems Architecture

Introduction to Computing and Systems Architecture Introduction to Computing and Systems Architecture 1. Computability A task is computable if a sequence of instructions can be described which, when followed, will complete such a task. This says little

More information

CISC 879 Software Support for Multicore Architectures Spring Student Presentation 6: April 8. Presenter: Pujan Kafle, Deephan Mohan

CISC 879 Software Support for Multicore Architectures Spring Student Presentation 6: April 8. Presenter: Pujan Kafle, Deephan Mohan CISC 879 Software Support for Multicore Architectures Spring 2008 Student Presentation 6: April 8 Presenter: Pujan Kafle, Deephan Mohan Scribe: Kanik Sem The following two papers were presented: A Synchronous

More information

The SARC Architecture

The SARC Architecture The SARC Architecture Polo Regionale di Como of the Politecnico di Milano Advanced Computer Architecture Arlinda Imeri arlinda.imeri@mail.polimi.it 19-Jun-12 Advanced Computer Architecture - The SARC Architecture

More information

IBM Research Report. SPU Based Network Module for Software Radio System on Cell Multicore Platform

IBM Research Report. SPU Based Network Module for Software Radio System on Cell Multicore Platform RC24643 (C0809-009) September 19, 2008 Electrical Engineering IBM Research Report SPU Based Network Module for Software Radio System on Cell Multicore Platform Jianwen Chen China Research Laboratory Building

More information

Use the Status Register when the firmware needs to query the state of internal digital signals.

Use the Status Register when the firmware needs to query the state of internal digital signals. PSoC Creator Component Datasheet Status Register 1.80 Features Up to 8-bit Status Register Interrupt support General Description The Status Register allows the firmware to read digital signals. When to

More information

A Transport Kernel on the Cell Broadband Engine

A Transport Kernel on the Cell Broadband Engine A Transport Kernel on the Cell Broadband Engine Paul Henning Los Alamos National Laboratory LA-UR 06-7280 Cell Chip Overview Cell Broadband Engine * (Cell BE) Developed under Sony-Toshiba-IBM efforts Current

More information

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors

Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors Filip Blagojevic, Xizhou Feng, Kirk W. Cameron and Dimitrios S. Nikolopoulos Center for High-End Computing Systems Department of Computer

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Flex System FC5024D 4-port 16Gb FC Adapter Lenovo Press Product Guide

Flex System FC5024D 4-port 16Gb FC Adapter Lenovo Press Product Guide Flex System FC5024D 4-port 16Gb FC Adapter Lenovo Press Product Guide The network architecture on the Flex System platform is designed to address network challenges, giving you a scalable way to integrate,

More information

IDT Using the Tsi310 TM to Optimize I/O Adapter Card Designs

IDT Using the Tsi310 TM to Optimize I/O Adapter Card Designs IDT Using the Tsi310 TM to Optimize I/O Adapter Card Designs September 19, 2009 6024 Silver Creek Valley Road San Jose, California 95138 Telephone: (408) 284-8200 FAX: (408) 284-3572 Printed in U.S.A.

More information

Mercury Computer Systems & The Cell Broadband Engine

Mercury Computer Systems & The Cell Broadband Engine Mercury Computer Systems & The Cell Broadband Engine Georgia Tech Cell Workshop 18-19 June 2007 About Mercury Leading provider of innovative computing solutions for challenging applications R&D centers

More information

SPE Runtime Management Library

SPE Runtime Management Library SPE Runtime Management Library Version 2.0 CBEA JSRE Series Cell Broadband Engine Architecture Joint Software Reference Environment Series November 11, 2006 Table of Contents 2 Copyright International

More information

Intel Xeon Scalable Family Balanced Memory Configurations

Intel Xeon Scalable Family Balanced Memory Configurations Front cover Intel Xeon Scalable Family Balanced Memory Configurations Last Update: 20 November 2017 Demonstrates three balanced memory guidelines for Intel Xeon Scalable processors Compares the performance

More information

Flex System FC port 16Gb FC Adapter Lenovo Press Product Guide

Flex System FC port 16Gb FC Adapter Lenovo Press Product Guide Flex System FC5022 2-port 16Gb FC Adapter Lenovo Press Product Guide The network architecture on the Flex System platform has been specifically designed to address network challenges, giving you a very

More information

Multicore Challenge in Vector Pascal. P Cockshott, Y Gdura

Multicore Challenge in Vector Pascal. P Cockshott, Y Gdura Multicore Challenge in Vector Pascal P Cockshott, Y Gdura N-body Problem Part 1 (Performance on Intel Nehalem ) Introduction Data Structures (1D and 2D layouts) Performance of single thread code Performance

More information

ServeRAID M5000 Series Performance Accelerator Key for System x Product Guide

ServeRAID M5000 Series Performance Accelerator Key for System x Product Guide ServeRAID M5000 Series Performance Accelerator Key for System x Product Guide The ServeRAID M5000 Series Performance Accelerator Key for System x enables performance enhancements needed by emerging SSD

More information

An Introduction to GPFS

An Introduction to GPFS IBM High Performance Computing July 2006 An Introduction to GPFS gpfsintro072506.doc Page 2 Contents Overview 2 What is GPFS? 3 The file system 3 Application interfaces 4 Performance and scalability 4

More information

Parallel Processing. Ed Upchurch April 2011

Parallel Processing. Ed Upchurch April 2011 Parallel Processing Ed Upchurch April 2011 Traditionally, software has been written for serial computation: * Run on a single computer having a single Central Processing Unit (CPU); * Problem is broken

More information

Accelerating Adaptive Background Subtraction with GPU and CBEA Architecture

Accelerating Adaptive Background Subtraction with GPU and CBEA Architecture Accelerating Adaptive Background Subtraction with GPU and CBEA Architecture Matthew Poremba Pennsylvania State University University Park, PA 16802 mrp5060@psu.edu Yuan Xie Pennsylvania State University

More information

TM5800 System Development Kit MontaVista Linux Release Notes. October 29, 2002

TM5800 System Development Kit MontaVista Linux Release Notes. October 29, 2002 TM5800 System Development Kit MontaVista Linux Release Notes October 29, 2002-1 - October 29, 2002 Property of: Transmeta Corporation 3940 Freedom Circle Santa Clara, CA 95054 USA (408) 919-3000 http://www.transmeta.com

More information

Mastering The Behavior of Multi-Core Systems to Match Avionics Requirements

Mastering The Behavior of Multi-Core Systems to Match Avionics Requirements www.thalesgroup.com Mastering The Behavior of Multi-Core Systems to Match Avionics Requirements Hicham AGROU, Marc GATTI, Pascal SAINRAT, Patrice TOILLON {hicham.agrou,marc-j.gatti, patrice.toillon}@fr.thalesgroup.com

More information

Iuliana Bacivarov, Wolfgang Haid, Kai Huang, Lars Schor, and Lothar Thiele

Iuliana Bacivarov, Wolfgang Haid, Kai Huang, Lars Schor, and Lothar Thiele Iuliana Bacivarov, Wolfgang Haid, Kai Huang, Lars Schor, and Lothar Thiele ETH Zurich, Switzerland Efficient i Execution on MPSoC Efficiency regarding speed-up small memory footprint portability Distributed

More information

ARMv8-A Memory Systems. Systems. Version 0.1. Version 1.0. Copyright 2016 ARM Limited or its affiliates. All rights reserved.

ARMv8-A Memory Systems. Systems. Version 0.1. Version 1.0. Copyright 2016 ARM Limited or its affiliates. All rights reserved. Connect ARMv8-A User Memory Guide Systems Version 0.1 Version 1.0 Page 1 of 17 Revision Information The following revisions have been made to this User Guide. Date Issue Confidentiality Change 28 February

More information

KVM on Embedded Power Architecture Platforms

KVM on Embedded Power Architecture Platforms KVM on Embedded Power Architecture Platforms Stuart Yoder Software Architect, Freescale Semiconductor Agenda Background Freescale / Networking Embedded Systems Use Cases KVM on Embedded Power New requirements

More information

Hands-on - DMA Transfer Using get Buffer

Hands-on - DMA Transfer Using get Buffer IBM Systems & Technology Group Cell/Quasar Ecosystem & Solutions Enablement Hands-on - DMA Transfer Using get Buffer Cell Programming Workshop Cell/Quasar Ecosystem & Solutions Enablement 1 Class Objectives

More information

SUPPORT MATRIX. HYCU OMi Management Pack for Citrix

SUPPORT MATRIX. HYCU OMi Management Pack for Citrix HYCU OMi Management Pack for Citrix : 2.0 Product release date: October 2017 Document release data: April 2018 Legal notices Copyright notice 2014-2018 HYCU. All rights reserved. This document contains

More information