A 1-GHz Configurable Processor Core MeP-h1
|
|
- Kerry Garrett
- 6 years ago
- Views:
Transcription
1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation
2 Outline Background Pipeline Structure Bus Interface Implementation Hot Chips 17 2
3 Outline Background Pipeline Structure Bus Interface Implementation Hot Chips 17 3
4 Configurable Processor Customize a processor during RTL design Configurations Cache config., Local RAM config., bus width, etc. Extensions User customized instructions (DSP instruction extension) Self-running hardware blocks (Hardware engines) Configurable Processor Core Extensions A simple controller Hot Chips 17 4
5 MeP Processor Roadmap This work MeP-h series MeP-c series 5-stage single scalar c1 HDTV MPEG-2 Decoder MPEG-2 CODEC 2001 c2 DVD Player MPEG-2 Decoder c3 Hot Chips 17 5 High performance & High frequency Cellular 3D Graphics Automotive Image Recognition LSI Reconfigurable Processor PVR MPEG-2 CODEC h1 High freq. main CPUs (ex. ARM11, MIPS24K) High freq. extensions Cellular Application Processor
6 MeP-h1 Design Trade-off High performance & high frequency Target : 1GHz@65nm Configurable Processor Fully synthesizable ASIC standard design flow 1-port synchronous SRAM A simple controller for extensions Pipeline Design Issues Branch latency Local memory latency Extension i/f latency Hot Chips 17 6
7 Outline Background Pipeline Structure Bus Interface Implementation Hot Chips 17 7
8 Pipeline Structure PS PD 9 stages for load inst. Branch predecode Processor core AS F1 F2 Fetch & Decode IS DC I(Integer)-Pipe M(Memory)-Pipe A(Auxiliary)-Pipe V(Variable)-Pipe E1 E2 RB LA LD M1 M2 SB A1 A2 A3 An AC VD VE V1 Vn VC WB (ROB) Hot Chips 17 8 DSP inst. extension HW engine extension
9 Instruction Fetch Unit Branch predecode & prefetch No branch prediction mechanism Prefetch pr-pc PC +16 Adder br-pc AS Instruction Buffer Normal Branch Repeat 128bit 128bit Inst. Inst. 128bit Inst. 128bit 128bit Inst. Inst. 128bit 128bit Inst. Inst. I$ / I-RAM 128bit Inst. F1 IS Normal Issue Adder +2,4,8 Issue Selector Predecode Selector +2,4,8 Adder Branch Predecode PS F2 DC PC PC Inst. Decoder Inst. Inst. Pre- Decoder Adder Hot Chips 17 9 PC PD Branch Address from E1-stage
10 lw $0,($1) add $0,1 mov $2,$3 DC LA PD beq $0,$2,L1 Branch Prefetch 2-cycle stall Hot Chips LD M1 M2 SB WB DC s s E1 E2 RB WB PD s s DC E1 E2 RB WB PD PD IS DC E1 E2 RB WB L1: add $4,3 AS F1 F2 DC E1 E2 1-cycle branch penalty Branch penalty Branch taken : 1 4 cycles (cf. 4 cycles w/o branch prefetch) Branch not-taken: 0 cycles Repeat (loop) inst. iteration penalty : 0 cycles
11 Branch Prefetch Evaluation 250 w. Branch Prefetch 1.18 Cycle Counts [M cycles] w/o Branch Prefetch Relative performance 1.17 Relative performance JPEG encode JPEG decode Average branch penalty: 1.8 cycles More than 80% of branch penalty is less than or equal 2 cycles Hot Chips 17 11
12 Load Data Forwarding MeP-c3 5-stage pipeline MUX E M W A L U Forwarding D$ D-RAM L.A. Ext.. RF Load penalty = 1 cycle LA LD M1 M2 MeP-h1 E1 MUX Clock adjustment A L U E2 D$ D-RAM Forwarding L.A. Ext.. Load penalty = 2 cycles Hot Chips 17 12
13 Extension I/F Latency MeP-c3 ex. MeP-h1 RAM Processor core RAM DSP inst. extension RAM Processor core RAM DSP inst. extension HW engine extension RAM RAM HW engine extension RAM RAM 1-cycle domain 1 cycle 2 cycles >3 cycles Tightly coupled extensions Loosely coupled extensions Hot Chips 17 13
14 Tightly / Loosely Coupled Pipelines DC GPRs ROB Bypass 1 Bypass 2 Processor core Tightly coupled I-Pipe M-Pipe E1 E2 RB LA ALU LD M1 SRAM M2 SB WB (ROB) Loosely coupled A-Pipe V-Pipe A1 A2 A3 An AC VD VE V1 Vn VC DSP inst. extension HW engine extension Hot Chips 17 14
15 Structure of ROB and GPR ROB (Reorder buffer) - 8 entries Completion - 4 write ports - 3 read ports I-Pipe Commit GPRs M-Pipe A-Pipe ROB 8 entries GPRs 32b x bits x 16 words V-Pipe - 1 write port - 2 read ports Hot Chips 17 15
16 DSP Extension Interface Processor Core V-Pipe from I, M, and A-pipes VD VE V1 Vn VC WB (ROB) dsp Rn,Rm,code Rn = func(rn, Rm, code) Code Valid Rn Rm Rn Ack Busy DSP inst. extension dsp0 $0,$1 DC VD VE V1 V2 V3 VC WB dsp0 $2,$3 DC VD VE V1 V2 V3 VC WB dsp0 $4,$5 DC VD VE V1 V2 V3 VC WB lw $7,($8) DC LA LD M1 M2 SB ROB WB add $8,4 DC E1 E2 RB ROB ROB ROB WB Completion Commit Hot Chips 17 16
17 Outline Background Pipeline Structure Bus Interface Implementation Hot Chips 17 17
18 SoC Architecture JTAG SDRAM Debug module Run control unit Memory Controller Bus bridge PC trace Based on OCP2.0 Host CPU bus MeP module MeP-h1 processor core DMAC Global bus Bridge Control bus Extensions Extensions Local Bus Peripheral bus HW module Host CPU ex. ARM Peripherals Hot Chips 17 18
19 Bus Interface Based on OCP* 2.0 Protocol Split bus transaction Single request burst Posted write Pipelined request and response 2 outstanding requests *) OCP Open Core Master bus reset and slave bus reset Hot Chips 17 19
20 Outline Background Pipeline Structure Bus Interface Implementation Hot Chips 17 20
21 Configurations Configurations I$: 8KB, direct-mapped I-RAM: 4KB D$: 8KB, direct-mapped D-RAM: 4KB Bus width: 64 bits Debug function: Hardware breakpoints, Single step PC trace(8kb trace memory), etc. Extensions DSP instruction extension 4 hardware engines Hot Chips 17 21
22 First Implementation Global Bus Local Bus INTC Debug Trace Mem. 4KB Toshiba 65nm CMOS process Size: 1.58 mm x 2.96 mm I$ 4KB I$ 4KB LB DMAC Trace Mem. 4KB Gate counts : 250K gates (core only) I-RAM 2KB D-RAM 2KB D$ 4KB I-Tag I-RAM 2KB D-RAM 2KB D-Tag D$ 4KB Core BIU LB INTC LB DSP inst. extension LB Timer Hot Chips LB LB LB: Local Bus Power consumption* about 1GHz, w/o clock gating *) estimated by total of Tr. sizes and wire loads. Static Timing Analysis result Clock margin Critical path delay Total = 80 ps = 918 ps = 998 ps 1GHz
23 Lab. Result 1200 Voltage vs. Frequency (Room temp.) 1100!"#$%#&'()*+, Supply Voltage [V] Hot Chips 17 23
24 Conclusion Configurable processor aims to high performance & high operating frequency Pipeline design Deeper pipeline stages Branch predecode and prefetch Relaxed local memory (SRAM) timing Loosely coupled extensions using ROB First implementation Fabricated by 65nm CMOS and can operate at 1GHz Hot Chips 17 24
25 Thank you! Hot Chips 17 25
The Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationECE 2300 Digital Logic & Computer Organization. Caches
ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1 Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2 Course Content Binary numbers and logic gates
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationCPU Pipelining Issues
CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput
More informationVenezia: a Scalable Multicore Subsystem for Multimedia Applications
Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationAge nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications
Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications N.C. Paver PhD Architect Intel Corporation Hot Chips 16 August 2004 Age nda Overview of the Intel PXA27X processor
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationSH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform
SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform H. Mizuno, N. Irie, K. Uchiyama, Y. Yanagisawa 1, S. Yoshioka 1, I. Kawasaki 1, and T. Hattori 2 Hitachi Ltd.,
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationFujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02
Fujitsu System Applications Support 1 Overview System Applications Support SOC Application Development Lab Multimedia VoIP Wireless Bluetooth Processors, DSP and Peripherals ARM Reference Platform 2 SOC
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationThe Pentium II/III Processor Compiler on a Chip
The Pentium II/III Processor Compiler on a Chip Ronny Ronen Senior Principal Engineer Director of Architecture Research Intel Labs - Haifa Intel Corporation Tel Aviv University January 20, 2004 1 Agenda
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationSH-X3 Flexible SuperH Multi-core for High-performance and Low-power Embedded Systems
SH-X3 Flexible SuperH Multi-core for High-performance and Low-power Embedded Systems Shinichi Shibahara 1, Masashi Takada 2, Tatsuya Kamei 1, Kiyoshi Hayase 1, Yutaka Yoshida 1, Osamu Nishii 1, Toshihiro
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationAnnouncement. ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining. Edward Suh Computer Systems Laboratory
ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcement Lab1 is released Start early we only have limited computing
More informationCopyright 2016 Xilinx
Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order
More informationCommunications and Computer Engineering II: Lecturer : Tsuyoshi Isshiki
Communications and Computer Engineering II: Microprocessor 2: Processor Micro-Architecture Lecturer : Tsuyoshi Isshiki Dept. Communications and Computer Engineering, Tokyo Institute of Technology isshiki@ict.e.titech.ac.jp
More informationComplex Pipelines and Branch Prediction
Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle
More informationPlease state clearly any assumptions you make in solving the following problems.
Computer Architecture Homework 3 2012-2013 Please state clearly any assumptions you make in solving the following problems. 1 Processors Write a short report on at least five processors from at least three
More informationThe Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006
The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationHotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.
HotChips 2007 An innovative HD video and digital image processor for low-cost digital entertainment products Deepu Talla Texas Instruments 1 Salient features of the SoC HD video encode and decode using
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.
Exam 2 April 12, 2012 You have 80 minutes to complete the exam. Please write your answers clearly and legibly on this exam paper. GRADE: Name. Class ID. 1. (22 pts) Circle the selected answer for T/F and
More informationHi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan
Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU
More informationLimitations of Scalar Pipelines
Limitations of Scalar Pipelines Superscalar Organization Modern Processor Design: Fundamentals of Superscalar Processors Scalar upper bound on throughput IPC = 1 Inefficient unified pipeline
More informationECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories
ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-06 1 Processor and L1 Cache Interface
More informationThe ARM10 Family of Advanced Microprocessor Cores
The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10
More informationAssuming ideal conditions (perfect pipelining and no hazards), how much time would it take to execute the same program in: b) A 5-stage pipeline?
1. Imagine we have a non-pipelined processor running at 1MHz and want to run a program with 1000 instructions. a) How much time would it take to execute the program? 1 instruction per cycle. 1MHz clock
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationFinal Exam Fall 2007
ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd
More informationFujitsu SOC Fujitsu Microelectronics America, Inc.
Fujitsu SOC 1 Overview Fujitsu SOC The Fujitsu Advantage Fujitsu Solution Platform IPWare Library Example of SOC Engagement Model Methodology and Tools 2 SDRAM Raptor AHB IP Controller Flas h DM A Controller
More informationECE260: Fundamentals of Computer Engineering
Pipelining James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy What is Pipelining? Pipelining
More informationSA-1500: A 300 MHz RISC CPU with Attached Media Processor*
and Bridges Division SA-1500: A 300 MHz RISC CPU with Attached Media Processor* Prashant P. Gandhi, Ph.D. and Bridges Division Computing Enhancement Group Intel Corporation Santa Clara, CA 95052 Prashant.Gandhi@intel.com
More informationCS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.
CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in
More informationDesign Space Exploration for Memory Subsystems of VLIW Architectures
E University of Paderborn Dr.-Ing. Mario Porrmann Design Space Exploration for Memory Subsystems of VLIW Architectures Thorsten Jungeblut 1, Gregor Sievers, Mario Porrmann 1, Ulrich Rückert 2 1 System
More informationDYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING
DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,
More informationItanium 2 Processor Microarchitecture Overview
Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs
More informationEnhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension
Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information
More informationECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego
Advanced Digital Winter, 2009 ECE Department UC San Diego dey@ece.ucsd.edu http://esdat.ucsd.edu Winter 2009 Advanced Digital Objective: of a hardware-software embedded system using advanced design methodologies
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationTechniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company
Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities
More informationCaches. Hiding Memory Access Times
Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY
More informationIntroduction Architecture overview. Multi-cluster architecture Addressing modes. Single-cluster Pipeline. architecture Instruction folding
ST20 icore and architectures D Albis Tiziano 707766 Architectures for multimedia systems Politecnico di Milano A.A. 2006/2007 Outline ST20-iCore Introduction Introduction Architecture overview Multi-cluster
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationSuperscalar Processor
Superscalar Processor Design Superscalar Architecture Virendra Singh Indian Institute of Science Bangalore virendra@computer.orgorg Lecture 20 SE-273: Processor Design Superscalar Pipelines IF ID RD ALU
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More information/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:
16.482 / 16.561: Computer Architecture and Design Fall 2014 Midterm Exam October 16, 2014 Name: ID #: For this exam, you may use a calculator and two 8.5 x 11 double-sided page of notes. All other electronic
More informationMulti-core microcontroller design with Cortex-M processors and CoreSight SoC
Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Joseph Yiu, ARM Ian Johnson, ARM January 2013 Abstract: While the majority of Cortex -M processor-based microcontrollers are
More informationEmbedded Systems: Hardware Components (part I) Todor Stefanov
Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System
More informationNext Generation Technology from Intel Intel Pentium 4 Processor
Next Generation Technology from Intel Intel Pentium 4 Processor 1 The Intel Pentium 4 Processor Platform Intel s highest performance processor for desktop PCs Targeted at consumer enthusiasts and business
More informationSuperscalar Organization
Superscalar Organization Nima Honarmand Instruction-Level Parallelism (ILP) Recall: Parallelism is the number of independent tasks available ILP is a measure of inter-dependencies between insns. Average
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 6 Superpipelining + Branch Prediction 2014-2-6 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play:
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationHardware Design I Chap. 10 Design of microprocessor
Hardware Design I Chap. 0 Design of microprocessor E-mail: shimada@is.naist.jp Outline What is microprocessor? Microprocessor from sequential machine viewpoint Microprocessor and Neumann computer Memory
More informationAdvanced Instruction-Level Parallelism
Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu
More informationThese actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.
MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously
More information6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU
1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high
More informationThe Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.
The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions
More informationEnergy Consumption Evaluation of an Adaptive Extensible Processor
Energy Consumption Evaluation of an Adaptive Extensible Processor Hamid Noori, Farhad Mehdipour, Maziar Goudarzi, Seiichiro Yamaguchi, Koji Inoue, and Kazuaki Murakami December 2007 Outline Introduction
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationDevelopment of Low Power and High Performance Application Processor (T6G) for Multimedia Mobile Applications
Session 8D-2 Development of Low Power and High Performance Application Processor (T6G) for Multimedia Mobile Applications Yoshiyuki Kitasho, Yu Kikuchi, Takayoshi Shimazawa, Yasuo Ohara, Masafumi Takahashi,
More informationResource Efficiency of Scalable Processor Architectures for SDR-based Applications
Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Thorsten Jungeblut 1, Johannes Ax 2, Gregor Sievers 2, Boris Hübener 2, Mario Porrmann 2, Ulrich Rückert 1 1 Cognitive
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationStatic, multiple-issue (superscaler) pipelines
Static, multiple-issue (superscaler) pipelines Start more than one instruction in the same cycle Instruction Register file EX + MEM + WB PC Instruction Register file EX + MEM + WB 79 A static two-issue
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationCPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner
CS104 Computer Organization and rogramming Lecture 20: Superscalar processors, Multiprocessors Robert Wagner Faster and faster rocessors So much to do, so little time... How can we make computers that
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationNOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline
CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism
More informationPowerPC 740 and 750
368 floating-point registers. A reorder buffer with 16 elements is used as well to support speculative execution. The register file has 12 ports. Although instructions can be executed out-of-order, in-order
More informationROB: head/tail. exercise: result of processing rest? 2. rename map (for next rename) log. phys. free list: X11, X3. PC log. reg prev.
Exam Review 2 1 ROB: head/tail PC log. reg prev. phys. store? except? ready? A R3 X3 no none yes old tail B R1 X1 no none yes tail C R1 X6 no none yes D R4 X4 no none yes E --- --- yes none yes F --- ---
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationCSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;
CSE 30321 Lecture 13/14 In Class Handout For the sequence of instructions shown below, show how they would progress through the pipeline. For all of these problems: - Stalls are indicated by placing the
More informationHigh Performance SMIPS Processor. Jonathan Eastep May 8, 2005
High Performance SMIPS Processor Jonathan Eastep May 8, 2005 Objectives: Build a baseline implementation: Single-issue, in-order, 6-stage pipeline Full bypassing ICache: blocking, direct mapped, 16KByte,
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationCS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 16, 2014 Time: 1 hour + 15 minutes Name: Alias: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your
More informationCreating a Scalable Microprocessor:
Creating a Scalable Microprocessor: A 16-issue Multiple-Program-Counter Microprocessor With Point-to-Point Scalar Operand Network Michael Bedford Taylor J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B.
More informationComputer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士
Computer Architecture 计算机体系结构 Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Hazards (data/name/control) RAW, WAR, WAW hazards Different types
More informationPredict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch
branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)
More informationSH-MobileG1: A Single-Chip Application and Dual-mode Baseband Processor
SH-MobileG1: A Single-Chip Application and Dual-mode Baseband Processor Masayuki Ito 1, Takahiro Irita 1, Eiji Yamamoto 1, Kunihiko Nishiyama 1, Takao Koike 1, Yoshihiko Tsuchihashi 1, Hiroyuki Asano 1,
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy
More informationEffective System Design with ARM System IP
Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1 Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera
More informationHP PA-8000 RISC CPU. A High Performance Out-of-Order Processor
The A High Performance Out-of-Order Processor Hot Chips VIII IEEE Computer Society Stanford University August 19, 1996 Hewlett-Packard Company Engineering Systems Lab - Fort Collins, CO - Cupertino, CA
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2005-4-12 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationPipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations
Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution
More informationHPC VT Machine-dependent Optimization
HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler
More informationImplementing Flexible Interconnect Topologies for Machine Learning Acceleration
Implementing Flexible Interconnect for Machine Learning Acceleration A R M T E C H S Y M P O S I A O C T 2 0 1 8 WILLIAM TSENG Mem Controller 20 mm Mem Controller Machine Learning / AI SoC New Challenges
More informationCS 152, Spring 2011 Section 2
CS 152, Spring 2011 Section 2 Christopher Celio University of California, Berkeley About Me Christopher Celio celio @ eecs Office Hours: Tuesday 1-2pm, 751 Soda Agenda Q&A on HW1, Lab 1 Pipelining Questions
More information