A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

Size: px
Start display at page:

Download "A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications"

Transcription

1 A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System Lab. Dept. of EECS Korea Advanced Institute of Science and Technology

2 Outline Introduction System Architecture Architecture Overview Dual Operations Vertex Shader, Rendering Engine Low Power Techniques Fixed-Point SIMD Datapath Instruction-level Power Management Programmable Frequency Synthesizer Implementation Results Summary 2

3 Motivation Portable 3D graphics Wireless multimedia PDA and cell-phone Requirements Long battery time (3 hours play) High performance (>3Mvertices/s) Low cost (<5$) Problems Too large chip size Very complicated chip Graphics Evolutions Simple Shading ISSCC 2001 Texture Mapping ISSCC 2003 Programmable shading architecture Flexibility and low power consumption 3D-CG IP ISSCC 2004 Programmable Shading ISSCC

4 Programmable Graphics Pipeline Traditional Graphics Pipeline Programmable Graphics Pipeline Fixed Functions Transform Vertex Shader User-defined Vertex Processing Lighting Vertex Program Simple Graphics Clipping Clipping Flexible 3D Graphics Funtions Rasterization Rasterization Texture Mapping Texture Mapping 4

5 Standouts of Portable 3D Graphics ISSCC 2003 ISSCC 2004 This Work Process 0.16 µ m DRAM 0.18 µ m CMOS 0.18 µ m CMOS Frequency (MHz) 132 / / 50 Programmable Shading x x Vertex Program 1.1 Type of HW Integer RISC Floating-point Engine Fixed-point Vertex Shader Graphics Speed (Mvertices/s) Pixel Fill Rate (Mpixels/s) Power Consumption (mw) Performance Index (Kvertices/s per mw)

6 System Architecture ARM-10 Co-processor Architecture ARM-10 90b I$ 16KB D$ 16KB Co-processor Interface 32b 32b PFS CTRL PLL Fixed-point SIMD Datapath 90b Vertex Shader 2KB Code Mem. 128b Vertex Buffer 32KB Display Buf. System Bus (800MB/s) 32b Memory CTRL 32b Async. SRAM I/F 128b Rendering Engine Triangle Setup Pixel Processor 26KB Graphics Cache 32b 32b PERI 56b External I/O 6

7 Data Transfer Flow Separation of data and command paths Increase the parallelism of hardware blocks. Reduce the bus arbitration cycles and bandwidth. System Bus External Mem. Graphics Model Data Data Transfer Path Vertex Data Transfer Pixel Data Transfer ARM-10 Co-Proc I/F Vertex Shader Vertex Buf. Rendering Engine VS Commands Transfer RE Commands Transfer Command Transfer Path 7

8 Dual Operations Two operating states Tightly coupled co-processor (TCC) Normal co-processor General SIMD instructions Handshaking with ARM-10 ARM-10 ARM Inst.1 ARM Inst. 3 Vertex Shader SIMD Inst.2 SIMD Inst. 4 Parallel processor (PP) Independent processor Graphics instructions Parallel operations with ARM-10 ARM-10 ARM Program 1 ARM Program 2 Vertex Shader Vertex Program 8

9 Programmable Vertex Shader VP CTRL 2KB Code Mem. Graphics INSTR 32KB Display Buf. Input Vertex RF SWZ General SIMD RF ARM-10 Co- Proc I/F SIMD INSTR FETCH DEC & CTRL opa opb opc Fixed-point SIMD Datapath Fixed-point Special Function Unit CTRL Reg. Output Vertex RF 0 Output Vertex RF 1 Output Vertex RF 2 Control Datapath Vertex Buf. RE 9

10 Rendering Engine Set-up Operations Vertex Buf. 128b RE Instructions Triangle Setup Engine (TSE) Pixel Processor Interpol. Depth Compare write_mask Graphics Cache Depth / Frame: Direct-map Texture: 2-way 8KB Depth Cache REclk from PFS Texture Engine TM_req Aligner even odd 3KB Texture Cache 0 3KB Texture Cache 1 MEM I/F System Bus Clock Gating Blending 12KB Frame Cache 10

11 Fixed-point Graphics Processing : Low Power Technique (1) Fixed-point number representation Bit index Value b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 b i {0,1} (ex) Q4.4 format sign m-bit integer part Energy efficiency 40% less energy than floating-point on average. Fraction point n-bit fraction part Power Consumption 17 % Less Power Fixed-point Arithmetic Unit Floating-point Arithmetic Unit 30% Faster Execution Time 11

12 Fixed-point SIMD Datapath (1) 8-stage pipeline with 4-way SIMD F I D E1 SIMD INSTR (Co-Proc I/F) 1st DEC 2nd DEC 4-way 32b Integer ALU Display Buf. ADDR Generation SRAM Display Buf. Access Graphics INSTR (Code Mem.) 4-way 32x16 Integer MUL SIMD Reg. Index Generation SIMD Reg. Access Forwarding CLZ Parallel Accesses of Operands Forwarding Only in General SIMD Reg. Single Cycle Throughput for Fixed-point MAC E2 E3 Pipeline Reg. Pipeline Reg. 4-way 32x16 Integer MUL CPA Array for Low 32b Result Integer DIV SQRT E4 Pipeline Reg. CPA Array for High 32b Result SHIFT Array CPA W Reg. Writeback 12

13 Fixed-point SIMD Datapath (2) Fast 4-cycle matrix transformation 50Mvertices/s graphics performance Broadcasted Vector Elements M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 X Y Z W M0 M1 M2 M3 X X X X M4 M5 M6 M7 Y Y Y Y M8 M9 M10 M11 Z Z Z Z M12 M13 M14 M15 W W W W Convert 32b Fixed to 64b Integer MUL X MAC Y MAC Z MAC W Bypass of Low 32b Bypass of High 32b Accum. SHIFT MUL 32x16 MUL 32x16 MUL C S A C P A C S A C P A SHIFT Fixed-point Result Convert 64b Integer to 32b Fixed 13

14 Fixed-point Special Function Unit Calculate RCP (1/x) and RSQ (1/sqrt(x)) Fixed-point input Fixed-point output No additional data type conversion circuits For Qm.n Fixed-point Format Fixed-point Input 1.0 in Qm.n Count Leading Zero SHIFT Normalized Input Pre-scale Value Quotient Remainder Radix-4 Integer DIV SQRT Carry Propagate Adder Fixed-point Output Scale up fixed-point dividend to 64b space OP RCP RSQ Pre-scale Value # of Leading Zero n/2 + # of Leading Zero 14

15 Efficient Floating-point Extension Software floating-point emulation Add two special instructions: CAS, CLS If-else A single cycle SIMD arithmetic operation. Compare CAS Compare CLS Negative No Negative No Yes Yes SUB ADD Left Shift Right Shift Controlled ADD / SUB Controlled Logical Shift 80MFLOPS peak performance at 200MHz X 10 improvement over conventional integer SIMD unit 15

16 Instruction-level Power Management : Low Power Technique (2) SIMD RF and datapath activation in instruction-by-instruction. Clock Off PFS Vertex Shader Clock Source Enable SIMD Reg. Files ARM-10 Co-Proc I/F Valid Reg. D Latch E Q OP Fixed-point SIMD Datapath Shader is Called Prevent Signal Transitions 16

17 Programmable Frequency Synthesizer : Low Power Technique (3) Frequency scaling with clock gating Power Mode, Target Freq. VS Measure Workload of Current Frame PFS 1x 0.25x 0.5x RE 1x RE Cache Software Library ARM-10 Coarse-grained Clock Gating ARM-10 CLK. (MHz) Fast Normal Slow Step Continuous Frequency Scaling in Three Power Modes 17

18 PFS Block Diagram and Measurement REF CLK (1MHz) N CK PFD CP LPF VCO UP/ DOWN N 1x RISCclk OP_MODE [FAST/ NORMAL/ SLOW] FREQ_ CTRL P S 4 4 RESET PROGRAM COUNTER SWALLOW COUNTER PRE SCALAR CKout = (16P+S)xCK Programmable counters tune target freq. FREQ DIV 1x 1/4x GEclk REclk Enable 1/2x REMEMclk Software Measured Waveform (RISCclk) 127MHz 112MHz 93MHz 70MHz Frequency Change (Normal Mode) 18

19 System Power Consumption Power consumption (mw) Sustaining Graphics Performance (Polygon/s) 0.07M 1.56M 2.77M M 10M 50x Improvement 26% Reduction A: No Vertex Shader B: Integer SIMD Processor C: Floating-point Graphics Processor D: This work (w/ light and texture) E: This work (w/o light and texture) Vertex Shader RE with Graphics Mem. RISC with I/D $ Power Manager Others (BUS, IO) 0 A B C Conventional D E This Work 19

20 Performance Comparison KVertices / Sec mw Performance index of portable 3D graphics Processing speed / power consumption Analogous to MIPS/mW 5.0 ISSCC 2003 (2.3) V, 75MHz ISSCC 2004 (18.4) 1.25V, 400MHz ISSCC 2004 (18.5) 1.8V, 200MHz This Work 20

21 Die Photograph 0.18μm CMOS 1P6M 36 mm pin PBGA Power supply 1.8V: Core 3.3V: IO Transistors 2M Logic 96KB SRAM Power consumption Less than 155 mw 21

22 Summary Low power 3D graphics processor for mobile applications Integration of full 3D graphics pipeline with programmable vertex shader ARM-10 coprocessor architecture with dual operations Low power techniques Energy-efficient fixed-point SIMD datapath Instruction-level power management Programmable frequency synthesizer 50 Mvertices/s peak graphics performance 155 mw power consumption 22

Vertex Shader Design I

Vertex Shader Design I The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only

More information

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don RAMP-IV: A Low-Power and High-Performance 2D/3D Graphics Accelerator for Mobile Multimedia Applications Woo, Sungdae Choi, Ju-Ho Sohn, Seong-Jun Song, Young-Don Bae,, and Hoi-Jun Yoo oratory Dept. of EECS,

More information

Design and Optimization of Geometry Acceleration for Portable 3D Graphics

Design and Optimization of Geometry Acceleration for Portable 3D Graphics M.S. Thesis Design and Optimization of Geometry Acceleration for Portable 3D Graphics Ju-ho Sohn 2002.12.20 oratory Department of Electrical Engineering and Computer Science Korea Advanced Institute of

More information

Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun Yoo

Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun Yoo A Low-Power Handheld GPU using Logarithmic Arithmetic and Triple DVFS Power Domains Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun Yoo Outline Backgrounds Proposed Handheld GPU Low-Power

More information

A 120mW Embedded 3D Graphics Rendering Engine with 6Mb Logically Local Frame-Buffer and 3.2GByte/s Run-time Reconfigurable Bus for PDA-Chip

A 120mW Embedded 3D Graphics Rendering Engine with 6Mb Logically Local Frame-Buffer and 3.2GByte/s Run-time Reconfigurable Bus for PDA-Chip A 120mW Embedded 3D Graphics Rendering Engine with 6Mb Logically Local Frame-Buffer and 3.2GByte/s Run-time Reconfigurable Bus for PDA-Chip Ramchan Woo*, Chi-Weon Yoon, Jeonghoon Kook, Se-Joong Lee, Kangmin

More information

A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices

A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, Jongcheol Jeong 1, Euljoo Jeong 1, Suk Joong Lee 1 and Hoi-Jun

More information

ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2

ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2 ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2 9.2 A 80/20MHz 160mW Multimedia Processor integrated with Embedded DRAM MPEG-4 Accelerator and 3D Rendering Engine for Mobile Applications

More information

A fixed-point 3D graphics library with energy-efficient efficient cache architecture for mobile multimedia system

A fixed-point 3D graphics library with energy-efficient efficient cache architecture for mobile multimedia system MS Thesis A fixed-point 3D graphics library with energy-efficient efficient cache architecture for mobile multimedia system Min-wuk Lee 2004.12.14 Semiconductor System Laboratory Department Electrical

More information

A Programmable Vertex Shader with Fixed-Point SIMD Datapath for Low Power Wireless Applications

A Programmable Vertex Shader with Fixed-Point SIMD Datapath for Low Power Wireless Applications Graphics Hardware (004) T. Akenine-Möller, M. McCool (Editors) A Programmable Shader with ixed-point SIMD Datapath for Low Power Wireless Applications Ju-Ho Sohn Ramchan Woo Hoi-Jun Yoo Semiconductor System

More information

Cost-Effective Low-Power Graphics Processing Unit for Handheld Devices

Cost-Effective Low-Power Graphics Processing Unit for Handheld Devices INTEGRATED CIRCUITS FOR COMMUNICATIONS Cost-Effective Low-Power Graphics Processing Unit for Handheld Devices Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seungjin Lee, and Hoi-Jun Yoo, Korea Advanced Institute

More information

SA-1500: A 300 MHz RISC CPU with Attached Media Processor*

SA-1500: A 300 MHz RISC CPU with Attached Media Processor* and Bridges Division SA-1500: A 300 MHz RISC CPU with Attached Media Processor* Prashant P. Gandhi, Ph.D. and Bridges Division Computing Enhancement Group Intel Corporation Santa Clara, CA 95052 Prashant.Gandhi@intel.com

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

Design and Implementation of High Performance Application Specific Memory

Design and Implementation of High Performance Application Specific Memory Design and Implementation of High Performance Application Specific Memory - 고성능 Application Specific Memory 의설계와구현 - M.S. Thesis Sungdae Choi Dec. 20th, 2002 Outline Introduction Memory for Mobile 3D Graphics

More information

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor The A High Performance Out-of-Order Processor Hot Chips VIII IEEE Computer Society Stanford University August 19, 1996 Hewlett-Packard Company Engineering Systems Lab - Fort Collins, CO - Cupertino, CA

More information

A Fast Synchronous Pipelined DRAM Architecture with SRAM Buffers

A Fast Synchronous Pipelined DRAM Architecture with SRAM Buffers A Fast Synchronous Pipelined DRAM Architecture with SRAM Buffers Chi-Weon Yoon, Yon-Kyun Im, Seon-Ho Han, Hoi-Jun Yoo and Tae-Sung Jung* Dept. of Electrical Engineering, KAIST *Samsung Electronics Co.,

More information

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF.

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems Liang-Gee Chen Distinguished Professor General Director, SOC Center National Taiwan University DSP/IC Design Lab, GIEE, NTU 1

More information

Embedded Systems Ch 15 ARM Organization and Implementation

Embedded Systems Ch 15 ARM Organization and Implementation Embedded Systems Ch 15 ARM Organization and Implementation Byung Kook Kim Dept of EECS Korea Advanced Institute of Science and Technology Summary ARM architecture Very little change From the first 3-micron

More information

AS THE MOBILE electronics market matures, third-generation

AS THE MOBILE electronics market matures, third-generation IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 7, JULY 2004 1101 A Low-Power 3-D Rendering Engine With Two Texture Units and 29-Mb Embedded DRAM for 3G Multimedia Terminals Ramchan Woo, Student Member,

More information

Development of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems

Development of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems 1020 IEEE Transactions on Consumer Electronics, Vol. 51, No. 3, AUGUST 2005 Development of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems Byeong-Gyu Nam, Min-wuk

More information

Jim Keller. Digital Equipment Corp. Hudson MA

Jim Keller. Digital Equipment Corp. Hudson MA Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership

More information

Structure. Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung,

Structure. Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, Il-San Kim,

More information

Vertex Shader Design II

Vertex Shader Design II The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only

More information

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson Director, Graphics Research, ARM Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface

More information

SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform

SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform H. Mizuno, N. Irie, K. Uchiyama, Y. Yanagisawa 1, S. Yoshioka 1, I. Kawasaki 1, and T. Hattori 2 Hitachi Ltd.,

More information

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision. Seok-Hoon Kim MVLSI Lab., KAIST

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision. Seok-Hoon Kim MVLSI Lab., KAIST A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision Seok-Hoon Kim MVLSI Lab., KAIST Contents Background Motivation 3D Graphics + 3D Display Previous Works Conventional 3D Image

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,

More information

SH4 RISC Microprocessor for Multimedia

SH4 RISC Microprocessor for Multimedia SH4 RISC Microprocessor for Multimedia Fumio Arakawa, Osamu Nishii, Kunio Uchiyama, Norio Nakagawa Hitachi, Ltd. 1 Outline 1. SH4 Overview 2. New Floating-point Architecture 3. Length-4 Vector Instructions

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

The T0 Vector Microprocessor. Talk Outline

The T0 Vector Microprocessor. Talk Outline Slides from presentation at the Hot Chips VII conference, 15 August 1995.. The T0 Vector Microprocessor Krste Asanovic James Beck Bertrand Irissou Brian E. D. Kingsbury Nelson Morgan John Wawrzynek University

More information

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary Cornell University CS 569: Interactive Computer Graphics Introduction Lecture 1 [John C. Stone, UIUC] 2008 Steve Marschner 1 2008 Steve Marschner 2 NASA University of Calgary 2008 Steve Marschner 3 2008

More information

Design Objectives of the 0.35µm Alpha Microprocessor (A 500MHz Quad Issue RISC Microprocessor)

Design Objectives of the 0.35µm Alpha Microprocessor (A 500MHz Quad Issue RISC Microprocessor) Design Objectives of the 0.35µm Alpha 21164 Microprocessor (A 500MHz Quad Issue RISC Microprocessor) Gregg Bouchard Digital Semiconductor Digital Equipment Corporation Hudson, MA 1 Outline 0.35µm Alpha

More information

ECE 2300 Digital Logic & Computer Organization. Caches

ECE 2300 Digital Logic & Computer Organization. Caches ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1 Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2 Course Content Binary numbers and logic gates

More information

16.1. Unit 16. Computer Organization Design of a Simple Processor

16.1. Unit 16. Computer Organization Design of a Simple Processor 6. Unit 6 Computer Organization Design of a Simple Processor HW SW 6.2 You Can Do That Cloud & Distributed Computing (CyberPhysical, Databases, Data Mining,etc.) Applications (AI, Robotics, Graphics, Mobile)

More information

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett Spring 2010 Prof. Hyesoon Kim AMD presentations from Richard Huddy and Michael Doggett Radeon 2900 2600 2400 Stream Processors 320 120 40 SIMDs 4 3 2 Pipelines 16 8 4 Texture Units 16 8 4 Render Backens

More information

Windowing System on a 3D Pipeline. February 2005

Windowing System on a 3D Pipeline. February 2005 Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April

More information

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

ELE 375 Final Exam Fall, 2000 Prof. Martonosi ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space

More information

EECS 322 Computer Architecture Superpipline and the Cache

EECS 322 Computer Architecture Superpipline and the Cache EECS 322 Computer Architecture Superpipline and the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow Summary:

More information

0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV

0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV 0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV Rajeev Jayavant Cyrix Corporation A National Semiconductor Company 8/18/98 1 0;L$UFKLWHFWXUDO)HDWXUHV ¾ Next-generation Cayenne Core Dual-issue pipelined

More information

ECE 574 Cluster Computing Lecture 16

ECE 574 Cluster Computing Lecture 16 ECE 574 Cluster Computing Lecture 16 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 26 March 2019 Announcements HW#7 posted HW#6 and HW#5 returned Don t forget project topics

More information

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Low-power Architecture. By: Jonathan Herbst Scott Duntley Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in

More information

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design Lecture Objectives Background Need for Accelerator Accelerators and different type of parallelizm

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI Tutorial on GPU Programming #2 Joong-Youn Lee Supercomputing Center, KISTI Contents Graphics Pipeline Vertex Programming Fragment Programming Introduction to Cg Language Graphics Pipeline The process to

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor

More information

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications N.C. Paver PhD Architect Intel Corporation Hot Chips 16 August 2004 Age nda Overview of the Intel PXA27X processor

More information

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015 CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 3, 2015 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if

More information

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more

More information

ISSCC 2003 / SESSION 14 / MICROPROCESSORS / PAPER 14.5

ISSCC 2003 / SESSION 14 / MICROPROCESSORS / PAPER 14.5 ISSCC 2003 / SESSION 14 / MICROPROCESSORS / PAPER 14.5 14.5 A 600MHz Single-Chip Multiprocessor with 4.8GB/s Internal Shared Pipelined Bus and 512kB Internal Memory Satoshi Kaneko, Katsunori Sawai, Norio

More information

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016 CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 2, 2016 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if

More information

ECE 313 Computer Organization FINAL EXAM December 13, 2000

ECE 313 Computer Organization FINAL EXAM December 13, 2000 This exam is open book and open notes. You have until 11:00AM. Credit for problems requiring calculation will be given only if you show your work. 1. Floating Point Representation / MIPS Assembly Language

More information

Introduction to Microprocessor

Introduction to Microprocessor Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

The Bifrost GPU architecture and the ARM Mali-G71 GPU

The Bifrost GPU architecture and the ARM Mali-G71 GPU The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our

More information

EECS150 - Digital Design Lecture 16 - Memory

EECS150 - Digital Design Lecture 16 - Memory EECS150 - Digital Design Lecture 16 - Memory October 17, 2002 John Wawrzynek Fall 2002 EECS150 - Lec16-mem1 Page 1 Memory Basics Uses: data & program storage general purpose registers buffering table lookups

More information

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities

More information

EECS Dept., University of California at Berkeley. Berkeley Wireless Research Center Tel: (510)

EECS Dept., University of California at Berkeley. Berkeley Wireless Research Center Tel: (510) A V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications Hui Zhang, Vandana Prabhu, Varghese George, Marlene Wan, Martin Benes, Arthur Abnous, and Jan M. Rabaey EECS Dept., University

More information

Vector IRAM: A Microprocessor Architecture for Media Processing

Vector IRAM: A Microprocessor Architecture for Media Processing IRAM: A Microprocessor Architecture for Media Processing Christoforos E. Kozyrakis kozyraki@cs.berkeley.edu CS252 Graduate Computer Architecture February 10, 2000 Outline Motivation for IRAM technology

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights The Alpha 21264 Microprocessor: Out-of-Order ution at 600 MHz R. E. Kessler Compaq Computer Corporation Shrewsbury, MA 1 Some Highlights Continued Alpha performance leadership 600 MHz operation in 0.35u

More information

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

EMBEDDED VERTEX SHADER IN FPGA

EMBEDDED VERTEX SHADER IN FPGA EMBEDDED VERTEX SHADER IN FPGA Lars Middendorf, Felix Mühlbauer 1, Georg Umlauf 2, Christophe Bobda 1 1 Self-Organizing Embedded Systems Group, Department of Computer Science University of Kaiserslautern

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc.

Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc. Gemini: A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor Sanjiv Kapil Gemini Architect Sun Microsystems, Inc. Design Goals Designed for compute-dense, transaction oriented systems (webservers,

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

EMBEDDED VERTEX SHADER IN FPGA

EMBEDDED VERTEX SHADER IN FPGA EMBEDDED VERTEX SHADER IN FPGA Lars Middendorf, Felix Mühlbauer 1, Georg Umlauf 2, Christophe Bobda 1 1 Self-Organizing Embedded Systems Group, Department of Computer Science University of Kaiserslautern

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

SIMD Parallel Computers

SIMD Parallel Computers CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) SIMD Computers Copyright 2003 J. E. Smith University of Wisconsin-Madison SIMD Parallel Computers BSP: Classic SIMD number

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU

More information

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998 707 Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor James A. Farrell and Timothy C. Fischer Abstract The logic and circuits

More information

Standard Graphics Pipeline

Standard Graphics Pipeline Graphics Architecture Software implementations of rendering are slow. OpenGL on Sparc workstations. Performance can be improved using sophisticated algorithms and faster machines. Real-time large-scale

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10032011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Chapter 3 Number Systems Fixed Point

More information

Design of Embedded DSP Processors Unit 5: Data access. 9/11/2017 Unit 5 of TSEA H1 1

Design of Embedded DSP Processors Unit 5: Data access. 9/11/2017 Unit 5 of TSEA H1 1 Design of Embedded DSP Processors Unit 5: Data access 9/11/2017 Unit 5 of TSEA26-2017 H1 1 Data memory in a Processor Store Data FIFO supporting DSP executions Computing buffer Parameter storage Access

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

Monday Morning. Graphics Hardware

Monday Morning. Graphics Hardware Monday Morning Department of Computer Engineering Graphics Hardware Ulf Assarsson Skärmen består av massa pixlar 3D-Rendering Objects are often made of triangles x,y,z- coordinate for each vertex Y X Z

More information

Introduction to asynchronous circuit design. Motivation

Introduction to asynchronous circuit design. Motivation Introduction to asynchronous circuit design Using slides from: Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky, Intel Corporation, USA Alex Kondratyev, Theseus Logic,

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

ECE 154A Introduction to. Fall 2012

ECE 154A Introduction to. Fall 2012 ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 10 Floating point review Pipelined design IEEE Floating Point Format single: 8 bits double: 11 bits single: 23 bits double:

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

Integer Arithmetic. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Integer Arithmetic. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Integer Arithmetic Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Graphics Hardware. Instructor Stephen J. Guy

Graphics Hardware. Instructor Stephen J. Guy Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!

More information

IMAGINE: Signal and Image Processing Using Streams

IMAGINE: Signal and Image Processing Using Streams IMAGINE: Signal and Image Processing Using Streams Brucek Khailany William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jinyung Namkoong, John D. Owens, Brian Towles Concurrent VLSI Architecture

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Introduction to Pipelined Datapath

Introduction to Pipelined Datapath 14:332:331 Computer Architecture and Assembly Language Week 12 Introduction to Pipelined Datapath [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 W12.1 Review:

More information

Venezia: a Scalable Multicore Subsystem for Multimedia Applications

Venezia: a Scalable Multicore Subsystem for Multimedia Applications Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and

More information

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,

More information

AMULET2e. AMULET group. What is it?

AMULET2e. AMULET group. What is it? 2e Jim Garside Department of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL, U.K. http://www.cs.man.ac.uk/amulet/ jgarside@cs.man.ac.uk What is it? 2e is an implementation

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Number Representation 09212011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Logic Circuits for Register Transfer

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information