AnySP: Anytime Anywhere Anyway Signal Processing

Size: px
Start display at page:

Download "AnySP: Anytime Anywhere Anyway Signal Processing"

Transcription

1 1 AnySP: Anytime Anywhere Anyway Signal Processing Mark Woh 1, Sangwon Seo 1, Scott Mahlke 1,Trevor Mudge 1, Chaitali Chakrabarti 2, Krisztian Flautner 3 University of Michigan ACAL 1 Arizona State University 2 ARM, Ltd. 3 1

2 The Old Modern Mobile Mobile Phone Phone 2 Video Recording Future phones are becoming more complex Richer applications require much more requirements Video Editing How do phones handle this now? Higher Data Rates 3D Rendering Advanced Image Processing Photos From

3 Inside Today s Smart Phones 3 Inside the OMAP330 Application Processor Inside the X-Gold 608 (Representation of QCOM) 3 3

4 Power/Performance Requirements for Multiple Systems P e r f o r m a n c e ( G o p s ) Different applications have different power/performance characteristics! SODA ( 65 nm ) Mobile HD Video 3 G Wireless G Wireless SODA ( 90 nm ) We need to design keeping each application in mind! (Not GPP but Domain Specific Processor) TI C 6 X Imagine VIRAM Pentium M IBM Cell Power ( Watts )

5 5 The Applications Is there anything we can learn from the applications themselves? 5 5

6 H.26 Basics 6 Standard Basis Patterns Basis Coefficients Reconstructed Macroblock Modules Entropy decoder Inverse Transform Motion Compensation Intra Prediction Deblocking Filter Inverse Transform Past Frames Algorithms Context-adaptive VLD x integer kernel Half: 6-tap filter Quarter: 6-tap / bilinear filter Eighth: -tap filter Directional spatial prediction In-loop adaptive filter Residual Data Current Frame Power Profiling * Motion Compensation Future Frames Form Prediction T.-A. Liu, T.-M. Lin, S. -Z. Wang, et al. A low-power dual-mode video decoder for mobile applications, IEEE Communications Magazine, volume, issue 8, pp , Aug % 29% 10% 3% Misc Kernels 23% Decoded Block Deblocking Filter Reconstructed Macroblock Decoded Block Current Uncoded Block Intra-Prediction Prediction Based on Neighbors 6 6

7 G Wireless Basics 7 Ant D/A Guard Insertion IFFT Channel Encoder (LDPC) S/P Transmitted Data Modulator Demodulator Ant A/D Symbol Timing / Guard Detect FFT Channel Estimator MIMO Decoder (STBC) Channel Decoder (LDPC) Recovered Data Three kernels make up the majority of the work FFT Extract Data from Signals STBC Combine Data into More Reliable Stream LDPC Error Correction on Data Stream 7 7

8 8 Mobile Signal Processing Algorithm Characteristics G H.26 Algorithm SIMD Scalar Overhead SIMD Width Amount Workload (%) Workload (%) Workload (%) (Elements) of TLP FFT Low STBC High LDPC Low Deblocking Filter Medium Intra- PredicMon SIMD 85 comes 5 at a cost! Medium Inverse Transform Register 80 File 5 Power 15 8 High MoMon CompensaMon High Data Movement/Alignment Cost SIMD architectures have to deal with this! Algorithms have different SIMD widths From very large to very small Though SIMD width varies all algorithms can exploit it Large percentage of work can be SIMDized Larger SIMD width tend to have less TLP 8 8

9 Only the instructions shown in red are MMX computations. All other instructions are simply supporting these computations. 9 Pentium III SIMD code for Discrete Cosine Transform (DCT) lea ebx, DWORD PTR [ebp+128] load/address overhead mov DWORD PTR [esp+28], ebx load/address overhead $B1$2: xor eax, eax address overhead move dx, ecx address overhead lea edi, DWORD PTR [ecx+16] load/address overhead mov DWORD PTR [esp+2], ecx load/address overhead $B1$3: movq mm1, MMWORD PTR [ebp] load overhead pxor mm0, mm0 initialization overhead pmaddwd mm1, MMWORD PTR [eax+esi] True Computation movq mm2, MMWORD PTR [ebp+8] load overhead pmaddwd mm2, MMWORD PTR [eax+esi+8] True Computation add eax, 16address overhead paddw mm1, mm0 True Computation paddw mm2, mm1 True Computation movq mm0, mm2 load related overhead psrlq mm2, 32 SIMD reduction overhead povd ecx, mm0 SIMD load overhead movd ebx, mm2 SIMD load overhead add ecx, ebx SIMD conversion Overhead mov WORD PTR [edx], cx store overhead add edx, 2 address overhead cmp edi, edx branch related overhead jg $B1$3 loop branch overhead $B1$: move cx, DWORD PTR [esp+2] load/address overhead add ebp, 16 address overhead add ecx, 16 address overhead move ax, DWORD PTR [esp+28] load/address overhead cmp eax, ebp branch related overhead jg $B1$2 loop branch overhead 9 Feb 15, 200

10

11

12

13 Traditional SIMD Power Breakdown 13 SCALAR PIPE 9% INTERCONNECT 2% SODA WCDMA MEM 3% CONTROL 38% REG 37% MEM REG ALU+MULT CONTROL INTERCONNECT SCALAR PIPE ALU+MULT 11% Register File Power consumes a lot of power in traditional 32-wide SIMD architecture 13

14 Register File Access 1 100% 90% 80% 70% 60% 50% 0% 30% 20% 10% 0% Register Access Bypass Read Bypass Write Lots of power wasted on unneeded register file access! FFT STBC LDPC Deblocking Filter Intra- PredicMon Inverse Transform Many of the register file access do not have to go back to the main register file 1 1

15 Instruction Pair Frequency 15 InstrucMon Pair Frequency 1 mul9ply- add 26.71% 2 add- add 13.7% 3 shuffle- add 8.5% shib right- add 6.90% 5 subtract- add 6.9% 6 add- shib right 5.76% 7 mul9ply- subtract.00% 8 shib right- subtract 3.75% 9 add- subtract 3.07% 10 Others 20.5% a ) Intra - prediction and Deblocking Filter Combined InstrucMon Pair Frequency 1 shuffle- move 32.07% 2 abs- subtract 8.5% 3 move- subtract 8.5% shuffle- subtract 3.5% 5 add- shuffle 3.5% 6 Others 3.77% b ) LDPC InstrucMon Pair Frequency 1 shuffle- shuffle 16.67% 2 add- mul9ply 16.67% 3 mul9ply- subtract 16.67% mul9ply- add 16.67% 5 subtract- mult 16.67% 6 shuffle- add 16.67% c ) FFT Like the Multiply-Accumulate (MAC) instruction there is opportunity to fuse other instructions A few instruction pairs (3-5) make up the majority of all instruction pairs! 15 15

16 Data Alignment Problem! Intra- PredicMon 16 A B C D A B C D E F G H I J x16 luma MB prediction modes for each x block X I J K L A B C D A B C D A B C D A B C D F G H I E F G H D E F G C D E F A B C D A B C D E F G A B C D E F G a b c d Traditional SIMD machines take too e f g h A B C D B C D E C D E F D E F G A E B F B F C G C G D D D i j k l long or A Bcost C D E F too G H I Jmuch to do A Bthis C D E F G H I J K L m n o p G H I J A B C D G H I J A B C D Good news A B C D small fixed number patterns per kernel A A A A B B B B C C C C D D D D I E D C J F I E K G J A B C D E F G H I J K L D D D F L H K G I J K L E F G H D I J K C D E F H.26 Intra-prediction has 9 different prediction modes Each prediction mode requires a specific permutation 16 16

17 Summary 17 Conclusion about G and H.26 Lots of different sized parallelism From wide to 96 wide to 102 wide SIMD Which means many different SIMD widths need to be supported Very short lived values Lots of potential for instruction fusings Limited set of shuffle patterns required for each kernel 17

18 18 AnySP Design 18 18

19 Traditional SIMD Architectures 19 AGU 32- lane SIMD ALU SIMD Data MEM SIMD RF E X 32- lane SSN W B SIMD S T V SIMD to scalar V T S Scalar Data MEM scalar RF E X 16-bit ALU W B Scalar 32-Wide SIMD with Simple Shuffle Network 19 19

20 AnySP Architecture High Level 20 L1 Program Memory Controller DMA Multi-Bank Local Memory Bank 0 Bank 1 Bank C R O S S B A R 16-bit 8-wide 16 entry SIMD RF-0 16-bit 8-wide 16 entry SIMD RF-1 Multi-SIMD Datapath 16 Banked Memory with 8-wide SIMD FFU-0 8-wide SIMD FFU-1 Swizzle Network 16-bit 8- wide -entry Buffer-0 16-bit 8- wide -entry Buffer-1 Bank 8 Groups 2 of 8-Wide 16-bit 8-wide Flexible Bank Function 16 Units entry 8-wide SIMD 3 SIMD RF-2 FFU-2 Multiple Output Adder Tree 128x128 16bit Swizzle Network SRAM-based Crossbar 16-bit 8- wide -entry Buffer-2 Multi- Output Adder Tree 8 Groups of 8-wide SIMD (6 Total Lanes) To Inter-PE Bus Bank bit 8-wide 8-wide SIMD 16 entry Temporary Buffer and SIMD RF-7 FFU-7 Bypass Network 16-bit 8- wide -entry Buffer-7 Scalar Memory Buffer Scalar Pipeline AGU Group 0 Pipeline Datapath AGU and Scalar Pipeline AGU Group 1 Pipeline AGU Group 7 Pipeline AnySP PE 20 20

21 Multi-Width Support 21 Multi-Bank Local Memory Multi-SIMD Datapath AGU Group 0 AGU Group 1 AGU Group 2 AGU Group 3 Bank 0 Bank 1 Bank 2 Bank 3 Bank C R O S S B A R 16-bit 8-wide 16 entry SIMD RF-0 16-bit 8-wide 16 entry SIMD RF-1 16-bit 8-wide 16 entry SIMD RF-2 16-bit 8-wide 16 entry SIMD RF-3 8-wide SIMD FFU-0 8-wide SIMD FFU-1 8-wide SIMD FFU-2 8-wide SIMD FFU-3 Swizzle Network 16-bit 8- wide -entry Buffer-0 16-bit 8- wide -entry Buffer-1 16-bit 8- wide -entry Buffer-2 16-bit 8- wide -entry Buffer-3 Multi- Output Adder Tree AGU Group 7 Bank bit 8-wide 16 entry SIMD RF-7 8-wide SIMD FFU-7 16-bit 8- wide -entry Buffer-7 Each 8-wide SIMD Group works on different memory Normal 6-Wide SIMD mode all lanes share one AGU locations of the same 8-wide code AGU Offsets 21 21

22 AnySP FFU Datapath 22 Flexible Functional Unit allows us to 1. Exploit Pipeline-parallelism by joining two lanes together 2. Handle register bypass and the temporary buffer 3. Join multiple pipelines to process deeper subgraphs. Fuse Instruction Pairs 22 22

23 23 AnySP Results 23 23

24 Simulation Environment Traditional SIMD architecture comparison SODA at 90nm technology 2 AnySP Synthesized at 90nm TSMC Power, timing, area numbers were extracted Performance and Power for each kernel was generated using synthesized data on in-house simulator G based on a NTT DoCoMo G test setup H.26 CIF@30fps 2 2

25 25 AnySP Speedup vs SIMD-based Architecture N o r m a l i z e d S p e e d u p Baseline 6 - Wide Multi - SIMD Swizzle Network Flexible Functional Unit Buffer + Bypass 0. 0 FFT 102 pt Radix - 2 FFT 102 pt Radix - STBC LDPC H. 26 Intra Prediction H. 26 Deblocking Filter H. 26 Inverse Transform H. 26 Motion Compensation For all benchmarks we perform more than 2x better than a SIMD-based architecture 25 25

26 AnySP Energy-Delay vs SIMD-based Architecture 26 N o r m a l i z e d E n e r g y - D e l a y FFT 102 pt Radix - 2 FFT 102 pt Radix - SIMD - based Architecture AnySP STBC LDPC H. 26 Intra Prediction H. 26 Deblocking Filter H. 26 Inverse Transform H. 26 Motion Compenstation More importantly energy efficiency is much better! 26 26

27 AnySP Power Breakdown 27 PE System Total Est. Components SIMD Data Mem ( 32 KB ) SIMD Register File ( 16 x 102 bit ) SIMD ALUs, Multipliers, and SSN SIMD Pipeline + Clock + Routing SIMD Buffer ( 128 B ) SIMD Adder Tree Intra - processor Interconnect Scalar / AGU Pipeline & Misc. ARM ( Cortex - M 3 ) Global Scratchpad Memory ( 128 KB ) Inter - processor Bus with DMA 90 nm ( MHz ) Units Area mm 2 65 nm ( MHz ) nm ( MHz ) Area Area % % % % % % < 1 % % % % % % % G + H. 26 Decoder Power Power mw % % % % % % < 1 % % % < 1 % < 1 % 1. 5 < 1 % % We estimate that both H.26 and G wireless can be done in under 1 Watt at 5nm 27 27

28 Conclusion & Future Work Conclusion We have presented an example architecture that could possibly meet the requirements of 100Mbps G and HD video on the same platform Under the power budget and meeting the performance at 5nm 28 Future and Ongoing Work Application-specific language Larger class of algorithms for AnySP Better utilization of resources for non-parallel kernels Speedup sequential parts University 28 of Michigan - ACAL 28

Low Power DSP Architectures

Low Power DSP Architectures Overview 1 AnySP SODA++ Increase the Application Domain of the Wireless baseband architecture Diet-SODA SODA- - Taking the sugar out of SODA 1 1 2 Low Power DSP Architectures Trevor Mudge, Bredt Professor

More information

AnySP: Anytime Anywhere Anyway Signal Processing

AnySP: Anytime Anywhere Anyway Signal Processing AnySP: Anytime Anywhere Anyway Signal Processing Mark Woh, Sangwon Seo, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti 2 and Krisztian Flautner 3 Advanced Computer Architecture Laboratory 2 Department

More information

An Ultra Low Power SIMD Processor for Wireless Devices

An Ultra Low Power SIMD Processor for Wireless Devices An Ultra Low Power SIMD Processor for Wireless Devices Mark Woh 1, Sangwon Seo 1, Chaitali Chakrabarti 2, Scott Mahlke 1 and Trevor Mudge 1 1 Advanced Computer Architecture Laboratory 2 School of Electrical,

More information

ANYSP: ANYTIME ANYWHERE ANYWAY SIGNAL PROCESSING

ANYSP: ANYTIME ANYWHERE ANYWAY SIGNAL PROCESSING ... ANYSP: ANYTIME ANYWHERE ANYWAY SIGNAL PROCESSING... LOOKING FORWARD, THE COMPUTATION REQUIREMENTS OF MOBILE DEVICES WILL INCREASE BY ONE TO TWO ORDERS OF MAGNITUDE, BUT THEIR POWER REQUIREMENTS WILL

More information

ARCHITECTURE AND ANALYSIS FOR NEXT GENERATION MOBILE SIGNAL PROCESSING. Mark Woh

ARCHITECTURE AND ANALYSIS FOR NEXT GENERATION MOBILE SIGNAL PROCESSING. Mark Woh ARCHITECTURE AND ANALYSIS FOR NEXT GENERATION MOBILE SIGNAL PROCESSING by Mark Woh A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical

More information

Workload Characterization: Can it save Computer Architecture and Performance Evaluation?

Workload Characterization: Can it save Computer Architecture and Performance Evaluation? Workload Characterization: Can it save Computer Architecture and Performance Evaluation? Lizy Kurian John The Laboratory for Computer Architecture (LCA) ECE Department, UT Austin ljohn@ece.utexas.edu (512)-232-1455

More information

A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment

A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment LETTER IEICE Electronics Express, Vol.11, No.2, 1 9 A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment Ting Chen a), Hengzhu Liu, and Botao Zhang College of

More information

Customizing Wide-SIMD Architectures for H.264

Customizing Wide-SIMD Architectures for H.264 Customizing Wide- Architectures for H.264 S. Seo, M. Woh, S. Mahlke, T. Mudge Department of Electrical and Computer Engineering University of Michigan, Ann Arbor, MI 48109 Email: swseo,mwoh,mahlke,tnm@umich.edu

More information

A System Solution for High-Performance, Low Power SDR

A System Solution for High-Performance, Low Power SDR A System Solution for High-Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztián Flautner 2 1 Advanced Computer Architecture Laboratory

More information

Accelerating 3D Geometry Transformation with Intel MMX TM Technology

Accelerating 3D Geometry Transformation with Intel MMX TM Technology Accelerating 3D Geometry Transformation with Intel MMX TM Technology ECE 734 Project Report by Pei Qi Yang Wang - 1 - Content 1. Abstract 2. Introduction 2.1 3 -Dimensional Object Geometry Transformation

More information

Multimedia Decoder Using the Nios II Processor

Multimedia Decoder Using the Nios II Processor Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra

More information

Dynamic Translation for EPIC Architectures

Dynamic Translation for EPIC Architectures Dynamic Translation for EPIC Architectures David R. Ditzel Chief Architect for Hybrid Computing, VP IAG Intel Corporation Presentation for 8 th Workshop on EPIC Architectures April 24, 2010 1 Dynamic Translation

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface

More information

Using MMX Instructions to Implement a Synthesis Sub-Band Filter for MPEG Audio Decoding

Using MMX Instructions to Implement a Synthesis Sub-Band Filter for MPEG Audio Decoding Using MMX Instructions to Implement a Synthesis Sub-Band Filter for MPEG Audio Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-11: 80x86 Architecture

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-11: 80x86 Architecture ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-11: 80x86 Architecture 1 The 80x86 architecture processors popular since its application in IBM PC (personal computer). 2 First Four generations

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

Datapoint 2200 IA-32. main memory. components. implemented by Intel in the Nicholas FitzRoy-Dale

Datapoint 2200 IA-32. main memory. components. implemented by Intel in the Nicholas FitzRoy-Dale Datapoint 2200 IA-32 Nicholas FitzRoy-Dale At the forefront of the computer revolution - Intel Difficult to explain and impossible to love - Hennessy and Patterson! Released 1970! 2K shift register main

More information

Power Estimation of UVA CS754 CMP Architecture

Power Estimation of UVA CS754 CMP Architecture Introduction Power Estimation of UVA CS754 CMP Architecture Mateja Putic mateja@virginia.edu Early power analysis has become an essential part of determining the feasibility of microprocessor design. As

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

MPEG-4: Simple Profile (SP)

MPEG-4: Simple Profile (SP) MPEG-4: Simple Profile (SP) I-VOP (Intra-coded rectangular VOP, progressive video format) P-VOP (Inter-coded rectangular VOP, progressive video format) Short Header mode (compatibility with H.263 codec)

More information

Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor. NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis

Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor. NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis University of Thessaly Greece 1 Outline Introduction to AVS

More information

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution 1. (40 points) Write the following subroutine in x86 assembly: Recall that: int f(int v1, int v2, int v3) { int x = v1 + v2; urn (x + v3) * (x v3); Subroutine arguments are passed on the stack, and can

More information

Processor Structure and Function

Processor Structure and Function WEEK 4 + Chapter 14 Processor Structure and Function + Processor Organization Processor Requirements: Fetch instruction The processor reads an instruction from memory (register, cache, main memory) Interpret

More information

Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation

Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided

More information

Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder. Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India

Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder. Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India Outline of Presentation MPEG-2 to H.264 Transcoding Need for a multiprocessor

More information

Vector IRAM: A Microprocessor Architecture for Media Processing

Vector IRAM: A Microprocessor Architecture for Media Processing IRAM: A Microprocessor Architecture for Media Processing Christoforos E. Kozyrakis kozyraki@cs.berkeley.edu CS252 Graduate Computer Architecture February 10, 2000 Outline Motivation for IRAM technology

More information

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 Video Codecs 70% of internet traffic will be video in 2018 [CISCO] Video

More information

Flexible wireless communication architectures

Flexible wireless communication architectures Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April

More information

IA-32 Architecture COE 205. Computer Organization and Assembly Language. Computer Engineering Department

IA-32 Architecture COE 205. Computer Organization and Assembly Language. Computer Engineering Department IA-32 Architecture COE 205 Computer Organization and Assembly Language Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Basic Computer Organization Intel

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

History of the Intel 80x86

History of the Intel 80x86 Intel s IA-32 Architecture Cptr280 Dr Curtis Nelson History of the Intel 80x86 1971 - Intel invents the microprocessor, the 4004 1975-8080 introduced 8-bit microprocessor 1978-8086 introduced 16 bit microprocessor

More information

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved. + William Stallings Computer Organization and Architecture 10 th Edition 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. 2 + Chapter 14 Processor Structure and Function + Processor Organization

More information

Digital Video Processing

Digital Video Processing Video signal is basically any sequence of time varying images. In a digital video, the picture information is digitized both spatially and temporally and the resultant pixel intensities are quantized.

More information

Assembly Language Programming Introduction

Assembly Language Programming Introduction Assembly Language Programming Introduction October 10, 2017 Motto: R7 is used by the processor as its program counter (PC). It is recommended that R7 not be used as a stack pointer. Source: PDP-11 04/34/45/55

More information

CS241 Computer Organization Spring 2015 IA

CS241 Computer Organization Spring 2015 IA CS241 Computer Organization Spring 2015 IA-32 2-10 2015 Outline! Review HW#3 and Quiz#1! More on Assembly (IA32) move instruction (mov) memory address computation arithmetic & logic instructions (add,

More information

Vertex Shader Design I

Vertex Shader Design I The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only

More information

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor Lecture 12 Architectures for Low Power: Transmeta s Crusoe Processor Motivation Exponential performance increase at a low cost However, for some application areas low power consumption is more important

More information

Hardware and Software Architecture. Chapter 2

Hardware and Software Architecture. Chapter 2 Hardware and Software Architecture Chapter 2 1 Basic Components The x86 processor communicates with main memory and I/O devices via buses Data bus for transferring data Address bus for the address of a

More information

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs CSC 2400: Computer Systems Towards the Hardware: Machine-Level Representation of Programs Towards the Hardware High-level language (Java) High-level language (C) assembly language machine language (IA-32)

More information

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam Assembly Language Lecture 2 - x86 Processor Architecture Ahmed Sallam Introduction to the course Outcomes of Lecture 1 Always check the course website Don t forget the deadline rule!! Motivations for studying

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information

UMBC. A register, an immediate or a memory address holding the values on. Stores a symbolic name for the memory location that it represents.

UMBC. A register, an immediate or a memory address holding the values on. Stores a symbolic name for the memory location that it represents. Intel Assembly Format of an assembly instruction: LABEL OPCODE OPERANDS COMMENT DATA1 db 00001000b ;Define DATA1 as decimal 8 START: mov eax, ebx ;Copy ebx to eax LABEL: Stores a symbolic name for the

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 ISSCC 26 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 22.1 A 125µW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ting-An Lin 2, Sheng-Zen Wang 2, Wen-Ping Lee

More information

Memory Models. Registers

Memory Models. Registers Memory Models Most machines have a single linear address space at the ISA level, extending from address 0 up to some maximum, often 2 32 1 bytes or 2 64 1 bytes. Some machines have separate address spaces

More information

Process Variation in Near-Threshold Wide SIMD Architectures

Process Variation in Near-Threshold Wide SIMD Architectures Process Variation in Near-Threshold Wide SIMD Architectures Sangwon Seo 1, Ronald G. Dreslinski 1, Mark Woh 1, Yongjun Park 1, Chaitali Charkrabari 2, Scott Mahlke 1, David Blaauw 1, Trevor Mudge 1 1 University

More information

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications N.C. Paver PhD Architect Intel Corporation Hot Chips 16 August 2004 Age nda Overview of the Intel PXA27X processor

More information

High Efficiency Video Coding. Li Li 2016/10/18

High Efficiency Video Coding. Li Li 2016/10/18 High Efficiency Video Coding Li Li 2016/10/18 Email: lili90th@gmail.com Outline Video coding basics High Efficiency Video Coding Conclusion Digital Video A video is nothing but a number of frames Attributes

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended

More information

CS 16: Assembly Language Programming for the IBM PC and Compatibles

CS 16: Assembly Language Programming for the IBM PC and Compatibles CS 16: Assembly Language Programming for the IBM PC and Compatibles Discuss the general concepts Look at IA-32 processor architecture and memory management Dive into 64-bit processors Explore the components

More information

Advanced Video Coding: The new H.264 video compression standard

Advanced Video Coding: The new H.264 video compression standard Advanced Video Coding: The new H.264 video compression standard August 2003 1. Introduction Video compression ( video coding ), the process of compressing moving images to save storage space and transmission

More information

Load Effective Address Part I Written By: Vandad Nahavandi Pour Web-site:

Load Effective Address Part I Written By: Vandad Nahavandi Pour   Web-site: Load Effective Address Part I Written By: Vandad Nahavandi Pour Email: AlexiLaiho.cob@GMail.com Web-site: http://www.asmtrauma.com 1 Introduction One of the instructions that is well known to Assembly

More information

An Efficient Vector/Matrix Multiply Routine using MMX Technology

An Efficient Vector/Matrix Multiply Routine using MMX Technology An Efficient Vector/Matrix Multiply Routine using MMX Technology Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection

More information

H.264 Decoding. University of Central Florida

H.264 Decoding. University of Central Florida 1 Optimization Example: H.264 inverse transform Interprediction Intraprediction In-Loop Deblocking Render Interprediction filter data from previously decoded frames Deblocking filter out block edges Today:

More information

IA-32 Architecture. Computer Organization and Assembly Languages Yung-Yu Chuang 2005/10/6. with slides by Kip Irvine and Keith Van Rhein

IA-32 Architecture. Computer Organization and Assembly Languages Yung-Yu Chuang 2005/10/6. with slides by Kip Irvine and Keith Van Rhein IA-32 Architecture Computer Organization and Assembly Languages Yung-Yu Chuang 2005/10/6 with slides by Kip Irvine and Keith Van Rhein Virtual machines Abstractions for computers High-Level Language Level

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

Assembly Language. Lecture 2 x86 Processor Architecture

Assembly Language. Lecture 2 x86 Processor Architecture Assembly Language Lecture 2 x86 Processor Architecture Ahmed Sallam Slides based on original lecture slides by Dr. Mahmoud Elgayyar Introduction to the course Outcomes of Lecture 1 Always check the course

More information

CSC 8400: Computer Systems. Machine-Level Representation of Programs

CSC 8400: Computer Systems. Machine-Level Representation of Programs CSC 8400: Computer Systems Machine-Level Representation of Programs Towards the Hardware High-level language (Java) High-level language (C) assembly language machine language (IA-32) 1 Compilation Stages

More information

IMPLEMENTATION OF H.264 DECODER ON SANDBLASTER DSP Vaidyanathan Ramadurai, Sanjay Jinturkar, Mayan Moudgill, John Glossner

IMPLEMENTATION OF H.264 DECODER ON SANDBLASTER DSP Vaidyanathan Ramadurai, Sanjay Jinturkar, Mayan Moudgill, John Glossner IMPLEMENTATION OF H.264 DECODER ON SANDBLASTER DSP Vaidyanathan Ramadurai, Sanjay Jinturkar, Mayan Moudgill, John Glossner Sandbridge Technologies, 1 North Lexington Avenue, White Plains, NY 10601 sjinturkar@sandbridgetech.com

More information

Venezia: a Scalable Multicore Subsystem for Multimedia Applications

Venezia: a Scalable Multicore Subsystem for Multimedia Applications Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and

More information

On-Chip Vector Coprocessor Sharing for Multicores

On-Chip Vector Coprocessor Sharing for Multicores On-Chip Vector Coprocessor Sharing for Multicores Spiridon F. Beldianu and Sotirios G. Ziavras Electrical and Computer Engineering Department, New Jersey Institute of Technology, Newark, NJ, USA sfb2@njit.edu,

More information

RISC I from Berkeley. 44k Transistors 1Mhz 77mm^2

RISC I from Berkeley. 44k Transistors 1Mhz 77mm^2 The Case for RISC RISC I from Berkeley 44k Transistors 1Mhz 77mm^2 2 MIPS: A Classic RISC ISA Instructions 4 bytes (32 bits) 4-byte aligned Instructions operate on memory and registers Memory Data types

More information

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine

More information

The Microprocessor and its Architecture

The Microprocessor and its Architecture The Microprocessor and its Architecture Contents Internal architecture of the Microprocessor: The programmer s model, i.e. The registers model The processor model (organization) Real mode memory addressing

More information

High performance, power-efficient DSPs based on the TI C64x

High performance, power-efficient DSPs based on the TI C64x High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research

More information

AVS VIDEO DECODING ACCELERATION ON ARM CORTEX-A WITH NEON

AVS VIDEO DECODING ACCELERATION ON ARM CORTEX-A WITH NEON AVS VIDEO DECODING ACCELERATION ON ARM CORTEX-A WITH NEON Jie Wan 1, Ronggang Wang 1, Hao Lv 1, Lei Zhang 1, Wenmin Wang 1, Chenchen Gu 3, Quanzhan Zheng 3 and Wen Gao 2 1 School of Computer & Information

More information

We can study computer architectures by starting with the basic building blocks. Adders, decoders, multiplexors, flip-flops, registers,...

We can study computer architectures by starting with the basic building blocks. Adders, decoders, multiplexors, flip-flops, registers,... COMPUTER ARCHITECTURE II: MICROPROCESSOR PROGRAMMING We can study computer architectures by starting with the basic building blocks Transistors and logic gates To build more complex circuits Adders, decoders,

More information

Static, multiple-issue (superscaler) pipelines

Static, multiple-issue (superscaler) pipelines Static, multiple-issue (superscaler) pipelines Start more than one instruction in the same cycle Instruction Register file EX + MEM + WB PC Instruction Register file EX + MEM + WB 79 A static two-issue

More information

Functional modeling style for efficient SW code generation of video codec applications

Functional modeling style for efficient SW code generation of video codec applications Functional modeling style for efficient SW code generation of video codec applications Sang-Il Han 1)2) Soo-Ik Chae 1) Ahmed. A. Jerraya 2) SD Group 1) SLS Group 2) Seoul National Univ., Korea TIMA laboratory,

More information

Using MMX Instructions to Perform Simple Vector Operations

Using MMX Instructions to Perform Simple Vector Operations Using MMX Instructions to Perform Simple Vector Operations Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with

More information

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,

More information

EECE416 :Microcomputer Fundamentals and Design. X86 Assembly Programming Part 1. Dr. Charles Kim

EECE416 :Microcomputer Fundamentals and Design. X86 Assembly Programming Part 1. Dr. Charles Kim EECE416 :Microcomputer Fundamentals and Design X86 Assembly Programming Part 1 Dr. Charles Kim Department of Electrical and Computer Engineering Howard University www.mwftr.com 1 Multiple Address Access

More information

Low-Level Essentials for Understanding Security Problems Aurélien Francillon

Low-Level Essentials for Understanding Security Problems Aurélien Francillon Low-Level Essentials for Understanding Security Problems Aurélien Francillon francill@eurecom.fr Computer Architecture The modern computer architecture is based on Von Neumann Two main parts: CPU (Central

More information

Moodle WILLINGDON COLLEGE SANGLI (B. SC.-II) Digital Electronics

Moodle WILLINGDON COLLEGE SANGLI (B. SC.-II) Digital Electronics Moodle 4 WILLINGDON COLLEGE SANGLI (B. SC.-II) Digital Electronics Advanced Microprocessors and Introduction to Microcontroller Moodle developed By Dr. S. R. Kumbhar Department of Electronics Willingdon

More information

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012 CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Introduction to Video Compression

Introduction to Video Compression Insight, Analysis, and Advice on Signal Processing Technology Introduction to Video Compression Jeff Bier Berkeley Design Technology, Inc. info@bdti.com http://www.bdti.com Outline Motivation and scope

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size Overview Videos are everywhere But can take up large amounts of resources Disk space Memory Network bandwidth Exploit redundancy to reduce file size Spatial Temporal General lossless compression Huffman

More information

Computer Processors. Part 2. Components of a Processor. Execution Unit The ALU. Execution Unit. The Brains of the Box. Processors. Execution Unit (EU)

Computer Processors. Part 2. Components of a Processor. Execution Unit The ALU. Execution Unit. The Brains of the Box. Processors. Execution Unit (EU) Part 2 Computer Processors Processors The Brains of the Box Computer Processors Components of a Processor The Central Processing Unit (CPU) is the most complex part of a computer In fact, it is the computer

More information

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth Registers Ray Seyfarth September 8, 2011 Outline 1 Register basics 2 Moving a constant into a register 3 Moving a value from memory into a register 4 Moving values from a register into memory 5 Moving

More information

16.317: Microprocessor Systems Design I Fall 2015

16.317: Microprocessor Systems Design I Fall 2015 16.317: Microprocessor Systems Design I Fall 2015 Exam 2 Solution 1. (16 points, 4 points per part) Multiple choice For each of the multiple choice questions below, clearly indicate your response by circling

More information

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? exam on Wednesday today s material not on the exam 1 Assembly Assembly is programming

More information

Creating a Scalable Microprocessor:

Creating a Scalable Microprocessor: Creating a Scalable Microprocessor: A 16-issue Multiple-Program-Counter Microprocessor With Point-to-Point Scalar Operand Network Michael Bedford Taylor J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B.

More information

Realizing Software Defined Radio A Study in Designing Mobile Supercomputers

Realizing Software Defined Radio A Study in Designing Mobile Supercomputers Realizing Software Defined Radio A Study in Designing Mobile Supercomputers by Yuan Lin A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer

More information

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication 2018 IEEE International Conference on Consumer Electronics (ICCE) An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication Ahmet Can Mert, Ercan Kalali, Ilker Hamzaoglu Faculty

More information

Using MMX Instructions to Implement 2D Sprite Overlay

Using MMX Instructions to Implement 2D Sprite Overlay Using MMX Instructions to Implement 2D Sprite Overlay Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel

More information

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Assembly Language for Intel-Based Computers, 4 th Edition Kip R. Irvine Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Slides prepared by Kip R. Irvine Revision date: 09/25/2002

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013 ECE 417 Guest Lecture Video Compression in MPEG-1/2/4 Min-Hsuan Tsai Apr 2, 213 What is MPEG and its standards MPEG stands for Moving Picture Expert Group Develop standards for video/audio compression

More information

An Asynchronous Array of Simple Processors for DSP Applications

An Asynchronous Array of Simple Processors for DSP Applications An Asynchronous Array of Simple Processors for DSP Applications Zhiyi Yu, Michael Meeuwsen, Ryan Apperson, Omar Sattari, Michael Lai, Jeremy Webb, Eric Work, Tinoosh Mohsenin, Mandeep Singh, Bevan Baas

More information

ADDRESSING MODES. Operands specify the data to be used by an instruction

ADDRESSING MODES. Operands specify the data to be used by an instruction ADDRESSING MODES Operands specify the data to be used by an instruction An addressing mode refers to the way in which the data is specified by an operand 1. An operand is said to be direct when it specifies

More information

4G WIRELESS VIDEO COMMUNICATIONS

4G WIRELESS VIDEO COMMUNICATIONS 4G WIRELESS VIDEO COMMUNICATIONS Haohong Wang Marvell Semiconductors, USA Lisimachos P. Kondi University of Ioannina, Greece Ajay Luthra Motorola, USA Song Ci University of Nebraska-Lincoln, USA WILEY

More information

The Instruction Set. Chapter 5

The Instruction Set. Chapter 5 The Instruction Set Architecture Level(ISA) Chapter 5 1 ISA Level The ISA level l is the interface between the compilers and the hardware. (ISA level code is what a compiler outputs) 2 Memory Models An

More information

21 Baseband Processing Architectures for SDR

21 Baseband Processing Architectures for SDR Vijay/Wireless, Networking, Radar, Sensor Array Processing, and Nonlinear Signal Processing 46047_C021 Page Proof page 1 22.6.2009 4:08pm Compositor Name: BMani 21 Baseband Processing Architectures for

More information

05 - Microarchitecture, RF and ALU

05 - Microarchitecture, RF and ALU September 15, 2015 Microarchitecture Design Step 1: Partition each assembly instruction into microoperations, allocate each microoperation into corresponding hardware modules. Step 2: Collect all microoperations

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? Common instructions (arithmetic, load/store,

More information

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture. Chapter Overview.

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture. Chapter Overview. Assembly Language for Intel-Based Computers, 4 th Edition Kip R. Irvine Chapter 2: IA-32 Processor Architecture Slides prepared by Kip R. Irvine Revision date: 09/25/2002 Chapter corrections (Web) Printing

More information

Microprocessors vs. DSPs (ESC-223)

Microprocessors vs. DSPs (ESC-223) Insight, Analysis, and Advice on Signal Processing Technology Microprocessors vs. DSPs (ESC-223) Kenton Williston Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com

More information