ARC HS4x and HS4xD CPUs: New Dual-Issue Architecture Boosts Embedded Processor Performance

Size: px
Start display at page:

Download "ARC HS4x and HS4xD CPUs: New Dual-Issue Architecture Boosts Embedded Processor Performance"

Transcription

1 : New Dual-Issue Architecture Boosts Embedded Processor Performance By Mike Demler Senior Analyst May

2 : New Dual-Issue Architecture Boosts Embedded Processor Performance By Mike Demler, Senior Analyst, The Linley Group This white paper describes the Synopsys DesignWare ARC HS4x and HS4xD series of licensable CPU cores. These are the company s newest CPUs for embedded applications requiring 32-bit RISC performance in a small silicon footprint with minimal power consumption. This paper was prepared by The Linley Group and sponsored by Synopsys, but the opinions and analysis are those of the author. Synopsys s DesignWare ARC CPUs comprise a family of highly configurable and customizable processor cores, which ship in nearly two billion chips per year. ARC s popularity in embedded devices makes the company second only to ARM in the number of chips that integrate its licensable CPUs. More than 230 ARC licensees use the cores in products that span a broad range of embedded applications, such as automotive control systems, digital-audio devices, sensors, solid-state drives (SSDs), network-attached storage (NAS), and residential gateways. Since acquiring ARC as part of its 2010 purchase of Virage Logic, Synopsys has continued to improve the CPU architecture and add new features that expand the product line. In 2011, the company introduced the 32-bit ARCv2 ISA and the ARC EM family for very low power and deeply embedded products. In 2013, it introduced the ARC HS family implemented with the ARCv2 ISA, which were the first ARC cores to support dual- and quad-core configurations. The ARC HS cores target higher performance embedded applications, but they are software compatible with the ARC EM family cores. In 2014, the introduction of ARC HS38 brought extensive enhancements, including the option to integrate a memory-management unit (MMU), which supports running Linux and other higher-level operating systems that use virtual memory. The HS38 supports dual- (HS38x2) and quad-core (HS38x4) configurations, along with offering a shared L2 cache for cache-coherent symmetric multiprocessing (SMP). In May 2017, Synopsys announced the new HS4x family, which enhances the ARC HS3x architecture by adding dual-issue capability to the 10-stage pipeline. The HS4x CPUs run the same software as previous HS models, but they deliver a 25% boost in CoreMarks per megahertz compared with the ARC HS3x. Designers can alternatively use the new architecture to deliver the same performance at a lower frequency than the previous version, reducing power consumption. Like the predecessor ARC HS3x-series, the new HS4x offers customers three base configurations, and all enable dual- and quad-core clusters. As Table 1 shows, the smallest new model is the HS44, which includes up to 16MB instruction and data closely coupled memories (CCMs) but no L2 cache or MMU. The HS46 adds as much as 64KB 2017 The Linley Group - 1 -

3 L1 I/D caches along with the CCMs, and it optionally has up to 8MB of L2 cache as well as an MMU. But including all those features is equivalent to licensing the top-of-the-line HS48, which supersedes the ARC HS38. Designers can configure all three cores with an optional IEEE 754-compliant FPU, a memory protection unit (MPU), and a real-time trace (RTT) feature. Instruction Set CPU Freq (max) Instruction Issue Pipeline Depth ARC HS44 ARC HS46 ARC HS48 32-bit ARCv2 1.6GHz (worst case), 2.2GHz (typical) in TSMC 28nm HPM 2 per cycle 10 stages L1 Caches (I/D) None 2-64KB/2-64KB 2-64KB/2-64KB Closely Coupled Memories (I/D) 512B 16MB / 512B 16MB L2 Cache None Optional 256KB-8MB Memory- Management Unit Options Interfaces None Optional Standard FPU, MPU, real-time trace, DMA 32-, 64-, or 128-bit AXI/AHB-Lite Table 1. Key features of the Synopsys DesignWare ARC HS4x family. The company offers three base configurations, but designers can use the ARC Processor Extension (Apex) tools to further customize the CPUs. Along with the new base HS4x configurations, Synopsys also developed two new ARC HS4xD processor cores that add DSP extensions previously available only with ARC EM. The ARCv2-DSP ISA adds more than 150 signal-processing instructions for audio, speech, and wireless-baseband applications. The ARC HS45D brings the DSP features to the HS44 base design, and it includes the same 16MB instruction and data CCMs. The ARC HS47D builds on the HS46, adding a DSP along with the CCMs and L1 caches. The HS4xD designs include a unified 32x32-bit multiplier/multiplier-accumulator (MUL/MAC). Designers can optionally add an L2 cache and MMU, as well as the DMA, FPU, MPU, and RTT features to the base designs. The DSP-equipped cores also support dual- and quad-core configurations. ARC HS4x Architectural Overview The ARC HS4x family uses the same 10-stage pipeline as its predecessor, but the addition of a second instruction decoder provides a substantial performance boost by increasing utilization of the functional units. This 32-bit RISC architecture has traditional 32-bit instructions and a subset of 16-bit instructions for greater code density in smallmemory embedded systems. As with all ARC cores, designers can use the ARC Processor Extension (Apex) tools to add custom instructions and hardware, including their own Verilog RTL, auxiliary registers, and condition and status codes, as well as memory-mapped blocks and closely coupled peripherals The Linley Group, Inc

4 Using extension core registers, ARC HS processors can support up to 60 core registers. The register file has four read ports and three write ports. Configurability enables designers to add or omit features in order to optimize the core s performance, size, and power consumption for the target application. As Figure 1 shows, the ARC HS4x functional units include two ALUs and two late ALUs. The late ALUs defer execution to stage 9 in the 10-stage pipeline, thereby avoiding stalls that would occur when data loaded from memory isn t available in time for the earlier ALUs. The late ALUs can also resolve branches that depend on load data, so most instructions suffer no load-to-use penalty. Figure 1. Block diagram of Synopsys DesignWare ARC HS4x CPU. The design adds a second decoder that can simultaneously issue instructions to any two functional units. The dualissue capability, along with a second ALU and late ALU, enable the ARC HS4x to increase performance by 25% compared with the previous HS3x model. The other functional units remain the same as in the ARC HS3x, comprising a single divider, multiplier, MAC/SIMD, and optional FPU. The ARC HS4x can issue two ALU instructions in parallel or pair instructions from two different categories, including ALU, load/store, multiply (MPY), divide (DIV), floating point (FPU), and user defined. The HS4x load/store unit executes 64-bit operations, enabling a single instruction to load or store data to or from a register pair. The ARC HS compiler automatically generates 64-bit load/stores, and the microarchitecture supports nonaligned access without incurring an additional one-cycle penalty. The load-to-use latency is one cycle for most ALU operations, and the load-tostore delay is also just one cycle. ARC HS is a nonblocking architecture and can handle up to four cache misses without blocking the pipeline. The optional data-memory port 2017 The Linley Group, Inc

5 enables faster access to memory or peripheral data. The AXI interface supports up to 13 outstanding memory transactions on the system bus, which designers can configure as a 32-, 64-, or 128-bit interface. In multicore designs, each HS4x core integrates a snoop interface to keep the L1 caches coherent. The snoop unit applies the MOESI (Modified, Owned, Exclusive, Shared, Invalid) protocol for cache-to-cache transfers. The optional I/O coherency port keeps I/O traffic coherent with the L1 caches. The shared L2 cache operates at the CPU clock frequency, and its Harvard architecture connects independently to the core s data and instruction buses. Designers can use configuration options to enable cache redundancy and ECC support, as well as to select up to a 16-way last recently used (LRU) policy and a 64- or 128-byte line size. The cache also supports reduced-power operation during processor sleep states. A More Efficient Pipeline Although HS4x CPUs issue instructions in order, the execution pipeline skews some operations so they can process and retire out of order, preventing pipeline stalls. This method enhances the instruction parallelism and throughput without needing additional hardware such as reorder buffers, which would increase die area and power. For example, by delaying a MPY operation to the third execution cycle (Exec3), that execution unit can receive operands from the ALUs while, at the same time, those ALUs are free to begin executing another instruction, as Figure 2 shows. Similarly, rather than allow the pipeline to stall while the ALUs wait for operands, the design will schedule those instructions for the late execution cycle (Exec3) as other instructions proceed to available execution units. ALU operations can incur different latencies depending on whether they are basic or advanced. Advanced ALU functions are long latency, requiring an additional cycle to complete compared with basic ALU operations. An 2017 The Linley Group, Inc

6 example of an advanced ALU operation is a rounding function. Figure 2. Block diagram of ARC HS4x execution pipeline. The CPU can issue two instructions per cycle. It employs deferred execution to maximize function-unit utilization and prevent pipeline stalls. Synopsys expects the ARC HS4x to support the same 2.2GHz maximum operating frequency as the predecessor HS3x, but the new dual-issue capability will increase performance efficiency by roughly 25%. The new design delivers 5.0 CoreMarks per megahertz (or 2.53 Dhrystone mips per megahertz). The dual-issue hardware adds roughly 50K gates to the design. For the smallest HS44, area increases by roughly 20 25%, so area efficiency stays the same as in the older design. The area increase is proportionally smaller in the HS46 and HS48 with L1 caches, however, so customers will gain both the 25% performance-efficiency boost and greater area efficiency. Boosting DSP Synopsys also offers two new dual-issue HS models that include support for ARC DSP extensions. The ARCv2DSP ISA comprises most of the same set of instructions that the smaller ARC EMxD products support, providing for software compatibility. The ISA has more than 150 DSP instructions, including vector/simd operations and complex-math functions. The ARC HS4xD omits the EMxD s bitstream instructions, which those cores use for audio codecs, but adds support for 64-bit operands. The smallest ARC HS4xD version is the HS45D, which essentially adds DSP features to the HS44, including up to 16MB CCMs. The ARC HS47D includes I/D caches, which are the same as in the HS46. Customers can separately license an MMU to enable an HS47D to run Linux, but the company isn t offering that feature combination in a preconfigured 2017 The Linley Group, Inc

7 product. Other licensable options are a single-precision FPU, a memory-protection unit (MPU), real-time trace, a DMA, and a shared L2 cache that supports dual- and quad-core configurations. As Figure 3 shows, the HS4xD replaces the integer multiplier with a DSP-specific multiplier that performs both the integer and fixed-point MPY/MAC functions, including fast and slow operations. The slow functions are those that take an additional cycle to set flags. The DSP core also adds signal-processing operations to the advanced ALU. The new dual-issue DSP architecture employs 64-bit source operands, which the ARC EMxD cores don t support. In the ARC HS4xD, that capability enables quad 16-bit and dual 32-bit SIMD operations. For example, the HS4xD can pack four 16-bit dotproduct operations in a single instruction. The new design enables the HS4xD to perform single-cycle 32x32-bit multiply/multiply-accumulate (MAC) operations, two 32x16-bit MAC operations per cycle, or four 16x16 MAC operations per cycle. Figure 3. Block diagram of ARC HS4xD execution pipeline. The HS4xD cores add DSP multipliers to the base set of function units and modify the ALUs for signal-processing operations. The ARC HS4xD omits the X/Y memories that enhance DSP performance in the ARC EM9/11D, but both designs employ the same address-generation unit (AGU), which supports the bit-reverse and modulo-wrapping modes common in FFTs and digital filters. In the HS45D/47D, the AGUs also generate operands for explicit load/store operations, and they support access to the data cache. The AGU feature is a 2017 The Linley Group, Inc

8 configuration option that includes a unit with four address pointers and modifiers as well as two offsets. Along with the base dual-issue capability, the HS4xD can issue two 32-bit DSP ALU instructions in parallel or execute a DSP multiply command in parallel with a load/store. The base Apex, DIV, FPU, and branch instructions can t issue in the same cycle as DSP multiply. The 64-bit advanced and DSP ALU operations are restricted to single-issue dispatch. Competitive Comparisons The three DesignWare ARC HS4x models give designers the option to include or exclude L1 and L2 caches, an IEEE 754 compliant floating-point unit (FPU), a memoryprotection unit (MPU), and a real-time trace block. The HS44 and HS46 omit the memory-management unit (MMU) and are therefore suitable for use in microcontrollers running an RTOS. The ARM Cortex-M7 is a smaller and lower-power competitor for such tasks, but it delivers less than half the peak performance of the ARC HS4x. The HS48 includes an MMU that supports Linux and other high-level operating systems. Fully configured, the HS48 competes for low-power embedded Linux designs with ARM s 32-bit Cortex-A7 and Cortex-A32, as well as Imagination Technologies 64-bit MIPS I6500, as Table 2 shows The Linley Group, Inc

9 Synopsys ARC HS48 ARM Cortex-A7 ARM Cortex-A32 Imagination MIPS I6500 Instruction Set 32-bit ARCv2 32-bit ARMv7 32-bit ARMv8 64-bit MIPS R6 SMP/SMT SMP SMP SMP SMP and SMT CPU Speed (max) 2.2GHz 2.0GHz 2.2GHz 2.0GHz Instr-Issue Rate 2 per cycle 1 per cycle 1 per cycle 2 per cycle Reordering? No No No No Pipeline Depth 10 stages 8 stages 8 stages 9 stages L1 Caches I/D 0 64KB 4 64KB 64KB 0 64KB with ECC TCM I/D 0 16MB None None 0/1MB with ECC L2 Cache 256KB 8MB 128KB 1MB 128KB 1MB 512KB 8MB Optional Extensions MACs/Cycle FPU, MPU, realtime trace 1x 32x32-bit DSP, Neon, FPU, TrustZone, SMP, LPAE 1x 32x32-bit or 2x 16x16-bit DSP, Neon, FPU, TrustZone, SMP, LPAE, ECC on caches, HW virtualization 1x 32x32-bit CoreMarks/MHz 5.0CM/MHz 3.3CM/MHz 3.3CM/MHz Max Performance Interfaces 11,000 CoreMarks 32-, 64-, or 128- bit AXI, AHB-Lite 6,600 CoreMarks 7,260 CoreMarks 128-bit Amba 4 Amba 4 Ace, AXI4, Amba 5 Chi ECC on L2 cache, 128-bit SIMD, DSP, FPU, H/W virtualization 1x 64x64-bit 3.7CM/MHz (1 thread); 5.6CM/MHz (2 threads) 7,400 / 11,600 CoreMarks (2 threads) 256-bit Amba AXI4 Die Area 0.25mm mm mm mm 2 Perf per Area 44kCM/mm 2 20kCM/mm 2 16kCM/mm 2 8kCM/mm 2 Power (max) 0.05mW/MHz 0.12mW/MHz 0.07mW/MHz 0.15mW/MHz Perf per Watt 100CM/mW 28CM/mW 47CM/mW 25CM/mW RTL Release Table 2. Comparison of DesignWare ARC HS48 with three competing CPUs. The maximum clock frequencies assume speed-optimized synthesis for a 28nm high-k metal-gate (HKMG) process. Area estimates exclude the L1 cache. (Source: vendors, except The Linley Group estimate) ARM designed Cortex-A7 to work as the little core in its original 32-bit Big.Little configuration. Cortex-A32 implements the 32-bit features of the ARMv8 ISA, offering designers a more area- and power-efficient alternative to the little 64-bit Cortex-A35. Imagination s MIPS I6500 is a low-end 64-bit CPU, but it runs MIPS32 software directly. The new dual-issue ARC HS4x design boosts maximum performance to 11,000 CoreMarks, which is approximately 50% higher than its nearest competitor in this group. Although the dual-threaded MIPS I6500 model delivers slightly higher 2017 The Linley Group, Inc

10 performance than the ARC HS48, most embedded applications won t use that feature, which comes at the expense of four times the area and three times the power of the Synopsys core. The ARC and MIPS cores allow a maximum 8MB L2 cache, but the ARM cores are limited to a 1MB maximum. The larger caches will save additional power required to access off-chip memory. All these CPUs work in clusters of four or more cores. For embedded applications, ARC HS4x designers can include up to 16MB each of instruction and data closely coupled memory (CCM) a feature the ARM cores lack. The MIPS core supports just a 1MB data scratchpad memory. Besides its performance advantage, the ARC HS4x also offers superior area and power efficiency compared with the ARM and MIPS CPUs. Excluding the L1/L2 caches and TCMs, the ARC HS48 occupies just 0.25mm 2, which is one-fourth the size of the I6500 and approximately 25% smaller than Cortex-A7. The result is 44kCM/mm 2, more than twice the area efficiency of the ARM cores and 5.5x better than the I6500. The Synopsys design delivers similar advantages in performance per milliwatt. Its 100CM/mW power efficiency is twice that of Cortex-A32 as well as four times more than Cortex-A7 and the MIPS I6500. The ARC HS45D and HS47D lack an MMU, although designers can optionally add one to the HS47D. These cores are thus best suited to advanced microcontroller applications such as speech processing and wireless basebands. The ARC HS4xD competes for such designs with ARM s Cortex-R8, which offers optional DSP extensions. It will also compete with Ceva s configurable X2 DSP core, a design that s purpose-built for PHYcontrol tasks in 5G and LTE-Advanced modems. As Table 3 shows, the ARC HS4xD cores in a 28nm HPM process can run at a higher maximum clock frequency than the ARM and Ceva offerings, although the HS47D and X2 both sport 10-stage pipelines and Cortex-R8 has 11 stages. The Ceva X2 integrates two scalar CPUs (SPUs in the company s parlance), but they deliver 10% less performance per megahertz than the ARC DSP. Cortex-R8 delivers 4.4 CoreMarks per megahertz, yielding peak performance that is 65% of ARC HS4xD The Linley Group, Inc

11 Synopsys ARC HS47D ARM Cortex-R8 Ceva X2 Instruction Set 32-bit ARCv2-DSP ARMv7, Thumb-2 32-bit Ceva-X SMP/SMT SMP Dual-core AMP/SMP No CPU Speed (max) 2.2GHz 1.6GHz 1.0GHz Instr-Issue Rate 2 per cycle 2 per cycle 2 per cycle Pipeline Depth 10 stages 11 stages 10 stages L1 Caches I/D 0 64KB 0 64KB TCM I/D 0 16MB 0 1MB 0 128KB instr / 0 64KB data 0 256KB instr TCM, 0 512KB data TCM L2 Cache 256KB 8MB 0 8MB None Optional Extensions MACs/Cycle FPU, MPU, real-time trace 1x 32x32-bit, 4x 16x16-bit DSP, FPU, MPU, AMP, embeddedtrace module (ETM) 2x 16x16-bit, 4x 8x8-bit Data/instruction cache, dynamic branch predictor, 0 2x FPU 2x 32x32-bit, 4x 16x16-bit MACs/s (16-bit) 8.8GMACs 3.2GMACs 4.0GMACs CoreMarks/MHz 5.0CM/MHz 4.4CM/MHz 4.5CM/MHz Max Performance 11,000 CoreMarks 7,040 CoreMarks 4,500 CoreMarks AXI Interfaces 32-, 64-, or 128-bit AXI (main), optional 32-bit AXI (peripherals), optional 32- or 64-bit (slave) 4x AXI (64-bit), 5x AXI (32-bit) Instructions: 128-bit master; data: 128-bit master bit slave Die Area 0.22mm mm 2 * 0.18mm 2 Perf per Area 50kCM/mm 2 44kCM/mm 2 25kCM/mm 2 Power (max) 0.042mW/MHz 0.054mW/MHz Not disclosed Perf per Watt 119CM/mW 81CM/mW Not disclosed Table 3. Comparison of DSP-capable CPUs. Maximum clock frequencies assume speedoptimized synthesis for a 28nm HPM process, *power-optimized nine-track library. (Source: vendors, except The Linley Group estimate) The ARC4xD cores maintain the area and power efficiency of the non-dsp versions. By omitting an MMU, the HS47D is slightly smaller than the HS48D, which has a higher area efficiency of 50kCM/mm 2. Although Cortex-R8 is smaller than the HS47D, its area efficiency is 12% less. Ceva s X2 is also roughly 20% smaller, but it delivers just half the area efficiency. ARM s Cortex-R8 SIMD/DSP instructions operate on 16- or 8-bit data values in 32-bit registers. By comparison, the ARC HS4xD supports 64-bit source operations for quad 16- bit or dual 32-bit SIMD, and it includes built-in signal-processing functions such as FFT 2017 The Linley Group, Inc

12 butterflies, loop transformations, and FIR and IIR filters. Ceva s design offers a more powerful DSP engine with dual 32x32-bit MACs and a five-way VLIW architecture, but designers looking for more-balanced RISC CPU and DSP operation will find the ARC cores to be a better choice. The Ceva design also has less TCM capability, increasing power consumption for memory transactions. And it lacks support for a shared L2 cache and SMP operation. Summary With its new HS4x design, Synopsys has enhanced its ARC lineup with a CPU family that delivers best-in-class area and performance efficiency for low-power embedded systems. By adding another instruction decoder and a second set of ALUs, the company efficiently increased utilization of the execution units to deliver 25% higher per-cycle performance. The new ARC HS4x dual-issue capability adds just 50K gates, but the slightly larger die area (0.25mm 2 for the HS48 versus 0.21mm 2 for the HS38) raises both area and power efficiency. For the fully configured ARC HS48, performance per square millimeter increases by 14% and performance per milliwatt increases by 29%. The MetaWare compiler automatically optimizes instruction execution to take advantage of the dualissue scheduler, so the HS4x is a drop-in replacement for its predecessor and is transparent to the programmer. The HS4xD cores extend the scalability of the ARC family by enabling designers to employ the DSP ISA throughout the lineup. The DSP-equipped CPUs provide upward compatibility for most ARC EM5D/7D/9D/11D software, but the HS-series 10-stage pipeline enables higher clock frequencies for up to a 2x DSP-performance boost. Synopsys eases adoption by supporting the ARC DSP cores with a software library that includes common signal-processing functions, such as audio/voice codecs, filters, FFTs, and matrix operations. Designers can configure the HS4x to run an RTOS such as ARC MQX, or they can add an MMU to run embedded Linux applications. Multicore options provide additional flexibility and scalability, allowing them to configure each HS core in a dual or quad cluster to optimize performance, power, and area. Designers can further customize their cores using Apex to implement user-defined functions. The ARC HS4x and HS4xD preserve and extend the compactness, configurability, extensibility, and low power of the ARC processor architecture. The new features and performance boost will appeal to embedded-processor designers looking for high-end performance in an exceptionally efficient CPU core. Mike Demler is a senior analyst at The Linley Group and a senior editor of Microprocessor Report. The Linley Group offers the most comprehensive analysis of the mobile semiconductor industry. We analyze not only the business strategy but also the internal technology. Our indepth reports also cover topics including embedded processors, network processors, base-station processors, and Ethernet chips. For more information, see our web site at The Linley Group, Inc

Chapter 5. Introduction ARM Cortex series

Chapter 5. Introduction ARM Cortex series Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1

More information

Contents of this presentation: Some words about the ARM company

Contents of this presentation: Some words about the ARM company The architecture of the ARM cores Contents of this presentation: Some words about the ARM company The ARM's Core Families and their benefits Explanation of the ARM architecture Architecture details, features

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture

EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2014 Agenda

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 3 September 2015 Announcements HW#1 will be posted today, due next Thursday. I will send out

More information

Low-Power Processor Solutions for Always-on Devices

Low-Power Processor Solutions for Always-on Devices Low-Power Processor Solutions for Always-on Devices Pieter van der Wolf MPSoC 2014 July 7 11, 2014 2014 Synopsys, Inc. All rights reserved. 1 Always-on Mobile Devices Mobile devices on the move Mobile

More information

Each Milliwatt Matters

Each Milliwatt Matters Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets

More information

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7 ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7 Scherer Balázs Budapest University of Technology and Economics Department of Measurement and Information Systems BME-MIT 2018 Trends of 32-bit microcontrollers

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

Universität Dortmund. ARM Architecture

Universität Dortmund. ARM Architecture ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture

More information

ECE 471 Embedded Systems Lecture 3

ECE 471 Embedded Systems Lecture 3 ECE 471 Embedded Systems Lecture 3 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 10 September 2018 Announcements New classroom: Stevens 365 HW#1 was posted, due Friday Reminder:

More information

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola 1. Microprocessor Architectures 1.1 Intel 1.2 Motorola 1.1 Intel The Early Intel Microprocessors The first microprocessor to appear in the market was the Intel 4004, a 4-bit data bus device. This device

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 4 September 2014 Announcements HW#1 will be posted tomorrow (Friday), due next Thursday Working

More information

ELC4438: Embedded System Design ARM Embedded Processor

ELC4438: Embedded System Design ARM Embedded Processor ELC4438: Embedded System Design ARM Embedded Processor Liang Dong Electrical and Computer Engineering Baylor University Intro to ARM Embedded Processor (UK 1990) Advanced RISC Machines (ARM) Holding Produce

More information

The ARM Cortex-A9 Processors

The ARM Cortex-A9 Processors The ARM Cortex-A9 Processors This whitepaper describes the details of the latest high performance processor design within the common ARM Cortex applications profile ARM Cortex-A9 MPCore processor: A multicore

More information

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Joseph Yiu, ARM Ian Johnson, ARM January 2013 Abstract: While the majority of Cortex -M processor-based microcontrollers are

More information

Growth outside Cell Phone Applications

Growth outside Cell Phone Applications ARM Introduction Growth outside Cell Phone Applications ~1B units shipped into non-mobile applications Embedded segment now accounts for 13% of ARM shipments Automotive, microcontroller and smartcards

More information

NXP Unveils Its First ARM Cortex -M4 Based Controller Family

NXP Unveils Its First ARM Cortex -M4 Based Controller Family NXP s LPC4300 MCU with Coprocessor: NXP Unveils Its First ARM Cortex -M4 Based Controller Family By Frank Riemenschneider, Editor, Electronik Magazine At the Electronica trade show last fall in Munich,

More information

SoC Platforms and CPU Cores

SoC Platforms and CPU Cores SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Introduction CHAPTER IN THIS CHAPTER

Introduction CHAPTER IN THIS CHAPTER CHAPTER Introduction 1 IN THIS CHAPTER What Is the ARM Cortex-M3 Processor?... 1 Background of ARM and ARM Architecture... 2 Instruction Set Development... 7 The Thumb-2 Technology and Instruction Set

More information

Chapter 15 ARM Architecture, Programming and Development Tools

Chapter 15 ARM Architecture, Programming and Development Tools Chapter 15 ARM Architecture, Programming and Development Tools Lesson 07 ARM Cortex CPU and Microcontrollers 2 Microcontroller CORTEX M3 Core 32-bit RALU, single cycle MUL, 2-12 divide, ETM interface,

More information

Amber Baruffa Vincent Varouh

Amber Baruffa Vincent Varouh Amber Baruffa Vincent Varouh Advanced RISC Machine 1979 Acorn Computers Created 1985 first RISC processor (ARM1) 25,000 transistors 32-bit instruction set 16 general purpose registers Load/Store Multiple

More information

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers ARM Architecture ARM Ltd! Founded in November 1990! Spun out of Acorn Computers! Designs the ARM range of RISC processor cores! Licenses ARM core designs to semiconductor partners who fabricate and sell

More information

KeyStone II. CorePac Overview

KeyStone II. CorePac Overview KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB

More information

PowerPC 740 and 750

PowerPC 740 and 750 368 floating-point registers. A reorder buffer with 16 elements is used as well to support speculative execution. The register file has 12 ports. Although instructions can be executed out-of-order, in-order

More information

Hercules ARM Cortex -R4 System Architecture. Processor Overview

Hercules ARM Cortex -R4 System Architecture. Processor Overview Hercules ARM Cortex -R4 System Architecture Processor Overview What is Hercules? TI s 32-bit ARM Cortex -R4/R5 MCU family for Industrial, Automotive, and Transportation Safety Hardware Safety Features

More information

The Next Steps in the Evolution of ARM Cortex-M

The Next Steps in the Evolution of ARM Cortex-M The Next Steps in the Evolution of ARM Cortex-M Joseph Yiu Senior Embedded Technology Manager CPU Group ARM Tech Symposia China 2015 November 2015 Trust & Device Integrity from Sensor to Server 2 ARM 2015

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 22

ECE 571 Advanced Microprocessor-Based Design Lecture 22 ECE 571 Advanced Microprocessor-Based Design Lecture 22 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 April 2018 HW#11 will be posted Announcements 1 Reading 1 Exploring DynamIQ

More information

report T h e i n s i d e r s g u i d e t o m i c r o p r o c e s s o r h a r d w a r e

report   T h e i n s i d e r s g u i d e t o m i c r o p r o c e s s o r h a r d w a r e M I C R O P R O C E S S O R www.mpronline.com report T h e i n s i d e r s g u i d e t o m i c r o p r o c e s s o r h a r d w a r e Xtensa LX3 and Xtensa 8 Cores Boost Performance, Cut Power By Tom R.

More information

Introduction to Embedded System Processor Architectures

Introduction to Embedded System Processor Architectures Introduction to Embedded System Processor Architectures Contents crafted by Professor Jari Nurmi Tampere University of Technology Department of Computer Systems Motivation Why Processor Design? Embedded

More information

Building blocks for 64-bit Systems Development of System IP in ARM

Building blocks for 64-bit Systems Development of System IP in ARM Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects

More information

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU New STM32 F7 Series World s 1 st to market, ARM Cortex -M7 based 32-bit MCU 7 Keys of STM32 F7 series 2 1 2 3 4 5 6 7 First. ST is first to sample a fully functional Cortex-M7 based 32-bit MCU : STM32

More information

The ARM10 Family of Advanced Microprocessor Cores

The ARM10 Family of Advanced Microprocessor Cores The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems Designing, developing, debugging ARM and heterogeneous multi-processor systems Kinjal Dave Senior Product Manager, ARM ARM Tech Symposia India December 7 th 2016 Topics Introduction System design Software

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Roadmap Directions for the RISC-V Architecture

Roadmap Directions for the RISC-V Architecture Roadmap Directions for the RISC-V Architecture Andes RISC-V Con November 13, 2018 Linley Gwennap, Principal Analyst About Linley Gwennap Founder, principal analyst, The Linley Group Leading vendor of technical

More information

Multicore and MIPS: Creating the next generation of SoCs. Jim Whittaker EVP MIPS Business Unit

Multicore and MIPS: Creating the next generation of SoCs. Jim Whittaker EVP MIPS Business Unit Multicore and MIPS: Creating the next generation of SoCs Jim Whittaker EVP MIPS Business Unit www.imgtec.com Many new opportunities Wearables Home wireless for everything Automation & Robotics ADAS and

More information

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

Introduction to the ARM Architecture. or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref.

Introduction to the ARM Architecture. or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Introduction to the ARM Architecture or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Manual Glance into the past Initial ARM Processor developed by Acorn Computers,

More information

Introducing the Latest SiFive RISC-V Core IP Series

Introducing the Latest SiFive RISC-V Core IP Series Introducing the Latest SiFive RISC-V Core IP Series Drew Barbier DAC, June 2018 1 SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance

More information

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block

More information

ARM ARCHITECTURE. Contents at a glance:

ARM ARCHITECTURE. Contents at a glance: UNIT-III ARM ARCHITECTURE Contents at a glance: RISC Design Philosophy ARM Design Philosophy Registers Current Program Status Register(CPSR) Instruction Pipeline Interrupts and Vector Table Architecture

More information

ARM instruction sets and CPUs for wide-ranging applications

ARM instruction sets and CPUs for wide-ranging applications ARM instruction sets and CPUs for wide-ranging applications Chris Turner Director, CPU technology marketing ARM Tech Forum Taipei July 4 th 2017 ARM computing is everywhere #1 shipping GPU in the world

More information

Digital Signal Processor Core Technology

Digital Signal Processor Core Technology The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x

More information

Jack Kang ( 剛至堅 ) VP Product June 2018

Jack Kang ( 剛至堅 ) VP Product June 2018 Jack Kang ( 剛至堅 ) VP Product June 2018 SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance 64-bit Application Cores High Performance

More information

Separating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance

Separating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance Separating Reality from Hype in Processors' DSP Performance Berkeley Design Technology, Inc. +1 (51) 665-16 info@bdti.com Copyright 21 Berkeley Design Technology, Inc. 1 Evaluating DSP Performance! Essential

More information

CEVA-X1 Lightweight Multi-Purpose Processor for IoT

CEVA-X1 Lightweight Multi-Purpose Processor for IoT CEVA-X1 Lightweight Multi-Purpose Processor for IoT 1 Cellular IoT for The Massive Internet of Things Narrowband LTE Technologies Days Battery Life Years LTE-Advanced LTE Cat-1 Cat-M1 Cat-NB1 >10Mbps Up

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Supercharging the Embedded Device: ARM Cortex -M7. Ian Johnson Senior Product Manager, ARM

Supercharging the Embedded Device: ARM Cortex -M7. Ian Johnson Senior Product Manager, ARM Supercharging the Embedded Device: ARM Cortex -M7 Ian Johnson Senior Product Manager, ARM 1 ARM Cortex Processors across the Embedded Market Cortex -M processors Cortex -R processors Cortex -A processors

More information

This Material Was All Drawn From Intel Documents

This Material Was All Drawn From Intel Documents This Material Was All Drawn From Intel Documents A ROAD MAP OF INTEL MICROPROCESSORS Hao Sun February 2001 Abstract The exponential growth of both the power and breadth of usage of the computer has made

More information

ARM Cortex -M7: Bringing High Performance to the Cortex-M Processor Series. Ian Johnson Senior Product Manager, ARM

ARM Cortex -M7: Bringing High Performance to the Cortex-M Processor Series. Ian Johnson Senior Product Manager, ARM ARM Cortex -M7: Bringing High Performance to the Cortex-M Processor Series Ian Johnson Senior Product Manager, ARM 1 ARM Cortex Processors across the Embedded Market Cortex -M processors Cortex -R processors

More information

systems such as Linux (real time application interface Linux included). The unified 32-

systems such as Linux (real time application interface Linux included). The unified 32- 1.0 INTRODUCTION The TC1130 is a highly integrated controller combining a Memory Management Unit (MMU) and a Floating Point Unit (FPU) on one chip. Thanks to the MMU, this member of the 32-bit TriCoreTM

More information

RISC-V Core IP Products

RISC-V Core IP Products RISC-V Core IP Products An Introduction to SiFive RISC-V Core IP Drew Barbier September 2017 drew@sifive.com SiFive RISC-V Core IP Products This presentation is targeted at embedded designers who want

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.

More information

New ARMv8-R technology for real-time control in safetyrelated

New ARMv8-R technology for real-time control in safetyrelated New ARMv8-R technology for real-time control in safetyrelated applications James Scobie Product manager ARM Technical Symposium China: Automotive, Industrial & Functional Safety October 31 st 2016 November

More information

15CS44: MICROPROCESSORS AND MICROCONTROLLERS. QUESTION BANK with SOLUTIONS MODULE-4

15CS44: MICROPROCESSORS AND MICROCONTROLLERS. QUESTION BANK with SOLUTIONS MODULE-4 15CS44: MICROPROCESSORS AND MICROCONTROLLERS QUESTION BANK with SOLUTIONS MODULE-4 1) Differentiate CISC and RISC architectures. 2) Explain the important design rules of RISC philosophy. The RISC philosophy

More information

Microprocessors vs. DSPs (ESC-223)

Microprocessors vs. DSPs (ESC-223) Insight, Analysis, and Advice on Signal Processing Technology Microprocessors vs. DSPs (ESC-223) Kenton Williston Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com

More information

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU

More information

Hello and welcome to this Renesas Interactive module that provides an architectural overview of the RX Core.

Hello and welcome to this Renesas Interactive module that provides an architectural overview of the RX Core. Hello and welcome to this Renesas Interactive module that provides an architectural overview of the RX Core. 1 The purpose of this Renesas Interactive module is to introduce the RX architecture and key

More information

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 GBI0001@AUBURN.EDU ELEC 6200-001: Computer Architecture and Design Silicon Technology Moore s law Moore's Law describes a long-term trend in the history

More information

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Lecture 26: Parallel Processing. Spring 2018 Jason Tang Lecture 26: Parallel Processing Spring 2018 Jason Tang 1 Topics Static multiple issue pipelines Dynamic multiple issue pipelines Hardware multithreading 2 Taxonomy of Parallel Architectures Flynn categories:

More information

Systemy RT i embedded Wykład 5 Mikrokontrolery 32-bitowe AVR32, ARM. Wrocław 2013

Systemy RT i embedded Wykład 5 Mikrokontrolery 32-bitowe AVR32, ARM. Wrocław 2013 Systemy RT i embedded Wykład 5 Mikrokontrolery 32-bitowe AVR32, ARM Wrocław 2013 Plan Power consumption of 8- and 16 bits - comparison AVR32 family AVR32UC AVR32AP SDRAM access ARM cores introduction History

More information

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of An Independent Analysis of the: MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of Berkeley Design Technology, Inc. OVERVIEW MIPS Technologies, Inc. is an Intellectual Property (IP)

More information

Embedded Systems: Architecture

Embedded Systems: Architecture Embedded Systems: Architecture Jinkyu Jeong (Jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ICE3028: Embedded Systems Design, Fall 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will

More information

ARMv8-A Software Development

ARMv8-A Software Development ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for

More information

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013 A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

Design and Implementation of a Super Scalar DLX based Microprocessor

Design and Implementation of a Super Scalar DLX based Microprocessor Design and Implementation of a Super Scalar DLX based Microprocessor 2 DLX Architecture As mentioned above, the Kishon is based on the original DLX as studies in (Hennessy & Patterson, 1996). By: Amnon

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines Assigned April 7 Problem Set #5 Due April 21 http://inst.eecs.berkeley.edu/~cs152/sp09 The problem sets are intended

More information

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the leading provider of RISC-V processor IP Codasip Bk: A portfolio of RISC-V processors Uniquely

More information

Power 7. Dan Christiani Kyle Wieschowski

Power 7. Dan Christiani Kyle Wieschowski Power 7 Dan Christiani Kyle Wieschowski History 1980-2000 1980 RISC Prototype 1990 POWER1 (Performance Optimization With Enhanced RISC) (1 um) 1993 IBM launches 66MHz POWER2 (.35 um) 1997 POWER2 Super

More information

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Fundamentals of Computer Design

Fundamentals of Computer Design CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining

More information

Ten Reasons to Optimize a Processor

Ten Reasons to Optimize a Processor By Neil Robinson SoC designs today require application-specific logic that meets exacting design requirements, yet is flexible enough to adjust to evolving industry standards. Optimizing your processor

More information

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. ARM CORTEX-R52 Course Family: ARMv8-R Cortex-R CPU Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. Duration: 4 days Prerequisites and related

More information

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing

More information

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces Zvonimir Z. Bandic, Sr. Director Robert Golla, Sr. Fellow Dejan Vucinic,

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable. Reproducibility. Don t depend on components

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

An introduction to Digital Signal Processors (DSP) Using the C55xx family

An introduction to Digital Signal Processors (DSP) Using the C55xx family An introduction to Digital Signal Processors (DSP) Using the C55xx family Group status (~2 minutes each) 5 groups stand up What processor(s) you are using Wireless? If so, what technologies/chips are you

More information

Cortex A8 Processor. Richard Grisenthwaite ARM Ltd

Cortex A8 Processor. Richard Grisenthwaite ARM Ltd Cortex A8 Processor Richard Grisenthwaite ARM Ltd 1 Evolution of the ARM Architecture Original ARM architecture: 32 bit RISC architecture 16 Registers 1 being the Program counter Conditional execution

More information

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part1

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part1 ARM Cortex-A9 ARM v7-a A programmer s perspective Part1 ARM: Advanced RISC Machine First appeared in 1985 as Acorn RISC Machine from Acorn Computers in Manchester England Limited success outcompeted by

More information

Designing with NXP i.mx8m SoC

Designing with NXP i.mx8m SoC Designing with NXP i.mx8m SoC Course Description Designing with NXP i.mx8m SoC is a 3 days deep dive training to the latest NXP application processor family. The first part of the course starts by overviewing

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Ninth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

RM3 - Cortex-M4 / Cortex-M4F implementation

RM3 - Cortex-M4 / Cortex-M4F implementation Formation Cortex-M4 / Cortex-M4F implementation: This course covers both Cortex-M4 and Cortex-M4F (with FPU) ARM core - Processeurs ARM: ARM Cores RM3 - Cortex-M4 / Cortex-M4F implementation This course

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information