Design Techniques for Implementing an 800MHz ARM v5 Core for Foundry-Based SoC Integration. Faraday Technology Corp.

Similar documents
ARM Processors for Embedded Applications

Fujitsu SOC Fujitsu Microelectronics America, Inc.

SoC Memory Interfaces. Today and tomorrow at TSMC 2013 TSMC, Ltd

Growth outside Cell Phone Applications

Contents of this presentation: Some words about the ARM company

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Copyright 2016 Xilinx

Zynq-7000 All Programmable SoC Product Overview

Effective System Design with ARM System IP

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.

Introduction to ASIC Design

TABLE OF CONTENTS III. Section 1. Executive Summary

Agilent N2533A RMP 4.0 Remote Management Processor Data Sheet

Bus AMBA. Advanced Microcontroller Bus Architecture (AMBA)

Technology Platform Segmentation

New System Solutions for Laser Printer Applications by Oreste Emanuele Zagano STMicroelectronics

Adaptive Voltage Scaling (AVS) Alex Vainberg October 13, 2010

ECE 471 Embedded Systems Lecture 2

Oberon M2M IoT Platform. JAN 2016

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02

The CoreConnect Bus Architecture

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

CMP Conference 20 th January Director of Business Development EMEA

SoC Platforms and CPU Cores

Design and Technology Trends

UCLA 3D research started in 2002 under DARPA with CFDRC

SmartFusion2 SoC FPGA Demo: Code Shadowing from SPI Flash to SDR Memory User s Guide

Chapter 6 Storage and Other I/O Topics

ECE 471 Embedded Systems Lecture 3

ECE 471 Embedded Systems Lecture 2

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

Product Series SoC Solutions Product Series 2016

Interconnects, Memory, GPIO

OCB-Based SoC Integration

Each Milliwatt Matters

Embedded Systems: Architecture

S2C K7 Prodigy Logic Module Series

ARMed for Automotive. Table of Contents. SHARP and ARM Automotive Segments SHARP Target Applications SHARP Devices SHARP Support Network Summary

Zatara Series ARM ASSP High-Performance 32-bit Solution for Secure Transactions

Smallest RISC-V Device for Next-Generation Edge Computing

Kevin Meehan Stephen Moskal Computer Architecture Winter 2012 Dr. Shaaban

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Building blocks for 64-bit Systems Development of System IP in ARM

3D Graphics in Future Mobile Devices. Steve Steele, ARM

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

Place Your Logo Here. K. Charles Janac

Product Technical Brief S3C2413 Rev 2.2, Apr. 2006

Tackling Verification Challenges with Interconnect Validation Tool

Embedded Systems: Hardware Components (part I) Todor Stefanov

More Course Information

SiFive Freedom SoCs: Industry s First Open-Source RISC-V Chips

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Design of NaviEngine - a SoC with MPCore - Oct. 2 nd, 2007 NEC Electronics Corp. Automotive System Div.

easic Technology & Nextreme Architecture

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006

Agile Hardware Design: Building Chips with Small Teams

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

Designing with NXP i.mx8m SoC

PXI Tsunami in Semiconductor ATE Michael Dewey Geotest Marvin Test Systems Silicon Valley Test Conference

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Flexible Product Demonstrations enabled with the FleX IC Development Kit

ASYNC Rik van de Wiel COO Handshake Solutions

Design of Embedded DSP Processors Unit 5: Data access. 9/11/2017 Unit 5 of TSEA H1 1

Moore s Law: Alive and Well. Mark Bohr Intel Senior Fellow

A Methodology for NoC

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

ARM s IP and OSCI TLM 2.0

SPEAr: an HW/SW reconfigurable multi processor architecture

RISC-V Core IP Products

ASPEED Technology (Ticker:5274)

Lecture 5: Computing Platforms. Asbjørn Djupdal ARM Norway, IDI NTNU 2013 TDT

Cirrus is a leader in audio, video, and precision mixed-signal ICs for consumer entertainment, automotive, and industrial applications

Proven 8051 Microcontroller Technology, Brilliantly Updated

CONTACT: ,

Introduction to ASIC Design. Victor P. Nelson

ASIC Logic. Speaker: Juin-Nan Liu. Adopted from National Chiao-Tung University IP Core Design

Introduction to USB3.0

Design Services Overview

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform

Park Sung Chul. AE MentorGraphics Korea

Module Introduction. CONTENT: - 8 pages - 1 question. LEARNING TIME: - 15 minutes

IBM "Broadway" 512Mb GDDR3 Qimonda

The Challenges of System Design. Raising Performance and Reducing Power Consumption

Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs. August 2015

Design Objectives of the 0.35µm Alpha Microprocessor (A 500MHz Quad Issue RISC Microprocessor)

Does FPGA-based prototyping really have to be this difficult?

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

100M Gate Designs in FPGAs

RISECREEK: From RISC-V Spec to 22FFL Silicon

Current status of SOI / MPU and ASIC development for space

Introduction to ARM LPC2148 Microcontroller

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

Accelerating Innovation

ARM996HS. The First Licensable, Clockless 32-bit Processor Core

Transcription:

Design Techniques for Implementing an 800MHz ARM v5 Core for Foundry-Based SoC Integration Faraday Technology Corp.

Table of Contents 1 2 3 4 Faraday & FA626TE Overview Why We Need an 800MHz ARM v5 Core Why CPU Optimization is Different Design Techniques 5 System Level Considerations 6 Summary 2

Faraday: At a Glance Fabless ASIC and IP supplier Spun off from UMC in 1993 Strategic IP partnership with UMC UMC owns 20% of Faraday Focused fabless business model 2006 revenue: $171 million Listed in Taiwan Exchange since 1999 Today s Operation: 700 employees worldwide, 550+ in R&D HQ in Taiwan, branch offices in USA, Europe, Japan, and China 3

Faraday ARM s Valued Partner CPU Instruction Set Arch. Licensee ARMv4 and v5tej instruction set license Only few tier-one companies have such capability to license and develop ARM-Compliant cores. ARM SUDL Partner Single Usage ARM processor hard core sub-licensed ARM7TDMI, ARM922T, ARM946E Tool Distributor Exclusive tool distributor in Taiwan Tool-Kit (ADS, RVDS) Developing platform (Integrator..) ARM926EJ-S Core Licensee Well versed Synthesis skill, physical design and verification expertise w/ proprietary library. Smallest hard core & highest performance. 4

FA626TE Overview 32-bit RISC processor core with ARM v5te instruction set 8-stage pipeline, AXI / AHB interface 32KB I/32KB D caches ICE Interface Optional I/D scratch pads ARM tool chain support ICE Branch Target Buffer Coherence Hardware AXI / AHB I Cache ASIE I/F CPU Core I Scratch Pad D Scratch Pad D Cache Memory Management Unit Bus Interface Bus Interface Power Saving Write Buffer 5

Why We Need an 800MHz ARM v5 Core Application Integration Cost Applications need high performance : OLPC IP-STB Industrial PC High-Speed Network Infortainment Mobile Convergence Device SOHO NAS / Home NAS Thousands of softwares need to be computing : Algorithms and logic that are risky to build as hardwares Complex middlewares Sophisticated application softwares 800MHz CPU in UMC 90nm process : To reduce mask cost and design cycle contrast to 65nm High performance CPU in foundry provides ease-of-integration and cost saving 6

Integrate 800MHz CPU into PC SoC Faraday s PC SOC Plan FA626TE CPU 2D Engine DRAM DRAM DDRII Memory CTRL Display Engine DVI/LVDS Flash 1x SATA Security Engine Connectivity Video Engine Wireless LAN KB/Mouse/LPC 2x PCI Express x1 4 x USB2.0 WiMAX 3G/3.5G 7

Target Applications Low Cost PC/ Mobile Device (Web, Word/Excel Video, Education) Faraday SOC features -Integration -Low power -Low cost -WinCE/Linux Embedded (Thin Client Panel PC POS Digital Signage IPTV) 8

Why ARM Based SoC is Competitive Faraday SoC Major X86 Solution Trend 2007/E 2008,H2 CPU + MPU FC-2 2D + PC SoC Graphics NB + + SB SB Faraday ARM SoC X86 Solution Solution All-in-One Integrated Processor + SB CPU 800M~1GHz ~1GHz GPU 2D Integrated 2D/3D Integrated Video Engine Integrated Accelerator Only Power 2W 5W Price (CPU + CS) A Few Much 9 9

Why CPU Optimization is Different General embedded CPU optimization methods Micro Architecture Library & Memory Process Generation Micro-architecture Process Generation Library & Memory Pipeline Superscalar Multithread 64bit 65nm 45nm 32nm High speed cell library High speed memory 10

Why CPU Optimization is Different Faraday optimization methods Speed (MHz) 1,000 900 800 Circuits 700 600 500 400 300 200 100 Pipeline Optimization 0 6 5 4 3 2 1 Cycle Time (ns) 11

Faraday CPU Technology Technology Benefit Data-path TLB, reg file MHz, T2M Design Database Clock Gating Power De-skew MHz Over Driving MHz Circuit Implementation Testing Specific Cell Library Fast SRAM Sense-Amp DFF Power Island Structural Approach for Speed Binning MHz MHz MHz Power MHz 12

Register File Improvements In < 0.5um, RF is NOT difficult However, It creates congestions (32 x 32 DFF, muxes, multiple ports) Its performance determines forwarding path time budget Therefore, automated RF makes P&R easy, and eliminates routing congestion Improvements: 200 ps 13

Next Obvious Improvement is TLB FA626TE TLB needs fullassociativity But 64 40-bit comparators is HUGE So we use the same methodology Result has same benefit as RG ~ 300 ps gain, instead of loss 14

Custom Design Process De-skew Step 1 I. Place the CPU block, figure out rough size Step 2 II. Rip out all cells and place clock trunk first Step 3 III. Place ALL DFF then, and make some rough grouping to ZERO out clock uncertainty. Step 4 IV. Place the rest of the cells 15

How to Do Circuit Implementation Higher Driving Capability Logic 1.2V Over Driving Enlarge cell to 11.5 grid Library Custom One-Hot Mux Memory 1.2V Over Driving Sense Amplifier based DFF 16

Structural Approach for Speed Binning Speed binning with delay test Advantages: Structural approach with Automated pattern generation. Issues need to be overcome: Decide critical path and correlate to true silicon longest paths Test condition different from actual system mode operation Good path selection to pick a set of representative paths is key to speed binning Need to do correlation between achieved speed with structural test patterns and system test 17

How to reach 1GHz FA626TE PMOS TT 900MHz 1.18% 1GHz 1.26% 1.36% FF 1.13% 800MHz SS NMOS Simulation result Speed binning at typical voltage result in 90nm for 18% - 36% performance Cost impact With three bins, there is an estimated additional 5% - 10% increase in die cost (to be confirmed) due to potential yield loss 18

System Diagram FA626TE processor Static controller L2 cache controller DDRII controller Others or application specific logic AMBA3 AXI Interconnect Bus AXI 2 AHB bridge AXI Interconnect DMA controller AMBA High Speed Bus - AHB AHB 2 APB bridge AHB controller Others or application specific logic AMBA2 Peripheral Bus - APB UART GPIO I2C/I2S SSI Others or application specific logic 19

System Level Considerations Need to consider the multiple in system clock 01.Clock Domain frequency design. Support synchronous mode 02. 02.Power Domain Core voltage (1.2V) is different from peripherals (1.0V) Need to consider gate delay when adding the level shift Need to add the regulator to support multi-voltage 03.Signal Domain Level shift to support signal conversion 20

Summary FA626TE Positioning Faraday Optimization Design Technology System Consideration 800MHz @ 90nm TLB Optimization For the Changeful Algorithms and Various S/W Target Applications are OLPC, ULPC, IPC, IP-STB and Design Data Path Circuit Implementation Testing Register File Optimization De-Skew Methodology Specific Cell Library and Memory Clock Domain Power Domain Signal Domain Network Speed Binning 21

Welcome to Visit Faraday Booth @303 Or Contact: Albert Chen albchen@faraday-tech.com 408-747-7535 22

Lucky Draw 23