Reduce FPGA Power With Automatic Optimization & Power-Efficient Design. Vaughn Betz & Sanjay Rajput

Similar documents
Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Low Power Design Techniques

Stratix II vs. Virtex-4 Power Comparison & Estimation Accuracy

Power Optimization in FPGA Designs

13. Power Optimization

8. Migrating Stratix II Device Resources to HardCopy II Devices

White Paper Performing Equivalent Timing Analysis Between Altera Classic Timing Analyzer and Xilinx Trace

Quartus II Prime Foundation

PowerPlay Early Power Estimator User Guide

FPGA Power Management and Modeling Techniques

Verilog for High Performance

Design Guidelines for Optimal Results in High-Density FPGAs

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

FPGA for Software Engineers

PowerPlay Early Power Estimator User Guide for Cyclone III FPGAs

Early Models in Silicon with SystemC synthesis

4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013)

Stratix II FPGA Family

4. TriMatrix Embedded Memory Blocks in HardCopy IV Devices

ALTERA FPGAs Architecture & Design

Stratix vs. Virtex-II Pro FPGA Performance Analysis

FPGAs: FAST TRACK TO DSP

Field Programmable Gate Array (FPGA) Devices

Stratix II vs. Virtex-4 Performance Comparison

Tutorial for Altera DE1 and Quartus II

Intel Quartus Prime Pro Edition User Guide

altshift_taps Megafunction User Guide

Section IV. In-System Design Debugging

VHDL for Synthesis. Course Description. Course Duration. Goals

Best Practices for Incremental Compilation Partitions and Floorplan Assignments

FIR Compiler II User Guide

13. Recommended HDL Coding Styles

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

ALTERA FPGA Design Using Verilog

2. TriMatrix Embedded Memory Blocks in Stratix II and Stratix II GX Devices

altmult_accum Megafunction User Guide

Advanced ALTERA FPGA Design

Cyclone II FPGA Family

5. Quartus II Design Separation Flow

Laboratory Exercise 8


Section III. Transport and Communication

System Debugging Tools Overview

Design Once with Design Compiler FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

CHAPTER-IV IMPLEMENTATION AND ANALYSIS OF FPGA-BASED DESIGN OF 32-BIT FPAU

Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs

Automatic Peak Detection System Power Analysis Using System on a Programmable Chip (SoPC) Methodology

13. LogicLock Design Methodology

Implementing Bus LVDS Interface in Cyclone III, Stratix III, and Stratix IV Devices

Embedded Memory Blocks in Arria V Devices

Qsys and IP Core Integration

Spiral 2-8. Cell Layout

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs

FIR Compiler MegaCore Function User Guide

Quartus II Software Device Support Release Notes

IDEA! Avnet SpeedWay Design Workshop

ISE Design Suite Software Manuals and Help

FPGAs: Instant Access

Compiler User Guide. Intel Quartus Prime Pro Edition. Updated for Intel Quartus Prime Design Suite: Subscribe Send Feedback

Altera Technical Training Quartus II Software Design

June 2003, ver. 1.2 Application Note 198

Digital Integrated Circuits

ECE Digital Engineering Laboratory. Designing for Synthesis

Synthesis Options FPGA and ASIC Technology Comparison - 1

Lab 3 Sequential Logic for Synthesis. FPGA Design Flow.

Implementing Multiple Memory Interfaces Using the ALTMEMPHY Megafunction

Quartus II Software Version 10.0 Device Support Release Notes

RAM-Based Shift Register (ALTSHIFT_TAPS) IP Core User Guide

Example Best and Median Results

INTRODUCTION TO CATAPULT C

Research Challenges for FPGAs

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Stratix FPGA Family. Table 1 shows these issues and which Stratix devices each issue affects. Table 1. Stratix Family Issues (Part 1 of 2)

5. Clock Networks and PLLs in Stratix IV Devices

lpm_rom Megafunction User Guide

ALTERA M9K EMBEDDED MEMORY BLOCKS

Memory-Based Multiplier (ALTMEMMULT) Megafunction User Guide

Readings: Storage unit. Can hold an n-bit value Composed of a group of n flip-flops. Each flip-flop stores 1 bit of information.

Early Power Estimator for Intel Stratix 10 FPGAs User Guide

Power-aware RAM Mapping for FPGA Embedded Memory Blocks

20. Mentor Graphics LeonardoSpectrum Support

Field Programmable Gate Array

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

EN2911X: Reconfigurable Computing Topic 02: Hardware Definition Languages

LOW POWER DESIGN IMPLEMENTATION OF A SIGNAL ACQUISITION MODULE RAVI BHUSHAN THAKUR. B.Tech, Jawaharlal Nehru Technological University, INDIA 2007

Design Tools for 100,000 Gate Programmable Logic Devices

Intel Quartus Prime Standard Edition User Guide

Design Guidelines for Using DSP Blocks

DDR and DDR2 SDRAM Controller Compiler User Guide

Lab #1. Topics. 3. Introduction to Verilog 2/8/ Programmable logic. 2. Design Flow. 3. Verilog --- A Hardware Description Language

18. Synopsys Formality Support

Unit 2: High-Level Synthesis

FPGAs & Multi-FPGA Systems. FPGA Abstract Model. Logic cells imbedded in a general routing structure. Logic cells usually contain:

Chapter 2. Cyclone II Architecture

Intel HLS Compiler: Fast Design, Coding, and Hardware

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

Stratix III FPGAs. Bring your ideas to life.

Lab 2: Modifying LegUp to Limit the Number of Hardware Functional Units

Transcription:

Reduce FPGA Power With Automatic Optimization & Power-Efficient Design Vaughn Betz & Sanjay Rajput

Previous Power Net Seminar Silicon vs. Software Comparison 100% 80% 60% 40% 20% 0% 20% -40% Percent Error -60% -80% Underestimation Overestimation 3% 3% 20% 14% 13% 20% 20% 26% 7% 34% 28% 25% 5% 14% 130% 187% -100% Quartus II PowerPlay ISE XPower N/A (1) N/A (1) -100% -5% -86% -1% -86% -86% -5% -84% -2% -84% -12% -82% -6% -75% -63% -55% -42% -19% -3% -17% -14% -6% Quartus II PowerPlay: The Most Accurate Power Estimation Tool (1) The Xilinx ISE XPower Tool Crashes

Accurate Models Lead to Intelligent Decisions Intelligent decisions on power & thermal management Accurate power models are critical For FPGA software to make intelligent power optimization decisions Accurate power models are critical

Questions We ll Answer Today How do I estimate my design s power? How does power optimization work? How effective is Quartus II PowerPlay in reducing power? What are design techniques for reducing power? How can I ensure I m using the best CAD settings for power?

Power Analysis Overview Getting the Best Accuracy

Power Estimation Early Power Estimate Post Place & Route Power Estimates Actual Power (Too Late!) Accurate power estimates & analysis allow Time to optimize system FPGA CAD software to optimize design power Design decisions that reduce power

Early Power Estimator

Early Power Estimator Accuracy Dynamic power accuracy limited by Lack of place & route data User entry: Resources, toggle rates, enable rates EPE/Quartus dynamic power 1.40 1.20 1.00 0.80 0.60 0.40 0.20 0.00 +/-20% Error vs. Power Analyzer With Perfect User Data Entry Stratix II design

Quartus II Power Analyzer More accurate than Early Power Estimator Has precise resource counts Has power models for each RAM/ALM/DSP mode Considers exact routing used Best accuracy: Simulate for toggle rates Reduced accuracy: Vectorless analysis for toggle rates

Example: Power vs. RAM Mode 10 Dynamic power (mw/mhz) 9 8 7 6 5 4 3 2 1 EP2S60 Measurement Quartus II Estimate 0 M4K RAM M512 RAM MRAM RAM configuration 8700+ Designs Covering All FPGA Elements

Quartus II Automatic Power Optimization Settings, Algorithms and Results

Power Optimization Overview Timing-Driven Compiler Power-Driven Compiler Timing Critical? No Min Area Yes Min Delay Timing Critical? No Power Critical? No Yes Yes Min Delay Min Power Min Area

PowerPlay Power Optimization Power-driven synthesis & fitting Normal effort (default) Does not increase compile time No reduction in design performance No need for toggle rate data from simulation Extra effort Larger power reduction May increase compile time All timing optimization enabled, but performance loss possible Best with toggle rate data from simulation

PowerPlay Power Optimization Design Automatic, but less accurate Vectorless Estimation Estimate Toggle Rates RTL Simulation Requires testbench, more accurate Normal or Extra effort Power-Driven Synthesis Power-Driven Fit Gate-Level Simulation + Power Analyzer Evaluate Power Hardware Measurement Power Report

Three Components of Power 22% 11% 67% 99 Customer Designs, Stratix II Devices Core Dynamic Core Static I/O Dynamic Power Dominant Focus of Power Optimization

Core Dynamic Power 99 Customer Designs, Stratix II Devices ALM Registers 18% ALM Combinational 19% RAM Blocks 14% Routing 38% Clock Networks 9% DSP Blocks 2%* *Digital Signal Processing (DSP) Block power: 5% of Dynamic power for designs that use DSP blocks

Power-Optimal RAM Mapping Power inefficient to use large physical RAM for small memory or vice versa Tri-matrix memory gives 3 physical RAM sizes Quartus II automatically picks best physical RAM Large RAM Medium RAM Small RAM MRAM M4K M4K M512

RAM Enable Optimization Shut RAM down when unused less power Convert read/write enable to clock enable Read Addr Read Data Read En RAM Core Write Addr Write Data Write En Read Clk Write Clk

Power-Optimized RAM Mapping 1K X 16 RAM Default Option Power Efficient Option 16 2:4 Decoder 16 4 1Kx4 M4K RAMs 4 256x16 M4K RAMs

RAM Power Reduction (Stratix II) Normal effort: 6% average savings Extra effort: 21% average savings 10% RAM dynamic power change 0% -10% -20% -30% -40% -50% -60% -70% -80% -90% Normal Effort Extra Effort Design

Power-Driven Place & Route Minimize capacitance of high-toggling signals Without violating timing constraints 20 Million Toggle/s 100 Million Toggle/s Power Optimize

Clock Shut Down Hardware Stratix II: Can shut down clock at 3 levels of tree Top-level: Shut down 1/8 of clock tree Next-level: 1/500 of clock tree Bottom-level: 1/5000 of clock tree Blue: Clock Required Only Red Parts of Clock Network Toggle

Placement to Reduce Clock Power Clocking Legal, Timing Optimized Power Optimize Group Clocks for Maximum Shutdown

AL14 Experimental Flow Source HDL 33 Designs: Customer & Internal Altera Quartus II 5.1 Synthesis & Fitting with PowerPlay Quartus II 5.0 SP1 Synthesis & Fitting Compare Dynamic Power

Slide 24 AL14 I don't see any results. Is it slide 25 and 26? Amy Lee, 02/12/2005

Stratix II Results: Normal Effort 0% Logic Intensive DSP Intensive RAM Intensive Balanced Dynamic Power Reduction vs. Quartus 5.0-10% -20% -30% -40% -50% -60%

Stratix II Results: Extra Effort 0% Logic Intensive DSP Intensive RAM Intensive Balanced Dynamic Power Reduction vs. Quartus 5.0-10% -20% -30% -40% -50% -60%

Stratix II Results Summary Power Optimization Flow Dynamic Power vs. QII 5.0 F max Change Compile Time Change Effort: Normal -15% -- -- Effort: Extra Toggle rates: Simulation Effort: Extra Toggle rates: Vectorless Effort: Extra Toggle rates: Simulation Easy timing constraint -21% -2% +9% -18% -2% +9% -25% -- +100%

Designing for Lower Power Six Tips for Reducing Core Power

1. Use Hard IP Blocks Synthesis tool inferring all the hard blocks it should? Check with floor-plan editor or HDL viewer DSP Blocks Less power than logic elements except for small multiplies (e.g. 5x5) Use all the DSP logic (not just multipliers): Multiplier-accumulator, complex-multiplier, finite impulse response sample chaining, etc. Use altmult_accum MegaFunction if synthesis not inferring

1. Use Hard IP Blocks (Cont.) RAM blocks Usually inferred by synthesis Use altsyncram MegaFunction if necessary Shift registers Many toggling signals: Power inefficient Medium to large shift registers: Implement in FIFOs Use altshift_taps MegaFunction if necessary

2. Synthesize for Area Less logic usually means less power Fewer signals to toggle If sufficient timing margin, tell synthesis tool to optimize for area

3. Help Shut Down RAM Blocks Specify read-enable & write-enable signals on your RAMs whenever possible PowerPlay will convert to clock enables Completely shuts down RAM on many cycles Leave RAM Block Type = Auto Power optimizer will choose best RAM block

4. Use Clock Enables on Logic When clock enable is low 1. Register + downstream logic don t toggle 2. Clock inside a logic array block (LAB) is shut off Even when clock enable is not needed for functionality, it can save power

5. Minimize Glitching Some logic produces many transitions/cycle For example, CRC/parity, combinational multipliers Find Glitchy logic via power analyzer report Register inputs & outputs of Glitchy logic Insert pipeline registers if possible D Q Downstream Logic

6. Shut Down Idle Clocks Entire clock domain unused in some cycles Use the altclkctrl MegaFunction to safely gate the clock Shuts down entire clock tree lower power than a clock enable on all registers

Power Optimization Advisor & Design Space Explorer Your FAEs in Quartus

Power Optimization Advisor Explains power analysis best practices Power optimization suggestions In priority order Highlights recommended settings not enabled on your design Button to correct settings Located in Tools Menu

Power Optimization Advisor

Design Space Explorer Searches Quartus options to find best implementation Search for Lowest Power Finds settings that minimize power & meet timing quartus_sh --dse or Tools Menu

Summary & Wrap Up

Altera s PowerPlay Suite Most accurate power analysis tools Automatic power optimization Industry first Highly effective 20% to 25% dynamic power reduction on average Power Optimization Advisor guides you to lower power Better Tools: Lower Power & No Surprises

Stratix II vs. Virtex-4 Power Dynamic Power: Stratix II vs. Virtex-4 20% 10% 0% -10% -20% -30% -40% -50% -60% -70% Logic intensive DSP intensive RAM intensive Balanced

Power & Altera Hardware Stratix II Lowest dynamic power & lowest total power for 90nm high-density FPGAs Cyclone II Lowest power of any FPGA HardCopy II Smooth migration to lower power Typical: 2X to 3X reduction vs. FPGA

Resources & More Information Quartus II Handbook Volume 2, Chapter 9. - Power Optimization www.altera.com/quartus2 Stratix II power page www.altera.com/stratix2power Cyclone II power page www.altera.com/cyclone2power Power management resource center www.altera.com/power

Thank You For More Information Visit www.altera.com