Frequency and Voltage Scaling Design. Ruixing Yang

Similar documents
Low Power System-on-Chip Design Chapters 3-4

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

Adaptive Voltage Scaling (AVS) Alex Vainberg October 13, 2010

Benchmarking of Dynamic Power Management Solutions. Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007

FPGA Power Management and Modeling Techniques

DesignCon A Combined Hardware-Software Approach for Low-Power SoCs: Applying Adaptive Voltage Scaling and Intelligent Energy Management Software

Next-generation Power Aware CDC Verification What have we learned?

Cluster-based approach eases clock tree synthesis

Designing with Siliconix PC Card (PCMCIA) Power Interface Switches

Analog Devices Welcomes Hittite Microwave Corporation NO CONTENT ON THE ATTACHED DOCUMENT HAS CHANGED

Low-Power Technology for Image-Processing LSIs

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Frequency Generator for Pentium Based Systems

SDR Forum Technical Conference 2007

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology

FPGA Programming Technology

Source EE 4770 Lecture Transparency. Formatted 16:43, 30 April 1998 from lsli

Advanced ALTERA FPGA Design

ALTERA FPGAs Architecture & Design

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM

CSE502: Computer Architecture CSE 502: Computer Architecture

Leakage Mitigation Techniques in Smartphone SoCs

RADPAL POWER-ON-RESET VDD RAMP RATE CHARACTERIZATION REPORT 4/24/98

8D-3. Experiences of Low Power Design Implementation and Verification. Shi-Hao Chen. Jiing-Yuan Lin

Chronos Latency - Pole Position Performance

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

CS/EE 6810: Computer Architecture

Unleashing the Power of Embedded DRAM

Selecting PLLs for ASIC Applications Requires Tradeoffs

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Attack Your SoC Power Challenges with Virtual Prototyping

VHDL for Synthesis. Course Description. Course Duration. Goals

Pentium Processor Compatible Clock Synthesizer/Driver for ALI Aladdin Chipset

Verilog for High Performance

CSE502: Computer Architecture CSE 502: Computer Architecture

Design and Implementation of High Performance DDR3 SDRAM controller

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.

A Simple Model for Estimating Power Consumption of a Multicore Server System

CENG4480 Lecture 09: Memory 1

Advanced Multimedia Architecture Prof. Cristina Silvano June 2011 Amir Hossein ASHOURI

ESE370 Fall ESE370, Fall 2015 Project 2: Memory Design Wednesday, Nov. 11

Advanced Test Equipment Rentals ATEC (2832)

ESD Protection Circuits: Basics to nano-metric ASICs

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

IDEA! Avnet SpeedWay Design Workshop

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

New Advances in Micro-Processors and computer architectures

Powering FPGAs Using Digitally Controlled Point of Load Converters

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

Low Power Methodology Manual For System-on-Chip Design

L12: I/O Systems. Name: ID:

Hardware Design with VHDL PLDs IV ECE 443

Real-Time Dynamic Voltage Hopping on MPSoCs

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

The Memory Component

Unit 7: Memory. Dynamic shift register: Circuit diagram: Refer to unit 4(ch 6.5.4)

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

Philip Andrew Simpson. FPGA Design. Best Practices for Team-based Reuse. Second Edition

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps

ARTIST-Relevant Research from Linköping

Embedded Systems: Hardware Components (part I) Todor Stefanov

APPLICATION NOTE 655 Supervisor ICs Monitor Battery-Powered Equipment

Synchronization In Digital Systems

A Low Power DDR SDRAM Controller Design P.Anup, R.Ramana Reddy

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Topic 21: Memory Technology

Topic 21: Memory Technology

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

Interfacing RLDRAM II with Stratix II, Stratix,& Stratix GX Devices

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory

Analog Devices Welcomes Hittite Microwave Corporation NO CONTENT ON THE ATTACHED DOCUMENT HAS CHANGED

STUDY OF SRAM AND ITS LOW POWER TECHNIQUES

SigmaRAM Echo Clocks

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform

Design Guidelines for Optimal Results in High-Density FPGAs

Design and Implementation of an AHB SRAM Memory Controller

SHRI ANGALAMMAN COLLEGE OF ENGINEERING. (An ISO 9001:2008 Certified Institution) SIRUGANOOR, TIRUCHIRAPPALLI

EE251: Thursday November 30

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

Main Memory (RAM) Organisation

THE latest generation of microprocessors uses a combination

Using the Current Control for Dynamic Voltage Scaling to Reduce the Power Consumption of PC Systems

DATA SHEET. Features. General Description. Block Diagram

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Buses. Maurizio Palesi. Maurizio Palesi 1

Real-Time Dynamic Energy Management on MPSoCs

Wave-Pipelining the Global Interconnect to Reduce the Associated Delays

Computing to the Energy and Performance Limits with Heterogeneous CPU-FPGA Devices. Dr Jose Luis Nunez-Yanez University of Bristol

Boost FPGA Prototype Productivity by 10x

Transcription:

Frequency and Voltage Scaling Design Ruixing Yang 04.12.2008

Outline Dynamic Power and Energy Voltage Scaling Approaches Dynamic Voltage and Frequency Scaling (DVFS) CPU subsystem issues Adaptive Voltages Scaling (AVS) Level Shifters and Isolation Voltage Scaling Interfaces Effect on Synchronous Timing Control of Voltage Scaling Examples of Voltage and Frequence Scaling Design The presentation is based on the reference book (M. Keating, et al., Low Power Methodology Manual for System-on-Chip Design, Springer, 2007. ) chapter 9 and 10. All the contents and figures used here are referenced from the book chapter 9 & 10.

Dynamic Power and Engergy Voltage Scaling reducing the supply voltage and clock frequence based on work load. Benefit: valuable for portable battery-powered products. Rarely is all the logic on a SoC required to run at the limit of performance at all times, there can be several different performance profiles. Price: complications are introduced to both the system design and implementation flow. Challenges: 1) Determining the minimum voltage to meet a particular (sub)system performance level; 2) Achieving timing closure over a range of voltages and clock speeds; 3) the actural available headroom for a design; 4) Temperature inversion.

Dynamic Power and Engergy Dynamic power dissipated by CMOS is described by the equation: P = C * f dyn eff * V 2 dd clock where Ceff is the capacitance, Vdd is the voltage (having the greatest effect on power), fclock is the frequence.

Dynamic Power and Engergy Engergy Savings From Voltage Scaling Discussions: Engergy is integration of power over the time to complete a task of work. Ignoring the effects of leakage power, clocking a block at half the frequence halves the dynamic power but takes twice as long to complete the work. Each voltage scaled block requires additional power rail.

Voltage Scaling Approaches The approaches to voltage scaling are: Static Voltage Scaling (SVS): Different blocks or subsystems are given different, fixed supply voltages. Multi-level Voltage Scaling (MVS): an extension of the static voltage scaling case where a block or subsystem is switched between two or more voltage levels. Only a few, fixed, discrete levels are supported for different operating modes. Dynamic Voltage and Frequence Scaling (DVFS): an extension of MVS where a larger number of voltage levels are dynamically switched between to follow changing workloads. Adpative Voltage Scaling (AVS): an extension of DVFS where a control loop is used to adjust the voltage.

Dynamic Voltage and Frequency Scaling (DVFS) CPU sub-system is powered by a programmable power supply. The rest of the chip is powered by fixed power supply. A PLL provides a high speed clock to SysClock Generator, which uses dividers to generate the CPU CLOCK and the SOC CLOCK. SW first decides the minimum CPU clock speed that meets the workload requirements, then determines the lowest supply voltage that will support that clock speed.

Dynamic Voltage and Frequency Scaling (DVFS) Timing/Voltage Values : DVFS uses a set of discrete voltage/ frequence pairs. Determining which values to support is a key design decision, application dependent. Too few operating points results in systems that spend too much time ramping between levels. Too many levels results in the power supply spending too much time hunting between different target voltages. The Effects of Temperature Inversion: Voltages must be limited to the range over which delay and voltage track monotonically, which means above the temperature inversion point. Below the temperature inversion point, delay and voltage invert their normal relationship. Libraries: To establish what voltage levels are needed for the selected clock frequence, we need do trial implementation at reduced voltages and measure the performance. Therefore we need libraries whose characterization extends beyond their nominal supply voltages. Switching Times and Algorithms: Switching performance levels take time for both voltage regulators and clock generators. Switching voltage levels is particular slow and switching frquencies is orders of magnitude faster than voltage level switching. Increase the voltage first and decrease the voltage after the frequence is lowered. Power Up Sequencing: Usually two external power supplies are used. Use digital counter, or handshake signal to avoid deadlocks due to IO pad signaling not being stable until the power rail is valid.

CPU Subsystem Design Issues DVFS is frequently used in CPU Subsystems. During power gating, the CPU is powered down and VDDRAM is set to the lower memory retention voltage. Clamps are used across the CPU memory interface to isolate the memories and hold at a retension voltage to avoid losing state during power down. Level shifters are used between the CPU and the rest of the chip. Below 130nm there is little or no headroom for voltage scaling memories, so a more practical design is shown below. Fixed, high voltage for the cache. Only CPU is voltage scaled. CPU and cache run on different supply voltages.

Adaptive Voltage Scaling (AVS) Previous discussion is open-loop techniques. Pairs of frequence/voltage are determined with sufficient margin to guarantee operation for best and worst case. AVS introduces a closed-loop feedback system between the voltage scaling power supply and delay-sensing performance monitor on the SoC. Performance Monitor is integrated with IP it is monitoring to get the best thermal tracking. The performance monitor communicates with a power controller which in turn set the voltage of the power supply.

Level Shifters and Isolation In any multi-voltage design, level shifters are required at the inferfaces of blocks operating at different voltages. It is much easier to design one direction level shifters. DVFS is always higher or lower voltage than the blocks that it interfaces with. Because the lack of voltage headroom for RAMs, the cache is always at a voltage higher than or equal to that of CPU. In theory, the bus interface of CPU can be a higher or lower voltage, for practical reason the bus is always operate at a voltage higher than or equal to the CPU. Otherwise, system errors! :(

Voltage Scaling Interfaces Effect on Synchronous Timing The timing of a synchronous interface between a DVFS block and the rest of the system is made more complex because DVFS changes the voltages and frequencies. No way to distribute a single, low-skew clock to both the DVFS block and the system. Solution asynchronous interface (E.g. ARM1176). Add asynchronous interface to an AMBA bus to complete with synchronizers in both directions. Latency maybe is not accepted. Not practical in most designs. Solution can be latch-based re-timing. Add latches between the CPU and AMBA bus. So the CPU clock is adjusted roughly aligned to the active of bus clock HCLK. CPU clock can be early or late relative to HCLK. CPU clock early solution is to over-constrain synthesis to guarantee that data arrives early. CPU clock late solution is that the latch assures the data is available.

Voltage Scaling Interfaces Effect on Synchronous Timing Read The Low-phase input latches (LphLAT) are transparent when HCLK is low. Input data is guaranteed by over- contraining synthesis to arrive before the rising edge of HCLK. At the rising edge of HCLK, the latch captures the input data and holds the data for half an HCLK cycle. This guarantees that the data to the CPU will meet setup and hold requirements, even with significant skew on CPUCLK late relative to HCLK. Write The High-phase HphLAT latches are transparent with HCLK is high. If the CPU clock is early, then the latch holds the old data on the bus until the write is complete. If the CPU clock is late, then data will be late arriving on the bus, so we over constrain the bus write timing in synthesis to guarantee that writes work correctly even if data is late by our worst case clock skew.

Voltage Scaling Interfaces Effect on Synchronous Timing An alternative approach to CPU-bus is to use standard rising-edge register. The CPU clock runs early enough so that it generates write data, the data is available at the input of REG early enough to meet the setup time requirements of the register. Write data is sampled by the register at the rising edge of HCLK and held for the duration of the write transaction. On read timing, we simply rely on the system to return read data before the CPU clock that occurs just before the rising edge of HCLK. Advantages: using edge-triggered registers, standard implementation work effectively to assure correct timing, makes automated design-for-test straightforward. Disadvantages: requires tighter over-constraining of the input paths to the CPU inferface.

Control of Voltage Scaling DVFS is used to save the energy only when the system level performance requirements are understood and it is clear when the frequence can be lowered. For enbedded system with a known workload, we can instrument the embedded firmware or hardware to drive the performance request and hence voltage requirements directly. For a real-time system, the deadlines are well understood and expressed in terms of scheduler priorities or scheduled events. The real-time requirements can be used to drive the performance and voltage scaling hardware. Simply tring to guess from system utilization metrics or statistics is not a good solution. Good example, ARM s Intelligent Energy Manager (IEM). It buids an awareness of producer and consumer task frequencies and deadlines. E.g. for GUI, the calls to the window server and the perceived display refresh rates may be used to judge the right level of performance for interactive tasks.

Examples of Voltage and Frequency Scaling Design

Summary Relationship between the voltage & frequency and Dynamic Power & Engergy Voltage Scaling Approaches DVFS (for CPU subsystem issues) AVS Design issues (level shifters and isolation, synchronous timing, scaling control)