Power and Thermal Models. for RAMP2

Similar documents
EECS4201 Computer Architecture

Fundamentals of Quantitative Design and Analysis

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Double Rewards of Porting Scientific Applications to the Intel MIC Architecture

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004

New Technologies in CST STUDIO SUITE CST COMPUTER SIMULATION TECHNOLOGY

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

A 1.5GHz Third Generation Itanium Processor

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

CPE300: Digital System Architecture and Design

Let s put together a Manual Processor

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

SOI REQUIRES BETTER THAN IR-DROP. F. Clément, CTO

Computer Architecture. Introduction. Lynn Choi Korea University

An Overview of Standard Cell Based Digital VLSI Design

Minimizing Thermal Variation in Heterogeneous HPC System with FPGA Nodes

Embedded Systems: Hardware Components (part I) Todor Stefanov

Thermal Modeling and Active Cooling

Presenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs

A+ Guide to Hardware, 4e. Chapter 4 Processors and Chipsets

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor

THERMAL GRADIENT AND IR DROP AWARE DESIGN FLOW FOR ANALOG-INTENSIVE ASICS

Physical Implementation

FABRICATION TECHNOLOGIES

Marine Acoustic Acquisition System

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

3D Technologies For Low Power Integrated Circuits

Microprocessors/Microcontrollers

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

MICROPROCESSOR TECHNOLOGY

Six-Core AMD Opteron Processor

Computer Hardware Requirements for ERTSs: Microprocessors & Microcontrollers

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Interposer Technology: Past, Now, and Future

Machine Architecture. or what s in the box? Lectures 2 & 3. Prof Leslie Smith. ITNP23 - Autumn 2014 Lectures 2&3, Slide 1

Design-Induced Latency Variation in Modern DRAM Chips:

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers

A Non-Volatile Microcontroller with Integrated Floating-Gate Transistors

Scientific Computing on GPUs: GPU Architecture Overview

Lecture Objectives. Introduction to Computing Chapter 0. Topics. Numbering Systems 04/09/2017

Why GPUs? Robert Strzodka (MPII), Dominik Göddeke G. TUDo), Dominik Behr (AMD)

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017

VLSI Design Automation

MICRO BURN IN PRODUCTS LISTED IN MODEL NUMBER ORDER FOLLOWED BY A BRIEF DESCRIPTION

COL862 - Low Power Computing

Checker Processors. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India

CHAPTER 8: Central Processing Unit (CPU)

EE 3170 Microcontroller Applications

The Future of Computing: AMD Vision

Knowledge Organiser. Computing. Year 10 Term 1 Hardware

Transistors and Wires

Implementation and Experimental Evaluation of a CUDA Core under Single Event Effects. Werner Nedel, Fernanda Kastensmidt, José.

Multi-Core Microprocessor Chips: Motivation & Challenges

SEE Tolerant Self-Calibrating Simple Fractional-N PLL

Energy'Introspector:'Standard'Physical'Library' Interface'for'Full;System'Microarchitecture'and' Mul>;Physics'Simula>ons'

VLSI Design Automation. Maurizio Palesi

Comparative Analysis of Contemporary Cache Power Reduction Techniques

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

Current status of SOI / MPU and ASIC development for space

Microprocessors, Lecture 1: Introduction to Microprocessors

FPGA Power Management and Modeling Techniques

Chapter 14 - Processor Structure and Function

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

Computer Architecture s Changing Definition

ECSE-2610 Computer Components & Operations (COCO)

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

COMPUTER ARCHITECTURE AND PARALEL PROCESSING STUDY NOTES

The Processor That Don't Cost a Thing

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.

Segment 1A. Introduction to Microcomputer and Microprocessor

An overview of standard cell based digital VLSI design

Jumping Hurdles. High Expectations in a Low Power Environment. Christopher Fadeley Software Engineering Manager EIZO Rugged Solutions

ELCT708 MicroLab Session #1 Introduction to Embedded Systems and Microcontrollers. Eng. Salma Hesham

CIT 668: System Architecture

Sophon SC1 White Paper

TEXAS INSTRUMENTS ANALOG UNIVERSITY PROGRAM DESIGN CONTEST MIXED SIGNAL TEST INTERFACE CHRISTOPHER EDMONDS, DANIEL KEESE, RICHARD PRZYBYLA SCHOOL OF

Power and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur

CAESAR: Cryptanalysis of the Full AES Using GPU-Like Hardware

Power and Energy Management

A Framework for Modeling GPUs Power Consumption

Generating the Control Unit

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

COSC 243. Computer Architecture 1. COSC 243 (Computer Architecture) Lecture 6 - Computer Architecture 1 1

Monolithic 3D IC Design for Deep Neural Networks

Lecture-14 (Memory Hierarchy) CS422-Spring

A 1-GHz Configurable Processor Core MeP-h1

TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES

Making Visible the Thermal Behaviour of Embedded Microprocessors on FPGAs. A Progress Report

Transcription:

Power and Thermal Models for 2 Jose Renau Department of Computer Engineering, University of California Santa Cruz http://masc.cse.ucsc.edu

Motivation Performance not the only first order design parameter Energy consumption & thermal issues platforms lack power and thermal statistics Jose Renau 2

Issues provides speed, we need fast power/thermal models Power Models Compute power on the FPGA or desktop? Most power models are for complex CPUs Need simple/validated CPUs models (IR setup to validate) Thermal Models Complex thermal models difficult to map to FPGAs We need more detailed thermal model: SOI, PCB... Shared infrastructure across platforms Jose Renau 3

2 Integration Thermal Sensor visible to OS/CPU Target (MP Modeled) 9 1 Activity rate generation (adjust clock gate) 8 Pass temperature and power? 7 Set Thermal Sensor 6 Pass temperature (every ~ few ms) Host (FPGA) SESCTherm (Desktop) FEM Solver (GPU) Activity rate transfer (once every ~1ms) 2 Estimate Dyn Power Compute Leakage with current temperature 3 Pass total power 4 Compute temperature 5 Jose Renau 4

Activity Rate Generation Similar approach to Wattch A counter for each resource Increase after each use Account for clock gating Not everything is 100% clock gated SESC uses ~80 counters for each CPU Large 32bit counters only require 320 Bytes per CPU Jose Renau 5

Activity Rate Transfer ~80 counters per CPU Slow temperature transients Temperature (C) 85 80 75 70 65 RF D$ FP0 Clock MC I$ 60 0 1 2 Time (s) 3 4 ~1ms is enough Trivial encoding :(32 bit x 80)/ 1ms = 320 KBytes/s per CPU AR (1/256) just requires 80 KBytes/s per CPU Additional optimizations are possible (floorplan block clustering~10kb/s) Jose Renau 6

Outline Motivation 2 Integration SESCTherm (Thermal Model) Thermal Sensors Q/A Jose Renau 7

SESCTherm: Thermal Model Finite element analysis of thermal processes Conduction, convention, and radiation Similar to HotSpot but.. Different transistor densities through die Supports multiple cooling solutions Package / die material layers (SOI, Al/Cu, etc) Highly extendable and scalable 3D chips ready Can be used stand alone or coupled with our architectural simulator (SESC) Jose Renau 8

SESCTherm Model 61 Figure 2.12: Sample Layer Stack for Flip-Chip Pin Grid Array-Type Package Assembly Jose Renau 9

Board+Densities+Package+SOI Jose Renau 10

SOI Modeling Jose Renau 11

SESCTherm Validation IR Infrastructure used to validate SESCTherm Current: a simple flip-chip (no package) Future: a 40nm TSMC testchip with full package Jose Renau 12

SESCTherm Validation Jose Renau 13

Current Simulation Speed 250um resolution (100mm^2 chip) (10-500 times slower than real time) Jose Renau 14

Faster Thermal Model Difficult to implement using FPGAs Floating point requirements Continuous parameter update (temperature dependence) GPUs Use 32bit FP (done) Port to CUDA the sparse matrix solver (done) Currently, only 2x speedup Optimize the CUDA port (work in progress) Objective Detailed thermal simulation ~10 slower than native Jose Renau 15

SESCTherm Video AMD Athlon simulated thermal map 2 seconds native simulation Video ~x16 slower than native (~MASC perf. 2 goal) Currently, it requires 3 minutes to compute Jose Renau 16

Power Models Configurable floorplan Inst. Pick Clock Distribution ALUs DTLB Memory Controller 1 L1I 0 2 Bus 1 L1D 0 FP0 FRF LSQ SSE Clock Fetch Sched ROB IRF FPSched Power model for each processor block P ower = P dyn +P leak0 T 2 e (P leak1/t ) (1 e Pleak2/T ) Dynamic Leakage Jose Renau 17

Feedback Loop Temperature affects: Leakage Mostly quadratic effect Material properties Resistance/Capacitance linear Thermal sensor model CPU/OS Jose Renau 18

Thermal Sensor CPU/OS response to temperature uses thermal sensors Not as trivial as we thought E.g.: A commercial GPU thermal sensor is over 1mm 2 @ 65nm Sampling rate and accuracy not so good Self-heating, calibration, leakage bias, etc... Where should sensors be placed? What is ideal # of sensors on die? Several sensor models Ideal...... Internal Temp Sensor Jose Renau 19

2 Integration Thermal Sensor visible to OS/CPU Target (MP Modeled) 9 1 Activity rate generation (adjust clock gate) 8 Pass temperature and power? 7 Set Thermal Sensor 6 Pass temperature (every ~ few ms) Host (FPGA) SESCTherm (Desktop) FEM Solver (GPU) Activity rate transfer (once every ~1ms) 2 Estimate Dyn Power Compute Leakage with current temperature 3 Pass total power 4 Compute temperature 5 Jose Renau 20

Questions? Power/Thermal Models for 2 Jose Renau renau@soe.ucsc.edu http://masc.cse.ucsc.edu 21

Backup Slides Power/Thermal Models for 2 Jose Renau renau@soe.ucsc.edu http://masc.cse.ucsc.edu 22

Our Experimental Setup Infrared Camera Oil Flow Voltage/ Power Logger Chip being measured Real-time Infrared Imaging Oil Cooling & Pump System Jose Renau 23