Power Measurements using performance counters

Similar documents
COL862 Programming Assignment-1

Power Measurement Using Performance Counters

Power Measurements using performance counters CSL862: Low-Power Computing By Radhika D (2014SIY7530)

COL862 - Low Power Computing

APPENDIX Summary of Benchmarks

ARM big.little Technology Unleashed An Improved User Experience Delivered

The Role of Performance

ELE 455/555 Computer System Engineering. Section 1 Review and Foundations Class 5 Computer System Performance

Abhishek Pandey Aman Chadha Aditya Prakash

Power Control in Virtualized Data Centers

TINY System Ultra-Low Power Sensor Hub for Always-on Context Features

Energy-centric DVFS Controlling Method for Multi-core Platforms

Computer Architecture. Fall Dongkun Shin, SKKU

Performance of computer systems

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

Scheduling the Intel Core i7

SMARTPHONE HARDWARE: ANATOMY OF A HANDSET. Mainak Chaudhuri Indian Institute of Technology Kanpur Commonwealth of Learning Vancouver

Power Management for Embedded Systems

Cloud Based Framework for Rich Mobile Application

CS61C - Machine Structures. Week 6 - Performance. Oct 3, 2003 John Wawrzynek.

Chapter 5. Introduction ARM Cortex series

ECE 471 Embedded Systems Lecture 2

Energy Management Issue in Ad Hoc Networks

Performance and Energy Efficiency of the 14 th Generation Dell PowerEdge Servers

Chip-Multithreading Systems Need A New Operating Systems Scheduler

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Utilization-based Power Modeling of Modern Mobile Application Processor

Energy Management Issue in Ad Hoc Networks

Suspend-aware Segment Cleaning in Log-Structured File System

Ubiquitous and Mobile Computing CS 528:EnergyEfficiency Comparison of Mobile Platforms and Applications: A Quantitative Approach. Norberto Luna Cano

Contour Detection on Mobile Platforms

Lesson #1. Computer Systems and Program Development. 1. Computer Systems and Program Development - Copyright Denis Hamelin - Ryerson University

Multimedia in Mobile Phones. Architectures and Trends Lund

LPGPU2 Font Renderer App

Fixed-Point Math and Other Optimizations

Low-power Architecture. By: Jonathan Herbst Scott Duntley

An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks

SAE5C Computer Organization and Architecture. Unit : I - V

Renesas Synergy MCUs Build a Foundation for Groundbreaking Integrated Embedded Platform Development

P a g e 1. MathCAD VS MATLAB. A Usability Comparison. By Brian Tucker

Amber Baruffa Vincent Varouh

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications

Snapdragon S4 System on Chip

GCSE Computer Science for OCR Overview Scheme of Work

F28HS Hardware-Software Interface: Systems Programming

Enabling the A.I. Era: From Materials to Systems

CMSC 611: Advanced Computer Architecture

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Android Power Management & Ways to reduce the Power Consumption in an Android Smartphone

Expanding Opportunities in Clamshell Devices. Laurence Bryant VP Strategic Marketing

Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors

The character of the instruction scheduling problem

Baikal-T1 Microprocessor Performance Tests

Example: Adding 1000 integers on Cortex-M4F. Lower bound: 2n + 1 cycles for n LDR + n ADD. Imagine not knowing this : : :

Variables and Operators 2/20/01 Lecture #

Introduction to Computer Science. Homework 1

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 3.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555C (ID102813)

EPUB // SAMSUNG GALAXY 7500 ONLINE MANUAL DOWNLOAD

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

TS1010C 10.1" Tablet PC Surf

Chapter 02. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Flowmap Generator Reference

Computer Architecture, RISC vs. CISC, and MIPS Processor

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms

Introduction. Arizona State University 1

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Byte Ordering. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief

Heterogeneous Computing Made Easy:

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

A Fast Instruction Set Simulator for RISC-V

Artificial Intelligence Enriched User Experience with ARM Technologies

What is a computer? Units of Measurement. - A machine that: - Counts.

SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD

Modeling CPU Energy Consumption for Energy Efficient Scheduling

An overview of mobile and embedded platforms

GO - OPERATORS. This tutorial will explain the arithmetic, relational, logical, bitwise, assignment and other operators one by one.

WearDrive: Fast and Energy Efficient Storage for Wearables

MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency

POWER MANAGEMENT AND ENERGY EFFICIENCY

CS 101, Mock Computer Architecture

Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation

TABLET COMPARISON WITH BENCHMARKS TABLETS WE TESTED A PRINCIPLED TECHNOLOGIES TEST REPORT. SEPTEMBER 2014 (Revised) Commissioned by Intel Corp.

Matrix Manipulation Using High Computing Field Programmable Gate Arrays

Main Window. June 25, 2017, Beginners SIG Activity Monitor (Part 1 of 2)

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Optimized Scientific Computing:

XPU A Programmable FPGA Accelerator for Diverse Workloads

Lecture 3 Notes Topic: Benchmarks

CS 110 Computer Architecture

Building Ultra-Low Power Wearable SoCs

EEL 4783: Hardware/Software Co-design with FPGAs

RPICT03/07: Overview of MTP Laptop Computer Testing Activities and Results

Higher compression efficiency, exceptional image quality, faster encoding time and lower costs

Byte Ordering. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Cryptography. Cryptography is much more than. What is Cryptography, exactly? Why Cryptography? (cont d) Straight encoding and decoding

Transcription:

Power Measurements using performance counters CSL862: Low-Power Computing By Suman A M (2015SIY7524)

Android Power Consumption in Android Power Consumption in Smartphones are powered from batteries which are limited in size and therefore capacity. Therefore Power management in Smartphone is of great concern. Objective: The Objective was to monitor the Power, performance and Frequency of CPU and GPU on running various benchmarks to gather data. Platform Specifications Phone Model : Mi4i Processor : Qualcomm MSM8939 GPU : Qualcomm Adreno 405 System RAM : 2GB CPU Cores : 8x Arm cortex A53 OS : Android 5.0 Figure 1: Test Platform Mi4i Smartphone Tools used: Bench mark Apps: Benchmark apps are standard application programs used to stress various resources of a system. SPEC CPU: SPEC CPU is industry-standardized, CPU-intensive benchmark suite, stressing a system's processor, memory subsystem and compiler. It contains 12 different benchmark tests. It provides a comparative measure of compute-intensive performance across the widest practical range of hardware using workloads developed from real user applications. Performance test Mobile: Performance test Mobile is an Android Application used to benchmark an Android device using a variety of different speed tests and compare the results to other device. It consists of various Benchmarking tests.

Power (W) Experiment: Setup: The Setup includes an app HWMonitor pro which can provide with the following data (as in fig ) at a time interval of 0.5 Second which is acquired in PC through Remote IP interface provided by the app. The data can be logged in CSV format which is imported to MS Excel for analysis and Charts The Experiment is conducted by running HWMonitor pro app in background which collects the performance and power data and transmits it to remote PC through IP. The power is measured by monitoring the drain current from the battery given by hardware measuring circuits present in the smartphone. Initially the systems Idle power is measured with the phone in screen off mode where all the CPUs are in sleep state. When the phone wakes the LCD is on and the LCB power is recorded which forms the base power of the smartphone. and the benchmark app is run in sequence one by one with an interval of 2 seconds between two benchmarks Spec CPU: SPEC CPU is industry-standardized, CPU-intensive benchmark suite, stressing a system's processor, memory subsystem and compiler. It contains 12 different benchmark tests. It provides a comparative measure of compute-intensive performance across the widest practical range of hardware using workloads developed from real user applications. The Spec CPU Benchmarks is executed in Smartphone using an Android apk. The results of the experiment shows that all these benchmarks consume same power around 2.5 to 3W as these are assigned to single core by the scheduler and there is no control to allocate more cores to the process and thus all should similar power values. 3 2.5 SPEC CPU Data 2 1.5 1 0.5 0 MCF vortex bzip2 crafty gcc Benchmarks Figure 2: Power values of SPEC CPU Benchmarks

Pass Mark Benchmark: The Passmark softwares provides a set of standard tests for benchmarking the performance of an Android smartphone Benchmarks: CPU Tests Integer Math Operations The test uses a large set of random 32-bit and 64-bit integers and adds, subtracts, multiplies and divides these numbers. Floating Point Math Operations Similar to Integer test it perform same operations with Floating point data Prime Number Test - This algorithm uses loops and CPU operations that are common in computer software, and determines the speed at which numbers can be compared with other numbers String Sorting Test - The String Sorting Test uses the qsort algorithm to see how fast the CPU can sort strings Data Encryption Test - The Encryption Test encrypts blocks of random data using the Blowfish algorithm, This test uses many of the techniques in the math tests, but also uses a large amount of binary data manipulation and CPU mathematical functions like 'to the power of'. Data Compression Test - The compression test uses an Adaptive encoding algorithm based on a method described by from Ian H. Witten, Radford M. Neal, and John G. Cleary in an article called "Arithmetic Coding for Data Compression". The system uses a model which maintains the probability of each symbol being the next encoded Memory Tests Memory Read Test and Memory Write Test - A Memory File is created in RAM. The size of the Memory File is determined by the amount of ram currently available. GPU Stress Test Three helicopters travel around the scene featuring 150 trees on a terrain with water. The terrain is formed with 5,000 triangular polygons (15,000 vertices). The trees and water consist of 2 polygons (6 vertices) and each helicopter object consists of 1,120 triangular polygons (3,360 vertices). Lighting & alpha blending are enabled. Following Parameters are recorded for analysis: CPU Frequency (Core 0, Core 4) CPU Utilization GPU Frequency Power

Figure 3: Performance chart of benchmarks The Power Consumption of the various benchmarks is plotted in fig 3. The data consists of the Minimum, Maximum and the Average Power consumed by each benchmark. From the experiment following observations are made for each benchmark. Figure 4: Power vs CPU Utilisation graph Idle: The Idle power for the smartphone is very low 0.18W when it is in sleep it disable all the processing modes and uses only essential resources like GSM and 3G connectivity resources. The power consumption is there only when there is a data transaction with the network.

LCD: LCD is the most power consuming part of the Smartphone since it consumes a constant power independent of the applications load it always remain powered. It consumed 0.8 W of continuous power until there is an operation being performed in Phone. Math Operations (Fixed point): The Math operations run a set of Math operations and compute the results. It is highly CPU Intensive process and the CPU Utilisation is seen to be 98% also the power consumed is 6.7 W which is mainly consumed by the CPU. Math Operations (Floating point): The Floating Point operations consumed 4.7 W which is less than Math Operations which can be related to reduction in CPU usage (85 %) mainly due to floating point operations being performed by FPU unit and thus consuming less power. Prime Number : Prime Number detection as it many loops and memory operations the CPU is not fully utilised thus is consuming a power of 5.7 W. String Sorting: String sorting uses Quick sort algorithm which utilizes the CPU to the maximum as the sorting is not limited anywhere by memory operations and it CPU is utilised to its full strength to compute the results and thus has highest power of 7.5 W. Data Encryption : It also similar to Prime number detection the CPU Utilisation is low as data encryption has memory dependence. It consumed 6.4 W which less that String sorting. Figure 5: CPU Utilisation for various Benchmarks

0.13 0.18 0.1 0.8 1.1 0.5 2.77 2.9 2.6 2.7 2.8 2.6 4 3.3 2.9 3.6 3.9 3.8 4.6 4.7 4.5 4.5 4.7 4.5 4.7 5.2 5.7 6.4 6.7 6.2 6.2 6.4 7.5 POWER DRAIN FROM BATTERY (W) Power (W) Power (W) Power (W) Figure 6: Average, Min and Max Power consumption for various Benchmarks Memory Read and Write: Memory operations perform a transfer a block file from or to RAM. The CPU Utilisation is very low (~14%) as it is purely a memory bound operation. Thus Power consumption of 2.1 W (2.9 0.8) W can be inferred to be consumed by the Memory controller only. It is visible from graph that write cycles are consuming more power than read cycle as Memory write is a power intensive operation. GPU Test: The GPU test stressed the GPU operations with its 3d vector processing operations. It consumed 3.3 W of Power and the CPU utilisation is very low (11%) showing that the power is being utilised mainly by the GPU. It also seen from fig. 7 that GPU frequency has increased to 550MHz from 400 MHz operated in all other operations.

Figure 7: GPU Frequency for various Benchmarks Conclusion: The Power distribution for various benchmark apps were analysed and from inference we can relate that operations that utilise CPU consume the most power peak of 7.5 W in String operation. Therefore CPU has the highest power rating based on operating frequency. The LCD consumes a continuous power of 0.8 W which is throughout the use of a smart phone compared to CPU which are burst of peak power based on operations. The GPU and Memory controllers consume considerably less power compared to CPU and contribute less to the power consumption of the system.