COL862 - Low Power Computing

Similar documents
ARDUINO BOARD LINE UP

Lesson 5 Arduino Prototype Development Platforms. Chapter-8 L05: "Internet of Things ", Raj Kamal, Publs.: McGraw-Hill Education

Bluno Mega 2560 (SKU:DFR0323)

IDUINO for maker s life. User Manual. For IDUINO Mega2560 Board(ST1026)

Cypress PSoC 4 Microcontrollers

IDUINO for maker s life. User Manual. For IDUINO development Board.

ARDUINO MEGA INTRODUCTION

Lesson 6 Intel Galileo and Edison Prototype Development Platforms. Chapter-8 L06: "Internet of Things ", Raj Kamal, Publs.: McGraw-Hill Education

Cypress Roadmap: CapSense Controllers Q Owner: JFMD CapSense Roadmap

Ali Karimpour Associate Professor Ferdowsi University of Mashhad

Cypress PSoC 4 Microcontrollers

Clicker 2 for Kinetis

ARDUINO MEGA ADK REV3 Code: A000069

Farklı Arduino Boardlar

Smart Restaurant Menu Ordering System

AVR Microcontrollers Architecture

Introduction to Arduino. Wilson Wingston Sharon

Ali Karimpour Associate Professor Ferdowsi University of Mashhad

Renesas Synergy MCUs Build a Foundation for Groundbreaking Integrated Embedded Platform Development

ARDUINO MEGA 2560 REV3 Code: A000067

Ultra Low Power Microcontroller - Design Criteria - June 2017

ECE 571 Advanced Microprocessor-Based Design Lecture 7

VLSI Design Lab., Konkuk Univ. Yong Beom Cho LSI Design Lab

Arduino ADK Rev.3 Board A000069

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

EFM32....the world s most energy friendly microcontrollers

XNUCLEO-F030R8, Improved STM32 NUCLEO Board

Intel Galileo gen 2 Board

ARDUINO M0 PRO Code: A000111

Arduino Uno. Arduino Uno R3 Front. Arduino Uno R2 Front

keyestudio Keyestudio MEGA 2560 R3 Board

Cypress PSoC 6 Microcontrollers

BANGLADESH UNIVERSITY OF ENGINEERING & TECHNOLOGY (BUET) DHAKA TECHNICAL SPECIFICATION FOR SUPPLY AND INSTALLATION OF LABORATORY EQUIPMENTS (PKG1).

Getting Started With the Stellaris EK-LM4F120XL LaunchPad Workshop. Version 1.05

ARROW ARIS EDGE Board User s Guide 27/09/2017

Use of ISP1880 Accelero-Magnetometer, Temperature and Barometer Sensor

arduino mega D4F69C4DABCA73DE7282FD2C4B5B8 Arduino Mega / 6

WAVETEK BLE-WT51822AA/AB. Revision History. Bluetooth low energy Module WT51822AA (256k) /AB (128k) (Bluetooth Low Energy BT4.0) PRODUCT SPECIFICATION

ARROW ARIS EDGE S3 Board User s Guide 21/02/2018

Introduction to ARM LPC2148 Microcontroller

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction.

ARDUINO LEONARDO ETH Code: A000022

Engineer-to-Engineer Note

ECE 571 Advanced Microprocessor-Based Design Lecture 6

ARDUINO UNO REV3 SMD Code: A The board everybody gets started with, based on the ATmega328 (SMD).

ARDUINO PRIMO. Code: A000135

ARDUINO UNO REV3 Code: A000066

Lab 01 Arduino 程式設計實驗. Essential Arduino Programming and Digital Signal Process

SmartBond DA Smallest, lowest power and most integrated Bluetooth 5 SoC. Applications DA14585

ARDUINO MINI 05 Code: A000087

Let s first take a look at power consumption and its relationship to voltage and frequency. The equation for power consumption of the MCU as it

Getting to know the Arduino IDE

ARDUINO MICRO WITHOUT HEADERS Code: A000093

AN116. Power Management Techniques and Calculation. Introduction. Key Points. Power Saving Methods. Reducing System Clock Frequency

Mohammad Shaffi 1, D Ravi Nayak 2. Dadi Institute of Engineering & Technology,

TINY System Ultra-Low Power Sensor Hub for Always-on Context Features

AVR Training Board-I. VLSI Design Lab., Konkuk Univ. LSI Design Lab

Adafruit Feather nrf52840 Express

An Introduction to the Stellaris LM4F Family of Microcontrollers

Unlocking the Potential of Your Microcontroller

BLE MODULE SPECIFICATIONS

PBLN52832 DataSheet V Copyright c 2017 Prochild.

ARDUINO LEONARDO WITH HEADERS Code: A000057

Power Measurements using performance counters

ESPino - Specifications

LPC2148 DEV BOARD. User Manual.

EE 354 Fall 2015 Lecture 1 Architecture and Introduction

Lecture 14. Ali Karimpour Associate Professor Ferdowsi University of Mashhad

MYD-IMX28X Development Board

Sanguino TSB. Introduction: Features:

MT2 Introduction Embedded Systems. MT2.1 Mechatronic systems

Arduino Uno R3 INTRODUCTION

Embedded Systems. Software Development & Education Center. (Design & Development with Various µc)

Freedom FRDM-KV31F Development Platform User s Guide

Goal: We want to build an autonomous vehicle (robot)

Alessandra de Vitis. Arduino

DFRobot BLE4.1 Module SKU: TEL0120

Adafruit Metro Mini. Created by lady ada. Last updated on :12:28 PM UTC

ELCT708 MicroLab Session #1 Introduction to Embedded Systems and Microcontrollers. Eng. Salma Hesham

STM32F3. Cuauhtémoc Carbajal ITESM CEM 12/08/2013

EVE2 BLE Datasheet. The EVE Platform features standardized IO, common OS and drivers and ultra-low power consumption.

Power Measurement Using Performance Counters

Introduction to Microcontroller Apps for Amateur Radio Projects Using the HamStack Platform.

Wireless-Tag WT51822-S1

Bolt 18F2550 System Hardware Manual

Lab 1 Introduction to Microcontroller

Network Embedded Systems Sensor Networks Fall Hardware. Marcus Chang,

Microcontroller: CPU and Memory

EK307 Lab: Microcontrollers

TEVATRON TECHNOLOGIES PVT. LTD Embedded! Robotics! IoT! VLSI Design! Projects! Technical Consultancy! Education! STEM! Software!

Distributed Real- Time Control Systems

EZ-Bv4 Datasheet v0.7

New STM32WB Series MCU with Built-in BLE 5 and IEEE

User Manual Rev. 0. Freescale Semiconductor Inc. FRDMKL02ZUM

TI SimpleLink dual-band CC1350 wireless MCU

Clock and Fuses. Prof. Prabhat Ranjan Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar

Diploma in Embedded Systems

MYD-IMX28X Development Board

FIFTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLOGY-MARCH 2014 EMBEDDED SYSTEMS (Common for CT,CM) [Time: 3 hours] (Maximum marks : 100)

Kinetis EA Ultra-Reliable Microcontrollers. Automotive and Industrial Applications

Transcription:

COL862 - Low Power Computing Power Measurements using performance counters and studying the low power computing techniques in IoT development board (PSoC 4 BLE Pioneer Kit) and Arduino Mega 2560 Submitted By: Aakash Arora (2014EEY7553) Prasanth B L (2014CRF2355)

Study on Performance measurement counters for different platforms Introduction All the current generation processor have performance monitoring unit (PMU). The PMU is basically hardware built inside a processor to measure its performance parameters such as instructions per cycle, cache misses, etc. They could be used to understand the nature of the current workload and tune the parameters of the chip at the hardware level or making any soft and hard decision for power optimization. We have a large number of performance counters available for the platforms but due to limited hardware counters that could be monitored simultaneously during the run time. We need to identify some subset of counters to correlate the power consumption of the system effectively. This selection could be done by mathematical modelling or by some exhaustive statistical analysis which will run different benchmarks with different possible configurations. In this study we identified some power measurement counters for desktop (Intel i7 and AMD 15h family) and smartphone (Moto G3-ARM A53 v8) which could be used to estimate power consumption. Performance counters for Moto G3 ARM Cortex 53 ARM v8 Moto G3 consists of the Cortex-A53 processor that implements the ARMv8-A8 architecture. The Cortex- A53 has four cores, each with L1 memory system and a single shared L2 cache. There are a total of five hardware counters are present and 62 events can be monitored in the PMU for each core. Some of the techniques to measure the run time performance counters are presented below DS 5 Development Studio with Data Streamline add-on module could be used to read the run time performance counters of the processor while we are running a benchmark. As DS 5 tool can be used with any version after ARM v7, but the smart phone should have a Linux or Android based Operating system. The newer generation smartphones have Hall Effect sensor that measure the current passed in and out of the battery. There android applications (e.g., Ampere) available on the Android play store that displays the charging and discharging current of the battery from the data read from the Hall Effect sensor.

Performance counters for Intel i7 Intel i7 The intel i7 has 6 or 8 core CPU and each core has its own Power Monitoring Unit (PMU) with 3 fixed counters and 4 general counters for each Hyper-thread for measuring the performance of the core subsystem. The shared subsystems like L3 cache will come under the uncore part. PMU have 8 general counters and one fixed counter for the performance monitoring for the uncore part. There are a total of 98 performance counters available for each core of i7. But due to the above mentioned hardware limitation we have to select some of the performance counters. There are online measurement tools available for Intel processor such as Intel Performance Counter Monitor (PCM), Intel Power Gadget which could be used to log these performance counters when running a benchmark. Performance counters for AMD 15h family There are a total of 98 performance monitor events that could be monitored in AMD 15h family. There are only 4 hardware counters available for online measurement for each core. AMD system monitor could be used to read the performance counters by running the benchmarks. In addition to the above mentioned tools for Intel i7 and AMD 15h family, there are some platform independent tools like oprofile to read the performance counters of processors running with Linux.

Development Board Specification PSoC 4 BLE Pioneer kit PSoC 4 BLE is a scalable and reconfigurable platform architecture for a family of programmable embedded system controllers with an ARM Cortex -M0 CPU. It combines programmable and reconfigurable analog and digital blocks with flexible automatic routing. The PSoC 4XX7_BLE product family, based on this platform, is a combination of a microcontroller with an integrated Bluetooth Low Energy (BLE), also known as Bluetooth Smart, radio and subsystem (BLESS). The other features include digital programmable logic, high-performance analog-to-digital conversion (ADC), Opamps with comparator mode, and standard communication and timing peripherals. The PSoC 4XX7_BLE products will be fully upward compatible with members of the PSoC 4 platform for new applications and design needs. The programmable analog and digital subsystems allow flexibility and infield tuning of the design. The system available on the development board is CY8C4247LQI-BL483. Some specifications relevant to our experiment are as follows: A 32-bit MCU Subsystem Single core 48-MHz ARM Cortex-M0 CPU with single-cycle multiply 128 KB of flash with Read Accelerator 16 KB of SRAM

Arduino Atmega 2560 The Arduino Mega 2560 is a microcontroller board based on the ATmega2560. It has 54 digital input/output pins (of which 14 can be used as PWM outputs), 16 analog inputs, 4 UARTs (hardware serial ports), a 16 MHz crystal oscillator, a USB connection, a power jack, an ICSP header, and a reset button. There are no DMA in the Atmega 2560. Specifications of Arduino Mega Microcontroller ATmega2560 Operating Voltage 5V Input Voltage (recommended) 7-12V Input Voltage (limit) 6-20V Digital I/O Pins 54 (of which 15 provide PWM output) Analog Input Pins 16 DC Current per I/O Pin 20 ma DC Current for 3.3V Pin 50 ma Flash Memory 256 KB of which 8 KB used by bootloader SRAM 8 KB EEPROM 4 KB

Experimental setup In our experiment to measure the power and energy consumption we need to acquire the data from the board when we are running the benchmark. The instantaneous current consumption data is obtained through DAQ facility in the multimeter 34461A with a sampling rate of 20 samples per second. The data are read in the system through the local area network in the system. The programs are burned on the chip through the USB connection. The software used to program chip for the PSoC and Arduino are given below PSoC : PSoC Creator Arduino : Arduino IDE PSoC 4 BLE has a Jumper 15 by which we can directly tap the current consumption by the chip. Since Arduino mega board does not have the power tapping jumpers we used external power supply to the board and connected the multimeter in series with the power supply. The connections made between the boards for Arduino and PSoC development boards are given below. Experimental setup for power measurement of PSoc 4 BLE Experimental Setup for measuring power consumption by Arduino Atmega 2560

Complete experimental setup Connectivity for PSoC 4 BLE pioneer kit.

Low Power modes in PSoC Evaluation of benchmarks and observations for PSoC 4 BLE and Arduino PSoC 4 low-power modes allow you to reduce overall power consumption while retaining essential functionality, especially when implemented with other power-saving features and techniques. The power consumption in the low power modes are analyzed, and these measurements are used to estimate the necessary CPU and SRAM power consumption by using the differential approach. There are 5 low power modes available in the PSoC 4 BLE. The summary of the power modes are given in the figure below The measurements are done at a supply voltage of 5V and 3.3V for the sleep mode, Deep sleep mode and the hibernate mode. It is clear that the power consumption are decreasing in order for the Sleep, Deep Sleep and Hibernate mode respectively. This is because the functionalities in sleep mode in ON state are more (only CPU in retention state), then the Deep sleep mode and in Hibernate mode most of the functionalities are in OFF state.

Micro benchmark evaluation in PSoC 4 BLE Pioneer kit The micro benchmarks are selected to find the power consumption by the components CPU, SRAM and other peripherals present in the chip. Some of the component level parameters are varied and the power consumption are analyzed. The benchmarks selected are very simple such that we will be able to completely have the prior knowledge of the code that is running on the CPU core and the purpose of the experiment is satisfied. The purpose of selecting the PSoC 4 BLE is to understand the power consumption in the IoT devices. Thus the application that run on a wearable device is mostly reading data from the sensors, doing some small computation and the device is mostly connected to a smart phone through Bluetooth Low energy. Thus most of the benchmarks considered are the CPU bounded. The micro benchmark considered are while loop, Integer and floating point arithmetic (Addition, Multiplication, Division ), Matrix multiplication, Load instruction (Memory operation).the benchmarks are run with a chip supply voltage of 5 V and at a CPU Clock frequency of 48 MHz. The below benchmarks are run on the PSoC 4 BLE pioneer kit and the results are compared with the low power modes power consumption. In the analysis the Sleep mode in PSoC 4 BLE is considered as the idle mode in this the CPU is in retention state and all other devices are in ON State.

Observations: We can clearly see that the all the benchmarks consume more power than the sleep mode, Addition instruction slightly consumes more power than multiplication instruction. Infinite while loop consumes more power because it fully utilizes the CPU, so it consumes more power. Since there is no DMA In the chip we used, the data is transferred to RAM through Load instructions execute in the CPU. Estimated CPU power Through the differential approach the power consumption of the CPU is found and is given by the formula below Since most of the work load are CPU bounded we see clearly there is negligible difference in the CPU power consumption in most of the micro benchmarks. But the highest CPU power consumption is observed in the while loop (infinite loop) and with Matrix multiplication.

Estimated SRAM power, High speed clock, ADC Through the differential approach the power consumption of the CPU is found and is given by the formula below The collective power consumed by the SRAM and the high frequency clock is constant for all the micro benchmarks because the total capacity of SRAM is 128 KB memory capacity, in all the above benchmarks we are maximum using 400 bytes in the matrix multiplication and in all other benchmarks the variables used are substantially low. Thus the power is for the switching ON the SRAM and since the number of variables stored are very negligible compared to the total capacity of the SRAM there is no difference in the power measured.

Dynamic Voltage and Frequency scaling in PSoC Dynamic voltage and frequency scaling (DVFS) is the adjustment of voltage and frequency settings of the CPU, and other peripherals like ADC, Flip flops to reduce the power consumption. But in the newer processors the feasibility or utilization of the DVFS is becoming less significance. The DVFS experiment in PSoC 4 BLE pioneer kit is to explore the DVFS utilization in PSoC 4 BLE chip. The experiment performs an Integer multiplication for 9600000 times is programmed to run on the PSoC. This number 96000000 is chosen such that the program run for at least 2 Sec duration, since the maximum frequency of the PSoC 4 chip is 48MHz. The experiment is done for 2 voltage settings (5V and 3.3V), and 5 frequency settings (48MHz,24MHz,12MHz,6MHz,3MHz).The results are presented below.

From the above graph it is clear that if we change the working voltage to lower voltage our power consumption will decrease. Thus the power consumption by chip working at 5V is more compared to chip working at 3.3V

If we consider only average power consumed for the work load to complete, it will looks like if we decrease the frequency the power consumption will also get decreased. Since by decreasing the working frequency of the CPU, the time taken for completing the workload gets increased so the energy consumed will gets increased for lower frequency. In order to investigate this scenario we acquired the complete power profile of the complete duration of the experiment. Generally if we execute some instructions in the CPU, it will be in the active state. In order to identify the completion of the workload, we designed the code in such a way that after the completion of the benchmark we return to the deep sleep power mode. Instantaneous Power Consumption plot for the work load at 5V Instantaneous Power in mw Time in Sec

Instantaneous Power Consumption plot for the work load at 3.3V 70 60 Instantaneous Power in mw 50 40 30 20 48 MHz 24 MHz 12 MHz 6 MHz 3MHz 10 0 0 10 20 30 40 50 60 Time in Sec The instantaneous power plots are obtained by acquiring current through the Multimeter 3441A, and then converting to the power by multiplying with the supply voltage to the chip. It is clear from the above plot, that CPU running at higher frequency will complete the given work load in less time compared to CPU working at lower frequency. The instantaneous power consumption by the chip is also getting lowered with the decrease in frequency. The time elapsed for different voltages settings (5V and 3.3 V) are same. The summary of the results are plotted below

Energy in Joules for completing the workload Energy in Joules Time Elapsed Time in Sec In order to include the practical case, after the completion of the workload the CPU will go to the idle state. In our case we are making the chip to go to Deep sleep mode. Thus we have padded the deep sleep mode power as per the equation given below where is the maximum time taken by the CPU to complete the workload, in our case it is the time take by the CPU to complete the workload by lowest frequency setting.

Energy consumed padded with the idle state Energy in Joules From the above plot it is clear that after padding the idle state after the workload, the CPU at computing at higher frequency consumes less energy compared to CPU computing at lower frequency. Thus for the CPU bounded workload there is no significance of using DVFS in the PSoC 4 BLE. But we have different idle power for different working voltage settings. For lower voltages we have lower idle power, thus if the chip is in idle mode we can lower the voltage to save considerable power. Macro Benchmark The main works done by the wearable devices is to read the data from the sensor and transfer the data to a smart phone through Bluetooth Low Energy for further processing. So two macro benchmarks are selected one continuously sensing data from the sensor and the other senses the value from the sensor and transfer it to the smart phone through when it is queried. The later implements the low power techniques to reduce the power consumption such as going to the deepsleep mode when there is no activity. The Bluetooth connects to the Cypress Smart android application in the smart phone and we can query the sensor data from that application

Instantaneous power profile for Bluetooth application Instantaneous power in mw Advertising state Bluetooth Connected to smart phone During data transmission between the smart phone and kit Time in Sec From the above figure it is clear that the when the data is queried we see a sudden spike in power consumption and then the PSoC goes to the deep sleep mode when there is no activity. The power consumption is low in this case because the program code scales the frequency of the processor. Comparison of both the Power consumption by the Macro application These applications are long running application, so it is reasonable to compare the average power consumption by both the applications. The below results shows that the low power techniques are more efficient Power Consumed in mw

Chip Temperature measurement using the die temperature sensor The PSoC 4 BLE has a die temperature sensor by which we will know the temperature of the chip. The temp sensor produces a voltage that is proportional to the temperature of the die in the device. This voltage is supplied as one of the single-ended inputs to the ADC (Analog to Digital Converter) mux. When the temp sensor is selected as the ADC input source and the ADC initiates a conversion, the resulting ADC output code can, with a little math, be converted into a temperature in degrees. Example applications of the temp sensor include system environmental monitoring, to test for system overheating. In the experiment the UART is configured to display the die temperature in degree Celsius. The obtained results at two different time instances are shown in the below figure.

Arduino The micro benchmark considered are while loop, Integer arithmetic (Addition, Multiplication), Matrix multiplication, Load instruction (Memory operation), to continuously sense an analog voltage through ADC at 10KHz LINPACK. The benchmarks are run with a chip supply voltage of 5 V and at a CPU Clock frequency of 16 MHz. The power consumption by each of the benchmarks are presented below.

All the benchmarks are using CPU intensively, the variations are not that much significant. The idle power in Atmega 2560 is nearly 88% of the active power consumption. Similar kind of trends of power consumption for different micro benchmarks like the PSoC 4 BLE. Estimating the CPU power consumption Through the differential approach the power consumption of the CPU is found and is given by the formula below DVFS experiment in Arduino The experiment performs a floating multiplication continuously in a while loop to run on the Arduino. The experiment is done for 2 voltage settings (5V and 3.3V), and 5 frequency settings (16MHz,8MHz,4MHz,2MHz,1MHz,0.5MHz).The results are presented below. The experiment is done with the supply voltage provided at the 5V and 3V pin in the Arduino board respectively. Thus from the below graphs it is clear that there is a considerable decrease in the power consumption when we decrease the working frequency of the CPU.

Power Consumed in mw Frequency The Arduino has a Vin pin which accepts a voltage range of 7V-12V.The applied voltage is regulated and converted to 5V and supplied to the Chip. Thus the remaining applied power is decapitated in the voltage regulator circuit. The below readings are obtained for voltage settings 7.5V, 9V, 10.5V and 12V respectively. 1000 900 800 700 Power Consumed in mw 600 500 400 300 200 100 7.5V 9V 10.5V 12V 0 16 MHz 8 MHz 4 MHz 2 MHz 1MHz 0.5 MHz Frequency

Future Work Comparing the power consumption for different control loops for example for loop, do-while, while loop. Thermal analysis of the loops could also be done. Some memory bound workloads have to be compared with along with the CPU intensive work load.