µtasker Document µtasker Benchmark Measurements

Similar documents
µtasker Document TELNET

µtasker Boot-loader support

µtasker Document µtasker Bare-Minimum Boot Loader

Hello, and welcome to this presentation of the STM32L4 power controller. The STM32L4 s power management functions and all power modes will also be

SPI to Ethernet Module

µtasker Document µtasker MQTT/MQTTS

440GX Application Note

µtasker Document Kirin3 M52259 demo user s guide

µtasker Document CodeWarrior 10

Infineon C167CR microcontroller, 256 kb external. RAM and 256 kb external (Flash) EEPROM. - Small single-board computer (SBC) with an

E Copyright VARAN BUS USER ORGANIZATION 06/2015. Real Time Ethernet VARAN Bus

µtasker Document µtasker SNTP (Simple Network Time Protocol)

Lecture 3. The Network Layer (cont d) Network Layer 1-1

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

6.9. Communicating to the Outside World: Cluster Networking

Using Time Division Multiplexing to support Real-time Networking on Ethernet

eip-24/100 Embedded TCP/IP 10/100-BaseT Network Module Features Description Applications

System Architecture Directions for Networked Sensors[1]

Network Driver for Microchip LAN7500 and LAN9500 User Guide

Internet II. CS10 : Beauty and Joy of Computing. cs10.berkeley.edu. !!Senior Lecturer SOE Dan Garcia!!! Garcia UCB!

Blackfin Optimizations for Performance and Power Consumption

LAPI on HPS Evaluating Federation

FreeBSD Network Performance Tuning

AN4777 Application note

Enabling Gigabit IP for Intelligent Systems

Clock Synchronous Control Module for Serial Flash Memory Access Firmware Integration Technology

Module Introduction! PURPOSE: The intent of this module, 68K to ColdFire Transition, is to explain the changes to the programming model and architectu

Network Design Considerations for Grid Computing

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

STM32G0 MCU Series Efficiency at its Best

µtasker Document Controller Area Netwok (CAN)

µtasker Document Segment LCD user s guide

µtasker Document µtasker Multicasting and Internet Group Management Protocol (IGMP)

QUIZ Ch.6. The EAT for a two-level memory is given by:

µtasker Document Kinetis demo user s guide

Software development from a bird's eye view Ulrich Kloidt, Senior Application Engineer, Altium Europe GmbH

Proven 8051 Microcontroller Technology, Brilliantly Updated

MQX -celeration RTOS-integrated solutions

Enabling success from the center of technology. Networking with Xilinx Embedded Processors

GigaX API for Zynq SoC

The SpaceWire Transport Protocol. Stuart Mills, Steve Parkes University of Dundee. International SpaceWire Seminar 5 th November 2003

Since ESE GATE PSUs ELECTRICAL ENGINEERING COMPUTER FUNDAMENTALS. Volume - 1 : Study Material with Classroom Practice Questions

Migrating to Cortex-M3 Microcontrollers: an RTOS Perspective

COL862 - Low Power Computing

Operation Manual of EX9132CST-Series

DESIGN AND IMPLEMENTATION OF AN AVIONICS FULL DUPLEX ETHERNET (A664) DATA ACQUISITION SYSTEM

Embedding it better... µtasker Document. µtasker I 2 C Support, Demo and Simulation

RPC Interface Specification November 2001 Introduction

STM32 Journal. In this Issue:

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking

Section 31. DMA Controller

Contents. Cortex M On-Chip Emulation. Technical Notes V

EE 610 Part 2: Encapsulation and network utilities

Typical modules include interfaces to ARINC-429, ARINC-561, ARINC-629 and RS-422. Each module supports up to 8 Rx or 8Tx channels.

ERRATA SHEET INTEGRATED CIRCUITS. Date: 2008 June 2 Document Release: Version 1.6 Device Affected: LPC2468. NXP Semiconductors

Environmental Data Acquisition Using (ENC28J60)

Hello and welcome to this Renesas Interactive module that provides an architectural overview of the RX Core.

Introduction to TCP/IP Offload Engine (TOE)

Clicker 2 for Kinetis

Layer 4: UDP, TCP, and others. based on Chapter 9 of CompTIA Network+ Exam Guide, 4th ed., Mike Meyers

Lecture 8. Network Layer (cont d) Network Layer 1-1

Question 7: What are Asynchronous links?

STM32F7 series ARM Cortex -M7 powered Releasing your creativity

Optimizing TCP Receive Performance

PCnet-FAST Buffer Performance White Paper

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction.

INT 1011 TCP Offload Engine (Full Offload)

Xen Network I/O Performance Analysis and Opportunities for Improvement

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

QuickSpecs. HP Z 10GbE Dual Port Module. Models

Message Passing Architecture in Intra-Cluster Communication

Mega128-Net Mega128-Net Mega128 AVR Boot Loader Mega128-Net

Lighting the Blue Touchpaper for UK e-science - Closing Conference of ESLEA Project The George Hotel, Edinburgh, UK March, 2007

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor

CS61C : Machine Structures

The Myricom ARC Series with DBL

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad

FIGURE Three EPROMs interfaced to the 8088 microprocessor.

RZ/N1 Multi-Protocol Industrial Ethernet Made Easy

CPCI-HPDI32ALT High-speed 64 Bit Parallel Digital I/O PCI Board 100 to 400 Mbytes/s Cable I/O with PCI-DMA engine

TCO305 TCP/IP Control System Interface Development Using Microchip Brand Microcontrollers

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS

CS 43: Computer Networks. 16: Reliable Data Transfer October 8, 2018

STM32L4 System operating modes

uip, TCP/IP Stack, LPC1700

LAN bit Non-PCI Small Form Factor 10/100 Ethernet Controller with Variable Voltage I/O & HP Auto-MDIX Support PRODUCT FEATURES.

An Intelligent NIC Design Xin Song

Production Programming for HC12 internal Flash

CS61C : Machine Structures

STM32 F0 Value Line. Entry-level MCUs

(Version 1.0) 2011 WIZnet Co., Ltd. All Rights Reserved. For more information, visit our website at

Hello, and welcome to this presentation of the STM32 Flash memory interface. It covers all the new features of the STM32F7 Flash memory.

LabVIEW ON SMALL TARGET

Introduction to Netburner Network Development Kit (NNDK) MOD5282

Table 1 provides silicon errata information that relates to the masks 0M55B, 1M55B, and M55B of the MC9328MX21S (i.mx21s) applications processor.

Linux Network Tuning Guide for AMD EPYC Processor Based Servers

Architectural Pattern for a Very Small Microcontroller

Motivation CPUs can not keep pace with network

Amarjeet Singh. January 30, 2012

The Interconnection Structure of. The Internet. EECC694 - Shaaban

Transcription:

Embedding it better... µtasker Document µtasker Benchmark Measurements utaskerbenchmarks.doc/1.1 Copyright 2012 M.J.Butcher Consulting

Table of Contents 1. Introduction...3 2. M52235 EVB Power consumption test - V1.2.008...4 3. M52235 EVB UDP reaction test Project version V1.2.008...5 4. K70F120 Tower Kit UDP reaction test Project version V1.4.1...7 utaskerbenchmarks.doc/1.1 2/8 29.4.2012

1. Introduction µtasker is an operating system designed especially for embedded applications where a tight control over resources is desired along with a high level of user comfort to produce efficient and highly deterministic code. The operating system is integrated with TCP/IP stack and important embedded Internet services along-side device drivers and system specific project resources. µtasker and its environment are essentially not hardware specific and can thus be moved between processor platforms with great ease and efficiency. However the µtasker project setups are very hardware specific since they offer an optimal pre-defined (or a choice of pre-defined) configurations, taking it out of the league of board support packages (BSP) to a complete project support package (PSP), a feature enabling projects to be greatly accelerated. This document discusses some performance comparisons with various events and settings. utaskerbenchmarks.doc/1.1 3/8 29.4.2012

2. M52235 EVB Power consumption test - V1.2.008 The power consumption of the M52235EVB was measured from 3V3 when the software was operating at 40MHz with and without 100MHz LAN connection. Current from 3V3 (LEDs disabled) Without LAN connection With 100M LAN connection Normal operation 280mA 330mA With LOW_POWER task 260mA 310mA Note: Reference current when the processor is held in reset = 80mA Conclusion It can be seen that low power support saves current but the saving is minimum on the M52235EVB. It has to be further investigated whether the low power mode (using the stop instruction) puts the processor in the lowest power state or whether other settings may improve the figure. utaskerbenchmarks.doc/1.1 4/8 29.4.2012

3. M52235 EVB UDP reaction test Project version V1.2.008 UDP frames were sent to the evaluation board and echoed back by the software. The time from the Ethernet RX frame interrupt to the call back routine, the time required for a copy of the received data (using umemcpy()) to a backup buffer, and the time from the UDP transmission routine call until activating the transmit buffer were measured. The measurement was repeated for various project configurations and the influence of the memory requirements also recorded. The memory copy step is not required for an echo to be achieved but was tested to monitor general umemcpy() performance. The tests were performed at 40MHz. The delays are inversely proportional to the PLL clock used. The M5223X is specified to 60MHz although its PLL can operate beyond 100MHz under normal conditions (although not officially specified for this). 512 byte UDP echo Basic 408/181/586 = 1 175µs No UDP CS 84/181/265 = 530µs DMA for umemcpy and umemset Loop code in SRAM Loop code in SRAM and DMA Loop code in SRAM and DMA No UDP CS 408/88/492 = 988µs 393/164/559 = 1 116µs 393/88/470 = 951µs 84/88/170 = 342µs 1024 byte UDP echo 708/359/1 060 = 2 127µs 86/364/443 = 893µs 730/175/880 = 1 785µs 680/333/1 010 = 2 023µs 685/175/853 = 1 713µs 87/175/256 = 518µs Program size Reference Same as ref. RAM requirements Reference Same as ref. +208 bytes -20 bytes +848 bytes +473 bytes +800 bytes +364 bytes* +800 bytes +364 bytes Notes: Rx interrupt->callback / umemcpy() / Call to transmission = Total The ping reaction time from reception interrupt to transmit buffer activation has approximately 220µs in all cases The basic configuration is without any additional features activated but with UDP checksum calculation All times are in µs Active low power support gives identical reaction times * The umemcpy is either implemented as DMA or in SRAM and so adding DMA decreases the RAM use in this case The following routines were operating in SRAM when the option was set IP Check sum calculation, umemcpy, umemset, umemcmp, ustrlen, ustrcmp, ustrcpy utaskerbenchmarks.doc/1.1 5/8 29.4.2012

Conclusion It is seen that activating DMA support for umemcpy() results in useful performance improvements. Since the same routine is also used during the frame preparation, it also saves during this portion of the echo test. It has no RAM price and only a small additional code size of around 200 bytes. The results suggest that the SRAM routines run about 10% faster than when run directly from FLASH. The reaction time in the test was improved by about 5% overall but the price paid was about 800 bytes of extra code, plus about 400 bytes of SRAM for the routines themselves (several routines were operating in SRAM and so this figure is in fact much less when only the umemcpy() and IP check sum routines are compared). The most critical code segment in the test is obviously the IP check sum over the UDP data. By deactivating the calculation, the reception frame and also the transmission frame are processed much faster. DMA support cannot improve this calculation, whereas operation from SRAM results in only about a 6% advantage. This suggests that investment in an improved calculation algorithm may result in best additional performance increases. utaskerbenchmarks.doc/1.1 6/8 29.4.2012

4. K70F120 Tower Kit UDP reaction test Project version V1.4.1 This test is a repetition of the same test performed with the M52235EVB but on the Kinetis K70 operating at 120MHz. The main reason for repeating the test is to demonstrate the effect of the check sum offloading that is possible with the Ethernet controller in this newer device. The tests were made with 1024 bytes payload in the UDP frames. The comparison between M52235 is based on the assumption that its processing times can be reduced by 1/3 to account for the fact that the Kinetis is running 3x faster than the original clock rate (the M52235 cannot practically run at such a clock rate but the figures make comparisons simpler). Since the M52235 cannot perform UDP checksum calculations in HW, the Kinetis figures, with disabled UDP check sum, are compared with the check sum enabled but offloading active. 1024 byte UDP echo M52235 (120MHz interpolated) Basic 236/120/353 = 709µs No UDP CS* 29/121/148 = 298µs No UDP CS* Memory copy with DMA 29/58/85 = 173µs 1024 byte UDP echo Kinetis K70 (120MHz) 130/61/187 = 378µs 13.6/58/77 = 148µs 12.8/34.4/48.8 = 96µs *For M52235 tests the UDP checksum was disabled / For the Kinetis the UDP checksum was active but HW checksum offloading was enabled for both reception and transmission Conclusion This comparison shows that the Kinetis K70 executes the code, without any acceleration techniques, about twice as fast from Flash as the M52235, when its operation is interpolated to be running a 120MHz. A direct comparison with the M52235 running at 60MHz (its fastest speed) shows that the Kinetis execution is about 4x that of the Coldfire running from Flash. As was seen with the original tests with the M52235, the UDP checksum calculation is the main software load involved. The use of DMA for memory buffer copies improves efficiency but is still outweighed by the calculation overhead. This overhead could only be reduced in M52235 tests by disabling the checksum operation (which is an option for UDP but not so for most other TCP/IP protocols). Also in the case of the Kinetis tests it is clear that the use of DMA for memory buffer copies is useful and reduces the copy time by almost half. More interesting and important are the results of the use of the checksum offloading in offered by its Ethernet hardware since this achieves equivalent savings in software overhead as disabling the UDP checksum did for the Coldfire: - the processing of the reception was reduced from 130µs to 13.6µs just be enabling this utaskerbenchmarks.doc/1.1 7/8 29.4.2012

(960% improvement of efficiency and similar to the improvement obtained by completely disabling the checksum operation in the Coldfire) - the processing of the transmission was reduced from 187µs to 77µs, whereby this task includes a copy of the user s data to the Ethernet transmit buffer. Again, however, a similar saving is seen as was obtained by disabling UDP checksum in the Coldfire case. When comparing just the performance with UDP checksum active it shows that the Kinetis can process the test frame of 1024 bytes payload in 96µs (this includes an additional 1024 byte memory copy in the application to add extra overhead or around 30µs) whereas the M52235 would need 1 142µs to do the same at 60MHz (or interpolated to 120MHz still needs 571µs for the operation). The use of checksum offloading and DMA for memory buffer transfers allows the Kinetis to reduce its processing time from 378µs to 96µs, whereby the checksum offloading is the predominant factor in this test case. The use of both checksum offloading and DMA memory buffer transfer is however seen to be of significance to obtain optimal performance. Modifications: - V1.0 19.11.2006 Original version with M52235 tests - V1.1 29.4.1012 Added document header and UDP performance comparison with Kinetis utaskerbenchmarks.doc/1.1 8/8 29.4.2012