Software Power Optimizations In An Embedded System
|
|
- Augusta Watson
- 5 years ago
- Views:
Transcription
1 Software Power Optimizations In An Embedded System Vishal Dalal 3G Wireless Group Silicon Automation Systems Limited Bangalore, India sasi. com C.P. Ravikumar Department of Electrical Engineering Indian Institute of Technology New Delhi, India iitd. ernet. in Abstract The topic of reducing power dissipation in embedded systems has received considerable attention in the recent years. Techniques have been reported to minimize energy dissipation through (a) selection of better algorithms for the application e.g. DSP algorithms that require fewer number of operations to perform a task such as $filtering (b) minimizing state transitions and switching activity in the hardware implementation, and (c) reducing the operating supply voltage by changing the architecture of the system e.g. through the use of pipelining. However; power dissipation is often neglected when developing the software for embedded systems. Software optimization techniques can be used to reduce the cost, size, and power dissipation in embedded systems without adding to system overheads. In this paper; we view the power dissipation as consisting of two parts, the power dissipated in the application-specijic integrated circuits (hardware power) and the power dissipated by the CPU, memory and associated busses (software power). We provide a trace-based technique to estimate software power and study the effect of different code optimization techniques on software power; performance and code size. 1. Introduction The essential components of an embedded system are the following: 0 a processor, which may either be a general-purpose microprocessor/microcontroller, or an applicationspecific instruction-set processor 0 memory, which may be embedded in the processor or may be external to the processor, 0 software that resides in the memory and runs on the /00.B EEE 254 processor, mainly responsible for real-time handling of U0 requests from the external world, and 0 application-specific integrated circuits (coprocessorsj which carry out the compute-intensive tasks Since many embedded systems, especially those that are used in mobile applications, run on battery power, it is important to ensure that the system dissipates the least possible power while still providing the required functionality. Algorithms that use fewer arithmetidlogic operations to perform the same function must be used to conserve energy e.g. use filtering algorithms that require fewer multiplications. Techniques have also been developed to reduce switching activity in the application-specific hardware in order to reduce power dissipation [2,3,4,9]. These techniques span all the levels of abstraction in VLSI design: architectural-level (e.g. use of pipelining to reduce Vdd), gate-level (e.g. technology mapping to reduce switching activity), transistorlevel (e.g. gate resizing), and layout-level (e.g. use shorter wires for high-activity nets). Unfortunately, power dissipation is often neglected during the software implementation of the algorithms in embedded systems, since code size and performance take priority over power dissipation at this stage. There have been efforts to study power minimization through better use of the instruction repertoire of the CPU [9]. In this paper, our aim is to study software optimization techniques that are used in compilers with the objective of meeting these constraints. We show that the use of code optimization can reduce the software power significantly without adversely affecting the code size or performance. By software power we mean the following components: 0 Power dissipated in the arithmeticllogic circuits and the control unit of the CPU when executing embedded code, 0 Power dissipated to charge and discharge the address and data busses, 0 Power dissipated within the memory circuits
2 In most embedded systems, significant portions of code are repetitively executed e.g. the ADPCM (Adaptive Differential Pulse Code modulation) algorithm is executed for every input sample in a telecom system. The number of samples per second can be 8000 or larger. Thus, if the software is not efficiently written, we can expect that it will require a larger number of cycles, consume more power, and occupy more memory. Programmers may, at times, overrule efficiency to improve the readability of code and simplify debugging. The use of function calls is an example. Compared to inline coding, the use of function calls improves the modularity of the code, results in lesser memory requirement, increases the execution time and power dissipation due to stacking/unstacking. Depending on the number of times a function is called, its inline coding may reduce the power dissipation and improve the execution time considerably at the cost of extra memory space. We study the effect of different code optimization techniques on software power, performance and code size. Our work is an extension of the study reported in [7], which mainly focussed on the size-performance tradeoff in code optimization, but did not consider the effect of the code optimizations or the order in which they are applied on system power dissipation. We believe that ours is the first attempt at modeling and estimation of software power. Our results can be useful in implementing design decisions such as hardware-software partitioning. They are also useful in guiding compiler design. The rest of the paper is organized as follows. In Section 2, we discuss the various components of software power. Section 3 explains the implementation environment. Section 4 describes the various optimization techniques considered in this paper. In Section 5, we discuss software power estimation. We explain the optimization flow in Section 6. In Section 7, we present the results of our study on the example of the ADPCM algorithm. Section 8 concludes the paper. 2. Software Power As mentioned in the previous section, we shall subdivide the power dissipation in an embedded system as hardware power and software power. The former includes the power dissipated in the application specific hardware, whereas the latter includes the power dissipated in the CPU, the memory, and in the address and data busses. We shall assume CMOS implementation in this paper, which means that the main source of software power dissipation is the switching activity in the CPU, memory circuits, and the busses Bus Power The busses comprising of unidirectional address and bidirectional data busses are a group of interconnecting wires through which the processor communicates with the memory and U0 circuits. Each line can be conveniently modeled as lumped RC-transmission line, where R is the wire resistance and C is the wiring capacitance. The capacitor C will charge or discharge depending on the present and previous data. For example, on an 8-bit bus, if the data changes from ' ' to ' ' there are 6 transitions or switching. One estimate shows that charging and discharging of bus lines will take upto half or more the total chip power for O.1um ULSI [6,8]. In another estimate, the power dissipated in the I/O busses can be as high as 80% [8]. There are coding techniques (like bus invert coding) which reduce external switching at the expense of slightly increasing the internal switching, to reduce the overall power. We attempt to reduce these switching activities through efficient source coding Memory power The power dissipated in memory can be a significant component of overall power dissipated in an embedded system. In the InfoPad subsystem [ 11, 50% of power is dissipated in memory. The major component of memory power are as follows: 0 Power dissipated in cell array 0 Power dissipated to charge and discharge the word line and bit lines capacitances 0 Power dissipated in the decode logic 0 Power dissipated in the sense amplifier The power dissipated depends on the type of memory access. A sequential memory access will consume less energy as the next word can be returned from the same buffer. One can also expect that the switching on address lines will be small in sequential accesses. One exception is when the previous word is the last one in the page. In that case, a separate page access needs to be performed, causing more power dissipation. A non-sequential access consumes more energy, as the next word address is either not related to previous address or it is entirely different if the data is on a different page. In the latter case, relative switching in the successive words is also larger. These concepts are illustrated in Figure 1. In the ARM microprocessor which we considered in this paper, CPU cycles are classified as S-cycles, N-cycles, or I-cycles. S-cycles refer to sequential memory accesses, N-cycles refer to non-sequential accesses, and I-cycles refer to internal cycles where there is no external memory access. 255
3 Page 0 Page 2 Sequential Access instructions are decompressed at the time of execution to produce 32-bit ARM instructions, which are then executed as normal lkacer Access Page 3 Page 4 Figure 1. Types of Memory Accesses 2.3. CPU Power Every instruction executed by the CPU will result in switching activity. We can broadly classify instructions as follows: 0 Loadstore instructions 0 Branch instructions 0 Type- 1 Arithmetic instructions (addition, subtraction, shift etc.) 0 Type-2 Arithmetic instructions (multiplication, division) The average energy consumption for these instruction types can be measured either by gate level simulation or instruction level current measurements [9]. Suppose that the relative weights associated with the average energy consumptions for the four instruction types are Wj, 1 < j < 4 and the number of instructions of these types are Ij, then the CPU power Pcpu is given by 4 4 Pcpu 0; cwj x Ij)/ j=1 j=1 3. Implementation Environment We have used the ADPCM algorithm as a vehicle to demonstrate the software power optimizations. ADPCM is widely used in DECT (Digitally Enhanced Cordless Telecommunication) wireless telephone in Mhz band. ADPCM is a speech compression and decompression algorithm. It takes the difference between successive samples of the signal and encodes the difference. We assume that the ADPCM algorithm has been implemented as part of an embedded system which uses the ARM processor [15]. The ARM processor has two instruction sets, the normal 32-bit ARM instruction set and 16-bit Thumb instruction set which is a compressed form of former. The Thumb I We used the ARM Software Development Toolkit [ 11,12,13] which enables the development of applications for the ARM family of microprocessors. The kit contain!; the armulator which emulates the execution of applications on the ARM processor without accessing real hardware. The armulator models both ARM and Thumb instruction sets. There are a number of software modules provided with armulator such as the tracer that can trace out the executed instructions, the type of memory accesses, and any other events that occur during the execution. For example, the tracer can give us the information on the number of S- cycles, N-cycles, and I-cycles Assumptions In our estimation of software power, we made the following assumptions. 0 There is no glitching on the busses 0 A single, bidirectional databus, 0 A full Vdd swing in bus switching, 0 On an average, a non-sequenlial access takes twice the power as compared to sequential access 0 When CPU performs 8/16-bit operations on a 32-bit data bus, it will output 0's in the remaining lines. 4. Optimization Techniques The various optimizations cons1 dered are fully described in [ 141. Although the ARM compiler (armcc) provides the options -otime and -ospace for performing peephole optimizations for performance and code size, these were not effective, suggesting that improvements must be made at the source code level. 0 A for loop coded as for (i = l;i <= max;i++) can be replaced by for (i = maz;z > 0;i - -). The latter style is more efficient, since no register is required for saving max. 0 In loop unrolling, the increment of i can be done during the same iteration provided max is even. This minimizes the total number of Compare instructions, but increases code size. 256
4 0 A program normally contains a number of function calls. These function calls are associated with computational overheads such as stacking and unstacking. If these functions are coded inline, then these overheads can be eliminated but at the cost of increase in code size. 0 Another technique is creatingfunction macros through #define preprocessor directive. The following structural transformations can be applied to ADPCM code. 0 Making code option spec@ instead of generalising it to include different data rates, PCM laws etc. The code was made specific to 32-bit data rate, the uniform law PCM, and ITU-T recommended standard. 0 Making the code embedded system-oriented by eliminating the print f statements. 0 Eliminating unnecessary addition, for example, D = (SLI SEI)&65535; can be efficiently replaced by D = (SLI-SEI)&65535; as is a 17-bit number. 0 Transforming the branching operations, for example, in Power2-exp function, use to find the exponent in ADPCM is: if (Val >=16384) i=15; else if ((val>=8192)&& (va )) i=14; else if ((val>=8) && (vak16)) i=4; It can be efficiently replaced by if (val>=16384) i=15; else if (val>=16) i=5; else if (val>=l) i=l; These optimizations were applied individually and there effect on power, performance and code size was analysed for both 32-bit ARM compiled code (armcc) and 16-bit Thumb compiled code (tcc) instruction sets. & Performance Source Code % Compiler Code Size...I U I Emu! I 71 Tracer Estimator 5. Power Estimation Figure 2. Optimization Flow In order to estimate bus power, switching between the two consecutive words on 32-bit busses was calculated. The tracer module traces all the memory accesses in the execution of the program. As per ARM documentation, an N- cycle can consume up to 2 clock cycles whereas an S-cycle requires only one cycle. Therefore, the total number of cycles required to complete a program is (2N + S + I) clock cycles in the worst case. The dissipated power is proportional to [Total Switching x (N + S + I ) l(2n + S + I)]. The power dissipated in memory is proportional to (B.N + S)/(A.N + S + I), where B is the relative energy of a non-sequential access in comparison to sequential access, A is the relative length of a non-sequential access in comparison to a sequential access. In the worst case, A can be 2. The value of B will vary, depending upon the type and size of memory. One has to experimentally tune the value of B. In this work, we took B = Optimization Flow The complete process of optimizing the code is depicted in the optimization flow shown in Figure 2. If the specitications are not met, then optimizations need to be applied again as shown by the dotted lines. Y 257
5 7. Results The percentage changes for each optimization technique with respect to the original ADPCM are shown in Tables 1 and 2. The negative sign shows that the optimization degrades the respective criterion and therefore should not be applied. The compiler options -otime and -ospace degrades the performance or compiler is not able to optimize for the criterion. Simultaneous application of optimization techniques give better results. Table 1. Percentage Changes for armcc simultaneously, many technique!; offer only marginal improvements. Many optimizations are performed on a small part of the code. This produces results which are locally optimum but not globally optimum. E.g. transformation of brunching, which gives maximum improvements in power, was used 4 times per sample in ithe ADPCM code, giving good improvements. When the compiler performs optimizations one after another there may be undesired interaction between them. Therefore, the order of optimizations matters. The compiler can try all possible orderings, but in practice, it orders optimizations by experimentation bec:ause of time constraint. Power size -otime ospace for loop Table 3. Order of Optimizations for armcc branchings 3 I branching oriented I 1.08 I option specific I 0.61 I 1.43 I 2.64 Table 2. Percentage Changes for tcc I Optimizations I Bus I Performance I Code I -otime -ospace for looo unnecessary addition function call function macros embedded oriented power size I The tables clearly show that some of these optimization techniques offer good improvements in power and performance e.g. transformation of branching and function calls. We also note that many optimizations improve all the three aspects, but some of them result in tradeoffs. In the latter situation, we can rank the optimization techniques for each of the three criteria, as shown in Tables 3 and 4. When used function call unnecessary addition embedded option specific for loop termination -0space otime 10 Table 4. Order of Optimizations for tcc power branching 1 option specific 2 embedded oriented 3 unnecessary addition 4 loop unrolling 5 function macros 6 function call for loop termination I -ospace
6 8. Conclusions In this paper, we have studied the effect of several source-level optimizations on the performance, power, and code size of embedded software. We illustrated the tradeoffs involved using the example of the ADPCM algorithm, which is often used in applications such as the answering machine. Our results indicate that significant reductions in power dissipation are possible through code rewriting. We have provided a method to estimate software power in embedded systems, which considers CPU power, bus power, and memory power. Acknowledgements We thank Thomas Major of Philips Semiconductors, Bangalore, for permitting us to use the ARM Software Development Kit. We thank Ani1 Sharma of Philips Semiconductors, Eindhoven, for many useful discussions. Tiwari V, Malik S et.al, "Power Analysis Of Embedded Software: A first Step Towards Software Power Minimization", IEEE Transactions on VLSI Systems, pp , December Weste N and Eshragian K, "Principles Of CMOS VLSI Design", Addison-Wesley, Advanced RISC Machine User Guide, ARM DUI 0040C. Advanced RISC Machine Reference Guide, ARM DUI B. ARM7TDMI Data Sheet, ARM DDI, 0029E. ARM Application note 34, "Writing Efficient C For ARM", ARM DAI0034A. Website of Advanced RISC Machine References Burd T.D and Broderson R, "Processor Design For Portable Systems", Department of EECS, University of California at Berkeley. Chandrakasan A and Broderson R, "Low Power CMOS Design", IEEE press, Cahndrakasan A, Sheng S and Broderson R, "Low power CMOS Digital Design", IEEE Journal of Solid State Circuits", pp , April Mehta H, Owens R.M, Irwin M.J, Chen R and Ghosh D, "Techniques for Low Energy Software", Department of Computer Science and Engineering, The Pennsylvania State University, PA. Najm F, "Transition Density: A New Measure Of Activity In Digital Circuit", IEEE Transactions on CAD of Integrated Circuits and Systems, pp , Feb Nakagome Y, Itoh K et.al, "Sub-1-v Swing Internal Bus Architecture for Future Low Power ULSI's", IEEE Journal of Solid State Circuit, pp , April Sharma A and Ravikumar C.P, "Efficient Implementation Of ADPCM Codec", The 13th International conference on VLSI Design, Calcutta, January 3-7, Stan M, Burleson W, "Bus Invert Coding For Low Power VO", IEEE Transactions of VLSI System, pp 49-58, March
Low-Power Data Address Bus Encoding Method
Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,
More informationTHE latest generation of microprocessors uses a combination
1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz
More informationVERY large scale integration (VLSI) design for power
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,
More informationHigh-Performance Full Adders Using an Alternative Logic Structure
Term Project EE619 High-Performance Full Adders Using an Alternative Logic Structure by Atulya Shivam Shree (10327172) Raghav Gupta (10327553) Department of Electrical Engineering, Indian Institure Technology,
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationBus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao
Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Abstract In microprocessor-based systems, data and address buses are the core of the interface between a microprocessor
More informationOUTLINE Introduction Power Components Dynamic Power Optimization Conclusions
OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism
More informationLow Power PLAs. Reginaldo Tavares, Michel Berkelaar, Jochen Jess. Information and Communication Systems Section, Eindhoven University of Technology,
Low Power PLAs Reginaldo Tavares, Michel Berkelaar, Jochen Jess Information and Communication Systems Section, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands {regi,michel,jess}@ics.ele.tue.nl
More informationBehavioral Array Mapping into Multiport Memories Targeting Low Power 3
Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,
More informationProblem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.
Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.
More information1. Designing a 64-word Content Addressable Memory Background
UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Project Phase I Specification NTU IC541CA (Spring 2004) 1. Designing a 64-word Content Addressable
More informationDesign of a Pipelined 32 Bit MIPS Processor with Floating Point Unit
Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit P Ajith Kumar 1, M Vijaya Lakshmi 2 P.G. Student, Department of Electronics and Communication Engineering, St.Martin s Engineering College,
More informationEE 434 ASIC & Digital Systems
EE 434 ASIC & Digital Systems Dae Hyun Kim EECS Washington State University Spring 2018 Course Website http://eecs.wsu.edu/~ee434 Themes Study how to design, analyze, and test a complex applicationspecific
More informationUnleashing the Power of Embedded DRAM
Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers
More informationModule 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals
Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals Objectives In this lecture you will learn the following Introduction SRAM and its Peripherals DRAM and its Peripherals 30.1 Introduction
More informationDesign of Low Power Wide Gates used in Register File and Tag Comparator
www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,
More informationHigh Performance Memory Read Using Cross-Coupled Pull-up Circuitry
High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA
More informationA Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic
A Novel Design of High Speed and Area Efficient De-Multiplexer Using Pass Transistor Logic K.Ravi PG Scholar(VLSI), P.Vijaya Kumari, M.Tech Assistant Professor T.Ravichandra Babu, Ph.D Associate Professor
More informationColumbia Univerity Department of Electrical Engineering Fall, 2004
Columbia Univerity Department of Electrical Engineering Fall, 2004 Course: EE E4321. VLSI Circuits. Instructor: Ken Shepard E-mail: shepard@ee.columbia.edu Office: 1019 CEPSR Office hours: MW 4:00-5:00
More informationComputer Architecture
Computer Architecture Lecture 1: Digital logic circuits The digital computer is a digital system that performs various computational tasks. Digital computers use the binary number system, which has two
More informationA Comparative Study of Power Efficient SRAM Designs
A Comparative tudy of Power Efficient RAM Designs Jeyran Hezavei, N. Vijaykrishnan, M. J. Irwin Pond Laboratory, Department of Computer cience & Engineering, Pennsylvania tate University {hezavei, vijay,
More informationIn this tutorial, we will discuss the architecture, pin diagram and other key concepts of microprocessors.
About the Tutorial A microprocessor is a controlling unit of a micro-computer, fabricated on a small chip capable of performing Arithmetic Logical Unit (ALU) operations and communicating with the other
More informationLow Power SRAM Design with Reduced Read/Write Time
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 3 (2013), pp. 195-200 International Research Publications House http://www. irphouse.com /ijict.htm Low
More informationFPGA Power Management and Modeling Techniques
FPGA Power Management and Modeling Techniques WP-01044-2.0 White Paper This white paper discusses the major challenges associated with accurately predicting power consumption in FPGAs, namely, obtaining
More informationINTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017
Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of
More informationAnalysis of Different Multiplication Algorithms & FPGA Implementation
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 2, Ver. I (Mar-Apr. 2014), PP 29-35 e-issn: 2319 4200, p-issn No. : 2319 4197 Analysis of Different Multiplication Algorithms & FPGA
More informationINTERCONNECT TESTING WITH BOUNDARY SCAN
INTERCONNECT TESTING WITH BOUNDARY SCAN Paul Wagner Honeywell, Inc. Solid State Electronics Division 12001 State Highway 55 Plymouth, Minnesota 55441 Abstract Boundary scan is a structured design technique
More informationCode Compression for DSP
Code for DSP Charles Lefurgy and Trevor Mudge {lefurgy,tnm}@eecs.umich.edu EECS Department, University of Michigan 1301 Beal Ave., Ann Arbor, MI 48109-2122 http://www.eecs.umich.edu/~tnm/compress Abstract
More informationDESIGN AND IMPLEMENTATION OF BIT TRANSITION COUNTER
DESIGN AND IMPLEMENTATION OF BIT TRANSITION COUNTER Amandeep Singh 1, Balwinder Singh 2 1-2 Acadmic and Consultancy Services Division, Centre for Development of Advanced Computing(C-DAC), Mohali, India
More informationEECS Dept., University of California at Berkeley. Berkeley Wireless Research Center Tel: (510)
A V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications Hui Zhang, Vandana Prabhu, Varghese George, Marlene Wan, Martin Benes, Arthur Abnous, and Jan M. Rabaey EECS Dept., University
More informationCode Compression for RISC Processors with Variable Length Instruction Encoding
Code Compression for RISC Processors with Variable Length Instruction Encoding S. S. Gupta, D. Das, S.K. Panda, R. Kumar and P. P. Chakrabarty Department of Computer Science & Engineering Indian Institute
More informationChapter 5. Introduction ARM Cortex series
Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1
More informationFPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.
FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different
More informationInternational Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN
International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 938 LOW POWER SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY T.SANKARARAO STUDENT OF GITAS, S.SEKHAR DILEEP
More informationEmbedded SRAM Technology for High-End Processors
Embedded SRAM Technology for High-End Processors Hiroshi Nakadai Gaku Ito Toshiyuki Uetake Fujitsu is the only company in Japan that develops its own processors for use in server products that support
More informationLow-Power Technology for Image-Processing LSIs
Low- Technology for Image-Processing LSIs Yoshimi Asada The conventional LSI design assumed power would be supplied uniformly to all parts of an LSI. For a design with multiple supply voltages and a power
More informationReference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses
Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses Tony Givargis and David Eppstein Department of Information and Computer Science Center for Embedded Computer
More informationThree DIMENSIONAL-CHIPS
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 4 (Sep-Oct. 2012), PP 22-27 Three DIMENSIONAL-CHIPS 1 Kumar.Keshamoni, 2 Mr. M. Harikrishna
More informationSIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I
SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK Subject with Code : DICD (16EC5703) Year & Sem: I-M.Tech & I-Sem Course
More informationArchitecture of Computers and Parallel Systems Part 6: Microcomputers
Architecture of Computers and Parallel Systems Part 6: Microcomputers Ing. Petr Olivka petr.olivka@vsb.cz Department of Computer Science FEI VSB-TUO Architecture of Computers and Parallel Systems Part
More informationCS250 VLSI Systems Design Lecture 9: Memory
CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled
More informationUNIVERSITY OF MORATUWA CS2052 COMPUTER ARCHITECTURE. Time allowed: 2 Hours 10 min December 2018
Index No: UNIVERSITY OF MORATUWA Faculty of Engineering Department of Computer Science & Engineering B.Sc. Engineering 2017 Intake Semester 2 Examination CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours
More informationLab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation
Course Goals Lab Understand key components in VLSI designs Become familiar with design tools (Cadence) Understand design flows Understand behavioral, structural, and physical specifications Be able to
More informationDC57 COMPUTER ORGANIZATION JUNE 2013
Q2 (a) How do various factors like Hardware design, Instruction set, Compiler related to the performance of a computer? The most important measure of a computer is how quickly it can execute programs.
More informationLow-Power SRAM and ROM Memories
Low-Power SRAM and ROM Memories Jean-Marc Masgonty 1, Stefan Cserveny 1, Christian Piguet 1,2 1 CSEM, Neuchâtel, Switzerland 2 LAP-EPFL Lausanne, Switzerland Abstract. Memories are a main concern in low-power
More informationPower Reduction Techniques in the Memory System. Typical Memory Hierarchy
Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache
More informationVERY LOW POWER MICROPROCESSOR CELL
VERY LOW POWER MICROPROCESSOR CELL Puneet Gulati 1, Praveen Rohilla 2 1, 2 Computer Science, Dronacharya College Of Engineering, Gurgaon, MDU, (India) ABSTRACT We describe the development and test of a
More informationChapter 5: ASICs Vs. PLDs
Chapter 5: ASICs Vs. PLDs 5.1 Introduction A general definition of the term Application Specific Integrated Circuit (ASIC) is virtually every type of chip that is designed to perform a dedicated task.
More informationDYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)
DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS
More informationFPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST
FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is
More information18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013
18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,
More informationSDR Forum Technical Conference 2007
THE APPLICATION OF A NOVEL ADAPTIVE DYNAMIC VOLTAGE SCALING SCHEME TO SOFTWARE DEFINED RADIO Craig Dolwin (Toshiba Research Europe Ltd, Bristol, UK, craig.dolwin@toshiba-trel.com) ABSTRACT This paper presents
More informationOn GPU Bus Power Reduction with 3D IC Technologies
On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The
More informationCPU ARCHITECTURE. QUESTION 1 Explain how the width of the data bus and system clock speed affect the performance of a computer system.
CPU ARCHITECTURE QUESTION 1 Explain how the width of the data bus and system clock speed affect the performance of a computer system. ANSWER 1 Data Bus Width the width of the data bus determines the number
More informationEnergy Issues in Software Design of Embedded Systems
Energy Issues in Software Design of Embedded Systems A. CHATZIGEORGIOU, G. STEPHANIDES Department of Applied Informatics University of Macedonia 156 Egnatia Str., 54006 Thessaloniki GREECE alec@ieee.org,
More informationUNIT 4 INTEGRATED CIRCUIT DESIGN METHODOLOGY E5163
UNIT 4 INTEGRATED CIRCUIT DESIGN METHODOLOGY E5163 LEARNING OUTCOMES 4.1 DESIGN METHODOLOGY By the end of this unit, student should be able to: 1. Explain the design methodology for integrated circuit.
More informationLPRAM: A Novel Low-Power High-Performance RAM Design With Testability and Scalability. Subhasis Bhattacharjee and Dhiraj K. Pradhan, Fellow, IEEE
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY 2004 637 LPRAM: A Novel Low-Power High-Performance RAM Design With Testability and Scalability Subhasis
More information6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1
6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,
More informationMicroprocessor Architecture
Microprocessor - 8085 Architecture 8085 is pronounced as "eighty-eighty-five" microprocessor. It is an 8-bit microprocessor designed by Intel in 1977 using NMOS technology. It has the following configuration
More informationCluster-based approach eases clock tree synthesis
Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network
More informationExploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors
Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,
More informationHonorary Professor Supercomputer Education and Research Centre Indian Institute of Science, Bangalore
COMPUTER ORGANIZATION AND ARCHITECTURE V. Rajaraman Honorary Professor Supercomputer Education and Research Centre Indian Institute of Science, Bangalore T. Radhakrishnan Professor of Computer Science
More informationA Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors
A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,
More informationPower Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study
Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study William Fornaciari Politecnico di Milano, DEI Milano (Italy) fornacia@elet.polimi.it Donatella Sciuto Politecnico
More informationARM ARCHITECTURE. Contents at a glance:
UNIT-III ARM ARCHITECTURE Contents at a glance: RISC Design Philosophy ARM Design Philosophy Registers Current Program Status Register(CPSR) Instruction Pipeline Interrupts and Vector Table Architecture
More informationShift Invert Coding (SINV) for Low Power VLSI
Shift Invert oding (SINV) for Low Power VLSI Jayapreetha Natesan* and Damu Radhakrishnan State University of New York Department of Electrical and omputer Engineering New Paltz, NY, U.S. email: natesa76@newpaltz.edu
More informationAdvanced Parallel Architecture Lesson 3. Annalisa Massini /2015
Advanced Parallel Architecture Lesson 3 Annalisa Massini - Von Neumann Architecture 2 Two lessons Summary of the traditional computer architecture Von Neumann architecture http://williamstallings.com/coa/coa7e.html
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits
More informationCOMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital
Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in
More informationEEL 4783: HDL in Digital System Design
EEL 4783: HDL in Digital System Design Lecture 13: Floorplanning Prof. Mingjie Lin Topics Partitioning a design with a floorplan. Performance improvements by constraining the critical path. Floorplanning
More informationAbbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University
Abbas El Gamal Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program Stanford University Chip stacking Vertical interconnect density < 20/mm Wafer Stacking
More informationECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141
ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition
More information160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp
Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing
More informationPerformance Analysis and Designing 16 Bit Sram Memory Chip Using XILINX Tool
Performance Analysis and Designing 16 Bit Sram Memory Chip Using XILINX Tool Monika Solanki* Department of Electronics & Communication Engineering, MBM Engineering College, Jodhpur, Rajasthan Review Article
More informationManaging Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks
Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department
More informationDesign and Implementation of Low Leakage Power SRAM System Using Full Stack Asymmetric SRAM
Design and Implementation of Low Leakage Power SRAM System Using Full Stack Asymmetric SRAM Rajlaxmi Belavadi 1, Pramod Kumar.T 1, Obaleppa. R. Dasar 2, Narmada. S 2, Rajani. H. P 3 PG Student, Department
More informationActel s SX Family of FPGAs: A New Architecture for High-Performance Designs
Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs A Technology Backgrounder Actel Corporation 955 East Arques Avenue Sunnyvale, California 94086 April 20, 1998 Page 2 Actel Corporation
More informationEE586 VLSI Design. Partha Pande School of EECS Washington State University
EE586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 1 (Introduction) Why is designing digital ICs different today than it was before? Will it change in
More informationColumn decoder using PTL for memory
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 5, Issue 4 (Mar. - Apr. 2013), PP 07-14 Column decoder using PTL for memory M.Manimaraboopathy
More informationUsing a Victim Buffer in an Application-Specific Memory Hierarchy
Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang Depment of lectrical ngineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Depment of Computer Science
More informationPower Efficient Arithmetic Operand Encoding
Power Efficient Arithmetic Operand Encoding Eduardo Costa, Sergio Bampi José Monteiro UFRGS IST/INESC P. Alegre, Brazil Lisboa, Portugal ecosta,bampi@inf.ufrgs.br jcm@algos.inesc.pt Abstract This paper
More informationChapter 2 Logic Gates and Introduction to Computer Architecture
Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are
More informationInternational Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering
IP-SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY A LOW POWER DESIGN D. Harihara Santosh 1, Lagudu Ramesh Naidu 2 Assistant professor, Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India
More informationMarching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.
UCAS-6 6 > Stanford > Imperial > Verify 2011 Marching Memory マーチングメモリ Tadao Nakamura 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. Flynn 1 Copyright 2010 Tadao Nakamura C-M-C Computer
More informationDESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES
Volume 120 No. 6 2018, 4453-4466 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR
More informationAN10035_1 Comparing energy efficiency of USB at full-speed and high-speed rates
Comparing energy efficiency of USB at full-speed and high-speed rates October 2003 White Paper Rev. 1.0 Revision History: Version Date Description Author 1.0 October 2003 First version. CHEN Chee Kiong,
More informationAnalysis and Design of Low Voltage Low Noise LVDS Receiver
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 9, Issue 2, Ver. V (Mar - Apr. 2014), PP 10-18 Analysis and Design of Low Voltage Low Noise
More informationSTUDY OF SRAM AND ITS LOW POWER TECHNIQUES
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN ISSN 0976 6464(Print)
More informationUniversität Dortmund. ARM Architecture
ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture
More information1 Introduction to Microcontrollers
1 Introduction to Microcontrollers EE445 - Microcontrollers and Embedded Systems Chapter 1: Introduction to Microcontro EE445 Microcontrollers and Emb and and Embedded Embedded Microcontrollers EE445 -
More informationContents of this presentation: Some words about the ARM company
The architecture of the ARM cores Contents of this presentation: Some words about the ARM company The ARM's Core Families and their benefits Explanation of the ARM architecture Architecture details, features
More informationCALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL
CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL Shyam Akashe 1, Ankit Srivastava 2, Sanjay Sharma 3 1 Research Scholar, Deptt. of Electronics & Comm. Engg., Thapar Univ.,
More informationPower Optimization in FPGA Designs
Mouzam Khan Altera Corporation mkhan@altera.com ABSTRACT IC designers today are facing continuous challenges in balancing design performance and power consumption. This task is becoming more critical as
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:349-9745, Date: -4 July, 015 Design a Full Adder Block for optimization of PDP Neha K. Sancheti 1, Shubhangi
More informationA 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS
A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS ABSTRACT We describe L1 cache designed for digital signal processor (DSP) core. The cache is 32KB with variable associativity (4 to 16 ways) and is pseudo-dual-ported.
More informationHotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.
HotChips 2007 An innovative HD video and digital image processor for low-cost digital entertainment products Deepu Talla Texas Instruments 1 Salient features of the SoC HD video encode and decode using
More informationAnalysis of Power Dissipation and Delay in 6T and 8T SRAM Using Tanner Tool
Analysis of Power Dissipation and Delay in 6T and 8T SRAM Using Tanner Tool Sachin 1, Charanjeet Singh 2 1 M-tech Department of ECE, DCRUST, Murthal, Haryana,INDIA, 2 Assistant Professor, Department of
More informationA Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup
A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington
More informationImplementation of ALU Using Asynchronous Design
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 6 (Nov. - Dec. 2012), PP 07-12 Implementation of ALU Using Asynchronous Design P.
More informationPower Measurement Using Performance Counters
Power Measurement Using Performance Counters October 2016 1 Introduction CPU s are based on complementary metal oxide semiconductor technology (CMOS). CMOS technology theoretically only dissipates power
More information