Reusing Cache for Real-Time Memory Address Trace Compression
|
|
- Martin Harris
- 6 years ago
- Views:
Transcription
1 Reusing for Real-Time Memory Address Trace Ing-Jer Huang Dept of Computer Science and Engineering National Sun Yat-Sen University Kaohsiung 804, Taiwan Tel : ext ijhuang@csensysuedutw Chung-Fu Kao Dept of Computer Science and Engineering National Sun Yat-Sen University Kaohsiung 804, Taiwan Tel : cfkao@eslabcsensysuedutw Abstract - Instruction trace can help designer to debug the system architecture and understand the program behavior However, one of the major problems of tracing is the highest cost to store the trace result How to reduce the trace information and compress the trace volumes is an important issue while tracing a program is one of the basic components in modern microprocessor design, and usually cache will be disabled when system is under debugging In this paper, we present the technique that reusing system cache for memory trace compression within system debugging, and we can use a cache simulator which with the same behavior of hardware cache to restore the fully trace result compress the memory es trace volumes at real-time, that means we can record the memory references successively Because of the modern microprocessor design such as ARM922T [3] and MIPS processor core [4], almost has an embedded cache (instruction cache and data cache) within it, we mainly aim at how to use the system cache to decrease the hardware cost, and still can record the most important trace information and doesn't need to stop the microprocessor execution After compressed these trace data, we can transfer them to debug host quickly and saved as trace files I Introduction In the era of system-on-chip (SoC), system and microprocessor debugging are more and more important due to design complexity and time-to-market The IEEE-ISTO Nexus 5001 Forum [1] established an open industry standard that provides a general-purpose interface for the software development and debug of embedded processors To debug the microprocessor, the dynamic executed instructions are one of the useful information that help designer not only to detect the hardware bug but also analysis the program behavior Once microprocessor runs a program, the dynamic executed instructions must be collected as a trace file Usually we record the es of instruction executed in the trace file The difficulties in obtaining a complete program trace stem from the high cost of recording every executed instruction when the program runs and from the large size of the resulting trace files [2] A program runs for minutes on even 200MHz RISC microprocessor, resulting in gigabytes or terabytes huge trace files For example, a 10MHz, 32-bit microprocessor may gene the full trace file with the speed of (10 32)/8 = 40MBytes/sec; a 100 MHz, 32-bit microprocessor will gene the full trace file will the speed of (100 32)/8 = 400MBytes/sec Therefore, various techniques have been proposed to reduce the trace capacities We propose a technique that reusing the system cache for II Related Work The use of memory traces is an established technique for simulation-based researches of computer architecture and systems [6], and its techniques can briefly divide into two major parts: hardware part and software part [5][7], as shown in Fig1 Addresses Tracing Techniques - H/W Approaches H/W monitoring Embedded tracer Nexus 5001 ETM d trace - S/W Approaches Interrupt based Instrumented program Trap-bit method Fig 1 The classification of es tracing techniques
2 By modifying either the trace mechanism (hardware) or the debugging tools (software) to record the es of executed instruction can reduce the tracing overhead As Fig1 shown, the software approaches are easy to implement but difficult to store the trace result at real-time due to microprocessor operation speed is faster than trace collection speed [7] In Fig1, the H/W monitoring approach [8] [9] record processor memory requests directly This monitoring captures both user and operating system reference The embedded tracer in Fig1 means integrating the trace mechanism and microprocessor into a single chip, eg ARM Embedded Trace Macrocell (ETM) [10] and MIPS E/M family processor cores [4] III Proposed Approach In this section, we present the proposed reusing cache for memory trace compression technique that is suitable for real-time compression of memory references trace Fig2 shown the primary components in an embedded system, they are: microprocessor, caches, on-chip memory, on-chip bus, etc The gray rectangle in Fig2 is our real-time es compression mechanism, and it contains a branch target filter (B/T filter) and a cached tracer uprocessor B/T Filter (1) (1) I- $ (2) (3) d Tracer Trace file Embedded memory On-chip bus Off-chip Fig 2 Embedding trace mechanism into system-on-chip In Fig2, the signals (1), (2), and (3) denote the enable,, and cache index respectively We will introduce the branch/target filter, system cache (instruction cache) and cached tracer as follows segment of program instructions in a basic block will be executed contiguously, ie the instruction es in a basic block will be increased by a constant offset Fig3 shown the simple basic block diagram The constant offset is dependent on machine instruction set architecture (ISA) For example, the successive references of ARM7 microprocessor will increase by 4, eg a, a+4, a+8, a+c, etc The first instruction in the basic block we denote as target, and the last instruction in the basic block we denote as branch After executed the branch instruction the program execution flow will jump to another basic block, the target Since we define the feature of a basic block, we can just record the target and branch es and ignore the contiguous es to reduce the trace volumes Address Instruction 0000 Target Basic Block 000C 0010 Branch C Target Branch Basic Block Fig 3 Illustration of basic block diagram Microprocessor will sent an for memory request and this information will received by branch/target filter (B/T filter) Fig4 shown the B/T filter block diagram After the B/T filter received the current reference, the previous value will be subtracted from current value The operation of B/T filter is shown in Fig5 The B/T filter will output an enable signal to system cache and cached tracer module as true if the offset of previous and current is not equal with constant offset Then the system cache will cache the current and cached tracer module will record the trace information A Branch/Target Filter When the microprocessor executes a program, the dynamic execution flow has locality characteristics which are spatial locality and temporal locality The spatial locality is the property that the next accessed will probably be very close to the last accessed As a result, we define a basic block which a Address from microprocessor B/T Filter Previous Constant offset - = enable
3 Fig 4 Branch/Target filter architecture previous_ = 0; offset = 0; constant_offset = 4 ; // user defined Branch-Target filter ( current_ ) { offset = current_ - previous_; if (offset == constant_offset) enable = False; //contiguously instruction else // offset!= const_offset enable = True; //save branch instruction previous_ = current_; } Fig 5 Pseudo-code illustrating the B/T filter operation B System After B/T filter operation, we can only record the branch and target references, but the trace volumes maybe still large For example, a 32-bit RISC microprocessor will send a 32-bit request to memory During the system tracing stage, if the number of total branch and target counts is m, the total trace size is m 32 Memory references from programs are known to show copious temporal locality Temporal locality means if a memory is accessed, it will probably be accessed again in the very near future With advances in chip technology, most new microprocessors incorpo embedded caches, which allow some memory reference to be handled internally for increase the system performance When system is under debugging stage, the system usually disables cache, and memory reference will bypass the cache Therefore, we reuse the system cache and emulate it a small direct mapped structure, called a compression cache, to keep the recently seen memory references If the next memory reference hits in the cache, the hit index value will sent to cached tracer module Otherwise cache miss occurs, the full information will sent to cached tracer module Whether cache hit or miss, the hit/miss information will also sent to cached tracer module C d Tracer If a branch or target information already stored in cache, which is cache hit The system cache will sent a Hit and hit index signals to cached tracer module If cache miss, the system cache will sent a and miss information to cached tracer module According to this result, the trace file format will be like as follows, where H denotes hit and M denotes miss: H(index)H(index)M()H(index)M()M(), etc If a program is loop-intensive or temporal localityintensive, especially for multimedia programs, the cache hit will greater than miss, we can reduce the trace file size because we don't need to record the full information for every branch/target d tracer module will transmit the trace information to off-chip host to save this information as trace files d tracer could transmit the data to host via JTAG port or other transmission protocols such as IEEE-ISTO Nexus 5001 AUX port or Ethernet port IV Experiment Results In this paper, we show five critical programs executed on ARM7 microprocessor and tracing 1,000,000 instructions for each program These five program are DCT (Discrete Cosine Transform), FFT (Fast Fourier Transform), JPEG encoder, Fibonacci sequence, and Tower of Hanoi Obviously, these five programs are loop-intensive benchmark, especially for recursive programs of Fibonacci sequence and Tower of Hanoi We also use different cache size to emulate the cache behavior to analysis the relation between cache size and trace compression As we mention above, the trace file size can be calculated by Eq1 S = H ( 1 + I) + M (1 + A) (bits) (Eq 1) Where I: cache index size A: memory size H: number of cache hit count M: number of cache miss count S: trace file size In Eq1, the item (1+I) means that if there is a cache hit of a reference, the trace record format is H(index) For instance, the hit index is , and the trace record is The first 1 denote hit, note that hit information keeps 1 bit The item (1+A) means that if there is a cache miss of a reference, the trace record format is M(full ) For example, the cache miss is a 32-bit width: , and the trace record is The first 0 denote miss So, for each cache hit, the record size is 1 plus hit index size; for each cache miss, the record size is 1 plus full bit-width In a trace file, the cache index size and a full bit-width are fixed, and it's easy to decompression the full trace information For each table, the instruction number indicates the total tracing instruction count The size is 32-bit due to ARM7TDMI is a 32-bit RISC microprocessor, its bus width is 32-bit indicates the trace file size, and is the product of instruction number and size For full trace, we don't use any compression technique and record all dynamic executed instruction es B/T filter means we omit the contiguous from total referenced instruction es The hit/miss shows the cache hit/miss count, and the sum of hit counts and miss counts should be equal to B/T filter instruction number The index size is
4 dependent on cache, and equal to log 2 ( cache ) The file size for cached trace could be calculated according to Eq1 The compression for each rows is compared with full trace, for example, the compression of 7 th row of table 1 (cache ) is 2k file size ( 1 ) 100% Full trace file size Table 1 ratio of the DCT program Instruction number Address size B/T filter % / / / / % % % % Table 2 ratio of the FFT program Instruction Line size number B/T filter % / / / / % % % % Table 3 ratio of the JPEG encoder program Instruction Line size number B/T filter % / / / / % % % % Table 4 ratio of the Fibonacci sequence program Instruction Line size number B/T filter % / / / / % % % % Table 5 ratio of the Tower of Hanoi program Instruction Line size number B/T filter % / / / / % % % % V Conclusions In this paper, we present a novel approach that reuse the system cache to reduce the trace volumes and hardware design overhead The dynamic executed instructions trace is one of the useful information that help designer not only to detect the hardware bug but also analysis the program behavior With increasing working frequency of a microprocessor, the tracing volumes will grows up very quickly, we should compress the trace data in order to record these information at real-time First, we define the basic-block concept and implement the branch/target filter to omit the successive referenced memory es Second, under system debugging, the cache usually be disabled, and we reuse the cache for hardware resource sharing to compress the memory references We use this technique based on program temporal locality characteristic The experiment results show that the average compression is approximately 90% after cache operation We will research about other compression method, such as LZ, LZW to obtain higher compression in the future
5 References [1] IEEE-ISTO Nexus 5001 Forum, [2] J R Larus, "Efficient program tracing," Computer, Vol 26 No 5, pp 52-61, May 1993 [3] ARM Ltd, ARM922T Technical Reference Manual, ARM, [4] MIPS web site, tation/processorcores/doclibrary [5] S-M Huang, I-J Huang, and C-F Kao, "Reconfigurable real-time trace compressor for embedded microprocessors," Proc IEEE Int'l Conf Field Programmable Technology (FPT), pp , 2003 [6] R A Uhlig and T N Mudge, "Trace-driven memory simulation: a survey," ACM Computing Surveys, Vol 29, No 2, pp , 1997 [7] Stunkel, CB, Janssens, B, and Fuchs, W K, "Collecting traces from parallel computers," Proc 24 th Annual Hawaii Int'l Conf System Sciences, Vol 1, pp , 1991 [8] D W Clark, " performance in the VAX-11/780," ACM Trans Computer Systems, Vol 1, pp 24-37, 1983 [9] A Malony, "Cedar performance measurements," CSRD Report No 579, Center for Supercomputing Research and Development, Univ of Illinois, Urbana, IL, 1986 [10] ARM Ltd, ETM9 Technical Reference Manual, ARM,
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 4, APRIL
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 4, APRIL 2011 571 An On-Chip AHB Bus Tracer With Real-Time Compression and Dynamic Multiresolution Supports for SoC Fu-Ching
More informationAn Embedded Low Power/Cost 16-Bit Data/Instruction Microprocessor Compatible with ARM7 Software Tools
An Embedded Low Power/Cost 16-Bit Data/Instruction Microprocessor Compatible with ARM7 Software Tools Fu-Ching Yang Department of Computer Science and Engineering National Sun Yat-sen University Kaohsiung
More informationLow-Power Data Address Bus Encoding Method
Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,
More informationAn Efficient Multi Mode and Multi Resolution Based AHB Bus Tracer
An Efficient Multi Mode and Multi Resolution Based AHB Bus Tracer Abstract: Waheeda Begum M.Tech, VLSI Design & Embedded System, Department of E&CE, Lingaraj Appa Engineering College, Bidar. On-Chip program
More informationFrom Hardware Trace to. System Knowledge
Fakultät Informatik, Institut für Technische Informatik, Professur VLSI-Entwurfssysteme, Diagnostik und Architektur From Hardware Trace to Data-intensive Hardware Trace Analysis Andreas Gajda TU Dresden,
More informationCode Compression for RISC Processors with Variable Length Instruction Encoding
Code Compression for RISC Processors with Variable Length Instruction Encoding S. S. Gupta, D. Das, S.K. Panda, R. Kumar and P. P. Chakrabarty Department of Computer Science & Engineering Indian Institute
More informationReal-time, Unobtrusive, and Efficient Program Execution Tracing with Stream Caches and Last Stream Predictors *
Ge Real-time, Unobtrusive, and Efficient Program Execution Tracing with Stream Caches and Last Stream Predictors * Vladimir Uzelac, Aleksandar Milenković, Milena Milenković, Martin Burtscher ECE Department,
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationFPGA Adaptive Software Debug and Performance Analysis
white paper Intel Adaptive Software Debug and Performance Analysis Authors Javier Orensanz Director of Product Management, System Design Division ARM Stefano Zammattio Product Manager Intel Corporation
More informationThe ARM10 Family of Advanced Microprocessor Cores
The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10
More informationContents of this presentation: Some words about the ARM company
The architecture of the ARM cores Contents of this presentation: Some words about the ARM company The ARM's Core Families and their benefits Explanation of the ARM architecture Architecture details, features
More informationEngineering Mathematics II Lecture 16 Compression
010.141 Engineering Mathematics II Lecture 16 Compression Bob McKay School of Computer Science and Engineering College of Engineering Seoul National University 1 Lossless Compression Outline Huffman &
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationMemory Access Optimizations in Instruction-Set Simulators
Memory Access Optimizations in Instruction-Set Simulators Mehrdad Reshadi Center for Embedded Computer Systems (CECS) University of California Irvine Irvine, CA 92697, USA reshadi@cecs.uci.edu ABSTRACT
More informationMultithreaded Coprocessor Interface for Dual-Core Multimedia SoC
Multithreaded Coprocessor Interface for Dual-Core Multimedia SoC Student: Chih-Hung Cho Advisor: Prof. Chih-Wei Liu VLSI Signal Processing Group, DEE, NCTU 1 Outline Introduction Multithreaded Coprocessor
More informationThe Impact of Write Back on Cache Performance
The Impact of Write Back on Cache Performance Daniel Kroening and Silvia M. Mueller Computer Science Department Universitaet des Saarlandes, 66123 Saarbruecken, Germany email: kroening@handshake.de, smueller@cs.uni-sb.de,
More informationSupport for RISC-V. Lauterbach GmbH. Bob Kupyn Lauterbach Markus Goehrle - Lauterbach GmbH
Company Lauterbach Profile Debug Support for RISC-V Lauterbach GmbH Bob Kupyn Lauterbach USA @2016 Markus Goehrle - Lauterbach GmbH Leading Manufacturer of Microprocessor Development Tools Founded in 1979
More informationMidterm #2 Solutions April 23, 1997
CS152 Computer Architecture and Engineering Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Sp97 D.K. Jeong Midterm #2 Solutions
More informationThe check bits are in bit numbers 8, 4, 2, and 1.
The University of Western Australia Department of Electrical and Electronic Engineering Computer Architecture 219 (Tutorial 8) 1. [Stallings 2000] Suppose an 8-bit data word is stored in memory is 11000010.
More informationADVANCED SECURITY SYSTEM USING FACIAL RECOGNITION Mahesh Karanjkar 1, Shrikrishna Jogdand* 2
ISSN 2277-2685 IJESR/Oct. 2015/ Vol-5/Issue-10/1285-1289 ADVANCED SECURITY SYSTEM USING FACIAL RECOGNITION Mahesh Karanjkar 1, Shrikrishna Jogdand* 2 1 Prof. & HOD, Dept of ETC, Shri Tuljabhavani College
More informationARM Processors ARM ISA. ARM 1 in 1985 By 2001, more than 1 billion ARM processors shipped Widely used in many successful 32-bit embedded systems
ARM Processors ARM Microprocessor 1 ARM 1 in 1985 By 2001, more than 1 billion ARM processors shipped Widely used in many successful 32-bit embedded systems stems 1 2 ARM Design Philosophy hl h Low power
More informationARM ARCHITECTURE. Contents at a glance:
UNIT-III ARM ARCHITECTURE Contents at a glance: RISC Design Philosophy ARM Design Philosophy Registers Current Program Status Register(CPSR) Instruction Pipeline Interrupts and Vector Table Architecture
More informationChapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative
Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory
More informationDigital Image Representation Image Compression
Digital Image Representation Image Compression 1 Image Representation Standards Need for compression Compression types Lossless compression Lossy compression Image Compression Basics Redundancy/redundancy
More informationTracking the Virtual World
Tracking the Virtual World Synopsys: For many years the JTAG interface has been used for ARM-based SoC debugging. With this JTAG style debugging, the developer has been granted the ability to debug software
More informationEmbedded Computation
Embedded Computation What is an Embedded Processor? Any device that includes a programmable computer, but is not itself a general-purpose computer [W. Wolf, 2000]. Commonly found in cell phones, automobiles,
More informationCS 335 Graphics and Multimedia. Image Compression
CS 335 Graphics and Multimedia Image Compression CCITT Image Storage and Compression Group 3: Huffman-type encoding for binary (bilevel) data: FAX Group 4: Entropy encoding without error checks of group
More informationManaging Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks
Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department
More informationAutomatic Video Caption Detection and Extraction in the DCT Compressed Domain
Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Chin-Fu Tsao 1, Yu-Hao Chen 1, Jin-Hau Kuo 1, Chia-wei Lin 1, and Ja-Ling Wu 1,2 1 Communication and Multimedia Laboratory,
More informationFujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02
Fujitsu System Applications Support 1 Overview System Applications Support SOC Application Development Lab Multimedia VoIP Wireless Bluetooth Processors, DSP and Peripherals ARM Reference Platform 2 SOC
More informationPerformance of AHB Bus Tracer with Dynamic Multiresolution and Lossless Real Time Compression
ISSN (O): 2349-7084 International Journal of Computer Engineering In Research Trends Available online at: www.ijcert.org Performance of AHB Bus Tracer with Dynamic Multiresolution and Lossless Real Time
More informationCS 341l Fall 2008 Test #4 NAME: Key
CS 341l all 2008 est #4 NAME: Key CS3411 est #4, 21 November 2008. 100 points total, number of points each question is worth is indicated in parentheses. Answer all questions. Be as concise as possible
More informationImage Error Concealment Based on Watermarking
Image Error Concealment Based on Watermarking Shinfeng D. Lin, Shih-Chieh Shie and Jie-Wei Chen Department of Computer Science and Information Engineering,National Dong Hwa Universuty, Hualien, Taiwan,
More informationRedundant Data Elimination for Image Compression and Internet Transmission using MATLAB
Redundant Data Elimination for Image Compression and Internet Transmission using MATLAB R. Challoo, I.P. Thota, and L. Challoo Texas A&M University-Kingsville Kingsville, Texas 78363-8202, U.S.A. ABSTRACT
More informationHIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR
HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR R. Alwin [1] S. Anbu Vallal [2] I. Angel [3] B. Benhar Silvan [4] V. Jai Ganesh [5] 1 Assistant Professor, 2,3,4,5 Student Members Department of Electronics
More informationVideo Compression An Introduction
Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital
More informationDynaPack: A Dynamic Scheduling Hardware Mechanism for a VLIW Processor
Appl. Math. Inf. Sci. 6-3S, No. 3, 983-991 (2012) 983 Applied Mathematics & Information Sciences An International Journal DynaPack: A Dynamic Scheduling Hardware Mechanism for a VLIW Processor Slo-Li Chu,
More informationTRACE32. Product Overview
TRACE32 Product Overview Preprocessor Product Portfolio Lauterbach is the world s leading manufacturer of complete, modular microprocessor development tools with 35 years experience in the field of embedded
More informationCODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala
CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala Tampere University of Technology Korkeakoulunkatu 1, 720 Tampere, Finland ABSTRACT In
More informationECE 30 Introduction to Computer Engineering
ECE 0 Introduction to Computer Engineering Study Problems, Set #9 Spring 01 1. Given the following series of address references given as word addresses:,,, 1, 1, 1,, 8, 19,,,,, 7,, and. Assuming a direct-mapped
More informationEfficient Algorithm for Test Vector Decompression Using an Embedded Processor
Efficient Algorithm for Test Vector Decompression Using an Embedded Processor Kamran Saleem and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering University
More informationChapter Seven Morgan Kaufmann Publishers
Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be
More informationEE380 Spring 2004 Sample Final Exam
This sample exam is derived from the final in Spring 2002. Note that it is only a sample; although general topic coverage will be similar, the precise material that you will be questioned on will be somewhat
More informationCOMPARISONS OF DCT-BASED AND DWT-BASED WATERMARKING TECHNIQUES
COMPARISONS OF DCT-BASED AND DWT-BASED WATERMARKING TECHNIQUES H. I. Saleh 1, M. E. Elhadedy 2, M. A. Ashour 1, M. A. Aboelsaud 3 1 Radiation Engineering Dept., NCRRT, AEA, Egypt. 2 Reactor Dept., NRC,
More informationTrace-Driven Hybrid Simulation Methodology for Simulation Speedup : Rapid Evaluation of a Pipelined Processor
Trace-Driven Hybrid Simulation Methodology for Simulation Speedup : Rapid Evaluation of a Pipelined Processor Ho Young Kim and Tag Gon Kim Systems Modeling Simulation Lab., Department of Electrical Engineering
More informationARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES
ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES Shashikiran H. Tadas & Chaitali Chakrabarti Department of Electrical Engineering Arizona State University Tempe, AZ, 85287. tadas@asu.edu, chaitali@asu.edu
More informationIntroducing the Superscalar Version 5 ColdFire Core
Introducing the Superscalar Version 5 ColdFire Core Microprocessor Forum October 16, 2002 Joe Circello Chief ColdFire Architect Motorola Semiconductor Products Sector Joe Circello, Chief ColdFire Architect
More informationCombining Arm & RISC-V in Heterogeneous Designs
Combining Arm & RISC-V in Heterogeneous Designs Gajinder Panesar, CTO, UltraSoC gajinder.panesar@ultrasoc.com RISC-V Summit 3 5 December 2018 Santa Clara, USA Problem statement Deterministic multi-core
More informationECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation
ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating
More informationA hardware operating system kernel for multi-processor systems
A hardware operating system kernel for multi-processor systems Sanggyu Park a), Do-sun Hong, and Soo-Ik Chae School of EECS, Seoul National University, Building 104 1, Seoul National University, Gwanakgu,
More informationDTNS: a Discrete Time Network Simulator for C/C++ Language Based Digital Hardware Simulations
DTNS: a Discrete Time Network Simulator for C/C++ Language Based Digital Hardware Simulations KIMMO KUUSILINNA, JOUNI RIIHIMÄKI, TIMO HÄMÄLÄINEN, and JUKKA SAARINEN Digital and Computer Systems Laboratory
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 15 April 1, 2009 martha@cs.columbia.edu and the rest of the semester Source code (e.g., *.java, *.c) (software) Compiler MIPS instruction set architecture
More informationNEWS 2018 CONTENTS SOURCE CODE COVERAGE WORKS WITHOUT CODE INSTRUMENTATION. English Edition
NEWS 2018 English Edition WORKS WITHOUT CODE INSTRUMENTATION SOURCE CODE COVERAGE CONTENTS Trace-based MCDC Coverage Code Coverage Live Tracing via PCI Express Transition Wind River to TRACE32 RISC-V Debugger
More informationHow to Write Fast Numerical Code
How to Write Fast Numerical Code Lecture: Memory hierarchy, locality, caches Instructor: Markus Püschel TA: Alen Stojanov, Georg Ofenbeck, Gagandeep Singh Organization Temporal and spatial locality Memory
More informationA NOVEL APPROACH FOR A HIGH PERFORMANCE LOSSLESS CACHE COMPRESSION ALGORITHM
A NOVEL APPROACH FOR A HIGH PERFORMANCE LOSSLESS CACHE COMPRESSION ALGORITHM K. Janaki 1, K. Indhumathi 2, P. Vijayakumar 3 and K. Ashok Kumar 4 1 Department of Electronics and Communication Engineering,
More informationAnother View of the Memory Hierarchy. Lecture #25 Virtual Memory I Memory Hierarchy Requirements. Memory Hierarchy Requirements
CS61C L25 Virtual I (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #25 Virtual I 27-8-7 Scott Beamer, Instructor Another View of the Hierarchy Thus far{ Next: Virtual { Regs Instr.
More informationReal-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation
LETTER IEICE Electronics Express, Vol.11, No.5, 1 6 Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation Liang-Hung Wang 1a), Yi-Mao Hsiao
More informationOverview of Development Tools for the ARM Cortex -A8 Processor George Milne March 2006
Overview of Development Tools for the ARM Cortex -A8 Processor George Milne March 2006 Introduction ARM launched the Cortex-A8 CPU in October 2005, for consumer products requiring power efficient multi-media
More information55:132/22C:160, HPCA Spring 2011
55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is
More information[1] C. Moura, \SuperDLX A Generic SuperScalar Simulator," ACAPS Technical Memo 64, School
References [1] C. Moura, \SuperDLX A Generic SuperScalar Simulator," ACAPS Technical Memo 64, School of Computer Science, McGill University, May 1993. [2] C. Young, N. Gloy, and M. D. Smith, \A Comparative
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationLow-power Architecture. By: Jonathan Herbst Scott Duntley
Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media
More informationImage Compression Algorithm and JPEG Standard
International Journal of Scientific and Research Publications, Volume 7, Issue 12, December 2017 150 Image Compression Algorithm and JPEG Standard Suman Kunwar sumn2u@gmail.com Summary. The interest in
More informationsystems such as Linux (real time application interface Linux included). The unified 32-
1.0 INTRODUCTION The TC1130 is a highly integrated controller combining a Memory Management Unit (MMU) and a Floating Point Unit (FPU) on one chip. Thanks to the MMU, this member of the 32-bit TriCoreTM
More informationMIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of
An Independent Analysis of the: MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of Berkeley Design Technology, Inc. OVERVIEW MIPS Technologies, Inc. is an Intellectual Property (IP)
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationManaging Complex Trace Filtering and Triggering Capabilities of CoreSight. Jens Braunes pls Development Tools
Managing Complex Trace Filtering and Triggering Capabilities of CoreSight Jens Braunes pls Development Tools Outline 2 Benefits and challenges of on-chip trace The evolution of embedded systems and the
More informationECE550 PRACTICE Final
ECE550 PRACTICE Final This is a full length practice midterm exam. If you want to take it at exam pace, give yourself 175 minutes to take the entire test. Just like the real exam, each question has a point
More informationFAST FOURIER TRANSFORM (FFT) and inverse fast
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 11, NOVEMBER 2004 2005 A Dynamic Scaling FFT Processor for DVB-T Applications Yu-Wei Lin, Hsuan-Yu Liu, and Chen-Yi Lee Abstract This paper presents an
More informationRAMP-White / FAST-MP
RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale RAMP-White
More informationFast Wavelet-based Macro-block Selection Algorithm for H.264 Video Codec
Proceedings of the International MultiConference of Engineers and Computer Scientists 8 Vol I IMECS 8, 19-1 March, 8, Hong Kong Fast Wavelet-based Macro-block Selection Algorithm for H.64 Video Codec Shi-Huang
More informationCycle accurate transaction-driven simulation with multiple processor simulators
Cycle accurate transaction-driven simulation with multiple processor simulators Dohyung Kim 1a) and Rajesh Gupta 2 1 Engineering Center, Google Korea Ltd. 737 Yeoksam-dong, Gangnam-gu, Seoul 135 984, Korea
More informationParallelism of Java Bytecode Programs and a Java ILP Processor Architecture
Australian Computer Science Communications, Vol.21, No.4, 1999, Springer-Verlag Singapore Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture Kenji Watanabe and Yamin Li Graduate
More informationFujitsu SOC Fujitsu Microelectronics America, Inc.
Fujitsu SOC 1 Overview Fujitsu SOC The Fujitsu Advantage Fujitsu Solution Platform IPWare Library Example of SOC Engagement Model Methodology and Tools 2 SDRAM Raptor AHB IP Controller Flas h DM A Controller
More information1. Truthiness /8. 2. Branch prediction /5. 3. Choices, choices /6. 5. Pipeline diagrams / Multi-cycle datapath performance /11
The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 ANSWER KEY November 23 rd, 2010 Name: University of Michigan uniqname: (NOT your student ID
More informationTHE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination May 23, 2014 Name: Email: Student ID: Lab Section Number: Instructions: 1. This
More informationURL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture
01 1 EE 4720 Computer Architecture 01 1 URL: https://www.ece.lsu.edu/ee4720/ RSS: https://www.ece.lsu.edu/ee4720/rss home.xml Offered by: David M. Koppelman 3316R P. F. Taylor Hall, 578-5482, koppel@ece.lsu.edu,
More informationSR college of engineering, Warangal, Andhra Pradesh, India 1
POWER OPTIMIZATION IN SYSTEM ON CHIP BY IMPLEMENTATION OF EFFICIENT CACHE ARCHITECTURE 1 AKKALA SUBBA RAO, 2 PRATIK GANGULY 1 Associate Professor, 2 Senior Research Fellow, Dept. of. Electronics and Communications
More informationEmGen: An Automatic Test-Program Generation Tool for Embedded IP Cores
EmGen: An Automatic Test-Program Generation Tool for Embedded IP Cores Haihua Shen, Yunji Chen, and Jing Huang Institute of Computing Technology Chinese Academy of Sciences Beijing, China {shenhh, cyj,
More informationCode Compression for DSP
Code for DSP Charles Lefurgy and Trevor Mudge {lefurgy,tnm}@eecs.umich.edu EECS Department, University of Michigan 1301 Beal Ave., Ann Arbor, MI 48109-2122 http://www.eecs.umich.edu/~tnm/compress Abstract
More information16 Sharing Main Memory Segmentation and Paging
Operating Systems 64 16 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per
More informationCSE Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: Due: Problem 1: (10 points)
CSE 30321 Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: November 17, 2009 Due: December 1, 2009 This assignment can be done in groups of 1, 2,
More informationReconfigurable Architecture for Efficient and Scalable Orthogonal Approximation of DCT in FPGA Technology
Reconfigurable Architecture for Efficient and Scalable Orthogonal Approximation of DCT in FPGA Technology N.VEDA KUMAR, BADDAM CHAMANTHI Assistant Professor, M.TECH STUDENT Dept of ECE,Megha Institute
More informationEE 4980 Modern Electronic Systems. Processor Advanced
EE 4980 Modern Electronic Systems Processor Advanced Architecture General Purpose Processor User Programmable Intended to run end user selected programs Application Independent PowerPoint, Chrome, Twitter,
More informationGrowth outside Cell Phone Applications
ARM Introduction Growth outside Cell Phone Applications ~1B units shipped into non-mobile applications Embedded segment now accounts for 13% of ARM shipments Automotive, microcontroller and smartcards
More informationPredicting the Worst-Case Execution Time of the Concurrent Execution. of Instructions and Cycle-Stealing DMA I/O Operations
ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, La Jolla, California, June 1995. Predicting the Worst-Case Execution Time of the Concurrent Execution of Instructions and Cycle-Stealing
More informationDesign of Transport Triggered Architecture Processor for Discrete Cosine Transform
Design of Transport Triggered Architecture Processor for Discrete Cosine Transform by J. Heikkinen, J. Sertamo, T. Rautiainen,and J. Takala Presented by Aki Happonen Table of Content Introduction Transport
More informationCode Compression for the Embedded ARM/THUMB Processor
IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications 8-10 September 2003, Lviv, Ukraine Code Compression for the Embedded ARM/THUMB Processor
More information1.3 Data processing; data storage; data movement; and control.
CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical
More informationCache Justification for Digital Signal Processors
Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose
More informationSystem Level Instrumentation using the Nexus specification
System Level Instrumentation using the Nexus 5001-2012 specification Neal Stollon, HDL Dynamics Chairman, IEEE 5001 Nexus Forum neals@hdldynamics.com nstollon@nexus5001.org HDL Dynamics SoC Solutions System
More information15 Sharing Main Memory Segmentation and Paging
Operating Systems 58 15 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per
More informationLecture 8 JPEG Compression (Part 3)
CS 414 Multimedia Systems Design Lecture 8 JPEG Compression (Part 3) Klara Nahrstedt Spring 2012 Administrative MP1 is posted Today Covered Topics Hybrid Coding: JPEG Coding Reading: Section 7.5 out of
More informationBus Encoding Techniques for System- Level Power Optimization
Chapter 5 Bus Encoding Techniques for System- Level Power Optimization The switching activity on system-level buses is often responsible for a substantial fraction of the total power consumption for large
More informationURL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture
01 1 EE 4720 Computer Architecture 01 1 URL: http://www.ece.lsu.edu/ee4720/ RSS: http://www.ece.lsu.edu/ee4720/rss home.xml Offered by: David M. Koppelman 345 ERAD, 578-5482, koppel@ece.lsu.edu, http://www.ece.lsu.edu/koppel
More informationUsing Shift Number Coding with Wavelet Transform for Image Compression
ISSN 1746-7659, England, UK Journal of Information and Computing Science Vol. 4, No. 3, 2009, pp. 311-320 Using Shift Number Coding with Wavelet Transform for Image Compression Mohammed Mustafa Siddeq
More informationDepartment of Electronics and Communication KMP College of Engineering, Perumbavoor, Kerala, India 1 2
Vol.3, Issue 3, 2015, Page.1115-1021 Effect of Anti-Forensics and Dic.TV Method for Reducing Artifact in JPEG Decompression 1 Deepthy Mohan, 2 Sreejith.H 1 PG Scholar, 2 Assistant Professor Department
More informationA Very Low Bit Rate Image Compressor Using Transformed Classified Vector Quantization
Informatica 29 (2005) 335 341 335 A Very Low Bit Rate Image Compressor Using Transformed Classified Vector Quantization Hsien-Wen Tseng Department of Information Management Chaoyang University of Technology
More informationLecture 22: Virtual Memory: Survey of Modern Systems
18-447 Lecture 22: Virtual Memory: Survey of Modern Systems James C. Hoe Dept of ECE, CMU April 15, 2009 S 09 L22-1 Announcements: Spring Carnival!!! Final Thursday, May 7 5:30-8:30p.m Room TBA Two Guest
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More information