Liquid Architecture. Microarchitecture Optimization for Embedded Systems
|
|
- Sheryl Wilson
- 6 years ago
- Views:
Transcription
1 Liquid Architecture Microarchitecture Optimization for Embedded Systems D. Schuehler, B. Brodie, R. Chamberlain, R. Cytron, S. Friedman, J. Fritts, P. Jones, P. Krishnamurthy, J. Lockwood, S. Padmanabhan, and H. Zhang Dept. of Computer Science and Engineering Washington University in St. Louis Supported by NSF ITR
2 Liquid Architecture Configurable architecture that can adapt to needs of particular application E.g., within an FPGA Soft-core processors E.g., as an embedded processor Tensilica supports configuration at fab time Stretch support configuration at run time Today s discussion is on performance analysis and configuration choice
3 Block Diagram Event Bus FPGA FPX Statistics Module Control Packet Processor I-Cache D-Cache AHB APB LEON SPARCcompatible processor LED UART `` Adapter `` Memory Controller Boot Rom External Memory Network Interface Layered Internet Protocol Wrappers
4 Microarchitecture Configurability Instruction set Memory subsystem Cache size (I and D) Associativity Cache line size Co-processor(s) Instruction pipeline Full HDL source is available
5 Design Flow Internet Write and compile embedded SPARC application with GCC Identify configuration for candidate architecture Reconfigure FPX hardware via Internet and upload system software. Execute program on FPX Platform and measure runtime performance
6 .text Method Time / Cycles Cycle-accurate profiling main addquery findmatch Choose methods to profile from the user interface computekey computebase computestep fillquery Rnd
7 Method Address Range.text main addquery findmatch Lo 0x C 0x400003EF Hi computekey computebase computestep fillquery Rnd
8 .text Method Statistics Module Event Bus PC CLK main addquery findmatch Lo 0x C 0x400003EF Hi 0x A computekey computebase computestep fillquery Rnd
9 Function Event Bus PC CLK.text Statistics Module main Lo 0x C 0x A 0x400003EF Hi addquery findmatch INCR Counter computekey computebase computestep fillquery Rnd
10 Function Event Bus PC CLK.text Statistics Module main Lo 0x C 0x A 0x400003EF Hi addquery findmatch INCR Counter computekey computebase Lo 0x400005D8 0x A 0x F Hi computestep fillquery INCR Counter Rnd
11 Event Bus Statistics Module PC CLK Lo 0x C 0x A 0x400003EF Hi To User INCR Counter Lo 0x400005D8 0x A 0x F Hi INCR Counter
12 Where is time spent? 100% 90% % of total runtime 80% 70% 60% 50% 40% 30% Rest coreloop findmatch BLASTN biosequence search application 20% 10% 0% 128K 32K Size of hash table (Bytes)
13 Function.text Time / Cycles Cache Hits / Misses Read Write main addquery findmatch computekey Expand to measure cache hits/misses computebase computestep fillquery Rnd
14 Measure Several Configurations
15 Impact of D-cache Configuration Total findmatch coreloop hit rate (%) BLASTN biosequence search application K, 1Kx1 128K, 32Kx1 128K, 16Kx2 32K, 1Kx1 32K, 32Kx1 32K, 16Kx2 Size of hash table, D-cache configuration
16 Impact of I-cache Configuration 35 Run time (secs) KB I-Cache 4KB I-Cache BLASTN biosequence search application K 32K BLASTN hash table sizes (Bytes)
17 Function Time / Cycles Cache Hits / Misses Read Write Pipeline Stalls Branch Predict.text main addquery findmatch computekey computebase computestep fillquery Rnd
18 Time for Single Run Time (sec) Almost 2 orders of magnitude faster than simulation 10 1 SimpleScalar 3.0 LEON
19 Implications of Slow Simulation Focus has historically been on measuring the performance of a single thread of a single application Real apps are often executed in a multitasking environment Impacts cache behavior Ignores OS (system call) performance Liquid architecture system enables direct measurement, including OS
20 OS Boot Sequence
21 Summary Run-time reconfigurable processors will be available sooner rather than later Determining desired configuration is a difficult design task Large search space Depends on accurate performance data Liquid architecture system enables direct measurement of performance properties
22 Current and Future Work Evaluation of several arch. design ideas Automated search of the design space Characterizing performance analysis methods Analytic models Simulation models Direct execution models Usable as is for evaluating soft-core procs Like to extend to higher-speed procs
Evaluating Dusty Caches on General Workloads. Praveen Krishnamurthy, Roger D. Chamberlain, Ron K. Cytron, and Jason E. Fritts
Evaluating Dusty Caches on General Workloads Praveen Krishnamurthy, Roger D. Chamberlain, Ron K. Cytron, and Jason E. Fritts Praveen Krishnamurthy, Roger D. Chamberlain, Ron K. Cytron, and Jason E. Fritts,
More informationLiquid Architecture Λ
Liquid Architecture Λ Phillip Jones, Shobana Padmanabhan, Daniel Rymarz, John Maschmeyer David V. Schuehler, John W. Lockwood, and Ron K. Cytron Department of Computer Science and Engineering Washington
More informationExtracting and Improving Microarchitecture Performance on Reconfigurable Architectures
Extracting and Improving Microarchitecture Performance on Reconfigurable Architectures Shobana Padmanabhan, Phillip Jones, David V. Schuehler, Scott J. Friedman, Praveen Krishnamurthy, Huakai Zhang, Roger
More informationA Hardware Implementation of Hierarchical Clustering for Text Documents
A Hardware Implementation of Hierarchical Clustering for Text Documents Dan Legorreta dan.legorreta@charter.net Moshe Looks moshe@metacog.org Shobana Padmanabhan sp3@arl.wustl.edu Abstract Clustering is
More informationEmbedded Fingerprint Verification and Matching System
Signal Theory and Communications Group Department of Electronics University of Mondragon Fifth Workshop on Intelligent Solutions in Embedded Systems WISES 07, June 21-22, Madrid A Low-Cost FPGA-based Embedded
More informationFigure 1: Organisation for 128KB Direct Mapped Cache with 16-word Block Size and Word Addressable
Tutorial 12: Cache Problem 1: Direct Mapped Cache Consider a 128KB of data in a direct-mapped cache with 16 word blocks. Determine the size of the tag, index and offset fields if a 32-bit architecture
More informationCycle Approximate Simulation of RISC-V Processors
Cycle Approximate Simulation of RISC-V Processors Lee Moore, Duncan Graham, Simon Davidmann Imperas Software Ltd. Felipe Rosa Universidad Federal Rio Grande Sul Embedded World conference 27 February 2018
More informationANALYSIS OF A PARALLEL LEXICAL-TREE-BASED SPEECH DECODER FOR MULTI-CORE PROCESSORS
17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 ANALYSIS OF A PARALLEL LEXICAL-TREE-BASED SPEECH DECODER FOR MULTI-CORE PROCESSORS Naveen Parihar Dept. of
More informationDirect-Attached Disk Subsystem Performance Assessment Roger D. Chamberlain Berkley Shands
Direct-Attached Disk Subsystem Performance Assessment Roger D. Chamberlain Berkley Shands Roger D. Chamberlain and Berkley Shands, Direct-Attached Disk Subsystem Performance Assessment, in Proc. of 4 th
More informationShort Title: Dusty Caches Friedman, M.Sc. 2005
Short Title: Dusty Caches Friedman, M.Sc. 2005 WASHINGTON UNIVERSITY SEVER INSTITUTE OF TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE DUSTY CACHES TO SAVE MEMORY TRAFFIC by Scott J. Friedman B.S. Applied Science
More informationWrite only as much as necessary. Be brief!
1 CIS371 Computer Organization and Design Midterm Exam Prof. Martin Thursday, March 15th, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached
More informationWrite only as much as necessary. Be brief!
1 CIS371 Computer Organization and Design Final Exam Prof. Martin Wednesday, May 2nd, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached (with
More informationFaculty of Engineering, Mathematics and Science. School of Computer Science and Statistics
Faculty of Engineering, Mathematics and Science School of Computer Science and Statistics Integrated Computer Science Hilary Term 2017 Year 3 Annual Examinations CS3021 Computer Architecture II 9 January
More informationare Softw Instruction Set Architecture Microarchitecture are rdw
Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics
More informationA Framework for Rule Processing in Reconfigurable Network Systems
A Framework for Rule Processing in Reconfigurable Network Systems Michael Attig and John Lockwood Washington University in Saint Louis Applied Research Laboratory Department of Computer Science and Engineering
More informationEE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination
1 Student name: Date: June 26, 2008 General requirements for the exam: 1. This is CLOSED BOOK examination; 2. No questions allowed within the examination period; 3. If something is not clear in question
More informationPraveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster
Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster Praveen Krishnamurthy, Jeremy Buhler, Roger
More informationTrying to design a simple yet efficient L1 cache. Jean-François Nguyen
Trying to design a simple yet efficient L1 cache Jean-François Nguyen 1 Background Minerva is a 32-bit RISC-V soft CPU It is described in plain Python using nmigen FPGA-friendly Designed for reasonable
More informationCODESSEAL: Compiler/FPGA Approach to Secure Applications
CODESSEAL: Compiler/FPGA Approach to Secure Applications Olga Gelbart 1, Paul Ott 1, Bhagirath Narahari 1, Rahul Simha 1, Alok Choudhary 2, and Joseph Zambreno 2 1 The George Washington University, Washington,
More informationMigrating from the UT699 to the UT699E
Standard Products Application Note Migrating from the UT699 to the UT699E January 2015 www.aeroflex.com/leon Table 1.1 Cross Reference of Applicable Products Product Name: Manufacturer Part Number SMD
More informationELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Circle the memory type based on electrically re-chargeable elements
ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Student name: Date: Example 1 Section: Memory hierarchy (SRAM, DRAM) Question # 1.1 Circle the memory type based on electrically re-chargeable elements
More informationProtoFlex Tutorial: Full-System MP Simulations Using FPGAs
rotoflex Tutorial: Full-System M Simulations Using FGAs Eric S. Chung, Michael apamichael, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi, Ken Mai ROTOFLEX Computer Architecture Lab at Our work in this
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationZephyr OS Configuration via Device Tree
Zephyr OS Configuration via Device Tree Andy Gross - Linaro IoT Zephyr is a trademark of the Linux Foundation. *Other names and brands may be claimed as the property of others. Configuration in Zephyr
More informationV8uC: Sparc V8 micro-controller derived from LEON2-FT
V8uC: Sparc V8 micro-controller derived from LEON2-FT ESA Workshop on Avionics Data, Control and Software Systems Noordwijk, 4 November 2010 Walter Errico SITAEL Aerospace phone: +39 0584 388398 e-mail:
More informationBooting a LEON system over SpaceWire RMAP. Application note Doc. No GRLIB-AN-0002 Issue 2.1
Template: GQMS-TPLT-1-1-0 Booting a LEON system over SpaceWire RMAP Application note 2017-05-23 Doc. No Issue 2.1 Date: 2017-05-23 Page: 2 of 11 CHANGE RECORD Issue Date Section / Page Description 1.0
More informationOutline. How Fast is -fast? Performance Analysis of KKD Applications using Hardware Performance Counters on UltraSPARC-III
Outline How Fast is -fast? Performance Analysis of KKD Applications using Hardware Performance Counters on UltraSPARC-III Peter Christen and Adam Czezowski CAP Research Group Department of Computer Science,
More informationSystem-On-Chip Design with the Leon CPU The SOCKS Hardware/Software Environment
System-On-Chip Design with the Leon CPU The SOCKS Hardware/Software Environment Introduction Digital systems typically contain both, software programmable components, as well as application specific logic.
More informationSimply RISC S1 Core Specification. - version 0.1 -
Simply RISC S1 Core Specification - version 0.1 - Simply RISC S1 Core Summary =========================== This is the summary for the S1 Core (codename "Sirocco"); all the informations you need are contained
More informationInterfacing a High Speed Crypto Accelerator to an Embedded CPU
Interfacing a High Speed Crypto Accelerator to an Embedded CPU Alireza Hodjat ahodjat @ee.ucla.edu Electrical Engineering Department University of California, Los Angeles Ingrid Verbauwhede ingrid @ee.ucla.edu
More informationArchitectural Support for Operating Systems
Architectural Support for Operating Systems Today Computer system overview Next time OS components & structure Computer architecture and OS OS is intimately tied to the hardware it runs on The OS design
More informationReplacement policies for shared caches on symmetric multicores : a programmer-centric point of view
1 Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view Pierre Michaud INRIA HiPEAC 11, January 26, 2011 2 Outline Self-performance contract Proposition for
More informationECE 3055: Final Exam
ECE 3055: Final Exam Instructions: You have 2 hours and 50 minutes to complete this quiz. The quiz is closed book and closed notes, except for one 8.5 x 11 sheet. No calculators are allowed. Multiple Choice
More information8051 Interfacing: Address Map Generation
85 Interfacing: Address Map Generation EE438 Fall2 Class 6 Pari vallal Kannan Center for Integrated Circuits and Systems University of Texas at Dallas 85 Interfacing Address Mapping Use address bus and
More informationMercury BLASTN Biosequence Similarity Search System: Technical Reference Guide
Washington University in St. Louis Washington University Open Scholarship All Computer Science and Engineering Research Computer Science and Engineering Report Number: WUCSE-2011-56 2011 Mercury BLASTN
More informationPXF Information. PXF Troubleshooting. Using show Commands APPENDIX
APPENDIX B Cisco Parallel express Forwarding (PXF) is used to accelerate forwarding performance on the Cisco 7304 router. PXF is available on the NSE-100 and NSE-150 only; the NPE-G100 does not support
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationC66x KeyStone Training HyperLink
C66x KeyStone Training HyperLink 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo Agenda 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo
More informationCOMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA)
COMP2121: Microprocessors and Interfacing Instruction Set Architecture (ISA) http://www.cse.unsw.edu.au/~cs2121 Lecturer: Hui Wu Session 2, 2017 1 Contents Memory models Registers Data types Instructions
More informationsystem on chip architecture CONTENTS Processor : An Architectural View Simple Sequential Processor
Contents i system on chip architecture FOR m.tech (jntu - h&k) i year Ii semester (COMMON TO EMBEDDED SYSTEMS, VLSI AND VLSI DESIGN) CONTENTS UNIT - I [CH. H. - 1] ] [INTRODUCTION TO THE SYSTEM APPROACH]...
More information-Device. -Physical or virtual thing that does something -Software + hardware to operate a device (Controller runs port, Bus, device)
Devices -Host -CPU -Device -Controller device) +memory +OS -Physical or virtual thing that does something -Software + hardware to operate a device (Controller runs port, Bus, Communication -Registers -Control
More informationFingerprint_Protocol_All_English
Fingerprint_Protocol_All_English 1. Protocol Format Port:UART - TTL, 19200bps, 1 Start bit, 1 Stop bit, None check bit 1.1 Data length = 8 bytes,data format as follow: Command 0xF5 CMD P1 P2 P3 Response
More informationC66x KeyStone Training HyperLink
C66x KeyStone Training HyperLink 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo Agenda 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo
More informationCo-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,
Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms SAMOS XIV July 14-17, 2014 1 Outline Introduction + Motivation Design requirements for many-accelerator SoCs Design problems
More informationCPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces
CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces Zvonimir Z. Bandic, Sr. Director Robert Golla, Sr. Fellow Dejan Vucinic,
More informationExploration of Cache Coherent CPU- FPGA Heterogeneous System
Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based
More informationWalking Four Machines by the Shore
Walking Four Machines by the Shore Anastassia Ailamaki www.cs.cmu.edu/~natassa with Mark Hill and David DeWitt University of Wisconsin - Madison Workloads on Modern Platforms Cycles per instruction 3.0
More informationLearning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.
Virtual Memory 1 Learning Outcomes An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory. 2 Memory Management Unit (or TLB) The position and function
More informationLearning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.
Virtual Memory Learning Outcomes An understanding of page-based virtual memory in depth. Including the R000 s support for virtual memory. Memory Management Unit (or TLB) The position and function of the
More informationMulti-core Programming Evolution
Multi-core Programming Evolution Based on slides from Intel Software ollege and Multi-ore Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution
More informationAptio 5.x Status Codes
Checkpoints & Beep Codes for Debugging Document Revision 2.0 Revision Date: April 10, 2014 Public Document Copyright 2014 American Megatrends, Inc. 5555 Oakbrook Parkway Suite 200 Norcross, GA 30093 Legal
More informationHardware Software Co-design and SoC. Neeraj Goel IIT Delhi
Hardware Software Co-design and SoC Neeraj Goel IIT Delhi Introduction What is hardware software co-design Some part of application in hardware and some part in software Mpeg2 decoder example Prediction
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationECE 437 Computer Architecture and Organization Lab 6: Programming RAM and ROM Due: Thursday, November 3
Objectives: ECE 437 Computer Architecture and Organization Lab 6: Programming RAM and ROM Due: Thursday, November 3 Build Instruction Memory and Data Memory What to hand in: Your implementation source
More informationLocality. Cache. Direct Mapped Cache. Direct Mapped Cache
Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be
More informationHPC VT Machine-dependent Optimization
HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler
More informationLet s look at each and begin with a view into the software
Power Consumption Overview In this lesson we will Identify the different sources of power consumption in embedded systems. Look at ways to measure power consumption. Study several different methods for
More informationTHE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION
THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION Radu Balaban Computer Science student, Technical University of Cluj Napoca, Romania horizon3d@yahoo.com Horea Hopârtean Computer Science student,
More informationImproving Cloud Application Performance with Simulation-Guided CPU State Management
Improving Cloud Application Performance with Simulation-Guided CPU State Management Mathias Gottschlag, Frank Bellosa April 23, 2017 KARLSRUHE INSTITUTE OF TECHNOLOGY (KIT) - OPERATING SYSTEMS GROUP KIT
More informationADMIN. SI232 Set #18: Caching Finale and Virtual Reality (Chapter 7) Down the home stretch. Split Caches. Final Exam Monday May 1 (first exam day)
ADMIN SI232 Set #8: Caching Finale and Virtual Reality (Chapter 7) Ethics Discussion & Reading Quiz Wed April 2 Reading posted online Reading finish Chapter 7 Sections 7.4 (skip 53-536), 7.5, 7.7, 7.8
More informationCISC RISC. Compiler. Compiler. Processor. Processor
Q1. Explain briefly the RISC design philosophy. Answer: RISC is a design philosophy aimed at delivering simple but powerful instructions that execute within a single cycle at a high clock speed. The RISC
More informationNext Generation Multi-Purpose Microprocessor
Next Generation Multi-Purpose Microprocessor Presentation at MPSA, 4 th of November 2009 www.aeroflex.com/gaisler OUTLINE NGMP key requirements Development schedule Architectural Overview LEON4FT features
More informationProcessors, Performance, and Profiling
Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode
More informationA Fault-Tolerant Approach to Embedded-System Design Using Software Standby Sparing
A Fault-Tolerant Approach to Embedded-System Design Using Software Standby Sparing Mehdi Modarressi, Hani Javanhemmat, Seyyed Ghasem Miremadi, Shaahin Hessabi, Morteza Najafvand, Maziar Goudarzi, and Naser
More informationQiong Zhang, Roger D. Chamberlain, Ronald S. Indeck, Benjamin West, and Jason White
Massively Parallel Data Mining Using Reconfigurable Hardware: Approximate String Matching Qiong Zhang, Roger D. Chamberlain, Ronald S. Indeck, Benjamin West, and Jason White Qiong Zhang, Roger D. Chamberlain,
More informationFPGA Implementation of A Pipelined MIPS Soft Core Processor
FPGA Implementation of A Pipelined MIPS Soft Core Processor Lakshmi S.S 1, Chandrasekhar N.S 2 P.G. Student, Department of Electronics and Communication Engineering, DBIT, Bangalore, India 1 Assistant
More informationRISC-V Core IP Products
RISC-V Core IP Products An Introduction to SiFive RISC-V Core IP Drew Barbier September 2017 drew@sifive.com SiFive RISC-V Core IP Products This presentation is targeted at embedded designers who want
More informationXilinx Vivado/SDK Tutorial
Xilinx Vivado/SDK Tutorial (Laboratory Session 1, EDAN15) Flavius.Gruian@cs.lth.se March 21, 2017 This tutorial shows you how to create and run a simple MicroBlaze-based system on a Digilent Nexys-4 prototyping
More informationIntellectual Property Macrocell for. SpaceWire Interface. Compliant with AMBA-APB Bus
Intellectual Property Macrocell for SpaceWire Interface Compliant with AMBA-APB Bus L. Fanucci, A. Renieri, P. Terreni Tel. +39 050 2217 668, Fax. +39 050 2217522 Email: luca.fanucci@iet.unipi.it - 1 -
More informationIntroducing the Cray XMT. Petr Konecny May 4 th 2007
Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions
More informationHighly-Scalable Reconfigurable Computing Abstract 1. Introduction 2. SGI System Architecture
Highly-Scalable Reconfigurable Computing Roger D. Chamberlain*, Steven Miller, Jason White*, and Dan Gall *Exegy Inc., roger@exegy.com, jwhite@exegy.com Silicon Graphics, Inc., scm@sgi.com, dgall@sgi.com
More informationSimulation Of Computer Systems. Prof. S. Shakya
Simulation Of Computer Systems Prof. S. Shakya Purpose & Overview Computer systems are composed from timescales flip (10-11 sec) to time a human interacts (seconds) It is a multi level system Different
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 24: Cache Performance Analysis Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Associative caches How do we
More informationM2351 Security Architecture. TrustZone Technology for Armv8-M Architecture
Architecture TrustZone Technology for Armv8-M Architecture Outline NuMicro Architecture TrustZone for Armv8-M Processor Core, Interrupt Handling, Memory Partitioning, State Transitions. TrustZone Implementation
More informationLEON4: Fourth Generation of the LEON Processor
LEON4: Fourth Generation of the LEON Processor Magnus Själander, Sandi Habinc, and Jiri Gaisler Aeroflex Gaisler, Kungsgatan 12, SE-411 19 Göteborg, Sweden Tel +46 31 775 8650, Email: {magnus, sandi, jiri}@gaisler.com
More informationArchitecture and OS. To do. q Architecture impact on OS q OS impact on architecture q Next time: OS components and structure
Architecture and OS To do q Architecture impact on OS q OS impact on architecture q Next time: OS components and structure Computer architecture and OS OS is intimately tied to the hardware it runs on
More informationThe Design Complexity of Program Undo Support in a General Purpose Processor. Radu Teodorescu and Josep Torrellas
The Design Complexity of Program Undo Support in a General Purpose Processor Radu Teodorescu and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Processor with program
More informationRAMP-White / FAST-MP
RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale RAMP-White
More informationComputer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics
Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can
More informationMulti-level Translation. CS 537 Lecture 9 Paging. Example two-level page table. Multi-level Translation Analysis
Multi-level Translation CS 57 Lecture 9 Paging Michael Swift Problem: what if you have a sparse address space e.g. out of GB, you use MB spread out need one PTE per page in virtual address space bit AS
More informationCS222: Cache Performance Improvement
CS222: Cache Performance Improvement Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati Outline Eleven Advanced Cache Performance Optimization Prev: Reducing hit time & Increasing
More informationTCP-Splitter: A Reconfigurable Hardware Based TCP/IP Flow Monitor
CP-Splitter: A Reconfigurable Hardware Based CP/IP Flow Monitor David V. Schuehler dvs1@arl.wustl.edu John W. Lockwood lockwood@arl.wustl.edu Applied Research Laboratory (ARL) Department of Computer Science
More informationNew Advances in Micro-Processors and computer architectures
New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,
More informationPerformance Impact of Multithreaded Java Server Applications
Performance Impact of Multithreaded Java Server Applications Yue Luo, Lizy K. John Laboratory of Computer Architecture ECE Department University of Texas at Austin 1/2/01 1 Outline Motivation VolanoMark
More informationHomework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm
Second Semester, 2015 16 Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm Instruction: Submit your answers electronically through
More informationCache Structure. Replacement policies Overhead Implementation Handling writes Cache simulations. Comp 411. L15-Cache Structure 1
Cache Structure Replacement policies Overhead Implementation Handling writes Cache simulations L15-Cache Structure 1 Tag A CPU Data Mem[A] Basic Caching Algorithm ON REFERENCE TO Mem[X]: Look for X among
More informationSmallest RISC-V Device for Next-Generation Edge Computing
Smallest RISC-V Device for Next-Generation Edge Computing 1 Seiji Munetoh 1, Chitra K Subramanian 2, Arun Paidimarri 2, Yasuteru Kohda 1 IBM Research Tokyo 1 & T.J. Watson Research Center 2 Processor chip
More informationCCSDS Time Distribution over SpaceWire
CCSDS Time Distribution over SpaceWire Sandi Habinc, Marko Isomäki, Daniel Hellström Aeroflex Gaisler AB Kungsgatan 12, SE-411 19 Göteborg, Sweden sandi@gaisler.com www.aeroflex.com/gaisler Introduction
More informationOverview. Technology Details. D/AVE NX Preliminary Product Brief
Overview D/AVE NX is the latest and most powerful addition to the D/AVE family of rendering cores. It is the first IP to bring full OpenGL ES 2.0/3.1 rendering to the FPGA and SoC world. Targeted for graphics
More informationJackson Marusarz Intel Corporation
Jackson Marusarz Intel Corporation Intel VTune Amplifier Quick Introduction Get the Data You Need Hotspot (Statistical call tree), Call counts (Statistical) Thread Profiling Concurrency and Lock & Waits
More informationTAG Word 0 Word 1 Word 2 Word 3 0x0A0 D2 55 C7 C8 0x0A0 FC FA AC C7 0x0A0 A5 A6 FF 00
ELE 758 Final Examination 2000: Answers and solutions Number of hits = 15 Miss rate = 25 % Miss rate = [5 (misses) / 20 (total memory references)]* 100% = 25% Show the final content of cache using the
More informationChapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST
Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism
More informationFinal Exam Fall 2008
COE 308 Computer Architecture Final Exam Fall 2008 page 1 of 8 Saturday, February 7, 2009 7:30 10:00 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of
More informationHIGH SPEED DOCUMENT CLUSTERING IN RECONFIGURABLE HARDWARE. G. Adam Covington, Charles L.G. Comstock, Andrew A. Levine, John W. Lockwood, Young H.
HIGH SPEED DOCUMENT CLUSTERING IN RECONFIGURABLE HARDWARE G. Adam Covington, Charles L.G. Comstock, Andrew A. Levine, John W. Lockwood, Young H. Cho Applied Research Laboratory, Washington University One
More informationA Reference Architecture for Payload Reusable Software (RAPRS)
SAND2011-7588 C A Reference Architecture for Payload Reusable Software (RAPRS) 2011 Workshop on Spacecraft Flight Software Richard D. Hunt Sandia National Laboratories P.O. Box 5800 M/S 0513 Albuquerque,
More informationItanium 2 Processor Microarchitecture Overview
Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs
More informationSYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS
SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous
More informationMartin Kruliš, v
Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal
More informationECE331 Homework 4. Due Monday, August 13, 2018 (via Moodle)
ECE331 Homework 4 Due Monday, August 13, 2018 (via Moodle) 1. Below is a list of 32-bit memory address references, given as hexadecimal byte addresses. The memory accesses are all reads and they occur
More informationLecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )
Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target
More information