Introduction to the Tegra SoC Family and the ARM Architecture. Kristoffer Robin Stokke, PhD FLIR UAS
|
|
- Crystal Sims
- 5 years ago
- Views:
Transcription
1 Introduction to the Tegra SoC Family and the ARM Architecture Kristoffer Robin Stokke, PhD FLIR UAS
2 Goals of Lecture To give you something concrete to start on Simple introduction to ARMv8 NEON programming environment Register environment, instruction syntax «Families» of instructions Important for debugging, writing code and general understanding Programming examples Intrinsics Inline assembly Performance analysis using gprof Introduction to GDB debugging
3 Keep This Under Your Pillow GNU compiler intrinsics list: o ARM Infocenter o infocenter.arm.com -> developer guides (..) -> software development -> Cortex A series Programmer s Guide for arm8 This may also be useful Last but not least GDB You will need it
4 5 Tegra Family of SoCs Tegra family of mobile Systems-on-Chip (SoC) Family of mobile SoCs Quite sophisticated Many knobs for tuning, experimentation and research! Wide operational range: runs at 2 W -> 20 W Frequency (clock) scaling Heterogeneous processors ( CPU / GPU ) CPU Tegra K1 Tegra X1 Tegra X2 Tegra Xavier High Performance 4 x ARM Cortex-A15 4 x ARM Cortex-A57 Dual-core Denver 8 core Carmel CPU Low Power 1 x ARM Cortex-A15 4 x ARM Cortex-A53 4 x ARM Cortex-A57 GPU 192-Core Kepler 256-Core Maxwell 256-Core Pascal 512-Core Volta Memory 2 GB (Jetson-TK1) 4 GB (Jetson-TX1) 8 GB LPDDR4? GB LPDDR4 9/10/2018 armv8
5 Tegra-Powered Servers System-on-Module (SoMs) mounted in a 1U server. Very dynamic power usage depending on load: Jetson TX2 / TX1 blade server with 24 SoMs. Based on my own guess.. I would say anything between 80 W to 500 W, depending on demand for processing power
6 Frequency Scaling Moden chips can changing clock frequency Power <-> Performance Tradeoff Processor / memroy utilisation typically varies throughout time Power and powerformance over processor and memory frequencies. The ondemand governor [1] Idea is to reduce CPU frequency while idle or under busy I/O computation Other governors exist for RAM, GPU, buses etc CPU utilisation over time for the Sort application. [2] [1] Pallipadi, Venkatesh, and Alexey Starikovskiy. "The ondemand governor." Proceedings of the Linux Symposium. Vol. 2. No sn, [2] Ibrahim, Shadi, et al. "Governing energy consumption in Hadoop through CPU frequency scaling: An analysis." Future Generation Computer Systems 54 (2016):
7 CPU interrupt controller Coresight (debug hardware support) IRAM and separate ARM7TDMI core Global interrupt control, Clock controllers, timers etc CPU Complex Bus / Interconnect Mainly AXI (highperformance bus), bridges 256-bit CPU interface to RAM I2C / SPI I/O Hardware filters VIC Post-process video Rotation, scaling, comp. VI (Video Input) Raw (YUV, BAYER) processing 2x ISPs (hidden) NVENC/NVDEC Video encode Video decode NVJPG RAM Tegra X1 Block Diagram SATA, USB, MIPI, HDMI etc. I/O
8 Tegra X1 CPU and Memory Hierarchy Each core with various compute capability Integer, floating point NEON (SIMD) Four Cortex-A57 cores, each.. 48 kb instruction L1 cache 32 kb data L1 cache One shared, 2 MB L2 cache Instruction + data Least-Recently Used (LRU) eviction policy 64 byte cache lines 32-bit interface to RAM.. Tegra K1 CPU and memory overview, which is similar to that of the TX
9 Cache Hierarchies and Performance Let s do an experiment! Reading or writing 800 MB Vary the size to read back-to-back E.g. read 24 kb repeatedly from same buffer, until 800 MB have been read CPU Core 33k 24 kb L1 Cache 32 kb Read 800 MB Buffer size detemines location of data Below 32 kb, all reads are cached in L1 Below 2 MB, all reads are cached in L2 For 10 MB... Nothing gets cached 12k 64 kb L2 Cache 2 MB 1k 800 MB RAM 4 GB
10 Time [ms] Code Example and Profiling Compile with pg Run application:./main Run gprof./main gmon.out NB: Prefetch op Prfm <type><target><policy> reg label Type pld (for load) pst (for store) pli (for instruction) Target L1 or L2 (or L3) Policy keep (normal) stream (use once) prfm pldl1keep [x0] (address in x0)
11 ARMv8 Registers 31 x 64-bit general purpose registers X0 X8 x16 x24 32 x 128-bit vector registers V0 V8 V16 V24 SP WSP Stack pointer WZR XZR Zero registers PC
12 The Vector Registers V0-V31: Packing Data in V0-V31 are packed, and you control how they are packed Example: 16 bytes or 8 bytes Example: 8 half-words or 4 half-words Lanes
13 Example: Vector Packing
14 Instruction Syntax
15 Programming With Intrinsics More in a bit!
16 Programming Example: Intrinsics
17 Inline Assembly Mostly harder than using intrinsics However, gives more control (and better performance?) Not always straightforward to figure out what mnemonics to use Tips: disassemble intrinsics and look with objdump or gdb Operand constraints > «m» : memory address > «r» : general purpose register > «f» : floating point register > «i» : immediate ++ more Specify dirty registers and more
18 Programming Example: Inline Assembly
19 Table Lookup Not straightforward to use for any purpose Vector table lookup: vtbl v0, {v1, v2,..., vn}, vm V0: destination vector {v1, v2,..., vn}: source data vm: data selector v v1 v vm
20 Matrix Transpose tbl v0.4s, {v1.4s}, v2.4s a b a c a c b d v0.4s c d b d v2.4s stride Think like this: For each output row, select increasing column a stride b c d v1.4s
21 Code Profiling Compile with pg Run application:./main Run gprof./main gmon.out Time to Finish 100M computations for Matrix Multiply (MM) and Transpose Operations? Transpose, lazy Transpose, NEON assembly MM, NEON intrinsics Series 1 Column1 Column2 MM, NEONassembly
22 GDB Example
23 Tips Build functions to print out macroblocks from vector registers and memory Start small test out independent parts of the code that are easy to verify When in trouble, step through the code, display the relevant registers, verify with output you know is working Many things to investigate Single versus double precision? Different, possibly more ways to implement e.g. transpose? Re-using vector registers across different functional blocks?..but stick to what the assignment says
24 Good Luck! You re going to need it
25 ARMv7 vs. ARMv8 Armv8 uses the same mnemonics as for general purpose registers E.g., in ARMv7, «mul, r0, r0, r1» (normal) and «vmul d0, d0, d1» (SIMD) In ARMv8: «mul x0, x0, x1» (normal) and «mul v0, v0, v1» (SIMD) Simplifies life, but take care to use correct operands ARMv8 has twice as many 128-bit registers bit registers, vs bit registers for ARMv7 Different instruction syntax
26 27 Tegra K1: a Mobile Heterogeneous Multicore SoC Quad-core highperformance CPU ARM Cortex A15 Frequency scaling Dedicated lowpower core ARM Cortex A15 Frequency scaling 192-core, Keplerbased GPU CUDA programmable Frequency scaling SIMD (NEON) multimedia instructions 2 GB DDR3 RAM CPU-GPU shared Frequency scaling
Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018
Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks
More informationEmbedded Computing without Compromise. Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM -Aitech Systems GTC Israel 2017
Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM - Systems GTC Israel 2017 Agenda Current GPGPU systems NVIDIA Jetson TX1 and TX2 evaluation Conclusions New Products 2 GPGPU Product
More informationTR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut
TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1
More informationTHE LEADER IN VISUAL COMPUTING
MOBILE EMBEDDED THE LEADER IN VISUAL COMPUTING 2 TAKING OUR VISION TO REALITY HPC DESIGN and VISUALIZATION AUTO GAMING 3 BEST DEVELOPER EXPERIENCE Tools for Fast Development Debug and Performance Tuning
More informationEach Milliwatt Matters
Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets
More informationECE 598 Advanced Operating Systems Lecture 4
ECE 598 Advanced Operating Systems Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Announcements HW#1 was due HW#2 was posted, will be tricky Let me know
More informationEEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture
EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2014 Agenda
More informationECE 471 Embedded Systems Lecture 2
ECE 471 Embedded Systems Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 3 September 2015 Announcements HW#1 will be posted today, due next Thursday. I will send out
More informationCache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory
Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and
More informationSoC Platforms and CPU Cores
SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University
More informationChapter 5. Introduction ARM Cortex series
Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1
More informationCopyright 2016 Xilinx
Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building
More informationF28HS Hardware-Software Interface: Systems Programming
F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has
More informationDesigning with NXP i.mx8m SoC
Designing with NXP i.mx8m SoC Course Description Designing with NXP i.mx8m SoC is a 3 days deep dive training to the latest NXP application processor family. The first part of the course starts by overviewing
More informationUniversität Dortmund. ARM Architecture
ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture
More informationECE 471 Embedded Systems Lecture 2
ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will
More informationARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.
ARM CORTEX-R52 Course Family: ARMv8-R Cortex-R CPU Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. Duration: 4 days Prerequisites and related
More informationARMv8-A Software Development
ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for
More informationKeyStone II. CorePac Overview
KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB
More informationARM Cortex-A9 ARM v7-a. A programmer s perspective Part1
ARM Cortex-A9 ARM v7-a A programmer s perspective Part1 ARM: Advanced RISC Machine First appeared in 1985 as Acorn RISC Machine from Acorn Computers in Manchester England Limited success outcompeted by
More informationECE 571 Advanced Microprocessor-Based Design Lecture 13
ECE 571 Advanced Microprocessor-Based Design Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements More on HW#6 When ask for reasons why cache
More informationDesign Choices for FPGA-based SoCs When Adding a SATA Storage }
U4 U7 U7 Q D U5 Q D Design Choices for FPGA-based SoCs When Adding a SATA Storage } Lorenz Kolb & Endric Schubert, Missing Link Electronics Rudolf Usselmann, ASICS World Services Motivation for SATA Storage
More informationA176 Cyclone. GPGPU Fanless Small FF RediBuilt Supercomputer. IT and Instrumentation for industry. Aitech I/O
The A176 Cyclone is the smallest and most powerful Rugged-GPGPU, ideally suited for distributed systems. Its 256 CUDA cores reach 1 TFLOPS, and it consumes less than 17W at full load (8-10W at typical
More informationHi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan
Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU
More informationEmbedded Linux Conference San Diego 2016
Embedded Linux Conference San Diego 2016 Linux Power Management Optimization on the Nvidia Jetson Platform Merlin Friesen merlin@gg-research.com About You Target Audience - The presentation is introductory
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationCortex-R5 Software Development
Cortex-R5 Software Development Course Description Cortex-R5 software development is a three days ARM official course. The course goes into great depth, and provides all necessary know-how to develop software
More informationMultiple Choice Type Questions
Techno India Batanagar Computer Science and Engineering Model Questions Subject Name: Computer Architecture Subject Code: CS 403 Multiple Choice Type Questions 1. SIMD represents an organization that.
More informationHOT CHIPS 2014 NVIDIA S DENVER PROCESSOR. Darrell Boggs, CPU Architecture Co-authors: Gary Brown, Bill Rozas, Nathan Tuck, K S Venkatraman
HOT CHIPS 2014 NVIDIA S DENVER PROCESSOR Darrell Boggs, CPU Architecture Co-authors: Gary Brown, Bill Rozas, Nathan Tuck, K S Venkatraman TEGRA K1 with Dual Denver CPUs The First 64-bit Android Kepler-Class
More informationAT-501 Cortex-A5 System On Module Product Brief
AT-501 Cortex-A5 System On Module Product Brief 1. Scope The following document provides a brief description of the AT-501 System on Module (SOM) its features and ordering options. For more details please
More informationI/O Devices. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)
I/O Devices Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Hardware Support for I/O CPU RAM Network Card Graphics Card Memory Bus General I/O Bus (e.g., PCI) Canonical Device OS reads/writes
More informationPerformance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews
Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,
More informationBasic Concepts COE 205. Computer Organization and Assembly Language Dr. Aiman El-Maleh
Basic Concepts COE 205 Computer Organization and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals [Adapted from slides of
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016
Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss
More informationINTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES. Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp.
INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp. Computer Vision in Mobile Tegra K1 It s time! AGENDA Use cases categories
More informationLecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program
More informationIntroduction to the ARM Architecture. or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref.
Introduction to the ARM Architecture or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Manual Glance into the past Initial ARM Processor developed by Acorn Computers,
More informationCS 101, Mock Computer Architecture
CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically
More informationPorting BLIS to new architectures Early experiences
1st BLIS Retreat. Austin (Texas) Early experiences Universidad Complutense de Madrid (Spain) September 5, 2013 BLIS design principles BLIS = Programmability + Performance + Portability Share experiences
More informationReal-time image processing and object recognition for robotics applications. Adrian Stratulat
Real-time image processing and object recognition for robotics applications Adrian Stratulat What is computer vision? Computer vision is a field that includes methods for acquiring, processing, analyzing,
More informationVirtual Memory - Objectives
ECE232: Hardware Organization and Design Part 16: Virtual Memory Chapter 7 http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy Virtual Memory - Objectives
More informationSupercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?
Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationManycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT
Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate
More informationARM Architecture and Assembly Programming Intro
ARM Architecture and Assembly Programming Intro Instructors: Dr. Phillip Jones http://class.ece.iastate.edu/cpre288 1 Announcements HW9: Due Sunday 11/5 (midnight) Lab 9: object detection lab Give TAs
More information8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2
CSE 820 Graduate Computer Architecture Richard Enbody Dr. Enbody 1 st Day 2 1 Why Computer Architecture? Improve coding. Knowledge to make architectural choices. Ability to understand articles about architecture.
More informationDesigning, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems
Designing, developing, debugging ARM and heterogeneous multi-processor systems Kinjal Dave Senior Product Manager, ARM ARM Tech Symposia India December 7 th 2016 Topics Introduction System design Software
More informationA176 C clone. GPGPU Fanless Small FF RediBuilt Supercomputer. Aitech
The A176 Cyclone is the smallest and most powerful Rugged-GPGPU, ideally suited for distributed systems. Its 256 CUDA cores reach 1 TFLOPS at a remarkable level of energy efficiency, providing all the
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More informationHugo Cunha. Senior Firmware Developer Globaltronics
Hugo Cunha Senior Firmware Developer Globaltronics NB-IoT Product Acceleration Platforms 2018 Speaker Hugo Cunha Project Developper Agenda About us NB IoT Platforms The WIIPIIDO The Gateway FE 1 About
More informationCortex-A9 MPCore Software Development
Cortex-A9 MPCore Software Development Course Description Cortex-A9 MPCore software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop
More informationEE 354 Fall 2015 Lecture 1 Architecture and Introduction
EE 354 Fall 2015 Lecture 1 Architecture and Introduction Note: Much of these notes are taken from the book: The definitive Guide to ARM Cortex M3 and Cortex M4 Processors by Joseph Yiu, third edition,
More informationScaling the Peak: Maximizing floating point performance on the Epiphany NoC
Scaling the Peak: Maximizing floating point performance on the Epiphany NoC Anish Varghese, Gaurav Mitra, Robert Edwards and Alistair Rendell Research School of Computer Science The Australian National
More informationThe ARM10 Family of Advanced Microprocessor Cores
The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10
More informationMYC-C7Z010/20 CPU Module
MYC-C7Z010/20 CPU Module - 667MHz Xilinx XC7Z010/20 Dual-core ARM Cortex-A9 Processor with Xilinx 7-series FPGA logic - 1GB DDR3 SDRAM (2 x 512MB, 32-bit), 4GB emmc, 32MB QSPI Flash - On-board Gigabit
More informationF28HS2 Hardware-Software Interfaces. Lecture 6: ARM Assembly Language 1
F28HS2 Hardware-Software Interfaces Lecture 6: ARM Assembly Language 1 CISC & RISC CISC: complex instruction set computer original CPUs very simple poorly suited to evolving high level languages extended
More informationQuantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms
Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms Arizona State University Dhinakaran Pandiyan(dpandiya@asu.edu) and Carole-Jean Wu(carole-jean.wu@asu.edu
More informationMulticore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.
CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous
More informationFreescale i.mx6 Architecture
Freescale i.mx6 Architecture Course Description Freescale i.mx6 architecture is a 3 days Freescale official course. The course goes into great depth and provides all necessary know-how to develop software
More informationKevin Meehan Stephen Moskal Computer Architecture Winter 2012 Dr. Shaaban
Kevin Meehan Stephen Moskal Computer Architecture Winter 2012 Dr. Shaaban Contents Raspberry Pi Foundation Raspberry Pi overview & specs ARM11 overview ARM11 cache, pipeline, branch prediction ARM11 vs.
More informationThe Memory Hierarchy 10/25/16
The Memory Hierarchy 10/25/16 Transition First half of course: hardware focus How the hardware is constructed How the hardware works How to interact with hardware Second half: performance and software
More informationComputer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra
Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating
More informationSlides for Lecture 6
Slides for Lecture 6 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 28 January,
More informationElaborazione dati real-time su architetture embedded many-core e FPGA
Elaborazione dati real-time su architetture embedded many-core e FPGA DAVIDE ROSSI A L E S S A N D R O C A P O T O N D I G I U S E P P E T A G L I A V I N I A N D R E A M A R O N G I U C I R I - I C T
More informationLECTURE 12. Virtual Memory
LECTURE 12 Virtual Memory VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a cache for magnetic disk. The mechanism by which this is accomplished
More informationARMv8-A CPU Architecture Overview
ARMv8-A CPU Architecture Overview Chris Shore Training Manager, ARM ARM Game Developer Day, London 03/12/2015 Chris Shore ARM Training Manager With ARM for 16 years Managing customer training for 15 years
More informationARM Processor Architecture
Chapters 1 and 3 ARM Processor Architecture Embedded Systems with ARM Cortext-M Updated: Monday, February 5, 2018 A Little about ARM The company Originally Acorn RISC Machine (ARM) Later Advanced RISC
More informationS2C K7 Prodigy Logic Module Series
S2C K7 Prodigy Logic Module Series Low-Cost Fifth Generation Rapid FPGA-based Prototyping Hardware The S2C K7 Prodigy Logic Module is equipped with one Xilinx Kintex-7 XC7K410T or XC7K325T FPGA device
More informationECE 471 Embedded Systems Lecture 3
ECE 471 Embedded Systems Lecture 3 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 10 September 2018 Announcements New classroom: Stevens 365 HW#1 was posted, due Friday Reminder:
More informationTile Processor (TILEPro64)
Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth
More informationIntroduction to Embedded System Design using Zynq
Introduction to Embedded System Design using Zynq Zynq Vivado 2015.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able
More informationProcessing Unit CS206T
Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct
More information. SMARC 2.0 Compliant
MSC SM2S-IMX8 NXP i.mx8 ARM Cortex -A72/A53 Description The new MSC SM2S-IMX8 module offers a quantum leap in terms of computing and graphics performance. It integrates the currently most powerful i.mx8
More information5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture.
Embedded Processors 5. ARM 기반모니터프로그램사용 DE1-SoC 보드 (IntelFPGA) 2 Application Processors Development of the ARM Architecture v4 v5 v6 v7 Halfword and signed halfword / byte support System mode Thumb instruction
More informationIntroduction to OpenMP. Lecture 10: Caches
Introduction to OpenMP Lecture 10: Caches Overview Why caches are needed How caches work Cache design and performance. The memory speed gap Moore s Law: processors speed doubles every 18 months. True for
More informationHercules ARM Cortex -R4 System Architecture. Processor Overview
Hercules ARM Cortex -R4 System Architecture Processor Overview What is Hercules? TI s 32-bit ARM Cortex -R4/R5 MCU family for Industrial, Automotive, and Transportation Safety Hardware Safety Features
More informationSmall is the New Big: Data Analytics on the Edge
Small is the New Big: Data Analytics on the Edge An overview of processors and algorithms for deep learning techniques on the edge Dr. Abhay Samant VP Engineering, Hiller Measurements Adjunct Faculty,
More informationSystems Programming and Computer Architecture ( ) Timothy Roscoe
Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture
More informationIntel X86 Assembler Instruction Set Opcode Table
Intel X86 Assembler Instruction Set Opcode Table x86 Instruction Set Reference. Derived from the September 2014 version of the Intel 64 and IA-32 LGDT, Load Global/Interrupt Descriptor Table Register.
More informationCS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS
CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in
More informationRM3 - Cortex-M4 / Cortex-M4F implementation
Formation Cortex-M4 / Cortex-M4F implementation: This course covers both Cortex-M4 and Cortex-M4F (with FPU) ARM core - Processeurs ARM: ARM Cores RM3 - Cortex-M4 / Cortex-M4F implementation This course
More informationARM ARCHITECTURE. Contents at a glance:
UNIT-III ARM ARCHITECTURE Contents at a glance: RISC Design Philosophy ARM Design Philosophy Registers Current Program Status Register(CPSR) Instruction Pipeline Interrupts and Vector Table Architecture
More informationAn Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki
An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &
More informationSOM PRODUCTS BRIEF. S y s t e m o n M o d u l e. Engicam. SOMProducts ver
SOM S y s t e m o n M o d u l e PRODUCTS BRIEF GEA M6425IB ARM9 TM Low cost solution Reduced Time to Market Very small form factor Low cost multimedia solutions Industrial Automotive Consumer Single power
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationComputers and Microprocessors. Lecture 34 PHYS3360/AEP3630
Computers and Microprocessors Lecture 34 PHYS3360/AEP3630 1 Contents Computer architecture / experiment control Microprocessor organization Basic computer components Memory modes for x86 series of microprocessors
More informationNegotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye
Negotiating the Maze Getting the most out of memory systems today and tomorrow Robert Kaye 1 System on Chip Memory Systems Systems use external memory Large address space Low cost-per-bit Large interface
More informationTEGRA K1 AND THE AUTOMOTIVE INDUSTRY. Gernot Ziegler, Timo Stich
TEGRA K1 AND THE AUTOMOTIVE INDUSTRY Gernot Ziegler, Timo Stich Previously: Tegra in Automotive Infotainment / Navigation Digital Instrument Cluster Passenger Entertainment TEGRA K1 with Kepler GPU GPU:
More informationZynq-7000 All Programmable SoC Product Overview
Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform
More informationMultimedia SoC System Solutions
Multimedia SoC System Solutions Presented By Yashu Gosain & Forrest Picket: System Software & SoC Solutions Marketing Girish Malipeddi: IP Subsystems Marketing Agenda Zynq Ultrascale+ MPSoC and Multimedia
More informationDeep Learning: Transforming Engineering and Science The MathWorks, Inc.
Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationComputer Organization and Design, 5th Edition: The Hardware/Software Interface
Computer Organization and Design, 5th Edition: The Hardware/Software Interface 1 Computer Abstractions and Technology 1.1 Introduction 1.2 Eight Great Ideas in Computer Architecture 1.3 Below Your Program
More informationECE 571 Advanced Microprocessor-Based Design Lecture 3
ECE 571 Advanced Microprocessor-Based Design Lecture 3 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 30 January 2018 Homework #1 was posted Announcements 1 Microprocessors Also
More informationx86 Architectures; Assembly Language Basics of Assembly language for the x86 and x86_64 architectures
x86 Architectures; Assembly Language Basics of Assembly language for the x86 and x86_64 architectures topics Preliminary material a look at what Assembly Language works with - How processors work»a moment
More informationOperating Systems, Fall
Input / Output & Real-time Scheduling Chapter 5.1 5.4, Chapter 7.5 1 I/O Software Device controllers Memory-mapped mapped I/O DMA & interrupts briefly I/O Content I/O software layers and drivers Disks
More informationLast 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture
Last 2 Classes: Introduction to Operating Systems & C++ tutorial User apps OS Virtual machine interface hardware physical machine interface An operating system is the interface between the user and the
More informationGPU Fundamentals Jeff Larkin November 14, 2016
GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate
More informationA Framework for Memory Hierarchies
Associativity schemes Scheme Number of sets Blocks per set Direct mapped Number of blocks in cache 1 Set associative Blocks in cache / Associativity Associativity (2-8) Fully associative 1 Number Blocks
More information