Fast, Scalable and Energy Efficient IO Solutions: Accelerating infrastructure SoC time-to-market
|
|
- Kory Glenn
- 5 years ago
- Views:
Transcription
1 Fast, calable and Energy Efficient IO olutions: Accelerating infrastructure oc time-to-market ridhar Valluru Product Manager ARM Tech ymposia 2016
2 Intelligent Flexible Cloud
3 calability and Flexibility C A C A C A C A Compute torage torage torage ion ion Acceleration Packet flows Packet flows Packet flows Access point 2-5W oc 4-8 CPUs ~30mm 2 Target design space 100W oc CPUs ~300mm 2 Data center
4 Execution environment supporting IFC Container_1 Container_2 Container_1 Container_2 JVM_1 App_1 VNF_2... VNF_1... App_4 VNF_3 VNF_4 Non-privileged N onpriveleged G uest O 1 G uest O 2 G uest O 3 P riveleged Privileged V irtual Machines V irtu al M achine 1 V irtu al M achine 2 V irtu al M achine M onito r (V M M )/H ypervisor V irtu al M achine 3 H yper- Hyper P riveleged Privileged Firm w are, O ption R O M s, etc Physical Machine Firm w are P h ysical M ach in e (e.g., processors, DRAM, caches, mmu, iommu, other resources and oc devices...) (O ptional) Optional ystem ystem D ependent Dependent O th er E xtern al D evices (e.g., disks, NICs, FPGAs, GPUs, crypto, other accelerators, other devices...)
5 IO challenges for next-gen oc systems calability Performance Power Efficiency Limited number of hardware IO due to capacity Large number of translations Large number of Page Table Walks Large Page Table Walks Large number memory access Large number of IO stream traffic management Insufficient TLB in MMU Enormous TLB Large dynamic power 5
6 ystem memory management unit (MMU) 6
7 Coreight oc ELA-500 ELA-500 ELA-500 ELA-500 ELA-500 ELA-500 Next-generation example server subsystem Generic Interrupt Controller (GIC) IO (PCIe and accelerators) Process or Cortex-A Process or Cortex-A Interconnect Process or Cortex-A Process or Cortex-A Process or Cortex-A Process or Cortex-A ystem MMU (MMU) Coherent Mesh Network (CMN-600) Non-coherent Interconnect Peripherals ecurity (CryptoCell) DMC-620 DDR4 1-8 memory controllers..... DMC-620
8 ARM IO MMU or system MMU (MMU) IO Accelerator - Virtual Address Virtual address pace (VA) IO #1 IO #2 IO #3 IO Accelerator - Virtual Address Translation Buffer Unit - Performs translation from VA PA - Holds TLB - Performs security/access checks - Request translation miss to TCU Local AXI tream - Free flowing transport - Enables distributed MMU Address translation AXI-tream interconnect Memory TCU Physical address space (PA) Translation Cache Unit - Performs table walks of translation tables - Handles AT* requests/responses for PCIe - Request translation miss to TCU - Performs security/access checks * AT PCIe address translation services (PCIe 3.0 ECN) Memory: - Physical Address
9 MMU architecture evolution MMUv1 Features upport for v7 page table for IO virtualization 4k page granule Implemented by CoreLink MMU-401 Adds MMUv2 Up to 128 translation contexts upport for v8 page tables 64k page granule Implemented by CoreLink MMU-500 Adds MMUv3 calability enhancements for millions of translation contexts Context store in memory PCIe address translation services (AT) for returning translations to end points with address translation caches (ATC) PCIe process address space ID (PAID) for processspecific translations PCIe page requires interface (PRI) support for access to unpinned pages in memory oftware communication via memory queues (nonblocking / scalable) upport for message-signalled interrupts
10 Addressing performance & scalability challenge MMU microarchitecture VA PA translation overhead Limited TLB scalability with # IO devices VA PA Translation in ATC Cache ATC removes dependency on ize Micro- TLB mall fully associative Config cache Caches context info MMU TCU Main TLB Large set associative Multi-Level walk cache eparate 1/2 Populating ATC requires MMU to support AT (address translation services) Local ATC (address translation caches) MMU ATC #1 IO #1 ATC #2 IO #2 TCU ATC #3 IO #3 AXI-tream interconnect 10
11 Advantages of PCIe AT for IO access performance calability of ATC ize and number of ATCs grows with number of IO devices, whereas TLB size in MMU is fixed (however large) Independence of ATCs Local ATC accesses are independent of each other and do not result in cache trashing hared TLB size in a MMU can suffer from trashing if multiple IO devices access too many scattered locations in memory Customizable pre-fetch IO devices can request translations ahead of time according to known access patterns hared TLB in an MMU is not aware of IO access patterns and cannot implement a universal pre-fetch policy Customizable replacement policies IO devices can prioritize caching of some entries over others based upon known access patterns E.g., an Ethernet NIC might choose to exclusively cache ring descriptor translations and store only data buffer translations temporarily upport for unpinned memory without stalling faults with the use of PRI 11
12 ARM MMU and Cadence PCIe RP integration M: AXI master interface All normal PCIe packets with or without translated address are seen here : AXI slave interface T: DTI-AT (direct translation interface for PCIe AT) supports AT translation requests from EP Invalidation requests from TCU PRI (page request interface) requests from EP ATC EndPoint EP M PCIe Link Root Port RP M T DTI- AT TCU MMU Mem * AT PCIe address translation services (PCIe 3.0 ECN) 12
13 Cadence PCIe RC s DTI-AT features eparate interface provided with the PCIe RC IP All PCIe AT related requests, responses, invalidations are routed to this I/F DTI-AT implementation supports additional features PCIe PRI (page request interface) PCIe PAID support (process address space ID) DTI-AT is conveyed using AXI4-tream eparate master AXI4-tream and slave AXI4-tream interfaces Transaction sideband signals to indicate the context information to the DTI-AT packets can be presented/accepted in one clock cycle Debug & status Registers to capture status and error conditions encountered in the DTI-AT protocol 13
14 MMU with PCIe operation no AT 1. Client logic generates a TLP with a untranslated address 2. EP sends this as a PCIe TLP to the RP 3. On receipt by the RP, since the packet is a data flow packet, this is sent on the M interface a) If the does not have a suitable translation for the address received, it will issue a request to TCU b) The TCU will respond with the response for the 4. The then forwards the transaction to the memory 1 EndPoint EP M 2 PCIe Link Root Port RP M T 3 DTI- AT 4 a b TCU MMU Mem 14 14
15 MMU with PCIe operation with AT and ATC hit 1. Client logic generates a TLP with a virtual address 2. The client logic uses the translated Addr if available from the ATC 3. The EP sends this as a PCIe TLP that has translated address 4. On receipt by the RP, since the packet is a data-flow packet, this is sent on the M interface 5. The then forwards the transaction to the memory via the main interconnect Lookup ATC Hit 1 2 EndPoint EP M 3 PCIe Link Root Port RP M T 4 DTI- AT TCU MMU 5 Mem 15 15
16 MMU with PCIe operation with AT and ATC miss 1. EP client generates a PCIe translation for a particular address that needs translation 2. Translation request goes out on the PCIe link to the RP 3. RP sends the translation request it received on the T interface to the TCU 4. The TCU then generates the response completion 5. The RP repacks the translation completion TLP back to the EP 6. Once the EP received this completion for the translation request it generated, it populates the local ATC Lookup ATC Miss 1 1 EndPoint EP 6 M 2 PCIe Link Root Port 5 RP M T 3 DTI- AT 4 TCU MMU Mem 16 16
17 IO challenges for next-gen oc systems calability Performance Power Efficiency ATC allows PCIe RC to support multiple IO accelerators With ATC, no more address translation needed for every transaction ATC & TCU minimize memory access for page table walks AXI tream Interface allows distributed s to be connected to a TCU TCU Cache reduces page table walks Custom ATC in IO accelerator removes the need for very large TLB in MMU 17
18 ummary IFC is driving need for scalability, performance and efficiency for IO accesses in infrastructure ocs ARM has been addressing IO virtualization solutions via its MMU Fast, performant IO such as PCIe Gen4 from Cadence has been efficiently integrated with ARM s MMU with an architected interface DTI-AT Combined MMU-PCIe solution delivers high performance access for IO devices with PCIe AT as well as PRI and PAID support MMU IP from ARM is designed to handle the performance, scalability, and power efficiency demands from ocs for IFC 18
19 Questions? Want to know more? Please contact 19
CCIX: a new coherent multichip interconnect for accelerated use cases
: a new coherent multichip interconnect for accelerated use cases Akira Shimizu Senior Manager, Operator relations Arm 2017 Arm Limited Arm 2017 Interconnects for different scale SoC interconnect. Connectivity
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationBuilding High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye
Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400
More informationNegotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye
Negotiating the Maze Getting the most out of memory systems today and tomorrow Robert Kaye 1 System on Chip Memory Systems Systems use external memory Large address space Low cost-per-bit Large interface
More informationLecture 21: Virtual Memory. Spring 2018 Jason Tang
Lecture 21: Virtual Memory Spring 2018 Jason Tang 1 Topics Virtual addressing Page tables Translation lookaside buffer 2 Computer Organization Computer Processor Memory Devices Control Datapath Input Output
More informationEach Milliwatt Matters
Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets
More informationExploring System Coherency and Maximizing Performance of Mobile Memory Systems
Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech
More informationArm CoreLink MMU-600 System Memory Management Unit
Arm CoreLink MMU-600 System Memory Management Unit Revision: r0p1 Technical Reference Manual Copyright 2016, 2017 Arm Limited (or its affiliates). All rights reserved. 100310_0001_01_en Arm CoreLink MMU-600
More informationCortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving
Cortex-A75 and Cortex- DynamIQ processors Powering applications from mobile to autonomous driving Lionel Belnet Sr. Product Manager Arm Arm Tech Symposia 2017 Agenda Market growth and trends DynamIQ technology
More informationAn Intelligent NIC Design Xin Song
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) An Intelligent NIC Design Xin Song School of Electronic and Information Engineering Tianjin Vocational
More informationModeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces
Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation
More informationCoreLink MMU-500 System Memory Management Unit ARM. Technical Reference Manual. Revision: r2p2
ARM CoreLink MMU-500 System Memory Management Unit Revision: r2p2 Technical Reference Manual Copyright 2013, 2014 ARM. All rights reserved. ARM DDI 0517E () ARM CoreLink MMU-500 System Memory Management
More informationAnalyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components
Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components By William Orme, Strategic Marketing Manager, ARM Ltd. and Nick Heaton, Senior Solutions Architect, Cadence Finding
More informationEvolving IP configurability and the need for intelligent IP configuration
Evolving IP configurability and the need for intelligent IP configuration Mayank Sharma Product Manager ARM Tech Symposia India December 7 th 2016 Increasing IP integration costs per node $140 $120 $M
More informationBuilding blocks for 64-bit Systems Development of System IP in ARM
Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects
More informationARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.
ARM CORTEX-R52 Course Family: ARMv8-R Cortex-R CPU Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. Duration: 4 days Prerequisites and related
More informationOptimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd
Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block
More informationCS 153 Design of Operating Systems Winter 2016
CS 153 Design of Operating Systems Winter 2016 Lecture 16: Memory Management and Paging Announcement Homework 2 is out To be posted on ilearn today Due in a week (the end of Feb 19 th ). 2 Recap: Fixed
More informationSmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center
SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center Jeff Defilippi Senior Product Manager Arm #Arm Tech Symposia The Cloud to Edge Infrastructure Foundation for a World of 1T Intelligent
More informationNext Generation Enterprise Solutions from ARM
Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the
More informationPerformance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews
Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,
More informationKeyStone II. CorePac Overview
KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB
More informationRA3 - Cortex-A15 implementation
Formation Cortex-A15 implementation: This course covers Cortex-A15 high-end ARM CPU - Processeurs ARM: ARM Cores RA3 - Cortex-A15 implementation This course covers Cortex-A15 high-end ARM CPU OBJECTIVES
More informationThe Challenges of System Design. Raising Performance and Reducing Power Consumption
The Challenges of System Design Raising Performance and Reducing Power Consumption 1 Agenda The key challenges Visibility for software optimisation Efficiency for improved PPA 2 Product Challenge - Software
More informationCortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving
Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving Stefan Rosinger Director, Product Management Arm Arm TechCon 2017 Agenda Market growth and trends DynamIQ
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationVirtual Memory: From Address Translation to Demand Paging
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 12, 2014
More informationVIRTUAL MEMORY II. Jo, Heeseung
VIRTUAL MEMORY II Jo, Heeseung TODAY'S TOPICS How to reduce the size of page tables? How to reduce the time for address translation? 2 PAGE TABLES Space overhead of page tables The size of the page table
More informationVirtual Memory: From Address Translation to Demand Paging
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 9, 2015
More informationVirtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1
Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs
More informationVirtual to physical address translation
Virtual to physical address translation Virtual memory with paging Page table per process Page table entry includes present bit frame number modify bit flags for protection and sharing. Page tables can
More informationCS 318 Principles of Operating Systems
CS 318 Principles of Operating Systems Fall 2018 Lecture 10: Virtual Memory II Ryan Huang Slides adapted from Geoff Voelker s lectures Administrivia Next Tuesday project hacking day No class My office
More informationCSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura
CSE 451: Operating Systems Winter 2013 Page Table Management, TLBs and Other Pragmatics Gary Kimura Moving now from Hardware to how the OS manages memory Two main areas to discuss Page table management,
More informationCSE 120 Principles of Operating Systems Spring 2017
CSE 120 Principles of Operating Systems Spring 2017 Lecture 12: Paging Lecture Overview Today we ll cover more paging mechanisms: Optimizations Managing page tables (space) Efficient translations (TLBs)
More informationVirtual Virtual Memory
Virtual Virtual Memory Jason Power 3/20/2015 With contributions from Jayneel Gandhi and Lena Olson 4/17/2015 UNIVERSITY OF WISCONSIN 1 Virtual Machine History 1970 s: VMMs 1997: Disco 1999: VMWare (binary
More informationChapter 8: Memory-Management Strategies
Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and
More informationARM big.little Technology Unleashed An Improved User Experience Delivered
ARM big.little Technology Unleashed An Improved User Experience Delivered Govind Wathan Product Specialist Cortex -A Mobile & Consumer CPU Products 1 Agenda Introduction to big.little Technology Benefits
More informationOptimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs
Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem
More informationA Secure and Connected Intelligent Future. Ian Smythe Senior Director Marketing, Client Business Arm Tech Symposia 2017
A Secure and Connected Intelligent Future 1 2017 Arm Copyright Limited Arm 2017 Ian Smythe Senior Director Marketing, Client Business Arm Tech Symposia 2017 Arm: The Industry s Architecture of Choice 50
More informationCopyright 2016 Xilinx
Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building
More informationCHAPTER 8 - MEMORY MANAGEMENT STRATEGIES
CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES OBJECTIVES Detailed description of various ways of organizing memory hardware Various memory-management techniques, including paging and segmentation To provide
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Spring 2018 Lecture 10: Paging Geoffrey M. Voelker Lecture Overview Today we ll cover more paging mechanisms: Optimizations Managing page tables (space) Efficient
More information1. Creates the illusion of an address space much larger than the physical memory
Virtual memory Main Memory Disk I P D L1 L2 M Goals Physical address space Virtual address space 1. Creates the illusion of an address space much larger than the physical memory 2. Make provisions for
More informationChapter 8: Main Memory. Operating System Concepts 9 th Edition
Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel
More informationMemory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts
Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of
More informationMobile & IoT Market Trends and Memory Requirements
Mobile & IoT Market Trends and Memory Requirements JEDEC Mobile & IOT Forum Daniel Heo ARM Segment Marketing Copyright ARM 2016 Outline Wearable & IoT Market Opportunities Challenges in Wearables & IoT
More informationVT-d and FreeBSD. Константин Белоусов 21 сентября 2013 г. Revision : Константин Белоусов VT-d and FreeBSD
VT-d and FreeBSD Константин Белоусов kib@freebsd.org 21 сентября 2013 г. Revision : 1.12 PCIe Example PCI Express Topology Root & Switch CPU Bus CPU Root RCRB Bus 0 PCIe Root Complex PCIe PCIe Memory Virtual
More informationThis presentation covers Gen Z Memory Management Unit (ZMMU) and memory interleave capabilities.
This presentation covers Gen Z Memory Management Unit (ZMMU) and memory interleave capabilities. 1 2 Given the operational similarities between a Requester ZMMU and a Responder ZMMU, much of the underlying
More informationChapter 8: Main Memory
Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel
More informationChapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition
Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation
More informationChapter 8: Main Memory
Chapter 8: Main Memory Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and 64-bit Architectures Example:
More informationCoreLink MMU-400 System Memory Management Unit ARM. Technical Reference Manual. Revision: r0p1
ARM CoreLink MMU-400 System Memory Management Unit Revision: r0p1 Technical Reference Manual Copyright 2011, 2014 ARM. All rights reserved. ARM DDI 0472B () ARM CoreLink MMU-400 System Memory Management
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationAnalyze system performance using IWB. Interconnect Workbench Dave Huang
Analyze system performance using IWB Interconnect Workbench Dave Huang Perf_analysis@126.com 1 Information Personal peech of personal experience I am on behalf on myself Interconnects Are at the Heart
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Spring 2018 Lecture 16: Virtual Machine Monitors Geoffrey M. Voelker Virtual Machine Monitors 2 Virtual Machine Monitors Virtual Machine Monitors (VMMs) are a hot
More informationMemory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology
Memory: Page Table Structure CSSE 332 Operating Systems Rose-Hulman Institute of Technology General address transla+on CPU virtual address data cache MMU Physical address Global memory Memory management
More informationPipelined processors and Hazards
Pipelined processors and Hazards Two options Processor HLL Compiler ALU LU Output Program Control unit 1. Either the control unit can be smart, i,e. it can delay instruction phases to avoid hazards. Processor
More informationNetronome NFP: Theory of Operation
WHITE PAPER Netronome NFP: Theory of Operation TO ACHIEVE PERFORMANCE GOALS, A MULTI-CORE PROCESSOR NEEDS AN EFFICIENT DATA MOVEMENT ARCHITECTURE. CONTENTS 1. INTRODUCTION...1 2. ARCHITECTURE OVERVIEW...2
More informationSoftware Driven Verification at SoC Level. Perspec System Verifier Overview
Software Driven Verification at SoC Level Perspec System Verifier Overview June 2015 IP to SoC hardware/software integration and verification flows Cadence methodology and focus Applications (Basic to
More informationAddress Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Address Translation Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics How to reduce the size of page tables? How to reduce the time for
More informationCHAPTER 8: MEMORY MANAGEMENT. By I-Chen Lin Textbook: Operating System Concepts 9th Ed.
CHAPTER 8: MEMORY MANAGEMENT By I-Chen Lin Textbook: Operating System Concepts 9th Ed. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the
More informationMemory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar
More informationHow Open Channel SSD Benefit Datacenter and Enterprise Applications
How Open Channel Benefit atacenter and Enterprise Applications Rick Huang Product Marketing Manager, iliconmotion Inc. 1 hared torage ystem App 1 App 2 App 3 torage ervice oftware hared torage Pool 2 atacenter
More information1. Memory technology & Hierarchy
1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories
More informationChapter 13: I/O Systems
Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance 13.2 Silberschatz, Galvin
More informationThe Evolution of the ARM Architecture Towards Big Data and the Data-Centre
The Evolution of the ARM Architecture Towards Big Data and the Data-Centre 8th Workshop on Virtualization in High-Performance Cloud Computing (VHPC'13) held in conjunction with SC 13, Denver, Colorado
More informationMobile & IoT Market Trends and Memory Requirements
Mobile & IoT Market Trends and Memory Requirements JEDEC Mobile & IOT Forum Copyright 2016 [ARM Inc.] Outline Wearable & IoT Market Opportunity Challenges in Wearables & IoT Market ARM technology tackles
More informationLEON4: Fourth Generation of the LEON Processor
LEON4: Fourth Generation of the LEON Processor Magnus Själander, Sandi Habinc, and Jiri Gaisler Aeroflex Gaisler, Kungsgatan 12, SE-411 19 Göteborg, Sweden Tel +46 31 775 8650, Email: {magnus, sandi, jiri}@gaisler.com
More informationI/O, today, is Remote (block) Load/Store, and must not be slower than Compute, any more
I/O, today, is Remote (block) Load/Store, and must not be slower than Compute, any more Manolis Katevenis FORTH, Heraklion, Crete, Greece (in collab. with Univ. of Crete) http://www.ics.forth.gr/carv/
More informationImplementing Flexible Interconnect Topologies for Machine Learning Acceleration
Implementing Flexible Interconnect for Machine Learning Acceleration A R M T E C H S Y M P O S I A O C T 2 0 1 8 WILLIAM TSENG Mem Controller 20 mm Mem Controller Machine Learning / AI SoC New Challenges
More informationDesigning with NXP i.mx8m SoC
Designing with NXP i.mx8m SoC Course Description Designing with NXP i.mx8m SoC is a 3 days deep dive training to the latest NXP application processor family. The first part of the course starts by overviewing
More informationKnut Omang Ifi/Oracle 20 Oct, Introduction to virtualization (Virtual machines) Aspects of network virtualization:
Software and hardware support for Network Virtualization part 2 Knut Omang Ifi/Oracle 20 Oct, 2015 32 Overview Introduction to virtualization (Virtual machines) Aspects of network virtualization: Virtual
More informationMemory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.
Memory Hierarchy Goal: Fast, unlimited storage at a reasonable cost per bit. Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory. Fast: When you need something
More information(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (3 rd Week)
+ (Advanced) Computer Organization & Architechture Prof. Dr. Hasan Hüseyin BALIK (3 rd Week) + Outline 2. The computer system 2.1 A Top-Level View of Computer Function and Interconnection 2.2 Cache Memory
More informationVirtual Memory. Motivations for VM Address translation Accelerating translation with TLBs
Virtual Memory Today Motivations for VM Address translation Accelerating translation with TLBs Fabián Chris E. Bustamante, Riesbeck, Fall Spring 2007 2007 A system with physical memory only Addresses generated
More informationWilliam Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ William Stallings Computer Organization and Architecture 10 th Edition 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. 2 + Chapter 3 A Top-Level View of Computer Function and Interconnection
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L20-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationNext Generation Verification Process for Automotive and Mobile Designs with MIPI CSI-2 SM Interface
Thierry Berdah, Yafit Snir Next Generation Verification Process for Automotive and Mobile Designs with MIPI CSI-2 SM Interface Agenda Typical Verification Challenges of MIPI CSI-2 SM designs IP, Sub System
More informationAddress spaces and memory management
Address spaces and memory management Review of processes Process = one or more threads in an address space Thread = stream of executing instructions Address space = memory space used by threads Address
More informationADDRESS TRANSLATION AND TLB
ADDRESS TRANSLATION AND TLB Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 3 submission deadline: Nov.
More informationGetting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013
Getting the Most out of Advanced ARM IP ARM Technology Symposia November 2013 Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block are now Sub-Systems Cortex
More informationCSE 560 Computer Systems Architecture
This Unit: CSE 560 Computer Systems Architecture App App App System software Mem I/O The operating system () A super-application Hardware support for an Page tables and address translation s and hierarchy
More informationMapping applications into MPSoC
Mapping applications into MPSoC concurrency & communication Jos van Eijndhoven jos@vectorfabrics.com March 12, 2011 MPSoC mapping: exploiting concurrency 2 March 12, 2012 Computation on general purpose
More information14 May 2012 Virtual Memory. Definition: A process is an instance of a running program
Virtual Memory (VM) Overview and motivation VM as tool for caching VM as tool for memory management VM as tool for memory protection Address translation 4 May 22 Virtual Memory Processes Definition: A
More informationSmartNIC Programming Models
SmartNIC Programming Models Johann Tönsing 206--09 206 Open-NFP Agenda SmartNIC hardware Pre-programmed vs. custom (C and/or P4) firmware Programming models / offload models Switching on NIC, with SR-IOV
More informationFast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names
Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory? The Environment ---- Long Latency
More informationCIS Operating Systems I/O Systems & Secondary Storage. Professor Qiang Zeng Fall 2017
CIS 5512 - Operating Systems I/O Systems & Secondary Storage Professor Qiang Zeng Fall 2017 Previous class Memory subsystem How to allocate physical memory? How to do address translation? How to be quick?
More informationChapter 5. Introduction ARM Cortex series
Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1
More informationDesigning, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems
Designing, developing, debugging ARM and heterogeneous multi-processor systems Kinjal Dave Senior Product Manager, ARM ARM Tech Symposia India December 7 th 2016 Topics Introduction System design Software
More informationMobile & IoT Market Trends and Memory Requirements
Mobile & IoT Market Trends and Memory Requirements JEDEC Mobile & IOT Forum Ivan H. P. Lin ARM Segment Marketing Copyright ARM 2016 Outline Wearable & IoT Market Opportunities Challenges in Wearables &
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationADDRESS TRANSLATION AND TLB
ADDRESS TRANSLATION AND TLB Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 submission deadline: Mar.
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 18: Virtual Memory Lecture Outline Review of Main Memory Virtual Memory Simple Interleaving Cycle
More information... Application Note AN-531. PCI Express System Interconnect Software Architecture. Notes Introduction. System Architecture.
PCI Express System Interconnect Software Architecture Application Note AN-531 Introduction By Kwok Kong A multi-peer system using a standard-based PCI Express (PCIe ) multi-port switch as the system interconnect
More informationDynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for 5G Networks
DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for 5G Networks Jeff Maguire Senior Product Manager Infrastructure IP Product Management Arm 2017 Arm Limited Arm Tech Symposia 2017 Agenda 5G networks
More informationAn Approach for Implementing NVMeOF based Solutions
An Approach for Implementing OF based olutions anjeev Kumar oftware Product Engineering, HiTech, Tata Consultancy ervices 25 May 2018 1 DC India 2018 Copyright 2018 Tata Consultancy ervices Limited Agenda
More informationData Path acceleration techniques in a NFV world
Data Path acceleration techniques in a NFV world Mohanraj Venkatachalam, Purnendu Ghosh Abstract NFV is a revolutionary approach offering greater flexibility and scalability in the deployment of virtual
More information2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.
Ethernet Storage Fabrics Using RDMA with Fast NVMe-oF Storage to Reduce Latency and Improve Efficiency Kevin Deierling & Idan Burstein Mellanox Technologies 1 Storage Media Technology Storage Media Access
More informationCIS Operating Systems I/O Systems & Secondary Storage. Professor Qiang Zeng Spring 2018
CIS 3207 - Operating Systems I/O Systems & Secondary Storage Professor Qiang Zeng Spring 2018 Previous class Memory subsystem How to allocate physical memory? How to do address translation? How to be quick?
More information