CO405H. Department of Compu:ng Imperial College London. Computing in Space with OpenSPL Topic 14: Networking DFEs
|
|
- Asher Walton
- 5 years ago
- Views:
Transcription
1 CO405H Computing in Space with OpenSPL Topic 14: Networking DFEs Oskar Mencer Georgi Gaydadjiev Department of Compu:ng Imperial College London h#p:// h#p:// CO405H course page: WebIDE: OpenSPL consor:um page: h#p://cc.doc.ic.ac.uk/openspl16/ h#p://openspl.doc.ic.ac.uk h#p://
2 DFEs in the network Networks operate on data streams This seems like natural use scenario for DFEs Some network processing specific problems Possible soluions Applications: Long Int. Multiplication RSA Cryptography Dynamic Programming Laplace Heat Equation Viterbi Decoder Sound Synthesis Neural Networks
3 The networking DFE Card {TOP, MID, BOT}: 4 10G Serial Links QSFP TOP QDR II SRAM (18MB) DDR3 DRAM (8-16GB) QSFP MID QSFP BOT PTP QDR II SRAM (18MB) QDR II SRAM (18MB) QDR II SRAM (18MB) Reconfigurable Logic QMEM LMEM DDR3 DRAM (8-16GB) DDR3 DRAM (8-16GB) PCI Epress 16 (electrical) 8 (logical) JDFE is pin compaible but has 8 serial links: 4 to the switch fabric (ports selectable via JunOS) 4 directly connected to the reconfigurable chip 3
4 The Software Stack The user applicaion is wri#en using the SLiC API StaIcally linked to libslic.a SLiC relies on MaelerOS s MaRT Linked via libmaeleros.so a shared library MaRT communicates with the driver Driver eposes funcionality through the file-system: /dev/maeler0 /proc/maeler/ MaRT performs ioctls on the device file MaRT also uses file-operaions on the /proc/maeler/dev0/ files The driver communicates with the Hardware via Slave IO EffecIvely Mapped Memory IO Driver writes to a special memory address and the Hardware receives the data The Hardware communicates Directly with the User applicaion using DMA This means the hardware accesses the System RAM directly User app reads/writes from/to System RAM 4
5 Links to/from CPU The manager can create a special enity for echanging data with the CPU All communicaions with the CPU is done over PCI Epress Generally, there is only one way to echange data with the CPU: Memory Access From the CPU point of view: To send data to the DFE à Write it to a special memory address (pointer) To read data from the DFE à The DFE writes the data directly in to a given buffer, so the CPU simply polls that memory region and the data will appear there at some point From the DFE point of view: Receiving data from the CPU à Data will appear at the output of a special manager block, and go over a link in to a kernel Sending data to the CPU à Send the data out on a link that is connected to the special manager block DFE System RAM CPU 5
6 Link Interfaces (flow control) Links have 2 types of interfaces: 1. PUSH 2. PULL PUSH Valid / Stall SemanIc Pull Read / (empty/almost empty) SemanIc DirecIon of the Data determines the arrow direcion: Input = Data coming into the block Output = Data going out Source Sink Input Manager Node Output 6
7 Computer Networks (Packets and Frames) IO connected in the Manager code addethernetstream( ) will create all the necessary components for you to be able to receive packets from the network The terms Packets and Frames are used interchangeably but we prefer the term Frame A frame is any data that is presented to the user along side the following metadata: SOF Start Of Frame indicator EOF End of Frame indicator MOD Number of valid bytes on the End Of Frame word 7
8 10G Network Traffic Through a Serial Link 10Gbps à 1 bit every 0.1ns à 64bits every 6.4ns 6.4ns period à MHz 10Gbps (e.g., Fiber) SFP Module 1 10GHz Deserializer MAC MOD EOF SOF Data MHz Kernel inside the Manager 8
9 Standard Ethernet Interfaces The most commonly used 10G Ethernet interfaces vary simply by data width MHz MHz (It s really MHz) This is because: 32b * 312.5MHz = 64b * MHz = 10Gbps 64-bit interface: Data = 64 bits SOF=1 bit EOF=1 bit MOD = BitsToAddress(64/8) = 3 bits 32-bit interface: Data = 32 bits SOF=1 bit EOF=1 bit MOD = BitsToAddress(32/8) = 2 bits SOF 1 bit EOF 1 bit MOD 3 bits Data 64 bits Eample with 69 bit wide bus 9
10 Network Traffic: User Point of View Eample of a 59 byte frame over a 64 bit link running at MHz MOD indicates how many bytes are valid at the last word of the frame SOF EOF MOD MOD is only Valid when EOF=1 Data [64 bits = 8 Bytes] Time [Cycles] 10
11 Hello World! H e l l o W o r l d! SOF EOF MOD Data [64 bits] H e l l o w o r l d! r l d! H e l l o w o Time [Cycles] 11
12 Clock Domains Simply put: A set of blocks in which data can move at a certain rate Typical Clock Domains: PCI Epress (Gen 2.0 8) 250MHz Network (64 bits) 156MHz LMEM 400MHz QMEM 550MHz Stream Clock (default) 100MHz The higher the clock frequency, the faster data can move Data can move between clock domains it might not appear coniguous at the desinaion clock domain At 100MHz At Network (156MHz) Time 12
13 Clock Domains (cont) Every Manager Block is associated with a Clock Domain Some blocks, like Kernels and State Machines are fleible and can be assigned to a specific clock domain by the User When different blocks that belong to different Clock Domains are connected the Manager automaically inserts a Dual-Clock FIFO to help with the domain transiion 100MHz 250MHz K2 100MHz 400MHz 300MHz PCI Epress 250MHz LMEM 400MHz K1 300MHz 13
14 Clock Domains and Throughput A Kernel that generates data on every clock cycle For a fied length of Ime: The higher the clock frequency, the more data the kernel will generate K1 100MHz K2 200MHz Time [ns] 14
15 Store and Forward The act of storing a complete frame before forwarding it downstream for further processing Common for Checksum verificaion Less common: Clock-domain transiioning Input Output Time - EOF - SOF 15
16 Store and Forward Frame Continuity A store and forward block can convert a nonconinuous frame in to a coninuous one This property is important when connecing directly to Ethernet MACs since those can only work with coninuous frames At 100MHz At Network (156MHz) EOF - SOF Ater S&F Time 16
17 Kernel Latency The pipeline depth from a specific input to a specific output For HPC applicaions - Typically in the 1,000s For Networking applicaions Typically in the 100s For Ultra-Low-Latency applicaions Typically in the 10s The numbers in (brackets) are the node latencies 17
18 Stream Hold Data flowing through a graph will only be valid at certain Imes If we re interested in the 4 th data item relaive to the start of frame, we can store it for later use by using a stream hold The streamhold will only remember the value that was at its input when valid & sof = true Only when valid&sof = true, the stream.offset(data, 3) was interesing streamhold(stream.offset(data, 3), valid & sof) Valid SOF Input of StreamHold Output of StreamHold Slice data +3 Hold Valid & Sof Valid & Sof ater Scheduling (1) Slice valid Slice sof (1) a b c d e f d d d & Wall Time [cycles] 18
19 Kernels behavior: Pull in, Push out PULL type inputs PUSH type outputs FIFO queues at each input and output of a kernel FIFOs can normally hold up to 512 data words before being full Pull Input Push Output Read Full/stall Data Kernel Data Empty Valid 19
20 Kernel Flow Control - Stalling Input: Read a#empt but there is no data available Kernel will stop everything unil there is more data! Output: Write a#empt but the output buffer is full Kernel will stop everything unil there is more space! K Pull Input Push Output 20
21 Kernel Flushing I have only 2 data Items: A and B Input Output 21
22 Kernel Flushing A goes in, B is sill outside Input A Output 22
23 Kernel Flushing Both A and B are processed Input has now gone empty! Input B A A Output 23
24 Kernel Flushing A problem! No more inputs Kernel is stalling! B Input A A Output 24
25 How long will a kernel run for? We normally specify the number of epected data items This is called the RunCycleCounter and it s part of the kernel s Flushing Logic Run Cycle Count = 2 25
26 Kernel Flushing With Cycle Count = 2 We got 2 data items, Flushing logic kicks in! B Input Run Cycle Count = 2 A A Output 26
27 Kernel Flushing With Cycle Count = 2 Flushing. Input Run Cycle Count = 2 B B A Output 27
28 Kernel Flushing With Cycle Count = 2 SIll Flushing. Input Run Cycle Count = 2 B Output 28
29 Kernel Flushing With Cycle Count = 2 And we re done! Input Run Cycle Count = 2 Output 29
30 There is, however, a problem This doesn t work with networking How many data items will the kernel epect? Unknown We might receive 1. We might receive 2. An infinite amount or nothing at all! What do we do? 30
31 Non-Blocking Inputs Non-blocking inputs solve this problem They never stall the kernel even when there s no data available! They do this by adding a Valid bit to every incoming word 32 bits 33 bits = 32 bits + 1 valid bit 31
32 Kernel Flushing Non-Blocking Input Input, v=0, v=0 v=0, v=0 Output 32
33 Kernel Flushing Non-Blocking Input Input A, v=1, v=0 v=0, v=0 Output 33
34 Kernel Flushing Non-Blocking Input Input B, v=1 A, v=1 A, v=1, v=0 Output 34
35 Kernel Flushing Non-Blocking Input Input, v=0 B, v=1 B, v=1 A, v=1 Output 35
36 Kernel Flushing Non-Blocking Input Input, v=0, v=0, v=0 B, v=1 Output 36
37 Kernel Flushing Non-Blocking Input The kernel keeps running forever with valid=0 unil more real data arrives!, v=0 Input, v=0, v=0, v=0 Things will furfure improve with Custom Kernels, a new class currently under development Output 37
38 Network DFE Simulated System MaelerOS comes with a SimulaIon environment It aims to be cycle-accurate when compared to the hardware environment In reality, Iming of dynamic events are completely different, but the inkernel simulaion is very accurate SLiC is always the same so it s trivial for an applicaion to switch between hardware and simulaion The hardware is simulated inside the MaelerOS Sim Daemon ApplicaIon Normal SLiC SimulaIon MaRT MaelerOS SimulaIon Daemon CPU 38
39 Simulated Networks SimulaIon uses TUN/TAP devices to create virtual NICs in Linu These NICs simulate a network device that has a direct connecion to a physical port on the Simulated DFE Linu can send and receive packets through the simulated NIC and those would go to/from the simulated DFE s port Simulated DFE TOP MID BOT Simulated Network Simulated NIC Simulated NIC SimulaIon Daemon Simulated NIC Linu 39
40 Simulation in Practice The simulated DFE is completely invisible from Linu s point of view The best way to think about it, is that the DFE is a different computer enirely that happens to be connected to the same network that the Simulated NIC is connected to. This means that a Linu program that uses standard sockets, can send data back and forth to the Simulated DFE using standard network protocols TCP/UDP/ ICMP etc You need to make sure: To assign the Simulated NIC an IP address To assign the Simulated DFE an IP address which is in the same network as the Simulated NIC Simulated DFE TOP Simulated NIC tap0 in Linu
41 Metadata Metadata Data about data Links in the Manager are designed to have data stream from one manager enity to the other, metadata is essenial for contetualizing the data Most common eamples: SOF, EOF, MOD Indicates how to interpret the data as a frame UDP/TCP Socket Tells which remote connecion the data belongs to Remote 1 Data Socket = 3 MaTCP Network Remote 2 Remote 3 41
42 Framed Link Fields with Metadata UlImately, a link is just a collecion of wires: data is indisinguishable from metadata from the Hardware s point of view When the link connects to the desinaion block, the link fields, it is viewed as one wide bus in our eample 83 wires. The individual fields are sliced out of this bus First 64 wires are the data Net 14 wires are the socket number etc. SOF 1 bit EOF 1 bit MOD 3 bits Socket 14 bits Data 64 bits The link width is 83 bits includes both data and metadata. That s 83 wires going in to the manager block. 42
43 Summary Network streams can be handled by DFE kernels Networking DFEs have specific requirements There are challenges with flow control Flushing kernels and non-blocking inputs can help Networking DFE kernels care about latency 43
CO405H. Department of Compu:ng Imperial College London. Computing in Space with OpenSPL Topic 5: Programming DFEs, basics
CO405H Computing in Space with OpenSPL Topic 5: Programming DFEs, basics Oskar Mencer Georgi Gaydadjiev Department of Compu:ng Imperial College London h#p://www.doc.ic.ac.uk/~oskar/ h#p://www.doc.ic.ac.uk/~georgig/
More informationPowerPC on NetFPGA CSE 237B. Erik Rubow
PowerPC on NetFPGA CSE 237B Erik Rubow NetFPGA PCI card + FPGA + 4 GbE ports FPGA (Virtex II Pro) has 2 PowerPC hard cores Untapped resource within NetFPGA community Goals Evaluate performance of on chip
More informationCO405H. Department of Compu:ng Imperial College London. Computing in Space with OpenSPL Topic 4: DataFlow Engines (DFEs)
Oskar Mencer CO405H Computing in Space with OpenSPL Topic 4: DataFlow Engines (DFEs) Georgi Gaydadjiev Department of Compu:ng Imperial College London h#p://www.doc.ic.ac.uk/~oskar/ h#p://www.doc.ic.ac.uk/~georgig/
More informationCO405H. Department of Compu:ng Imperial College London. Computing in Space with OpenSPL Topic 4: DataFlow Engines (DFEs)
Oskar Mencer CO405H Computing in Space with OpenSPL Topic 4: DataFlow Engines (DFEs) Georgi Gaydadjiev Department of Compu:ng Imperial College London h#p://www.doc.ic.ac.uk/~oskar/ h#p://www.doc.ic.ac.uk/~georgig/
More informationP51: High Performance Networking
P51: High Performance Networking Lecture 6: Programmable network devices Dr Noa Zilberman noa.zilberman@cl.cam.ac.uk Lent 2017/18 High Throughput Interfaces Performance Limitations So far we discussed
More informationCO405H. Department of Compu:ng Imperial College London. Computing in Space with OpenSPL Topic 15: Porting CPU Software to DFEs
CO405H Computing in Space with OpenSPL Topic 15: Porting CPU Software to DFEs Oskar Mencer Georgi Gaydadjiev Department of Compu:ng Imperial College London h#p://www.doc.ic.ac.uk/~oskar/ h#p://www.doc.ic.ac.uk/~georgig/
More informationChapter 9: A Closer Look at System Hardware
Chapter 9: A Closer Look at System Hardware CS10001 Computer Literacy Chapter 9: A Closer Look at System Hardware 1 Topics Discussed Digital Data and Switches Manual Electrical Digital Data Representation
More informationChapter 9: A Closer Look at System Hardware 4
Chapter 9: A Closer Look at System Hardware CS10001 Computer Literacy Topics Discussed Digital Data and Switches Manual Electrical Digital Data Representation Decimal to Binary (Numbers) Characters and
More information100% PACKET CAPTURE. Intelligent FPGA-based Host CPU Offload NIC s & Scalable Platforms. Up to 200Gbps
100% PACKET CAPTURE Intelligent FPGA-based Host CPU Offload NIC s & Scalable Platforms Up to 200Gbps Dual Port 100 GigE ANIC-200KFlex (QSFP28) The ANIC-200KFlex FPGA-based PCIe adapter/nic features dual
More informationInfiniBand SDR, DDR, and QDR Technology Guide
White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses
More informationGeneric Model of I/O Module Interface to CPU and Memory Interface to one or more peripherals
William Stallings Computer Organization and Architecture 7 th Edition Chapter 7 Input/Output Input/Output Problems Wide variety of peripherals Delivering different amounts of data At different speeds In
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationCO405H. Department of Compu:ng Imperial College London. Computing in Space with OpenSPL Topic 6: Programming DFEs (advanced)
CO405H Computing in Space with OpenSPL Topic 6: Programming DFEs (advanced) Oskar Mencer Georgi Gaydadjiev Department of Compu:ng Imperial College London h#p://www.doc.ic.ac.uk/~oskar/ h#p://www.doc.ic.ac.uk/~georgig/
More informationINT G bit TCP Offload Engine SOC
INT 10011 10 G bit TCP Offload Engine SOC Product brief, features and benefits summary: Highly customizable hardware IP block. Easily portable to ASIC flow, Xilinx/Altera FPGAs or Structured ASIC flow.
More informationNFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications
NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan
More informationCO405H. Department of Compu:ng Imperial College London. Computing in Space with OpenSPL Topic 9: Programming DFEs (Loops II)
CO405H Computing in Space with OpenSPL Topic 9: Programming DFEs (Loops II) Oskar Mencer Georgi Gaydadjiev special thanks: Jacob Bower Department of Compu:ng Imperial College London h4p://www.doc.ic.ac.uk/~oskar/
More informationViews of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)
CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so
More informationReview: Hardware user/kernel boundary
Review: Hardware user/kernel boundary applic. applic. applic. user lib lib lib kernel syscall pg fault syscall FS VM sockets disk disk NIC context switch TCP retransmits,... device interrupts Processor
More informationHIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS
HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS CS6410 Moontae Lee (Nov 20, 2014) Part 1 Overview 00 Background User-level Networking (U-Net) Remote Direct Memory Access
More informationPCIe 10G SFP+ Network Card
PCIe 10G SFP+ Network Card User Manual Ver. 1.00 All brand names and trademarks are properties of their respective owners. Contents: Chapter 1: Introduction... 3 1.1 Product Introduction... 3 1.2 Features...
More informationIntroduction to PCI Express Positioning Information
Introduction to PCI Express Positioning Information Main PCI Express is the latest development in PCI to support adapters and devices. The technology is aimed at multiple market segments, meaning that
More informationINT 1011 TCP Offload Engine (Full Offload)
INT 1011 TCP Offload Engine (Full Offload) Product brief, features and benefits summary Provides lowest Latency and highest bandwidth. Highly customizable hardware IP block. Easily portable to ASIC flow,
More informationFELI. : the detector readout upgrade of the ATLAS experiment. Soo Ryu. Argonne National Laboratory, (on behalf of the FELIX group)
LI : the detector readout upgrade of the ATLAS experiment Soo Ryu Argonne National Laboratory, sryu@anl.gov (on behalf of the LIX group) LIX group John Anderson, Soo Ryu, Jinlong Zhang Hucheng Chen, Kai
More informationCSE398: Network Systems Design
CSE398: Network Systems Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University April 04, 2005 Outline Recap
More information10GE network tests with UDP. Janusz Szuba European XFEL
10GE network tests with UDP Janusz Szuba European XFEL Outline 2 Overview of initial DAQ architecture Slice test hardware specification Initial networking test results DAQ software UDP tests Summary 10GE
More informationThis Unit: Main Memory. Building a Memory System. First Memory System Design. An Example Memory System
This Unit: Main Memory Building a Memory System Application OS Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors Memory hierarchy review DRAM technology A few more transistors Organization:
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More information6.9. Communicating to the Outside World: Cluster Networking
6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and
More information10G bit UDP Offload Engine (UOE) MAC+ PCIe SOC IP
Intilop Corporation 4800 Great America Pkwy Ste-231 Santa Clara, CA 95054 Ph: 408-496-0333 Fax:408-496-0444 www.intilop.com 10G bit UDP Offload Engine (UOE) MAC+ PCIe INT 15012 (Ultra-Low Latency SXUOE+MAC+PCIe+Host_I/F)
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More information100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21
100 GBE AND BEYOND 2011 Brocade Communications Systems, Inc. Diagram courtesy of the CFP MSA. v1.4 2011/11/21 Current State of the Industry 10 Electrical Fundamental 1 st generation technology constraints
More informationPerformance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms
Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State
More informationTMS320C64x EDMA Architecture
Application Report SPRA994 March 2004 TMS320C64x EDMA Architecture Jeffrey Ward Jamon Bowen TMS320C6000 Architecture ABSTRACT The enhanced DMA (EDMA) controller of the TMS320C64x device is a highly efficient
More informationOPERATING SYSTEMS CS136
OPERATING SYSTEMS CS136 Jialiang LU Jialiang.lu@sjtu.edu.cn Based on Lecture Notes of Tanenbaum, Modern Operating Systems 3 e, 1 Chapter 5 INPUT/OUTPUT 2 Overview o OS controls I/O devices => o Issue commands,
More informationAN 829: PCI Express* Avalon -MM DMA Reference Design
AN 829: PCI Express* Avalon -MM DMA Reference Design Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Latest document on the web: PDF HTML Contents Contents 1....3 1.1. Introduction...3 1.1.1.
More information1G Bit TCP+UDP Offload Engine (TOE+UOE) Hardware IP Core
Intilop Corporation 4800 Great America Pkwy Ste-231 Santa Clara, CA 95054 Ph: 408-496-0333 Fax:408-496-0444 www.intilop.com 1G bit TCP+UDP Offload Engine MAC + Host_IF (Same PHY Port) INT 2511 (Ultra-Low
More informationDESIGN AND IMPLEMENTATION OF AN AVIONICS FULL DUPLEX ETHERNET (A664) DATA ACQUISITION SYSTEM
DESIGN AND IMPLEMENTATION OF AN AVIONICS FULL DUPLEX ETHERNET (A664) DATA ACQUISITION SYSTEM Alberto Perez, Technical Manager, Test & Integration John Hildin, Director of Network s John Roach, Vice President
More informationTopic & Scope. Content: The course gives
Topic & Scope Content: The course gives an overview of network processor cards (architectures and use) an introduction of how to program Intel IXP network processors some ideas of how to use network processors
More informationCS510 Operating System Foundations. Jonathan Walpole
CS510 Operating System Foundations Jonathan Walpole OS-Related Hardware & Software 2 Lecture 2 Overview OS-Related Hardware & Software - complications in real systems - brief introduction to memory protection,
More informationSo computers can't think in the same way that people do. But what they do, they do excellently well and very, very fast.
Input What is Processing? Processing Output Processing is the thinking that the computer does - the calculations, comparisons, and decisions. Storage People also process data. What you see and hear and
More informationPerformance Evaluation of Myrinet-based Network Router
Performance Evaluation of Myrinet-based Network Router Information and Communications University 2001. 1. 16 Chansu Yu, Younghee Lee, Ben Lee Contents Suez : Cluster-based Router Suez Implementation Implementation
More informationCluster Computing. Interconnect Technologies for Clusters
Interconnect Technologies for Clusters Interconnect approaches WAN infinite distance LAN Few kilometers SAN Few meters Backplane Not scalable Physical Cluster Interconnects FastEther Gigabit EtherNet 10
More informationCommon Computer-System and OS Structures
Common Computer-System and OS Structures Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection General System Architecture Oct-03 1 Computer-System Architecture
More informationBasics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS
Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS
More informationDistributed Queue Dual Bus
Distributed Queue Dual Bus IEEE 802.3 to 802.5 protocols are only suited for small LANs. They cannot be used for very large but non-wide area networks. IEEE 802.6 DQDB is designed for MANs It can cover
More information7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc.
Technology in Action Technology in Action Chapter 9 Behind the Scenes: A Closer Look a System Hardware Chapter Topics Computer switches Binary number system Inside the CPU Cache memory Types of RAM Computer
More informationModule 12: I/O Systems
Module 12: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Performance Operating System Concepts 12.1 Silberschatz and Galvin c
More informationNegotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye
Negotiating the Maze Getting the most out of memory systems today and tomorrow Robert Kaye 1 System on Chip Memory Systems Systems use external memory Large address space Low cost-per-bit Large interface
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationBarcelona: a Fibre Channel Switch SoC for Enterprise SANs Nital P. Patwa Hardware Engineering Manager/Technical Leader
Barcelona: a Fibre Channel Switch SoC for Enterprise SANs Nital P. Patwa Hardware Engineering Manager/Technical Leader 1 Agenda Introduction to Fibre Channel Switching in Enterprise SANs Barcelona Switch-On-a-Chip
More informationArchitecture of Computers and Parallel Systems Part 2: Communication with Devices
Architecture of Computers and Parallel Systems Part 2: Communication with Devices Ing. Petr Olivka petr.olivka@vsb.cz Department of Computer Science FEI VSB-TUO Architecture of Computers and Parallel Systems
More informationLecture 23. Finish-up buses Storage
Lecture 23 Finish-up buses Storage 1 Example Bus Problems, cont. 2) Assume the following system: A CPU and memory share a 32-bit bus running at 100MHz. The memory needs 50ns to access a 64-bit value from
More informationMulti-Gigabit Transceivers Getting Started with Xilinx s Rocket I/Os
Multi-Gigabit Transceivers Getting Started with Xilinx s Rocket I/Os Craig Ulmer cdulmer@sandia.gov July 26, 2007 Craig Ulmer SNL/CA Sandia is a multiprogram laboratory operated by Sandia Corporation,
More informationPicture of memory. Word FFFFFFFD FFFFFFFE FFFFFFFF
Memory Sequential circuits all depend upon the presence of memory A flip-flop can store one bit of information A register can store a single word, typically 32-64 bits Memory allows us to store even larger
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationComp 204: Computer Systems and Their Implementation. Lecture 18: Devices
Comp 204: Computer Systems and Their Implementation Lecture 18: Devices 1 Today Devices Introduction Handling I/O Device handling Buffering and caching 2 Operating System An Abstract View User Command
More informationThe Memory Component
The Computer Memory Chapter 6 forms the first of a two chapter sequence on computer memory. Topics for this chapter include. 1. A functional description of primary computer memory, sometimes called by
More informationNetronome NFP: Theory of Operation
WHITE PAPER Netronome NFP: Theory of Operation TO ACHIEVE PERFORMANCE GOALS, A MULTI-CORE PROCESSOR NEEDS AN EFFICIENT DATA MOVEMENT ARCHITECTURE. CONTENTS 1. INTRODUCTION...1 2. ARCHITECTURE OVERVIEW...2
More informationI/O Devices. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)
I/O Devices Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Hardware Support for I/O CPU RAM Network Card Graphics Card Memory Bus General I/O Bus (e.g., PCI) Canonical Device OS reads/writes
More informationComputer System Components
Computer System Components CPU Core 1 GHz - 3.2 GHz 4-way Superscaler RISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware
More informationQUIZ Ch.6. The EAT for a two-level memory is given by:
QUIZ Ch.6 The EAT for a two-level memory is given by: EAT = H Access C + (1-H) Access MM. Derive a similar formula for three-level memory: L1, L2 and RAM. Hint: Instead of H, we now have H 1 and H 2. Source:
More informationINT-1010 TCP Offload Engine
INT-1010 TCP Offload Engine Product brief, features and benefits summary Highly customizable hardware IP block. Easily portable to ASIC flow, Xilinx or Altera FPGAs INT-1010 is highly flexible that is
More information15: OS Scheduling and Buffering
15: OS Scheduling and ing Mark Handley Typical Audio Pipeline (sender) Sending Host Audio Device Application A->D Device Kernel App Compress Encode for net RTP ed pending DMA to host (~10ms according to
More informationQsys and IP Core Integration
Qsys and IP Core Integration Stephen A. Edwards (after David Lariviere) Columbia University Spring 2016 IP Cores Altera s IP Core Integration Tools Connecting IP Cores IP Cores Cyclone V SoC: A Mix of
More informationRDMA programming concepts
RDMA programming concepts Robert D. Russell InterOperability Laboratory & Computer Science Department University of New Hampshire Durham, New Hampshire 03824, USA 2013 Open Fabrics Alliance,
More informationDevices. Today. Comp 104: Operating Systems Concepts. Operating System An Abstract View 05/01/2017. Devices. Devices
Comp 104: Operating Systems Concepts Devices Today Devices Introduction Handling I/O Device handling Buffering and caching 1 2 Operating System An Abstract View User Command Interface Processor Manager
More informationCOSMOS Architecture and Key Technologies. June 1 st, 2018 COSMOS Team
COSMOS Architecture and Key Technologies June 1 st, 2018 COSMOS Team COSMOS: System Architecture (2) System design based on three levels of SDR radio node (S,M,L) with M,L connected via fiber to optical
More informationIntelop. *As new IP blocks become available, please contact the factory for the latest updated info.
A FPGA based development platform as part of an EDK is available to target intelop provided IPs or other standard IPs. The platform with Virtex-4 FX12 Evaluation Kit provides a complete hardware environment
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationECE 485/585 Microprocessor System Design
Microprocessor System Design Lecture 4: Memory Hierarchy Memory Taxonomy SRAM Basics Memory Organization DRAM Basics Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering
More informationComponents of a personal computer
Components of a personal computer Computer systems ranging from a controller in a microwave oven to a large supercomputer contain components providing five functions. A typical personal computer has hard,
More informationBuses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.
es > 100 MB/sec Pentium 4 Processor L1 and L2 caches Some slides adapted from lecture by David Culler 3.2 GB/sec Display Memory Controller Hub RDRAM RDRAM Dual Ultra ATA/100 24 Mbit/sec Disks LAN I/O Controller
More informationIntroduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses
Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses 1 Most of the integrated I/O subsystems are connected to the
More informationChapter 5 - Input / Output
Chapter 5 - Input / Output Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 5 - Input / Output 1 / 90 1 Motivation 2 Principle of I/O Hardware I/O Devices Device Controllers Memory-Mapped
More informationCS162 Operating Systems and Systems Programming Lecture 16. I/O Systems. Page 1
CS162 Operating Systems and Systems Programming Lecture 16 I/O Systems March 31, 2008 Prof. Anthony D. Joseph http://inst.eecs.berkeley.edu/~cs162 Review: Hierarchy of a Modern Computer System Take advantage
More informationMotivation to Teach Network Hardware
NetFPGA: An Open Platform for Gigabit-rate Network Switching and Routing John W. Lockwood, Nick McKeown Greg Watson, Glen Gibb, Paul Hartke, Jad Naous, Ramanan Raghuraman, and Jianying Luo JWLockwd@stanford.edu
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More information10 G Bit TCP+UDP Offload Engine (TOE+UOE) Hardware IP Core
Intilop Corporation 4800 Great America Pkwy Ste-231 Santa Clara, CA 95054 Ph: 408-496-0333 Fax:408-496-0444 www.intilop.com 10G bit TCP+UDP Offload Engine MAC + PCIe + Host_IF (Same PHY Port) INT 25012
More informationNIOS II Pixel Display
NIOS Pixel Display SDRAM 512Mb Clock Reset_bar CPU Onchip Memory External Memory Controller JTAG UART Pixel DMA Resampler Scaler Dual Port FIFO VGA Controller Timer System ID VGA Connector PLL 2 tj SDRAM
More informationBlazePPS (Blaze Packet Processing System) CSEE W4840 Project Design
BlazePPS (Blaze Packet Processing System) CSEE W4840 Project Design Valeh Valiollahpour Amiri (vv2252) Christopher Campbell (cc3769) Yuanpei Zhang (yz2727) Sheng Qian ( sq2168) March 26, 2015 I) Hardware
More informationJANUARY 28, 2014, SAN JOSE, CA. Microsoft Lead Partner Architect OS Vendors: What NVM Means to Them
JANUARY 28, 2014, SAN JOSE, CA PRESENTATION James TITLE Pinkerton GOES HERE Microsoft Lead Partner Architect OS Vendors: What NVM Means to Them Why should NVM be Interesting to OS Vendors? New levels of
More informationLegUp: Accelerating Memcached on Cloud FPGAs
0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are
More informationOverview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM
Memories Overview Memory Classification Read-Only Memory (ROM) Types of ROM PROM, EPROM, E 2 PROM Flash ROMs (Compact Flash, Secure Digital, Memory Stick) Random Access Memory (RAM) Types of RAM Static
More informationMultifunction Networking Adapters
Ethernet s Extreme Makeover: Multifunction Networking Adapters Chuck Hudson Manager, ProLiant Networking Technology Hewlett-Packard 2004 Hewlett-Packard Development Company, L.P. The information contained
More informationMainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation
Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer
More informationHARNESS THE POWER OF GAMING GAMING DESKTOP PC. Infokit
HARNESS THE POWER OF GAMING GAMING DESKTOP PC Infokit 1 Hardcore Gamers Best Choice GAMING DESKTOP PC 2 Major Three Selling Points Unmatched design Gaming in style Supreme power Getting the best performance
More informationMicroprocessor & Interfacing Lecture DMA Controller--1
Microprocessor & Interfacing Lecture 26 8237 DMA Controller--1 E C S D E P A R T M E N T D R O N A C H A R Y A C O L L E G E O F E N G I N E E R I N G Contents Introduction Features Basic Process of DMA
More informationMainstream Computer System Components
Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved
More informationNetwork Interface Architecture and Prototyping for Chip and Cluster Multiprocessors
University of Crete School of Sciences & Engineering Computer Science Department Master Thesis by Michael Papamichael Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors
More informationCOS 318: Operating Systems. NSF, Snapshot, Dedup and Review
COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early
More informationThe RM9150 and the Fast Device Bus High Speed Interconnect
The RM9150 and the Fast Device High Speed Interconnect John R. Kinsel Principal Engineer www.pmc -sierra.com 1 August 2004 Agenda CPU-based SOC Design Challenges Fast Device (FDB) Overview Generic Device
More informationThe D igital Digital Logic Level Chapter 3 1
The Digital Logic Level Chapter 3 1 Gates and Boolean Algebra (1) (a) A transistor inverter. (b) A NAND gate. (c) A NOR gate. 2 Gates and Boolean Algebra (2) The symbols and functional behavior for the
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationInput/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security
Input/Output Today Principles of I/O hardware & software I/O software layers Disks Next Protection & Security Operating Systems and I/O Two key operating system goals Control I/O devices Provide a simple,
More informationComputers Are Your Future
Computers Are Your Future 2008 Prentice-Hall, Inc. Computers Are Your Future Chapter 6 Inside the System Unit 2008 Prentice-Hall, Inc. Slide 2 What You Will Learn... Understand how computers represent
More informationCS 326: Operating Systems. Networking. Lecture 17
CS 326: Operating Systems Networking Lecture 17 Today s Schedule Project 3 Overview, Q&A Networking Basics Messaging 4/23/18 CS 326: Operating Systems 2 Today s Schedule Project 3 Overview, Q&A Networking
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationChapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel
Chapter-6 SUBJECT:- Operating System TOPICS:- I/O Management Created by : - Sanjay Patel Disk Scheduling Algorithm 1) First-In-First-Out (FIFO) 2) Shortest Service Time First (SSTF) 3) SCAN 4) Circular-SCAN
More information