This presentation covers Gen Z Buffer operations. Buffer operations are used to move large quantities of data between components.

Similar documents
This presentation covers Gen Z Memory Management Unit (ZMMU) and memory interleave capabilities.

This presentation covers Gen Z Operation Classes, also referred to as OpClasses.

This presentation provides an overview of Gen Z architecture and its application in multiple use cases.

Messaging Overview. Introduction. Gen-Z Messaging

Accelerated Library Framework for Hybrid-x86

This presentation provides a high level overview of Gen Z unicast and multicast operations.

This presentation cover Gen Z Access Control.

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

This presentation covers Gen Z Bank and Component Media structures. Gen Z Bank structures apply only to components that support P2P Core OpClass.

Building dense NVMe storage

Remote Persistent Memory With Nothing But Net Tom Talpey Microsoft

This presentation covers Gen Z coherency operations and semantics.

<Insert Picture Here> Optimizing ASO

Was ist dran an einer spezialisierten Data Warehousing platform?

I/O Management and Disk Scheduling. Chapter 11

vsan 6.6 Performance Improvements First Published On: Last Updated On:

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

New Interconnnects. Moderator: Andy Rudoff, SNIA NVM Programming Technical Work Group and Persistent Memory SW Architect, Intel

DRAM and Storage-Class Memory (SCM) Overview

PracticeTorrent. Latest study torrent with verified answers will facilitate your actual test

Netronome NFP: Theory of Operation

SAS Technical Update Connectivity Roadmap and MultiLink SAS Initiative Jay Neer Molex Corporation Marty Czekalski Seagate Technology LLC

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Topics. TCP sliding window protocol TCP PUSH flag TCP slow start Bulk data throughput

IsoStack Highly Efficient Network Processing on Dedicated Cores

What is Network Acceleration?

Fabric Interfaces Architecture. Sean Hefty - Intel Corporation

Virtual Private Networks Advanced Technologies

HPE Synergy HPE SimpliVity 380

HP Designing and Implementing HP Enterprise Backup Solutions. Download Full Version :

Routers. Session 12 INST 346 Technologies, Infrastructure and Architecture

Introduction to OpenOnload Building Application Transparency and Protocol Conformance into Application Acceleration Middleware

Accelerate Applications Using EqualLogic Arrays with directcache

<Insert Picture Here> Implementing Efficient Essbase ASO Application

Advanced Diploma in Computer Science (907) Computer Systems Architecture

Industry Collaboration and Innovation

Chapter 18. Introduction to Network Layer

Storage Networking Strategy for the Next Five Years

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Remote Persistent Memory SNIA Nonvolatile Memory Programming TWG

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

Tailwind: Fast and Atomic RDMA-based Replication. Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes

FAULT TOLERANT SYSTEMS

COMPUTER SCIENCE 4500 OPERATING SYSTEMS

An Implementation of Fog Computing Attributes in an IoT Environment

Introduction to Protocols

I/O Device Controllers. I/O Systems. I/O Ports & Memory-Mapped I/O. Direct Memory Access (DMA) Operating Systems 10/20/2010. CSC 256/456 Fall

Scalability Considerations

Virtual Private Networks Advanced Technologies

Scale-Out Architectures for Secondary Storage

White Paper. Technical Advances in the SGI. UV Architecture

SAP High-Performance Analytic Appliance on the Cisco Unified Computing System

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle

Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO. Oliver Michel

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

Integrating NetScaler ADCs with Cisco ACI

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks

CS420: Operating Systems. Kernel I/O Subsystem

Programmable NICs. Lecture 14, Computer Networks (198:552)

Networking in the Hadoop Cluster

Running VMware vsan Witness Appliance in VMware vcloudair First Published On: April 26, 2017 Last Updated On: April 26, 2017

Distributed Queue Dual Bus

Correcting mistakes. TCP: Overview RFCs: 793, 1122, 1323, 2018, TCP seq. # s and ACKs. GBN in action. TCP segment structure

Gen-Z Identifiers September Copyright 2016 by Gen-Z. All rights reserved.

Cisco HyperFlex Systems

Arista Networks and F5 Solution Integration

Lessons learned from MPI

Addressing the Memory Wall

A Kaminario Reference Architecture: Reference Architecture for Running SQL Server on ESXi

Design, Verification and Emulation of an Island-Based Network Flow Processor

Intel QuickAssist Technology

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

A 400Gbps Multi-Core Network Processor

Dr Markus Hagenbuchner CSCI319 SIM. Distributed Systems Chapter 4 - Communication

Cisco NAC Profiler Architecture Overview

Frequently asked questions from the previous class survey

Intelligent Interconnect for Autonomous Vehicle SoCs. Sam Wong / Chi Peng, NetSpeed Systems

Deployments and Network Topologies

IBM InfoSphere Streams v4.0 Performance Best Practices

Exporting the DS8000 Performance Summary

Advanced Computer Networks. End Host Optimization

Next Gen Storage StoreVirtual Alex Wilson Solutions Architect

The Google File System

Lecture 7: Data Center Networks

Common Design Principles for kdb+ Gateways

This presentation covers Unsolicited Events notification and handling, and standalone acknowledgments used to confirm reliable delivery or

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras

Lossless 10 Gigabit Ethernet: The Unifying Infrastructure for SAN and LAN Consolidation

WAN Application Infrastructure Fueling Storage Networks

Using SMR Drives with Smart Storage Stack-Based HBA and RAID Solutions

Storage System. David Southwell, Ph.D President & CEO Obsidian Strategics Inc. BB:(+1)

Algorithms and Data Structures for Efficient Free Space Reclamation in WAFL

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

White Paper. Major Performance Tuning Considerations for Weblogic Server

Database Systems. Jan Chomicki. University at Buffalo

ProgrammableFlow White Paper. March 24, 2016 NEC Corporation

OpenContrail, Real Speed: Offloading vrouter

Report. Middleware Proxy: A Request-Driven Messaging Broker For High Volume Data Distribution

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016

Transcription:

This presentation covers Gen Z Buffer operations. Buffer operations are used to move large quantities of data between components. 1

2

Buffer operations are used to get data put or push up to 4 GiB of data from buffer A to buffer B, or to get or pull data from buffer B to buffer A. Buffer operations can used in point to point, single subnet, and multi subnet topologies. 3

Buffer operations can be used to move data within a component. For example, if a component contains two addressable resources (primary and secondary media), then a buffer operation can be used to put or get data from one media to the other. Further, if the Responder contains an accelerator (e.g., an encryption / compression / graph / KVS / etc. engine), then as the data is moved between the two buffers, the Responder can perform additional accelerator specific operations in parallel (these accelerations are transparent to the application). A vector buffer operations enables up to 4 independently addressed buffers to be moved between components, or within a single component. 4

A Buffer Put operation is translated by the Responder (component A) into a series of Write requests to component B. Only the Requester and Component A need to support buffer operations; component B only needs to support normal read and write operations. This example illustrates how buffer operations can eliminate unnecessary data movement. For example, in some file or storage solutions, the data needs to flow through the host processor and memory (this is often accomplished by having component A perform a series of DMA Writes data to host memory, signal component B when the DMA Writes are completed, and then component B performs a series of DMA Reads to pull the data down. Though not shown in this example, the Requester could issue a Buffer Put request to instruct component A to write a buffer to the Requester s memory, i.e., the Requester contains buffer B. This provides multiple advantages: Eliminates the need to provision and program work queues as well as the associated control plane protocol overheads Eliminates per Requester static resource provisioning a Responder supports N outstanding buffer operations usable by multiple Requesters based on workload needs Enables all data movement to be offloaded to the Responder. Significant software stack reductions are possible (perhaps a single processor instruction to perform a put or get operation). 5

A Buffer Get operation is conceptually the same as a Buffer Put operation, except it uses Read or Large Read operations instead of Write operations. Buffer Get operations can improve solution efficiency by enabling data to be pulled at the rate that component A can consume the data.. This can reduce fabric congestion as well as improve component A s responsiveness since it controls when the data is read. 6

There are multiple types of Buffer operations. This slide illustrates a Buffer Signaled Put operation. A Buffer Signaled Put operation uses a Multi Op Signaled Write to move the last bit of data. Upon receipt of a Signaled Write, the Responder atomically updates the signaled address. Components that periodically poll the signaled address will detect the data modification, and take action. Using a Signaled Write eliminates the need to explicitly coordinate the put operation completion (the Responder does even need to know which or how many other components or process threads are polling the signaled address). Eliminating coordination cost / complexity can yield significant performance gains and improve solution quality and robustness. 7

A Buffer Signaled Get is the read analog of a Buffer Signaled Write. Instead of a Signaled Write as the last operation in the put, Component A performs a local atomic increment to signal the other components or process threads. 8

Gen Z supports Dynamic Buffer Allocation and Release operations. Allocation can be executed as a standalone operation or in combination with a Put, Get, or Signaled variants. Dynamic Buffer Allocation eliminates the need for a Requester to explicitly manage buffer allocation. For example, consider an application that continually receives new data objects such as pictures or videos. Instead of having to identify a storage or memory resource capable of storing each object, the Requester can simply issue a Dynamic Buffer Put to a Responder which automatically allocates the required buffer and receives the subsequent put data. If the Responder is a Transparent Router (TR), then it could transparently represent any number of memory and storage components, thus supporting very large addressable resources, variety of data preservation technologies (e.g., RAID), aggregate performance (e.g., memory interleave), memory and storage tiers, etc. Further, since many service providers analyze objects to identify faces, places, etc., signaled put and get operations can be used to automatically inform any number of accelerators to automatically analyze the object and generate additional meta data or to improve data analytics. 9

Buffer operations provide numerous application functional and performance benefits. Buffer operations are more than just simple put and get data movement mechanisms. Buffer operations can be used to augment or enhance solution stacks while minimizing data movement, or can be combined with acceleration technology to provide value add processing transparent to applications and middleware. 10

This concludes this presentation. Thank you. 11