OpenCAPI Technology. Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name. Join the Conversation #OpenPOWERSummit

Similar documents
OpenCAPI and its Roadmap

Industry Collaboration and Innovation

Industry Collaboration and Innovation

Industry Collaboration and Innovation

Accelerating Flash Memory with the High Performance, Low Latency, OpenCAPI Interface

Introduction to the OpenCAPI Interface

New Interconnnects. Moderator: Andy Rudoff, SNIA NVM Programming Technical Work Group and Persistent Memory SW Architect, Intel

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

CAPI SNAP framework, the tool for C/C++ programmers to accelerate by a 2 digit factor using FPGA technology

genzconsortium.org Gen-Z Technology: Enabling Memory Centric Architecture

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Gen-Z Overview. 1. Introduction. 2. Background. 3. A better way to access data. 4. Why a memory-semantic fabric

Hybrid Memory Platform

Interconnect Your Future

Revolutionizing Open. Cecilia Carniel IBM Power Systems Scale Out sales

OCP Engineering Workshop - Telco

IBM CORAL HPC System Solution

Toward a Memory-centric Architecture

GEN-Z AN OVERVIEW AND USE CASES

IBM Power Advanced Compute (AC) AC922 Server

POWER CAPI+SNAP+FPGA,

CCIX: a new coherent multichip interconnect for accelerated use cases

Power Technology For a Smarter Future

Open Innovation with Power8

Deep Learning mit PowerAI - Ein Überblick

Gen-Z Memory-Driven Computing

OpenPOWER Innovations for HPC. IBM Research. IWOPH workshop, ISC, Germany June 21, Christoph Hagleitner,

Extending RDMA for Persistent Memory over Fabrics. Live Webcast October 25, 2018

Emerging IC Packaging Platforms for ICT Systems - MEPTEC, IMAPS and SEMI Bay Area Luncheon Presentation

Efficient Data Movement in Modern SoC Designs Why It Matters

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud

Persistent Memory over Fabrics

Adrian Proctor Vice President, Marketing Viking Technology

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces

Realizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics

Université IBM i 2017

A 101 Guide to Heterogeneous, Accelerated, Data Centric Computing Architectures

POWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist

LinuxCon Japan 2014 OpenPOWER Technical Overview. Jeff Scheel Chief Engineer Linux on Power May 21, IBM Corporation

Revolutionizing the Datacenter

LegUp: Accelerating Memcached on Cloud FPGAs

Altera SDK for OpenCL

Transprecision Computing

NVM Express Awakening a New Storage and Networking Titan Shaun Walsh G2M Research

This presentation covers Gen Z coherency operations and semantics.

This presentation provides an overview of Gen Z architecture and its application in multiple use cases.

Introducing NVDIMM-X: Designed to be the World s Fastest NAND-Based SSD Architecture and a Platform for the Next Generation of New Media SSDs

IBM Power AC922 Server

Accelerating Data Centers Using NVMe and CUDA

POWER9. Jeff Stuecheli POWER Systems, IBM Systems IBM Corporation

Capturing value from an open ecosystem

RapidIO.org Update. Mar RapidIO.org 1

PCIe Storage Beyond SSDs

Zhang Tianfei. Rosen Xu

DRAM and Storage-Class Memory (SCM) Overview

IBM Deep Learning Solutions

Netronome NFP: Theory of Operation

Hardware NVMe implementation on cache and storage systems

IBM Power Systems Update. David Spurway IBM Power Systems Product Manager STG, UK and Ireland

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Why Composable Infrastructure? Live Webcast February 13, :00 am PT

Messaging Overview. Introduction. Gen-Z Messaging

Technical Steering Committee Update

SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center

P51: High Performance Networking

Building Open Source IoT Ecosystems. November 2017

40Gbps+ Full Line Rate, Programmable Network Accelerators for Low Latency Applications SAAHPC 19 th July 2011

An Open Accelerator Infrastructure Project for OCP Accelerator Module (OAM)

Fast Hardware For AI

Flexible Architecture Research Machine (FARM)

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

Adaptable Intelligence The Next Computing Era

Emergence of the Memory Centric Architectures

Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers

THE NVIDIA DEEP LEARNING ACCELERATOR

Pactron FPGA Accelerated Computing Solutions

Lecture 1: Gentle Introduction to GPUs

6.9. Communicating to the Outside World: Cluster Networking

OpenPOWER Performance

HETEROGENEOUS COMPUTE INFRASTRUCTURE FOR SINGAPORE

Next Generation Enterprise Solutions from ARM

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

Capabilities and System Benefits Enabled by NVDIMM-N

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Cloud Acceleration with FPGA s. Mike Strickland, Director, Computer & Storage BU, Altera

Welcome. Altera Technology Roadshow 2013

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Interconnect Your Future

What You can Do with NVDIMMs. Rob Peglar President, Advanced Computation and Storage LLC

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

Building blocks for custom HyperTransport solutions

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

Building NVLink for Developers

3D Xpoint Status and Forecast 2017

Revolutionizing Data-Centric Transformation

IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE

Transcription:

OpenCAPI Technology Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name Join the Conversation #OpenPOWERSummit

Industry Collaboration and Innovation

OpenCAPI Topics Computation Data Access Industry Background Where/How OpenCAPI Technology is used Technology Overview and Advantages Demonstrations OpenCAPI Consortium Where it all Happens Key Messages Throughout Open IO Standard High Performance No OS/Hypervisor/FW Overhead with Low Latency and High Bandwidth Not tied to Power Architecture Agnostic Very Low Accelerator Design Overhead Programing Ease Ideal for Accelerated Computing and SCM Supports heterogeneous environment Use Cases Optimized for within a single system node Products exist today!

Industry Background that Defined OpenCAPI Growing computational demand due to emerging workloads (e.g., AI, cognitive, etc.) Moore s Law not being supported by traditional silicon scaling Computation Driving increased dependence on Hardware Acceleration for performance Hyperscale Datacenters and HPC need much higher network bandwidth 100 Gb/s -> 200 Gb/s -> 400Gb/s are emerging Deep learning and HPC require more bandwidth between accelerators and memory Emerging memory/storage technologies are driving need for bandwidth with low latency Data Access Hardware accelerators are defining the attributes of a high performance bus Growing demand for network performance and network offload Introduction of device coherency requirements (IBM s introduction in 2013) Emergence of complex storage and memory solutions Various form factors with no one able to address everything (e.g., GPUs, FPGAs, ASICs, etc.) 4 all Relevant to Modern Data Centers

Use Cases - A True Heterogeneous Architecture Built Upon OpenCAPI OpenCAPI 3.0 OpenCAPI specifications are downloadable from the website at www.opencapi.org - Register - Download OpenCAPI 3.1

Buffered System Memory OpenCAPI Memory Buffers OpenCAPI Key Attributes Standard System Memory Advanced SCM Solutions Storage/Compute/Network etc ASIC/FPGA/FFSA FPGA, SOC, GPU Accelerator Load/Store or Block Access Caches Accelerated OpenCAPI Device Device Memory TLx/DLx U Accelerated Function TL/DL 25Gb I/O Application Any OpenCAPI Enabled Processor 1. Architecture agnostic bus Applicable with any system/microprocessor architecture 2. Optimized for High Bandwidth and Low Latency 3. High performance 25Gbps PHY design with zero overhead 4. Coherency - Attached devices operate natively within application s user space and coherently with host microprocessor 5. Virtual addressing enables low overhead with no Kernel, hypervisor or firmware involvement 6. Wide range of Use Cases and access semantics 7. CPU coherent device memory (Home Agent Memory) 8. Architected for both Classic Memory and emerging Advanced Storage Class Memory 9. Minimal OpenCAPI design overhead (FPGA less than 5%) 6

POWER9 IO Features POWER9 IO Leading the Industry PCIe Gen4 CAPI 2.0 (Power) NVLink 2.0 OpenCAPI 3.0 POWER9 Silicon Die Various packages (scale-out, scale-up) 8 and 16Gbps PHY Protocols Supported PCIe Gen3 x16 and PCIe Gen4 x8 CAPI 2.0 on PCIe Gen4 PCIe Gen4 P9 25Gbs 25Gbps PHY Protocols Supported OpenCAPI 3.0 NVLink 2.0

Virtual Addressing and Benefits An OpenCAPI device operates in the virtual address spaces of the applications that it supports Eliminates kernel and device driver software overhead Allows device to operate on application memory without kernel-level data copies/pinned pages Simplifies programming effort to integrate accelerators into applications Improves accelerator performance The Virtual-to-Physical Address Translation occurs in the host CPU Reduces design complexity of OpenCAPI-attached devices Makes it easier to ensure interoperability between OpenCAPI devices and different CPU architectures Security - Since the OpenCAPI device never has access to a physical address, this eliminates the possibility of a defective or malicious device accessing memory locations belonging to the kernel or other applications that it is not authorized to access

Acc Acceleration Paradigms with Great Performance Memory Transform Processor Chip Example: Basic work offload DLx/TLx Data Acc OpenCAPI is ideal for acceleration due to Bandwidth to/from accelerators, best of breed latency, and flexibility of an Open architecture Examples: Machine or Deep Learning such as Natural Language processing, sentiment analysis or other Actionable Intelligence using OpenCAPI attached memory Egress Transform Ingress Transform Acc Acc Processor Chip DLx/TLx Data Processor Chip DLx/TLx Data Examples: Encryption, Compression, Erasure prior to delivering data to the network or storage Needle-in-a-haystack Needle-In-A-Haystack Engine Engine Processor Chip DLx/TLx Needles Examples: Database searches, joins, intersections, merges Only the Needles are sent to the processor Large Haystack Of Data Examples: Video Analytics, Network Security, Deep Packet Inspection, Data Plane Accelerator, Video Encoding (H.265), High Frequency Trading etc Bi-Directional Transform Processor Chip DLx/TLx Data Acc Acc Examples: NoSQL such as Neo4J with Graph Node Traversals, etc 9

Comparison of Memory Paradigms Common physical interface between non-memory and memory devices OpenCAPI protocol was architected to minimize latency; excellent for classic DRAM memory Extreme bandwidth beyond classical DDR memory interface Agnostic interface will handle evolving memory technologies in the future (e.g., compute-in-mem) Ability to handle a memory buffer to decouple raw memory and host interface to optimize power, cost, perf Main Memory Example: Basic DDR attach Processor Chip DLx/TLx Data DDR4/5 OpenCAPI 3.1 Architecture Ultra Low Latency ASIC buffer chip adding +5ns on top of native DDR direct connect!! Emerging Storage Class Memory Processor Chip DLx/TLx Data SCM Storage Class Memories have the potential to be the next disruptive technology.. Examples include ReRAM, MRAM, Z-NAND All are racing to become the defacto Tiered Memory Processor Chip DLx/TLx Data DDR4/5 DLx/TLx Data SCM Storage Class Memory tiered with traditional DDR Memory all built upon OpenCAPI 3.1 & 3.0 architecture. Still have the ability to use Load/Store Semantics

CAPI and OpenCAPI Performance 128B DMA Read 128B DMA Write 256B DMA Read 256B DMA Write CAPI 1.0 PCIE Gen3 x8 Measured BW @8Gb/s CAPI 2.0 PCIE Gen4 x8 Measured BW @16Gb/s OpenCAPI 3.0 25 Gb/s x8 Measured BW @25Gb/s 3.81 GB/s 12.57 GB/s 22.1 GB/s 4.16 GB/s 11.85 GB/s 21.6 GB/s N/A 13.94 GB/s 22.1 GB/s N/A 14.04 GB/s 22.0 GB/s POWER8 CAPI 1.0 POWER9 CAPI 2.0 and OpenCAPI 3.0 Xilinx KU60/VU3P FPGA POWER8 Introduced in 2013 11 POWER9 Second Generation POWER9 Open Architecture with a Clean Slate Focused on Bandwidth and Latency

Latency Test Results

Latency Test Simple workload created to simulate communication between system and attached FPGA 1. Copy 512B from host send buffer to FPGA 2. Host waits for 128 Byte cache injection from FPGA and polls for last 8 bytes 3. Reset last 8 bytes 4. Repeat Go TO 1.

OpenCAPI Enabled FPGA Cards Mellanox Innova2 Accelerator Card Alpha Data 9v3 Accelerator Card Typical eye diagram at 25Gb/s using these cards 14

Barrel Eye G2 System Demo Actual Barrel Eye G2 demo system Packet Classifier Demonstration using Alpha Data 9v3 Accelerator Card (early classifier bringup at 20 Gb/s)

Barrel Eye G2 System Demo Actual Barrel Eye G2 demo system Packet Classifier Demonstration using Mellanox Innova2 Accelerator Card (early classifier bringup at 20 Gb/s)

OpenCAPI Consortium Incorporated September 13, 2016 Announced October 14, 2016 5 Open forum founded by AMD, Google, IBM, Mellanox, and Micron Manage the OpenCAPI specification, Establish enablement, Grow the ecosystem Currently over 35 members Consortium now established Established Board of Directors (AMD, Google, IBM, Mellanox Technologies, Micron, NVIDIA, Western Digital, Xilinx) Governing Documents (Bylaws, IPR Policy, Membership) with established Membership Levels Website www.opencapi.org Technical Steering Committee with Work Group Process established Marketing/Communications Committee Work Groups TL Specification, DL Specification, PHY Signaling, PHY Mechanical, Compliance, and Enablement Creation of additional work groups include: Memory, Software, Accelerator, and more OpenCAPI Specification available on web site, was contributed to consortium as starting point for the Work Groups Design enablement available today (reference designs, documentation, SIM environment, exercisers, etc.)

OpenCAPI Design Enablement Item Availability OpenCAPI 3.0 TLx and DLx Reference Xilinx FPGA Designs (RTL and Specifications) Xilinx Vivado Project Build with Memcopy Exerciser Device Discovery and Configuration Specification and RTL AFU Interface Specification Reference Card Design Enablement Specification 25Gbps PHY Signal Specification 25Gbps PHY Mechanical Specification OpenCAPI Simulation Environment (OCSE) Tech Preview Memcopy and Memory Home Agent Exercisers Reference Driver Available Today Today Today Today 2Q18 Today Today Today Today 2Q18 18

Membership Entitlement Details Strategic level - $25K Draft and Final Specifications and enablement License for Product development Workgroup participation and voting TSC participation Vote on new Board Members Nominate and/or run for officer election Prominent listing in appropriate materials Contributor level - $15K Draft and Final Specifications and enablement License for Product development Workgroup participation and voting TSC participation Submit proposals Observing level - $5K Final Specifications and enablement License for Product development Academic and Non-Profit level - Free Final Specifications and enablement Workgroup participation and voting

Current Members Strategic Membership level Contributor Membership level Observing Membership Level Academic Membership Level 20

Cross Industry Collaboration and Innovation Research & Academic SW Deployment Systems and Software Accelerator Solutions SOC OpenCAPI Protocol Products and Services Welcoming new members in all areas of the ecosystem 21

OpenCAPI Consortium Next Steps JOIN TODAY! www.opencapi.org Come see us in the Exhibit Hall OpenCApI BOOth. 5