Achieving the performance benefits of Infiniband in Java Mark Falco Oracle Coherence Development 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
The following is intended to outline general product use and direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 2 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Exalogic / Exabus 3 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Exalogic - Hardware 24 cores 96GB RAM 30 compute nodes in a full rack QDR Infiniband 4 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Infiniband High throughput (~32gbs in QDR) Low latency (~1us) Super Jumbo Frames (MTU 64KB) Supports standard IP stack (UDP/TCP) Verbs based API Remote Direct Memory Access (RDMA) pre-registered memory accessible to remote machines operates without involving host CPU 5 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Exabus - Exalogic I/O and Network Design Eliminates cloud, cluster and network virtualization I/O bottlenecks Exalogic X2-2 Ethernet Gateway Switches Spine Switch IB Data Center Service Network (10GbE) Standard Oracle Database Data Center Mgmt Network (GbE) 10GbE GbE Management Switch Exabus (InfiniBand I/O Backplane) Compute Nodes Storage Exadata Exalogic SPARC SuperCluster Management Network (GbE) ZFS Storage 6 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8 Copyright 2011 Oracle Corpora4on
Exabus - Optimizations Direct Memory I/O for Java New Java APIs and Exalogic Elastic Cloud Software - Low Latency Java support for Infiniband - Optimized implementation for Exalogic Infiniband Surfacing low-level advanced networking capabilities 7 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Infiniband - Socket Direct Protocol Streaming sockets API, i.e. SOCK_STREAM Easily integrated into TCP based applications zero-copy or kernel-bypass Java availability Proprietary in JDK6 Standard in JDK7 8 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Infiniband - Coherence Integration Initially attempted over standard UDP Experimented with TCP/SDP Required many co-located nodes to utilize bandwidth Dozens in order to max out HCA Latencies Large objects: benefit from Infiniband without protocol change Small objects: on-par with standard ethernet (300-600us) 9 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Binary low-level message transport Multi-point addressing Reliable ordered delivery Asynchronous event based programming model Pluggable provider based framework SocketBus (TCP/SDP) Native RDMA Exabus 10 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Exabus - Next-generation of Exalogic performance optimization Coherence WebLogic IB Transport APIs SDP Tuxedo Any Linux or Solaris App. Na4ve RDMA EoIB TCP/IP IPoIB InfiniBand Core Hardware and Firmware New for Exalogic V1.1 Exalogic V1.0 11 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
- API public interface {! void seteventcollector(collector<event> collector);! void open();! void close();! void connect(endpoint peer);! void disconnect(endpoint peer);! void release(endpoint peer);! void flush();! void send(endpoint peer, BufferSequence buf, Object receipt);! }! 12 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
- Events Event OPEN CLOSE CONNECT DISCONNECT RELEASE MESSAGE RECEIPT BACKLOG_EXCESSIVE BACKLOG_NORMAL Indicates Start of bus event stream End of bus event stream Start of per- connec4on event stream End of confirmed delivery per- connec4on event stream End of per- connec4on event stream Local message delivery Message delivery confirma4on Start of backlog condi4on End of backlog condi4on 13 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
- Native RDMA Zero-copy and kernel-bypass Optimized for sender latency Predictive notifications avoid costly interrupts Asynchronous task based system manages protocol Custom DirectByteBuffer allows for zero-copy reduces GC pressure 14 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Message Transfer - Native RDMA Sender Receiver RDMA Write Header Ring Buffer Allocation Message RDMA Read Body Message Ring Buffer RDMA Write Receipt Delivery Delivery Collector Collector 15 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
- Coherence Integration Pluggable message transport per service Legacy system utilized a single transport for entire JVM Increased Parallel Processing Network I/O Message Deserialization Message Delivery - Java context switches 1 vs. 3 Potential for zero context switches 16 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
- Coherence Integration Member 1 Member 2 Member 3 PartitionedCache Service (Cache: A, B, C) PartitionedCache Service (Cache: D, E, F) InvocationService PartitionedCache Service (Cache: A, B, C) PartitionedCache Service (Cache: D, E, F) InvocationService PartitionedCache Service (Cache: A, B, C) PartitionedCache Service (Cache: D, E, F) InvocationService tmb:// 192.168.1.1:8000.1 tmb:// 192.168.1.2:8000.2 tmb:// 192.168.1.2:8000.3 tmb:// 192.168.1.1:8001.1 tmb:// 192.168.1.2:8001.2 tmb:// 192.168.1.2:8001.3 tmb:// 192.168.1.1:8002.1 tmb:// 192.168.1.2:8002.2 tmb:// 192.168.1.2:8002.3 Exabus RDMA 17 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
- Coherence Integration The network is no longer the bottleneck Measured Improvements small number of nodes can max out HCA latencies reduced to ~100us RDMA Bus, ~200us SocketBus Future direction more ses per service prototyped solution drops latency down to 70us designs to drop latency to 40us 18 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Q&A 19 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8