A RESTful Java Framework for Asynchronous High-Speed Ingest Pablo Silberkasten Jean De Lavarene Kuassi Mensah JDBC Product Development October 5, 2017 3
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 4
Program Agenda 1 2 3 4 5 What is an HTTP/REST High-Speed Ingestor? Common issues when ingesting data Our approach to address those issues Coding and implementing this approach Demo architecture and execution! 5
Program Agenda 1 2 3 4 5 What is an HTTP/REST High-Speed Ingestor? Common issues when ingesting data Our approach to address those issues Coding and implementing this approach Demo architecture and execution! 6
What is a REST Ingestor? A separate channel (HTTP/REST) to stream data into the database It receives data continuously, concurrently (>10k concurrent clients) and with hightraffic to be inserted on the database. It exposes REST APIs: operational (stream data) administrative (CRUD operations over the destination) 7
Program Agenda 1 2 3 4 5 What is an HTTP/REST High-Speed Ingestor? Common issues when ingesting data Our approach to address those issues Coding and implementing this approach Demo architecture and execution! 8
Common issues when ingesting data Resource Contention and Transparency Number of connections/sessions becomes a performance issue and a bottleneck for scaling. Regular jdbc sql inserts does not scale. Clients blocks for a response. Transparently resolve -no changes on the client-: Routing High Availability Disaster Recover Planned Maintenance and changes in the database topology. Database upgrades (new features, version translation is not trivial. 9
Program Agenda 1 2 3 4 5 What is an HTTP/REST High-Speed Ingestor? Common issues when ingesting data Our approach to address those issues Coding and implementing this approach Demo architecture and execution! 10
Our approach Decouple clients from the database using a proxy-like design pattern. Multiplex connections (reuse). Work asynchronously (do not block for a response). Be able to scale: collect objects on 1000 s of sockets using just a handful of threads based on Java NIO Selector s API and State Machine algorithms. Build Batch Inserts blocks. Write unblockingly and asynchronously on the database. Be able to route inserts based on attributes of the objects and state of the database nodes. 11
Program Agenda 1 2 3 4 5 What is an HTTP/REST High-Speed Ingestor? Common issues when ingesting data Our approach to address those issues Coding and implementing this approach Demo architecture and execution! 12
The code challenge. Use just a handful of threads. Yet read/write over thousands of sockets. Expose a non-blocking, lightweight, stand-alone interface (and build it!). Using standard protocols. Protecting the infrastructure through flow control (backpressure). Optimize the persistence of data. Unblockingly sending prepared messages to the database. Provide High-Availability. Provide Scalability: elastically growing over the demand. Listen for events on the database. 13
Handful of threads being able to provide IO over thousands of open sockets. // Main class starting a server socket // EVERYTHING IS EVENT DRIVEN!! RESTServer server = new RESTServer(); server.startserverchannel(new AcceptEventListener() { Rest Request Ack. Persist. State Machine Algorithms with partially read requests @Override public void onaccept(socketchannel channel) { // Incoming channel set non-blocking channel.configureblocking(false); // Associate a listener for the business logic and // registers on a pool of threads where the logic will // be executed (10k s o listeners on a pool of threads) server.registerlistener(channel, new IOEventListener() { @Override public void onread() throws IOException { // Retrieve buffer associated to this // channel and resume reading until end ByteBuffer buffer = channels.get(channel); int bytesread = channel.read(buffer); // won t block! if ( reachendofrequest) { // When the reading is over send ack // Http- 200 immediately, process is async. } else { continue } 14
Non-blocking, lightweight, stand-alone interface using standard protocols 1/2. // RESTServer s methods to start a ServerSocketChannel public ServerSocketChannel startserverchannel( final AcceptEventListener listener, final int port) throws IOException { Rest Request Ack. Persist. State Machine Algorithms with partially read requests // Create server channel ServerSocketChannel serversocketchannel = ServerSocketChannel. open(); serversocketchannel.bind(new InetSocketAddress(port)).configureBlocking(false); // <--!! // Register in SelectionServices, which is // the same pool as the IOEventListener SelectionServices. getdefaultservice().register(serversocketchannel, new AcceptEventHandler(serverSocketChannel, listener)); } return serversocketchannel; 15
Non-blocking, lightweight, stand-alone interface using standard protocols 2/2. /* Rest Request Ack. Persist. Admin API State Machine Algorithms with partially read requests ß On Error/Event Cloud- based API both for admin and operational usage. Operational: putrecord putrecordbatch Admin: addstorage liststorage removestorage updatestorage Storage update can take Lambdas to act upon events (including errors). */ 16
Optimize the persistence of data, unblockingly sending prepared messages to the database 1/2.... if (reachendofrequest) { Rest Request Ack. Persist. State Machine Algorithms with partially read requests // Before sending ack to client build object // TODO Optimize, we might not need to unmarshall MyObject myobject = MyObject.fromJSonByteBufferJSon(buffer); // Queue on database workers databaseworkers. queue(myobject); // send response bufferresponse.rewind(); channel. write(bufferresponse); // clear buffer buffer.clear(); } 17
Optimize the persistence of data, unblockingly sending prepared messages to the database 2/2. /* Two more options to optimize the data persistence: Rest Request Ack. Persist. State Machine Algorithms with partially read requests 1. Using specific technology for the database (going out of the JDBC spec), like Direct Path in Oracle Database (OCI ready, we are working on adding this support in the thin driver too). 2. Using the async. Driver (check on Douglas presentation). */ 18
High Availability and Elasticity Redundancy over intercommunicated managed servers (nodes) on a cluster. Session states are stored in an outside memory data grid (Coherence*Web optionally) to migrate sessions over nodes. Template servers that can be automatically started (and stopped) to satisfy an increase on the demand depending on thresholds and using template values. Front End Http Load Balancer 19
Database event awareness UCP/Simplefan usage to receive events from the database and re-route requests accordingly Node Down Event: to act upon the event of a node going down. Service Down Event: to act upon the event of service going down. Load Advisory Event: to route requests based on the load of the nodes in the cluster. FAN EVENTS 20
Program Agenda 1 2 3 4 5 What is an HTTP/REST High-Speed Ingestor? Common issues when ingesting data Our approach to address those issues Coding and implementing this approach Demo architecture and execution! 21
The Demo High level tech. approach (recap.) Service Layer DB Worker Layer POST Service Thread Service Thread Service Thread Service Thread Flow Control DB Thread Worker DB Thread Worker DB Thread Worker DB Thread Worker DB Thread Worker DB Thread Worker DB Thread Worker DB Thread Worker DB Thread Worker DB Thread Worker DB Thread Worker DB Thread Worker DB Write 22
The Demo - Execution JMeter (load simulator) running on 1 Exalogic X4-2 compute node*. 10.000 concurrent threads 2 start up time 1 between posts 1kb payload Collector running on 1 Exalogic X4-2 compute node*. 4 threads for service I/O 10 worker threads to persist on database (each one with a dedicated connection to the db) Oracle Database 12c running on 1 Exalogic X4-2 compute node*. Table: EventId Timestamp Event Generation Timestamp Event Ingestion Payload (1Kb)) Rate: 10.000 rpc/second Avg. response time: ~ 2-3ms Inserted > 1M rows ~ 1gb *Exalogic X4-2 compute node - Memory: 94.6GB - CPU 12 cores, 24 threads, 2 sockets, Intel Xeon Processor X5675 3.07GHz, 23
Demo 24
Q&A Next Steps 25