Reliable Distributed Messaging with HornetQ Lin Zhao Software Engineer, Groupon lin@groupon.com
Agenda Introduction MessageBus Design Client API Monitoring Comparison with HornetQ Cluster Future Work
Introduction
Groupon Architecture REST Service Oriented Data streamed through messaging
(Part of) Groupon Architecture Merchant Data Relevance SEO Deals MessageBus Users Orders Getaways Accounting Goods
Problem Statement I want to know when something happens in another world I want to listen to existing events without integration effort Subscriber 1 Message Platform Publisher Subscriber 2 New Subscriber
MessageBus
Requirements Performance Scalability Reliability Data Replication
HornetQ JBoss open-source messaging project. Journal based, protocol agnostic. HornetQ Server JMS Client Journal Network STOMP Client Protocol Manager HornetQ Core Page Store
Architecture STOMP Publisher 1 Load Balancer DRBD Backup 2 DRBD Backup 1 HornetQ 1 STOMP Consumer HornetQ 2
Design A publisher publishes to one broker at a time. The publisher reconnects once every 5 minutes (configurable) to be load balanced to a new broker to publish to. A broker list resides on each broker. Referred to as consumer list. This list is a text file that can be modified on the fly. The consumer client reaches a random broker from load balancer, then discovers the rest of the cluster by reading the consumer list on that broker. The consumer fetches the consumer list once every 5 minutes.
Guarantees Messages are delivered at least once. A message may be redelivered under circumstances the server deem needed to ensure at lease once guarantee. Messages are replicated real-time to back up machines. When send_safe is returned, the message has at least reached the memory of the backup machine. Messages may be delivered out of order.
Interactions STOMP Publisher 1 STOMP Publisher 2 Load Balancer DRBD Backup 2 DRBD Backup 1 HornetQ 1 HornetQ1,HornetQ2 STOMP Consumer HornetQ 2 5 Minutes passed
Adding a New Broker 1. 2. 3. 4. Start the standalone broker. Update consumer list on each broker to include the new one. Waits 5 minutes. Updates load balancer to direct publisher traffic to the new broker.
Adding a New Broker STOMP Publisher 1 STOMP Publisher 2 Load Balancer HornetQ 2 STOMP Consumer DRBD Backup 3 DRBD Backup 2 DRBD Backup 1 HornetQ 1 HornetQ 3
Removing a Broker 1. 2. 3. 4. Take the broker from rotation on the load balancer. Wait until all messages on this broker is consumed. Remove the broker from consumer lists of existing brokers. Retire the broker.
Removing a Broker STOMP Publisher 1 STOMP Publisher 2 Load Balancer HornetQ 2 STOMP Consumer DRBD Backup 3 DRBD Backup 2 DRBD Backup 1 HornetQ 1 HornetQ 3
When a Broker is Down Load balancer detects the down time through heartbeat and no longer routes publishes to the bad broker. Consumers continue to consume from good brokers without interruption. The dropped connection is logged. Persisted messages on the bad broker is replicated through DRBD. In the case the down time is permanent, data on the backup hosts can be migrated and used on a new host (manual operation needed).
Data Replication DRBD 8.3 Protocol B. In memory replication Guarantee. HornetQ Server DRBD Backup /dev/sdax /dev/sday../hornetq/ journal Primary HornetQ Core../hornetq/ paging Network Protocol Manager../hornetq/ journal Secondary../hornetq/ paging
Data Replication Mbus Client HornetQ Server send(message) DRBD Backup DRBD sync Local write Local write Receipt
Monitoring HornetQ Core 1 HornetQ Core 2 JMX2 HTML JMX2 HTML Aggregator HornetQ Core 3 JMX2 HTML Monitord Agent Metrics Ganglia Visualized Data Nagios Email/Phone alerts Mornitord Cluster
Monitoring
Client API Publisher start(config) stop() publish(message) publishsafe(message)
Client API Consumer start(config) stop() receive receive_immediate receive(timeout) (Continued)
Client API Consumer ack acksafe nack keepalive
Single Node Alternative What if I really want ordering guarantee? Single broker cluster. Primary DRBD & GFS Master HornetQ Primary Backup HornetQ Cluster Manager (Load Balancer) Publisher Consumer
Single Node Alternative Must auto failover to ensure near-zero down time. The brokers need to be physically close to each other to reduce latency and avoid performance impact. Mbus Client HornetQ Server send(message) DRBD sync Local write Receipt DRBD Backup
Comparison with HornetQ Cluster
HornetQ Cluster HornetQ built-in feature. Supports load balancing, redistribution. Most requests takes 2 hosts to handle. Compromises throughput. Publisher HornetQ 1 HornetQ 2 HornetQ 3
Benchmarks Setup. Two HornetQ servers, each with: CPU: 2 x Intel E5645 (2.4 GHZ, 6 cores, Hyper Threaded, 12MB ache, 5.86GT/s QPI) Memory: 64GB DDR3-1333 Storage: 4 x 1TB SATA, 3Gb/s, 7200 RPM, 64 MB Cache, software RAID 10 NIC: 1Gbps Ethernet Tests: Scenario 1: 50 publishers publish 10,000 messages each Scenario 2: 50 publishers publish 10,000 messages each while 50 consumers are consumer from the same queue.
Benchmarks 3386 2848 1417 1445
MessageBus Or HornetQ Cluster?
Future Work
Future Work RESTFul Servce. Publisher in production Consumer tricky. The REST service needs to buffer messages and dynamically determine number of consumers Real prioritized HornetQ in paging mode Open sourcing the project.
Contact messagebust-team@groupon.com lin@groupon.com http://engineering.groupon.com