High Availability High Performance Plone Guido Stevens guido.stevens@cosent.nl www.cosent.nl Social Knowledge Technology
Plone Worldwide
Resilience
Please wave, to improve my speech
Plone as usual Aspeli: über-buildout for a production Plone server Regebro: Plone-Buildout-Example nginx frontend varnish cache haproxy balancer 4x plone instance zeo backend
Plone as usual
Plone as usual webserver :80
Plone as usual caching
Plone as usual balancing across Plone instances
Plone as usual Plone instances
Plone as usual ZEO backend
Meet the client High-profile internet technology NGO Slashdot traffic levels 0.4 million page views / peak day 4 40 million page views / month million hits / month Mission-critical web presence 100% uptime previous 5 years Non-Plone sysadmins High security
No can do SPOF WTF? SPOF
Architecture Goals Must convince file-based 100% uptime sysadmins No SPOF eliminate all Single Points Of Failure Automated failover no manual intervention Extreme performance Extreme resilience killall -9 Plone
Meet Paul Stevens My brother mod_wodan + DBmail Plone developer pjstevns on irc/github/etc NFG Net Facilities Group premium hosting 24/7 MySQL HA since stone age www.nfg.nl
Plone as usual
3-tier
Plone as usual
Duplicate setup
Load Balancer
Load Balancer Client provided hardware load balancer Alternative: Linux Virtual Server + HAproxy 2x HAproxy in active/passive config this would be an EXTRA layer of HAproxy not shown in diagram use highly available virtual IP address monitor with Heartbeat or comparable failover virtual IP addres with arping broadcasts Alternative: AWS
Load Balancer
Ensure physical separation Ensure redundancy across physical servers no use to fail over on same machine separate machines in separate data centers Gotcha: moving virtuals around Disable HA facilities of virtualization platform We'll do our own HA
Full cluster
Replacing ZEO
ZEO versus Relstorage ZEO Relstorage ZEO protocol ZEO protocol filestorage MySQL or PostgreSQL object pickles object pickles: no alchemy! ZRS Replication MySQL replication $$$ at the time done that 24/7 since 2001 later opensourced widely used No hot-failover slave master reconfig Hot failover multi-master
Relstorage on MySQL
Blobstorage Not shown in diagram Client provided Netapp Metrocluster NFS disks no need to care about replication and HA for those Alternatives: DRBD + NFS AWS Elastic Block Device F-sniper + rsync + NFS Why not run database on that? disk replication + NFS + ZEO what can possibly go wrong?
Full cluster
Apache + Wodan
mod_wodan Caching module for Apache C Originally by ICS for nu.nl Now maintained by NFG Store response body + headers on disk BOFH attitude to caching policies Used in anger Alternative: stxnext.staticdeployment
Varnish Wodan Proxy process Apache module RAM memory cache Persistent disk cache restart empty cache restart full cache expired gone expired keep fallback Plays nice BOFH request + response headers my way or the highway etag split-view single cache file per page purge API plone.app.caching Cronjobs maintenance crawl sitemap delete removed pages
Varnish plus Wodan Varnish Wodan unload Plone failsafe content delivery plone.app.caching policies hard policy config pages 1 hour pages 1 minute resources longer resources longer purge on edit edit 1-minute refresh etag split-view per-user page versions cache authenticated Gotcha: anonymous only editors bypass Wodan
Failure Modes
Full cluster
MySQL failover
Multi Master MySQL multi-master cross replication any can be master each slaves the other hot failover and failback Gotcha: use only 1 master at a time Relstorage is not multi-master avoid replication errors mmm_agent server (not shown in diagram) monitors mysql health and replication manages virtual MySQL HA ip address think: Heartbeat for MySQL
Blade failure
Wodan only
Plone as usual file-based content delivery
Readonly Rescue Mode File-based content delivery mod_wodan full cache of all pages + resources cached search results (Subject / tag cloud) AJAX-driven graceful degradation detect backend down via non-cached lightweight view disable interactive elements via CSS search bar, personal tools display:none Gotcha: anonymous only @@ipaddress not a full page: minimal rendering overhead down for authenticated until manual reconfig Gotcha: ErrorDocument pre-cache nice page but preserve http error status code
No-downtime maintenance
Full cluster
cosent.nl/blog