GFS Best Practices and Performance Tuning Curtis Zinzilieta, Red Hat Global Services May 2007
GFS Overview Highly available, cluster aware filesystem All nodes read/write through shared san concurrently Supports Fibre Channel SAN, iscsi, SAS, GNBD With correct infrastructure, can sustain any single point of failure Scales through hundreds of nodes. Typical sizes are from 2 100. Posix compliant...supports Posix locking, Posix ACL's, Quotas and extended attributes Oracle Certified Cluster Filesystem Requires GULM lock manager now, DLM in certification Continued Enhancements... Root filesystem installations via anaconda Making journals special files and movable Faster statfs performance... 2
GFS Configurations Single Node and Clustered configurations Single Node gfs_mkfs -j 5 -t lock_nolock /dev/vg00/mygfs Clustered Filesystem Runs on Clustered LVM2 Reconfiguration of single node deployment possible Cluster Mirroring just released in RHEL4.5, Scheduled for RHEL5.2 Shared storage for virtual client live migration and support Setup and configuration tools Conga, web management interface system config cluster 3
GFS Best Practices Infrastructure and Storage Connection Storage Connections iscsi GNBD Fibre Channel SAS All can use Multipath MPIO System setup ntpd Active/Active Performance and Redundancy 4
GFS Best Practices Infrastructure and Storage Connection Network Layout Multiple Networks NIC Bonding Power Fence Switch Switch B A Switch Switch 5
GFS Best Practices Fencing and Recovery Power Fencing Fabric Fencing Management Fencing, physical and virtual SCSI Reservations IPMI Found in /sbin/fence_<methodname> 6
GFS Best Practices Quorum One vote per node, majority is quorum and in control Two node clusters Qdisk partition, re added in RHEL4u4 and RHEL5 Non LVM, raw partition, 10MB Best on it's own LUN...better performance Additional voting and quorum selection <cman expected_votes="1" two_node="0"/> <quorumd interval= 1 tko= 10 votes= 3 device= /dev/foo > <heuristic program= ping A -c1 -t1" score="1" interval="2"/> </quorumd> 7
GFS Best Practices Disk Format Formatting Block size Journals Volume size gfs_mkfs -J 64 -j 5 -t mycluster:mygfs -p lock_dlm /dev/myvg/mygfs 8
GFS Best Practices Cluster Locking Cluster Lock Management GULM and DLM lock_nolock... Switch to clustered filesystem with: Display superblock: gfs_tool sb /dev/vg00/mygfs all Reconfigure locking protocol: gfs_tool sb /dev/vg00/mygfs sb_lockproto lock_dlm Reconfigure table: gfs_tool sb /dev/vg00/mygfs sb_locktable mycluster:mygfs 9
GFS Performance Tuning 10
GFS Performance Tuning Performance Tuning from two angles: Faster Failover Quick detection and recovery from failed nodes or services GFS throughput and filesystem performance On disk format Parameters and tuning; gfs_tool Kernel and other system tuning 11
GFS Performance Tuning Configuration File Options, Fast Failover Service and Heartbeat Timeouts, RHEL4: deadnode_timeout: how long until a node is considered dead, due to missed heartbeats or monitoring. Defaults to 21 seconds for RHEL4 hello_timer: interval between heartbeat messages. max_retries: how many times the heartbeat will be retried <cman deadnode_timeout= 5 hello_timer= 21 max_retries= 5 > Changeable on the fly in /proc/cluster/config/cman/ Change deadnode_timeout when capturing cores, sysrq t, or other debugging echo 30 > /proc/cluster/config/cman/deadnode_timeout post_fail_delay Number of seconds to wait before fencing after a failure is detected <fence_daemon post_fail_delay= 0 /> 12
GFS Performance Tuning Configuration File Options, Fast Failover Service and Heartbeat Timeouts, RHEL5: Token: milliseconds until a token is timed out during count (22 seconds) <totem consensus="5000" join="90" token="22000"/> Change post_fail_delay when capturing cores, sysrq t, or other debugging Edit /etc/cluster/cluster.conf <fence_daemon post_fail_delay="30" post_join_delay="12"/> 13
GFS Performance Tuning Configuration File Options, Fast Failover Filesystem Tuning for faster failover: gfs_tool settune recoverd_secs 20 dead machine journal recovery, defaults to every 60 seconds Fast Fencing Device selection Recovery action 14
GFS Performance Tuning Filesystem Performance and Tunables gfs_tool gfs_tool gettune /mountpoint Locking management: ilimit, ilimit_tries, ilimit_min, scand_seconds, inoded_secs, glock_purge, demote_secs glock_purge: new for RHEL 4.5, scheduled for 5.1, to clear a percentage of unused glocks gfs_tool settune /mntpoint glock_purge 50 demote_secs: Demote locks into less restrictive states and subsequently flush cache data to disk. gfs_tool settune /mntpoint demote_secs 100 Decrease inoded_secs from 15 second default Decrease scand_seconds from 5 second default 15
GFS Performance Tuning Filesystem Tunables, continued gfs_tool, continued new_files_jdata...verify only journaling metadata, not full data copy gfs_tool settune /mntpoint new_files_jdata 0 Set readahead appropriately for workload gfs_tool settune max_readahead /mntpoint <newvalue> Mount noatime mount -t gfs -o noatime /dev/vg00/mygfs /mountpoint 16
GFS Performance Tuning On Disk Format Journals Smaller journals will sometimes make a performance difference Application tuning or layout Multiple subdirectories to resolve stat() loads statfs changes released for RHEL 4.5, scheduled for 5.1 To enable: gfs_tool settune /mntpoint statfs_fast 1 17
GFS Performance Tuning System Tuning Guidelines I/O Schedulers CFQ versus Deadline Readahead /sys/block/<dev>/queue/nr_requests /sys/block/<dev>/queue/read_ahead_kb Direct IO Always use DirectIO if the application is caching for itself (ie Databases) HBA Device Tuning Qlogic LPFC MPIO 18
GFS Performance Tuning Performance monitoring and Tuning tools Standard Unix tools Iostat, vmstat, nfsstat, sar Benchmarking tools IOzone Bonnie++ Postal, found at http://www.coker.com.au/postal/ 19
Where to go from here? References Red Hat Training RH436, Enterprise Storage Management https://www.redhat.com/training/architect/courses/rh436.html This presentation http://people.redhat.com/czinzili/docs/ Mailing lists Linux cluster: http://www.redhat.com/mailman/listinfo/linux cluster Web sites GFS home page http://www.redhat.com/gfs/ Cluster home page http://sources.redhat.com/cluster/ Conga home page http://sources.redhat.com/cluster/conga/index.html 20
Where to go from here? Other Documentation Red Hat docs: http://www.redhat.com/docs/manuals/csgfs/ Patrick Caulfield's OpenAIS writeup on CMAN interaction http://people.redhat.com/pcaulfie/docs/aiscman.pdf 21
Questions? This presentation can be found at http://people.redhat.com/czinzili/docs/ 22