Insight Case Studies Tuning the Beloved DB-Engines Presented By Nithya Koka and Michael Arnold
Who is Nithya Koka? Senior Hadoop Administrator Project Lead Client Engagement On-Call Engineer Cluster Ninja On numerous Insight projects 5+ years in IT - 4 years with Hadoop
Who is Michael Arnold? Principal Systems Engineer Automation geek 20+ years in IT - 9 years with Hadoop I help people deal with: Servers (physical and virtual) Networks Server operating systems Hadoop distributions Making it all run smoothly
Agenda Impala Tuning Case Study HBase Tuning Case Study C L A I R V O Y A N T S O F T. C O M
Impala Tuning Impala Tuning Case Study
Impala Tuning Case Study: ClientA Impala Woes 1. Impala threads peak, crash the daemon, and all queries hang causing complete outage to their end users. This is happening over: 2 years, on and off Multiple support tickets Several tuning attempts No trends on host or timeframe where these incidents tend to occur 2. Impala queries on HUE error out with expired results messages
Impala Tuning Initial Insight Evaluation Gotchas Captured: Role Layout: over burdened Master hosts Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple Java versions Default swappiness Transparent hugepages was enabled
Impala Tuning Impala Threads Typical Incident Pattern
Impala Tuning Impala Threads Typical Incident Pattern
Impala Tuning Impala Threads : Deep Dive 1. Potential disk errors in dmesg output for incident prone hosts. 2. The JVM crashes reported by Impala. 3. HDFS file count snowballing.
Impala Tuning 1.15Million Files 750K
Impala Tuning Impala Threads : Deep Dive 1. Disk Errors Without Spill directories configured, Scratch was defaulting to /tmp/impala-scratch, which was unsuitable for the scale and concurrency. Resolution: Spread the disk spill across the data drives.
Impala Tuning Impala Threads : Deep Dive 1. Disk Errors Identified bad RAID controller : Three problem disks on a master host, RAID10 virtual disk for namenode, RAID1 virtual disk for Journalnode and another RAID1 virtual disk for Zookeeper. Resolution: The host with bad disks was decommissioned to replace the disks and brought back in a good state. Regular scans have been set with the raid controller CLI to alert about any future incidents.
Impala Tuning Impala Threads : Deep Dive 2. Impala reported JVM Crashes The running OS kernel version is known to cause CDH applications to pause and result in JVM hangs as seen on Impala reports. Resolution: Upgrading kernel version to 2.6.32-504.16.2.el6 or later is recommended
Impala Tuning Impala Threads : Deep Dive 3. The small files problem: Parquet files in order of KB which led to slow IO throughput. Coordinator and Executor connections fail due to high scan times from NN. The failed executor connections kick off more threads which add up very quickly and crash the daemon. Resolution: By rewriting Parquet Compaction to dynamic partitions the client was able to produce 1 file in place of 29 files, significantly reducing the file count overall.
Impala Tuning Impala Threads : Deep Dive Tuning for Scale Since Impala 2.9, we can assign Impala Daemons as query coordinators or query executors. These two components can now be tuned as per their responsibilities giving us more flexibility.
Impala Tuning Impala Threads : Deep Dive Tuning for Scale Coordinators: Perform the network communication to keep metadata up-to-date and route query results to the appropriate clients. Experience significant network and CPU overhead with queries containing a large number of query fragments. Need large JVM heap for caching metadata for all table partitions and data files.
Impala Tuning Impala Threads : Deep Dive Tuning for Scale Executors: Need default JVM Heap, leaving more memory available to process CPU intensive joins, aggregations, and other operations. Executors perform I/O intensive scans.
Impala Tuning Impala Threads : Deep Dive Tuning for Scale Coordinators: How Many? [Our cluster: 3] Small is good (a minimum of 1 dedicated) Considerations: # of Impala Daemons, DDL queries, average query resource usage at various stages. Where do they go? [Our cluster: Utility hosts] Coordinators can go non-workers. Avoid losing out on resources, memory, or disk.
Impala Tuning High Availability Choosing the right Load-Balancing Algorithm for High Availability through a proxy. LeastConn: What? When? Connects sessions to the coordinator with the fewest connections, to balance the load evenly. Many independent, short-running queries. Where? Recommended for Impala with F5.
Impala Tuning High Availability Choosing the right Load-Balancing Algorithm for High Availability through a proxy. RoundRobin: What? When? Where? Distributes connections to all coordinator nodes, we can add list of servers with a weight parameter to define the distribution. Predictable and stable balancing, requires to perform benchmarks and load testing. Not recommended by Cloudera for Impala.
Impala Tuning High Availability Choosing the right Load-Balancing Algorithm for High Availability through a proxy. Source Persistence: What? When? Where? The source IP address is hashed and divided by the total weight of the running servers to determine which server will receive the request. Impala workloads containing a mix of queries and DDL statements, such as CREATE TABLE and ALTER TABLE. It is required for setting up high availability with Hue.
HBase Tuning HBase Tuning Case Study
HBase Tuning Case Study: ClientB OpenTSDB Platform Upgrade Client wanted to upgrade from manually installed HBase environment to the Cloudera distribution's HBase. New hardware with much larger RAM footprint. SSDs, because, why not? (And not important to this tuning.)
HBase Tuning Initial Insight Evaluation Gotchas Captured: None, really. It is not installed yet, but we will need to tune HBase to utilize a lot more memory.
HBase Tuning Java Use the Java Development Kit (JDK) version 8.
HBase Tuning Java Enable garbage collection (GC) logging. -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC -XX:+PrintFlagsFinal -Xloggc:/var/log/hbase/regionserver-gc.log
HBase Tuning Java Enable garbage collection (GC) log rotation. -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=200M
HBase Tuning Java Enable G1GC Garbage Collector for RegionServer. -XX:+UseG1GC -XX:MaxGCPauseMillis=100 https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html
HBase Tuning Java Tune G1GC. -XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:ParallelGCThreads=8+(logical Processors-8)(5/8) -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=3 https://www.oracle.com/technetwork/articles/java/g1gc-1984535.html
HBase Tuning Configuration Where do the HBase GC settings go? Cloudera Manager: HBase -> Configuration -> SCOPE:RegionServer / CATEGORY:Advanced / Java Configuration Options for HBase RegionServer Ambari: Service/HBase/Configs -> CONFIGS / ADVANCED / Advanced hbase-env / hbase-env template
HBase Tuning Java Increase the Java Heap of the HBase RegionServer. CM: Java Heap Size of HBase RegionServer in Bytes: 31 GiB Ambari: HBase RegionServer Maximum Memory: 31 GiB
HBase Tuning Java Increase the Java Heap of the HBase RegionServer. CM: Java Heap Size of HBase RegionServer in Bytes: 31 GiB Ambari: HBase RegionServer Maximum Memory: 31 GiB Never set the heap size to values between 32-48 GiB. https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm -memory-oddities/
HBase Tuning HBase Enable the HBase BucketCache. RegionServer Advanced Configuration Snippet (Safety Valve) for hbase-site.xml: hbase.bucketcache.ioengine: offheap hbase.bucketcache.size: 32 GiB (or 96 GiB) hfile.block.cache.size: 0.2
HBase Tuning HBase Enable the HBase BucketCache. HBase Client Environment Advanced Configuration Snippet for hbase-env.sh: HBASE_OFFHEAPSIZE=36G (or 100G) HBASE_OPTS=-XX:MaxDirectMemorySize=36G (100G)
HBase Tuning HBase Enable HBase MultiWAL Support. hbase.wal.provider: Multiple HDFS WAL hbase.wal.regiongrouping.numgroups: (numdr ives/3)
HBase Tuning HDFS Enable HDFS Hedged Reads. dfs.client.hedged.read.threadpool.size: 20 dfs.client.hedged.read.threshold.millis: 5 00 milliseconds
References https://impala.apache.org/docs/build/html/topics/impala_scalability.html https://impala.apache.org/docs/build/html/topics/impala_partitioning.html https://impala.apache.org/docs/build/html/topics/impala_proxy.html https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-ga rbage-collection-for-hbase http://gceasy.io/
Thank You Thank you Questions Get in touch with us: www.clairvoyantsoft.com
Contact Us SEATTLE, WA CHANDLER, AZ DALLAS, TX BOSTON, MA PUNE, INDIA 6185 W Detroit St. Chandler, AZ +1 (623) 282 2385 Nithya Koka @nithya_koka https://www.linkedin.com/in/nithyakoka Michael Arnold @hadoopgeek https://www.linkedin.com/in/michaelarnold