IBM Security Guardium: : Sniffer restart & High CPU correlation alerts IBM SECURITY SUPPORT OPEN MIC, presented by Lisette Contreras, Guardium Support To hear the WebEx audio, select an option in the Audio Connection dialog or by access the Communicate > Audio Connection menu option. To ask a question by voice, you must either Call In or have a microphone on your device. You will not hear sound until the host opens the audio line. For more information, visit: http://ibm.biz/webexoverview_supportopenmic September 14, 2017 1 IBM Security NOTICE: By participating in this call, you give your Irrevocable consent to IBM to record any statements that you may make during the call, as well as to IBM s use of such Recording in any and all media, including for video postings on YouTube. If you object, please do not connect to this call.
Troubleshooting Sniffer restarts Problem: sniffer is restarting Goal: Find out why and resolve How do you know the sniffer restarted? Most of the time report of symptom is what leads to the discovery: - appliance slow to login or run reports or general slowness - STAPs disconnecting/reconnecting - STAPs flicking between green/red Other ways to find out: - Column PID/TID (sniffer process id) in Buf Usage Monitor report changed - message in messages file indicating sniffer started - Received alert if Correlation Alert for sniffer restarts has been configured and enabled (recommended) (More on this later) 2 IBM Security
Troubleshooting Sniffer restarts (Cont) Example of how to know the sniffer restarted: - Buf Usage Monitor report (just relevant columns below) 3 IBM Security
Troubleshooting Sniffer restarts (Cont d) Example of how to know the sniffer restarted (cont d) - message Guardium Sniffer Started in messages file: Example 1: Example 2: 4 IBM Security
Troubleshooting Sniffer restarts (Cont d) Example of how to know the sniffer restarted (cont d) - Alert received Prerequisite: Correlation alert for sniffer restarts pre-configured and enabled (More on this later) Sample alert: 5 IBM Security
How to build alert to troubleshoot Sniffer restarts (Recommended) Correlation alert: Alerts of special conditions based on historical analysis of data. You can receive an alert via email when certain conditions happen, for example, if there has been more than X sniffer restarts in the last Y hours. You can build your own alert, or import an existing one. 6 IBM Security
How to build alert to troubleshoot Sniffer restarts Correlation alert (Cont d) To Import existing one: Technote with some pre-made alerts: http://www-01.ibm.com/support/docview.wss?uid=swg21698726 One that notifies if sniffer restarted more than 3 times in last 24 hours: 1- Import alert definition (GUI: Definitions Import). 2- Open alert definition (GUI: Alert Builder) to View Definition. 3- Modify alert definition to fit your needs (Optional). 4- Mark alert as Active (checkbox) in the alert definition panel. 7 IBM Security
How to build alert to troubleshoot Sniffer restarts Correlation alert (Cont) To build your own alert: 1- Open Alert Builder(GUI: Protect > Database Intrusion Detection > Alert Builder) 2- Fill out the fields to create your alert. Mandatory fields: Name, Run Frequency, Query (query to be evaluated by the alert), Accumulation interval (how far back historical data it will check), Threshold Optional but recommended fields: Recommended Action (this text will be in email body of email receiving the alert) Alert Receivers (ie. SYSLOG, email addresses) 3- Mark alert as Active (checkbox) in the alert definition panel. 4- Save the alert (click Apply ) 5- Make sure Alerter is running(v9 Administration Console->Configuration->Alerter. v10 Setup -> Tools and Views -> Alerter) 8 IBM Security
Sample Alert Definition 9 IBM Security
Sample Alert Definition (Cont) 10 IBM Security
How to read the alert (Investigate) Interpret the email: 11 IBM Security
Finding when was the sniffer restart in the alert Take note of the Query Period in the alert: If you used a detailed query, you'll get more details in Alert Details. For example, query used in our example gives details on exact Timestamp when sniffer restarted: What to do Next: Review the messages file for a sniffer restart message (Guardium Sniffer started) or review the Buf Usage Monitor report around same timeframe. Tip: Use a good query that gives additional details such as exact Timestamp above, so the timeframe to be searched can be greatly reduced. 12 IBM Security
Finding why the sniffer restarted So now we know there was a sniffer restart on 08/23 20:37:08. Why it restarted? - Review the messages file for a sniffer restart message (Guardium Sniffer started) around same timeframe - Review the Buf Usage Monitor report Messages file: It was a crash. Some abnormal condition lead the sniffer to crash. What to do next? To investigate the crash: - if crash is happening frequently, enable the sniffer in debug mode and monitor for the coredump file generation on next crash. Send coredump to Technical Support for analysis to find cause of crash. - Buf Usage Monitor report Review around time of crash to pinpoint possible causes of sniffer strain that could have led to the crash 13 IBM Security
Finding why the sniffer restarted - Buf Usage Monitor report (Some columns were cut for better visualization) At 08/23 20:37:08, the sniffer was struggling (both Analyzer Queue and Logger Queues were very high). Sniffer memory had been consistently growing. Conclusion: Cause of sniffer struggled seems to be caused by heavy traffic (high Analyzer Queues) 14 IBM Security
Finding why the sniffer restarted Conclusion: Cause of sniffer struggled that lead to the crash in this case seems to be caused by heavy traffic (high Analyzer Queues). Next: Use reports and investigation to understand why there was heavy traffic. DB Throughput Built-in Report: 15 IBM Security
Finding why the sniffer restarted Sample query to help find dbs with most traffic: Number of logged sqls per DB server: 16 IBM Security
Troubleshooting High CPU When it is a problem?: High CPU by itself is not a problem always. Potentially not a problem when: - was for short period of time - happened at same time there was a legitimate heavy outburst of traffic - did not cause lost of data packets - did not cause significant performance impact - did not destabilized the sniffer (crashes, restarts) A concern if: - system very slow with significant impact - processes running or completing much later than expected, or not completing at all - STAPs keep disconnecting/reconnecting - data packets lost How you find out?: - Symptoms are reported (audit processes have not been received as expected) - Received alert (if it has been configured) 17 IBM Security
Troubleshooting High CPU Sample content of email alert that lists occurrences when System Cpu Load was high (based on a query configured to give that information): 18 IBM Security
Troubleshooting High CPU Common causes: -Huge processes executed/running over appliance -Batch jobs are stuck -Heavy traffic -Insufficient resources (RAM,cpus) What to do: 1) Check system resources meet minimum requirements (RAM, cpus,...) 19 IBM Security
Troubleshooting High CPU 2) Check Buf Usage Monitor report: Are Analyzer and Logger keeping up with the incoming traffic? If Analyzer Rate, Analyzer Queues, Logger Rate, Logger Queues are all high it means heavy incoming traffic is causing the sniffer to struggle to handle incoming data. This is probably why high CPU. Find what is causing the heavy traffic. If a legitimate once in a while situation (ie. a very heavy batch program that runs only once a month), plan so next time the appliance where this process runs has more resources to handle the heavy process. If the heavy traffic is consistent, reduce traffic coming to this collector: - Review with your polices and ignore some trusted traffic if possible.(ignore STAP sessions) - Move some STAP traffic to a less busy collector. Find the highest traffic sending Stap. (cli command"iptraf" or Built-in DB Server Throughput Report (GUI: Manage > Reports > Unit Utilization > DB Server Throughput)). 20 IBM Security
Troubleshooting High CPU 3) support show db-processlist running Is there one or more queries (on top of the list) running for extremely long time? Look for the first query listed (longest running). Some queries can be stopped from the GUI (Manage > Maintenance > General > Running Query Monitor). Check on GUI if this query is listed, if so, kill it so resources are freed and cpu utilization may come down. Are one or more quries on Waiting for Metadata Lock status? Most of these scenarios will require Support intervention to find cause (usually at mysql level) and resolve (usually requires killing the query as root). 21 IBM Security
Troubleshooting High CPU 4) support show db-top-tables all Look for extremely huge tables (listed on top). If one of the longest running queries (support show db-processlist running) is an Audit Process, and the query is using a huge table, reason of delay is probably linked to size of the table. Try to reduce the size of the table. 5) is sniffer crashing/restarting constantly? This can also lead to high CPU consumption. Check messages file/ Buf Usage Monitor to troubleshoot sniffer restarts and try to resolve it. 22 IBM Security
Troubleshooting High CPU 6) support show top cpu Find which processes are consuming most of the CPU, and take appropriate action to resolve. Most common processes are: sniffer => Cause is usually heavy traffic as explained above. Take action to reduce traffic. java => Cause is usually one or more long running (or hung) audit processes. Run: support show db-processlist running Longest running process(es) are on top of list. If they've been running for long or are stuck, try killing them. mysql => Cause is usually either heavy traffic, or issues at mysql level (for example, crashed tables). - For heavy traffic, take action to reduce traffic. - Review mysql-error.log (in support must_gather system_db_info) for mysql errors. If it mentions crashed tables, run command to find crashed tables (v10): support find crashed_tables ALL Then check the log (crashed_tables.log) using fileserver command. If crashed tables found, report output to support. 23 IBM Security
Questions for the panel Now is your opportunity to ask questions of our panelists. To ask a question now: Raise your hand by clicking Raise Hand. The Raise Hand icon appears next to your name in the Attendees panel on the right in the WebEx Event. The host will announce your name and unmute your line. or Type a question in the box below the Ask drop-down menu in the Q&A panel. Select All Panelists from the Ask drop-down-menu. Click Send. Your message is sent and appears in the Q&A panel. To ask a question after this presentation: You are encouraged to participate in the dw Answers forum: http://ibm.biz/guardiumforum 24 IBM Security
IBM Security Learning Academy www.securitylearningacademy.com New content published daily! Learning at no cost! Learning Videos Hands-on Labs Live Events 25 IBM Security
Where do you get more information? Questions on this or other topics can be directed to the product forum: http://ibm.biz/guardiumforum Technote Identifying and resolving common sniffer problems with the Buffer Usage report www.ibm.com/support/docview.wss?uid=swg21994083 Security Learning Academy: www.securitylearningacademy.com Get started with IBM Security Support: ibm.biz/security-support-start-here IBM Support Portal: ibm.com/support Sign up for My Notifications: ibm.com/software/support/einfo.html Follow us: 26 IBM Security
THANK YOU FOLLOW US ON: ibm.com/security securityintelligence.com xforce.ibmcloud.com @ibmsecurity youtube/user/ibmsecuritysolutions www.securitylearningacademy.com Copyright IBM Corporation 2017. All rights reserved. The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. Any statement of direction represents IBM's current intent, is subject to change or withdrawal, and represent only goals and objectives. IBM, the IBM logo, and other IBM products and services are trademarks of the International Business Machines Corporation, in the United States, other countries or both. Other company, product, or service names may be trademarks or service marks of others. Statement of Good Security Practices: IT system security involves protecting systems and information through prevention, detection and response to improper access from within and outside your enterprise. Improper access can result in information being altered, destroyed, misappropriated or misused or can result in damage to or misuse of your systems, including for use in attacks on others. No IT system or product should be considered completely secure and no single product, service or security measure can be completely effective in preventing improper use or access. IBM systems, products and services are designed to be part of a lawful, comprehensive security approach, which will necessarily involve additional operational procedures, and may require other systems, products or services to be most effective. IBM does not warrant that any systems, products or services are immune from, or will make your enterprise immune from, the malicious or illegal conduct of any party.