Using Automated Network Management at Fiserv June 2012
Brought to you by Join Group Vivit Network Automation Special Interest Group (SIG) Leaders: Chris Powers & Wendy Wheeler Your input is welcomed on new topics!
Today s Presenter Chris Powers Senior Tools Engineer Fiserv
Housekeeping This LIVE session is being recorded and will be available to all Vivit members Session Q&A: Please type questions in the Questions Pane
Webinar Control Panel Toggle View Window between Full screen/window mode. Questions
Agenda Introduction Discussion on how Fiserv uses and tunes Network Automation Examples of customizations to maximize NA s performance Examples Increase Task Performance Improve Reporting Improve working with HP Support staff Summary 6
Background Information: Currently we have: Approximately 4500 devices currently in NA. Deploy two new cores (replacing current cores) Deploying many satellites (at remote data centers) HP OO work Using OO to integrate NA with 3 rd party products, perform custom actions, work around existing issues /limitations within NA
Network Automation 8
NA Performance If you ve just installed NA, haven t made any changes, you have a very solid foundation to handle all of your tasks. But at some point, you may need more. How can you do more changes in a set time window How can you perform more tasks in the same amount of time
Increase NA Performance There are a number of ways to do this: Tune Network Automation, DB (Oracle, SQL Server, etc.) Tune how NA interacts with devices Tune what NA does or when it does it
Always Keep This Picture in Your Mind When Tuning Network Automation 11
Tune Network Automation, DB So, most people I talk with ask how they can do more in the same amount of time. One way is to increase the limits around tasks Max Concurrent Tasks Max Concurrent Group Tasks Max Task Length But, it s just not as easy as increasing these numbers
Basic Information about NA and Java When a Java program starts the Java Virtual Machine gets some memory from whatever Operating System is running on the machine. The JVM uses this memory for all its tasks. Java is made up of: Heap - the memory area where run-time newly created objects are stored PermGen - holds details on the things in the heap
Basic Information about NA and Java Every time something creates an object it means an object is allocated memory from Heap and when an object dies or garbage collection happens, memory goes back to Heap space in Java So, the heap stores the objects and the PermGen keeps details about the things inside of it.
Basic Information about NA and Java Java is also made up of: Stack each thread started by NA will be allocated a stack Memory address space outside of Heap or PermGen Contains data no other threads can access, including the local variables, parameters, and return values of each method the thread has invoked
Key Comments Backup the file before you change it. Tuning is an art form, if you modify one value, you may need to adjust something else to compensate. Think of this in terms of an equation, if you change one side, you ll need to change something on the other to maintain balance. There is risk to making changes to files, have a change request filed in case something happens. Notify your team of what you re looking to do. Open a support case if you have questions. 16
NA Configuration Files NA pulls Java settings from this file: /opt/na/server/ext/wrapper/conf/jboss_wrapper.conf wrapper.java.additional.3=-xmn170m wrapper.java.additional.7=-xx:maxpermsize=512m wrapper.java.initmemory=512 wrapper.java.maxmemory=512
NA Configuration Files Based on one of our servers with 24 GB of RAM, I came up with the following values: wrapper.java.additional.3=-xmn3333m wrapper.java.additional.7=-xx:maxpermsize=512m wrapper.java.initmemory=10000 wrapper.java.maxmemory=10000 So, wrapper.java.additional.3 is 1/3 rd of total Heap size MaxPermSize = 512m I m using about 40% of memory for initmemory / maxmemory
NA Configuration Files Why not use more of the RAM? Increasing the Heap size may result in other performance problems. For example, the Garbage Collection task may take longer causing the application to pause.
NA Configuration Files Database configuration file: /opt/na/server/ext/jboss/server/default/deploy/db-ds.xml C3P0 Hibernate Connection Pool HP NA uses C3P0 hibernate libraries to manage database connections. <attribute name="maxpoolsize">99</attribute>
Tune Network Automation, DB If you just increase the task numbers, you should expect to run into some errors: java.lang.outofmemoryerror: Java heap space java.lang.outofmemoryerror: GC overhead limit exceeded. If GC is spending a lot of time (98% of total time) and less than 2% of heap is recovered, Java throws this error. java.lang.stackoverflowerror java.lang.outofmemoryerror: unable to create new native thread Stack java.lang.stackoverflowerror java.net.socketexception: Too many open files run lsof, check OS / file descriptors though there could be a memory leak Work with the Sys Admin who manages the server(s)
Tune Network Automation, DB Why????? Is this so difficult Is this hard to get exact numbers for my system Every system is different OS 32 / 64 bit Even how NA is used Drivers Tasks Out of the box / Customized??? HS / MM / Single Core / Satellites / etc 3 rd party integrations
Tune how NA interacts with devices Even if it s not a task involving a change, I m always thinking about time, performance and efficiency. Don t add devices that don t work If you must add these devices, try to limit tasks that will run against these devices Remove / deactivate devices that don t work Discontinued devices Have processes in place to make these visible to the proper resources (if applicable, make sure those who can fix them, know about them) Remove / Limit device password rules Filter, Filter, Filter Limit connection options (telnet, ssh, tftp, ftp, scp, SNMP, etc) If you don t support RLogin, why have it enabled by default
Tune how NA interacts with devices Our Scheduled Snapshot task does not run off of Inventory, but off a dynamic group Managed Devices Managed Devices checks and verifies a config file exists This cuts down on unproductive device tasks Emails alert engineers if there is a device issue. When the issue is fixed, NA is again able to manage the device and NA adds the device to the Managed Devices Group.
Device Password Rules
Connection Methods Go for the averages - If you have a few devices that need it, then set it just for those devices, don t globally enable it. You can easily automate changing connection methods based on adding a device Event Notification / Response Rules Scripts or Flows Multi-Task Projects
Tune Drivers Driver enhancements can definitely assist in the overall performance and functionality of Network Automation Try to stay as current as possible with driver packs An exception may be where you use a hot fix that is not included in the latest driver pack. If you find defects or would like improvements with drivers, open a case with support.
Tune Drivers If you use a new driver / hot fix, you may find cases where things still are not quite right. Validate that the fix is working logging will do this. Try a Checkpoint Snapshot depending on the driver, this may fix issue If possible, try to make a config change, frequently this will clear the prior issue.
Tune Reporting While some may say Reporting doesn t really impact performance, it does play a role in how you can improve NA functionality and in some cases, outright performance Create reports that show device issues Place on repeating basis, send to key resources Use out of the box options as well as take advantage of options available through scripting / flows This will provide a quick list for people who should be able to resolve these issues to do so. Customize Canned Reports If you have a request to modify a canned report, definitely check with HP Support or post on forums Be aware that some types of reports (Tasks) may run slower than others (Devices) If possible, try to find the data elsewhere. For example, I have code that produces a report that looks like it contains task data, but does not it s all device data (so it runs faster)
Tune Reporting Custom Report showing Device Access Issues Runs Daily Distributed via email, report is an attachment Device Hostname IP Address Last Access Attempt Date Last Access Attempt Time Last Access Attempt Status 10.01.10.1 10.01.10.1 12-Apr-12 17:10:30 Run Diagnostics: Problem accessing device hosta 192.168.10.30 20-Apr-12 6:36:46 Take Snapshot: Problem accessing device 172.16.74.3 172.16.74.3 -- Last Successful Snapshot Date Last Successful Snapshot Time Comments Management Status -- 0 14-Apr-12 06:33:29 0 -- Device deactivated by na_admin on 2011-03-17 09:28:59 1
Tune Reporting Other areas for Custom Reports: Policies / Compliance (out of the box) Device Add Issues (out of the box) Monthly Command Scripts (out of the box) Other reports we use: Monthly report showing NA kicking off OO flows Devices added but without a driver Devices added but without model info (i.e. Snapshots fail)
Improve working with HP Support staff Document. Actually over-document. This cuts down on the ping-pong nature. Just because it s clear / obvious to you, doesn t mean someone else may see it. When I open a case, I ll include a troubleshooting.zip file and I ll make a best effort guess (or based on past similar cases) what logging I need to turn up. I ve recently adjusted what logging I ll use as I ve noticed a change in what is requested by support. I may not always agree, but I know it ll be requested. So, if I do this initially, I save everyone some time. I ll include screen shots, device details, and verbose notes. I ve even been known to attach videos to show the issue happening if it s rather obscure.
Improve working with HP Support staff If I am not sure what logging is best, I ll make that comment initially, so I should get the info back sooner rather than later. I have multiple custom task templates that help me do this, for example, I had a time where I experienced a large number of driver issues, so I created a template that was specific for this type of issue. When I have a similar problem, I can use this template and then turn around and use the logs generated to submit to support that I know will contain useful information. Just keep in mind that some actions may not have a taskid associated, so if you need to turn up logging, you need to do it globally.
Template Example
Improve working with HP Support staff Ask for feedback from support. While something may work great for you, it may make things much more difficult for others. While I was validating a hotfix driver, I started to use an app that would do screen prints of webpages. This was easy for me to do when I needed to copy a long webpage. However, the result was an image not searchable. As a result, the driver guys couldn t search through the results and that slowed down their work dramatically. It wasn t worth the minor savings in time for me. Had I not asked, I may not have known.
Improve working with HP Support staff Suggestions I ve received from HP Support: Prioritize a support case according to the issue: for example is the customer has a question, that must be a sev 4 or 3 case. It s important to us to know if our customers have changed their environment like: product upgrades, new hotfixes, new topology (HS, MM), new servers, etc. If the case is related to NA performance, it would be great to have NA logs and server specifications, also include OS details. If possible, don t delete files inside the troubleshoot.zip bundle. If they are looking for a Request For Enhancement(RFE), it would be good to explain the scenario and their expectations, if possible, provide an example. On the other hand, a general troubleshoot.zip file (without specific logs) has to be attached to the case.
Summary Today we ve covered how one can tune Network Automation to increase its performance and get more value out of it.
Thank you Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.