Services: Monitoring and Logging 9/16/2018 IST346: Info Tech Management & Administration 1
Recall: Server vs. Service A server is a computer. A service is an offering provided by server(s). HTTP 9/16/2018 IST346: Info Tech Management & Administration 2
Services Unify a set of workstations into a distributed computing environment, since they share common resources. Typical environments have several services, and services often depend on other services. Some services are simple, and have no interaction s on the user s part. (network time, or NTP for example) It is best to think about any given service in terms of its components and interdependencies. 9/16/2018 IST346: Info Tech Management & Administration 3
Components: An Anatomy of a service Rules for accessing or extending the service beyond the components Protocols Instructions for transferring data to/from client/server (transport mechanism) APIs Components Client and Server software part of the application or service interaction 9/16/2018 IST346: Info Tech Management & Administration 4
Example of a service anatomy: Wordpress Protocols HTTP, WebDAV, SMTP APIs XML-RPC Atom RSS Components Web Browsers Apache Web Server PHP Scripting MySQL Database 9/16/2018 IST346: Info Tech Management & Administration 5
Service Dependencies Services at the application level depend on lowerlevel services for their operation. Application Network Core/Infrastructure Complex interface, several protocols. Basic interface with underlying protocol. Essential to all other services. Minimal interface. 9/16/2018 IST346: Info Tech Management & Administration 6
Service Dependencies: Example Application HTTP, SMTP Network DNS, DHCP, LDAP Core Infrastructure 9/16/2018 IST346: Info Tech Management & Administration 7
Services every IT professional should know Core NTP Network time protocol. Keeps the clocks in sync on several hosts DNS Domain name system a method of IP address to host name resolution. DHCP Dynamic Host configuration Protocol a method of assigning IP information over the network. LDAP Lightweight Directory Access Protocol a hierarchal database of directory information (users, groups, organizations, etc) Kerberos A network authentication protocol, used for securely evaluating identities over a network 9/16/2018 IST346: Info Tech Management & Administration 8
Services every IT professional should know Network HTTP Hypertext transport protocol. The application protocol for the WWW SSL Secure Sockets Layer an encrypted channel for HTTP traffic SSH / SCP Secure Shell, Secure Copy. Unix/Linux remote shell and remote file copy protocols. NFS Network File System File sharing for unix-like computers. RDP Remote Desktop protocol. A proprietary protocol for accessing Windows hosts over a network. SMTP Simple Mail Transport Service. Mail routing protocol. OAuth2 for account authentication and authorization. 9/16/2018 IST346: Info Tech Management & Administration 9
Providing a Service Any service you provide must be: Defined Reliable Scalable Monitored Maintained Supported 1 2 3 4 5 6 9/16/2018 IST346: Info Tech Management & Administration 10
Defining your Service Customers are the reason for your service How will they use it? What features do they need? Want? How critical is this service? What are the required levels of availability and support? Formulate a SLA (Service Level Agreement) This will define the service being offered Clarify the expectations for support levels and response time Defined 1 9/16/2018 IST346: Info Tech Management & Administration 11
Service Reliability Keep it simple Simple systems are more reliable and easier to maintain Make the trade-off between features and reliability Use reliable hardware, of course! Reliable 2 Take advantage of vendor relationships Have them provide recommendations (they should be the experts!) Let multiple vendors compete for your business Choose a vendor based on not only features but the stability of their company and product 9/16/2018 IST346: Info Tech Management & Administration 12
More Reliability Reliable Use Open Architecture: Open protocol standards and file formats RFC s from the IETF http://www.rfc-editor.org Pros Bigger selection of products and vendors to choose from Decoupled client and server selection Avoids being locked in to a specific platform or vendor Con Sometimes open standards don t go far enough Google-Worthy Service-Oriented Architecture is changing the game a bit, as most services are gravitating towards interoperability (working with each other) 2 9/16/2018 IST346: Info Tech Management & Administration 13
Last Slide on Reliability Any service should have 3 environments Usually, each environment is on separate hardware Reliable 2 Primary Environment for the Service Prod Dev Where you build out new features for the service / upgrades Test Mirror image of prod environment for testing purposes 9/16/2018 IST346: Info Tech Management & Administration 14
Scalability Scalable Scalability A service s ability to grow with its demand. Helps maintain performance levels. You should try to plan for scalability when designing your service. Two types of scalability: Vertical (scale up) Increasing the size of the node. Eg. add more RAM or an extra CPU to a server, buy a bigger washing machine, Horizontal (scale out) Adding more nodes to the service. Eg. purchase three more servers and balance their load, buy another washing machine, but keep your old one. 3 9/16/2018 IST346: Info Tech Management & Administration 15
Scalability: H vs. V Scalable 3 Vertical Scalability: Before After Load Balancer Horizontal Scalability: Before After After After 9/16/2018 IST346: Info Tech Management & Administration 16
Scaling your Laundry You ve got more laundry than your current washing machine can handle!! Scalable 3 Vertical Horizontal Use a bigger, faster washing machine Use more than one washing machine 9/16/2018 IST346: Info Tech Management & Administration 17
Service Monitoring Layered Monitoring: PPS! 1. Ping: Monitor the host 2. Port: Monitor the port for the service 3. Service: Connect to the port; verify the response Monitored 4 A Monitoring agent should send an alert to the IT team when things aren t right. What, When, Where Without adequate monitoring you cannot offer good service! 9/16/2018 IST346: Info Tech Management & Administration 18
Monitoring and Logging Monitored 4 Service Monitoring Observing service activity in real-time This is done by a computer, not a human. Important events are passed on to a human (notification). Service Logging Keeping historical records of service activity This data grows over time and can become quite large. Only referred to when needed to troubleshoot a problem or trace down a security incident. 9/16/2018 IST346: Info Tech Management & Administration 19
Why Bother? Monitored 4 Why do we Monitor? To detect / identify problems quickly. Ideally you want to know about it before your users do. To determine if resources are being constrained or over utilized. Why do we Log? Help get to the root cause of an issue or incident. Help us predict problem and avoid them. Provide historical data or trends for service usage. Report on service activity. If you re not measuring it you aren t managing it 9/16/2018 IST346: Info Tech Management & Administration 20
How Monitoring and Logging Work Server Log Network Activity Service External Service Monitor Internal Service Monitor Event Event 9/16/2018 IST346: Info Tech Management & Administration 21 SA
Example: Simple Web Service Monitoring Linux Host: web.syr.edu Network Activity Apache HTTPD access_log nmap web.syr.edu ps aux grep httpd Event: Port unavailable Event: Service stopped 9/16/2018 IST346: Info Tech Management & Administration 22
Example: Email Service Monitoring Exchange email Network Activity Web mail service http access_logs SCOM testmailflow.ps1script Windows IIS web service monitor Event: Port unavailable Event: Service stopped 9/16/2018 IST346: Info Tech Management & Administration 23
What to Monitor, what to Log? Monitored 4 Monitor for a condition. Send alert when the condition is met. Log the condition whether it sends an alert or not. Examples: (Why would you monitor/log these?) CPU utilization stays at 100% for X minutes. Free disk space drops below 10%. Port does not respond for 1500 ms HTTP request take more than 5 sec to get response. 9/16/2018 IST346: Info Tech Management & Administration 24
Better Monitoring Monitored Normal Normal: When a service fails you send an alert. Proactive Monitoring Proactive: When a service show signs it is about to fail you send an alert. (100% cpu, Long responses, etc.) Automated Responses Normal: When a service fails you send an alert. Automated: When the service fails, you attempt to restart it. If the restart fails, you send an alert. PM and AR are difficult and time-consuming to implement, but are time savers for difficult problems with no permanent fix. A layered approach is always better. 4 9/16/2018 IST346: Info Tech Management & Administration 25
Alerts! Monitored Types: Email TXT message SMS Page Automated dialer calls phone. Pick the appropriate Alert for the appropriate Event and time. Don t send email when you re not going to check it! In a layered approach, you might send an email, and if the problem persists send a TXT, etc 4 9/16/2018 IST346: Info Tech Management & Administration 26
Logging Monitored Log files get very large since they record all activity. Log file rotation service points to a different log file after a specified interval. Lets you backup log files Keeps the size of the files manageable. Log files are text and they compress nicely. How long do you keep logs? Depends on service, depends on your policy It s not a decision the SA should make. Ship logs to a Big Data system like Hadoop or ElastiSearch Like an insurance policy. Not very useful until the off chance that you need it... then you re glad you have it! 4 9/16/2018 IST346: Info Tech Management & Administration 27
Service Maintenance Maintained 5 Yes, there will come a time when you will need to deny service. (Make it unavailable.) Upgrades to hardware / OS / Service itself Plan and advertise your service outages so your users can plan accordingly. Make sure your outage complies with your TOS. 9/16/2018 IST346: Info Tech Management & Administration 28
Supporting your service After your service is up and running, but before rolling it out you should: Supported 6 Document how the service should be used and maintained by your IT staff Train your IT staff how to support the new service Train the users, if required Build out self-help support for the service to reduce calls to the helpdesk. Don t forget to advertise the new service to your users. Roll it out using One Some - Many so you can get a handle of any unforeseen issues. 9/16/2018 IST346: Info Tech Management & Administration 29