Computer Science Capstone Design Project Description Project Title: Integrated AWS Cloud Asset Status Dash Board Sponsor Information Scott Hancock Associate, Gore IT E-mail: shancock@wlgore.com Phone: +1928-864-4555 W.L. Gore (wlgore.com) Tim Fraser Cloud Architect Email: tfraser@wlgore.com E-mail: tfraser@wlgore.com Phone: +1-928-864-3674 Project Overview One of the most significant trends in enterprise IT in recent years has been the transition away from in-house data centers and software to cloud-based solutions that are not only cheaper to maintain, but offer greater flexibility, scalability, and robustness. Our company is embarking on a major transition to IaaS (infrastructure as a service) and PaaS (platform as a service) cloudbased solutions implemented on Amazon Web Services (AWS), in conjunction with an existing enterprise Datacenter apps strategy. The company is simultaneously making investments in the enterprise system IT Service Maturity (ITSM) area to improve incident/problem/change management. The concept of ITSM refers, in general, to maintaining a broad overview of a complex IT infrastructure and its performance, and keeping track of its evolution over time. In essence, the system records incidents, representing events where something caused the infrastructure to be degraded (e.g., the system slowed way down or crashed, storage was compromised, etc.); it then records the underlying problem that was revealed by subsequent investigation of the incident (e.g., a key RDS instance was overloaded and bogged down); and finally documents the change that was made to the infrastructure to solve the problem and prevent future incidents (e.g., a second RDS instance was created and load balanced between them). In this way, the ITSM system documents the current state of a complex infrastructure, as well as the series of changes that got it to that point. The underlying database that supports the operation of the ITSM is commonly known as a Configuration Management database (CMDB), and it is around this key element that this project is focused. The Problem: As the system of record that documents and records the current and all past states of the overall infrastructure, the CMDB is the heart of the ITSM. It tracks all physical or virtual computational assets in the system, how these assets are used by various applications running on the infrastructure, and what application-critical services are defined as unique app services (this can be a URL, or a Web Services API, or even a legacy Module/app). The CMDB receives alerts and alarms from the infrastructure various monitoring agents that detect problems either in real-time, after the fact (by scraping logs), or by applications posting alarms inside the app. These events are then presented to a systems management dash board for IT operations to show availability and health of infrastructure, applications etc. The problem is that traditionally these CMDB s have not been designed for the agility and elasticity of cloud based systems,
where virtual instances of applications spin up and down on the fly (through capabilities in AWS like auto scaling). This means that IT administrators at Gore currently must to use the AWS dash boards to really know what s happening; in particular, the powerful enterprise infrastructure management tools (e.g. ServiceNow ) that Gore uses for ITSM purposes simply don t have visibility into the status of the cloud-based infrastructure elements running on AWS. As a result, troubleshooting a particular incident takes much longer; this is a big problem when an alert is critical, meaning that the problem is occurring right then, in real-time, and is negatively impacting critical business functions. In practical terms, this means that the problem has highest possible priority because a critical end-user application is experiencing either an outage or severe degradation. Summary of what is needed: 1) Integrating real-time Amazon Web Services (AWS) Cloud (IaaS) Asset Lifecycle/status tracking between Deployment of apps to AWS and Running /operating/scaling events of components in AWS and the enterprise Configuration Management Database (CMDB) 2) Query and navigating and displaying the unified data in the form of a Dependency Map (See sample ACID screen shots below for more information. System Requirements: The CMDB Integration (system Integration): The overall aim of this project is to explore how the ServiceNow ITSM tool used by Gore could be adapted to work well within the new hybrid cloud-based infrastructure we are moving to at Gore. Register/De-register/Update Status of AWS Compute assets (and related Application/Service names) into the ServiceNow CMDB. The system will also need to compare the Original Registered AWS assets (the template of what should be running) against what is actually running live/now in AWS versus what the Enterprise CMDB says what should be up and running. If these don t match, and exception report should be generated. If the Enterprise CMDB shows an Incident/Problem for a Host/Application, allow engineers to view the real-time status of all related assets running and report status (up, down, degraded) The Application Component Infrastructure Dependency (ACID) Tool: [AKA the Dependency Map website]: The Enterprise CMDB system (ServiceNow tool) is supposedly cloud-ready, but this turns out to be only partially true: it comes with an underlying CMDB that could potentially deal with cloud-based assets like those hosted on AWS, but does not seem to have a useful (and Most Importantly a simple to use) front-end that can provide the critical dashboard function, drawing on the CMDB and the real-time status /configuration of the AWS resources to effectively display infrastructure status to IT managers.
The core aim of this part of the project is to create a front-end application to present a GUI dashboard to IT managers/engineers to support Incident and Change Management decisions in real-time. The overall aim of this Application is to build and maintain a unified, Support Tool to help Technical support staff at Gore to have a tool that uses real-time information from AWS and The Federated CMDB, to all a simple to use and easy to navigate view/tree of all assets (clients, apps, hosts, Network, storage (and any Specific AWS resources (EC2, S3, RDS, etc)) that are being used by any infrastructure elements and their relationship of and their dependencies & systems status across all elements of this complex infrastructure. o Provide a support tool for fast trouble shooting and impact assessment when things stop working. o The Key success factor will include a simple, intuitive and responsive Web UI that can be accessed via the Phone browser or PC. The UI design needs to be clean, simple and flexible to adjust the device orientation and views for fast access to the basic element information and related status (running versus actually intended to be deployed Using Template for deployment from the app deployment plan). This way the engineer can identify the problems at a Glance or very quickly. o Below is a sample UI (mocked up) to illustrate the concept, but you are not limited in your creativity to improve and make this design most efficient. The Navigation concept is that of a hierarchy tree: and you are navigating UP or DOWN the dependency tree.
SAMPLE MAIN SCREEN: Up The Tree Down The Tree 1) Start by clicking on one of the TABS at the top of the three windows Customers o Applications Hosts o Host Instance/ AWS Resources (EC2, RDS, EBS, RT53, etc.) Network Devices (started as a POC) 2) Then double click on any entry on the list to see the information about that component (up/down associations and Dependencies) Usage/Navigation Tips: You can select one or more Items in the leftmost Pane to then see details as follows: [UNION] - Meaning "List all the components for all the selected items. [INTERSECT] - Meaning 'tell me what do the selected items (eg: Applications) have in common. [DETAILS] Application Tab only: This there to demonstrate the concept of component details pages for info about that component - we have not collected details at this level yet as this would talk a lot more time then we were allotted for this exercise. Also you will see the possibilities of linking support and technical documents from here to aid in trouble shooting when problems arise and information is needed in short order.
Not Part of required Scope But Desired capabilities (Dependency Graph/Visualization Feature) When You click on the button, The system will create a block diagram (dynamic graphic/chart) of all the applications in the application Tab you can selected (multiselected), and then click on the [DRAW] to show their relationships visually. Also Optionally, time permitting, when you Click on [Details] Button: You can see relevant information about that component or application, and contact information and from the CMDB, and any support contact s On call Phone #. Sample Display of what Is selected above:
Non Functional Requirements: The system needs to be designed to optimized for low cost to operate in AWS, and minimize software costs by using open source and AWS built in offerings (eg: no license requirements wherever possible) High availability designs: using AWS sites, regions, availability zones and best way to design applications for HA with at least 99.95 availability. Security & Identity Access Management (IAM): Knowledge or various authentication mechanism (LDAP, SAML2.0, oauth Preferable Onelogin (https://www.onelogin.com/) for federated IAM in AWS) Implementation Constraints/Technology Considerations: Although this end goal is that system will communicate with the underlying ServiceNow CMDB when deployed within the Gore infrastructure, this will not be possible during the development/testing of this project due to licensing issues; ServiceNow is licensed only within the Gore infrastructure. This means that the team will have to research open-source CMDBs (a goodly number exist an example is http://www.device42.com/) to find one that has an API as similar as possible to the ServiceNow CMDB API. The idea is that we will prove the concept and develop the front-end GUI system using the open-source CMDB, and then plug it into the ServiceNow CMDB when the completed system is deployed at Gore. Knowledge, skills, and expertise required for this project: Working knowledge of AWS eco-system namely (EC2, S3, EBS, Dynamo/DB, RDS, API/Gateway, AWS Docker container Services, AWS Micro services with Lambda) Working knowledge of modern Website design using rich/mobile responsive apps built in HTTML5. Web programming: familiar common web technologies, php pages must be implemented in Php 7x. API and back end WebServices must me developed in Java and/or Javascript (e.g. Node.js) Front end can be built using Angular Databases: some experience with database design and SQL queries will aid in communicating with server's Amazon RDS for MySQL database AND/OR Amazon DynamoDB (for unstructured data using JSON/XML docs) Equipment Requirements: If there is a need for licensed software from W.L Gore s suppliers (e.g.: OneLogin.com, ServiceNow.com), these can be simply simulated or substituted by open source products. The key will be to focus on developing the core GUI, and not on the infrastructure that it will be embedded in. Nearing the end of the project, Gore can make arrangements to test in a Gore AWS account/vpc Sandbox environment where licensed API s are available.
Software and other Deliverables: 1. Basic deliverables include: A fully-functioning mobile friendly (mobile responsive) web applicationand related Back end services/apps/api s and data/load scripts to seed test data. 2. Complete GitHub library with all source artifacts and documents for this project. Should include issue tracking to document any open issues. 3. Complete user s manual, brief and well-written user s manual, written for non-technical users. Could be done as web-based document Preferred: Wiki Pages. 4. As-built report (required of all CS capstones), that carefully documents requirements with user stories/use cases, design decisions, and implementation details. Should allow future team to easily pick up where left off. (included defect list, status, severity assignments) 5. Professionally documented source code, scripts (chef or puppet) and related test cases with documented Unit test results for all expected use cases.