AGENDA ITEM: 3.4 INFORMATION MANAGEMENT, TECHNOLOGY & GOVERNANCE COMMITTEE DATE OF MEETING: 3 MAY 2018 Subject: Approved and Presented by: Prepared by: Other Committees and meetings considered at: Considered by Executive Committee on: IT DISASTER RECOVERY AND BUSINESS CONTINUITY PLAN Andrew Durant/Ellen Sullivan Michael Jones None Not considered at time of reporting PURPOSE: The purpose of this paper is to update the Information Management, Technology & Governance Committee on the ICT Disaster Recovery and Approval/ Ratification/Decision Discussion Information THE PAPER IS ALIGNED TO THE DELIVERY OF THE FOLLOWING STRATEGIC OBJECTIVE(S) AND HEALTH AND CARE STANDARD(S): Well Being Objective 8: Transforming in Partnership Health and Care Standards: Organisational Priority 27 The information outlined in this paper supports Governance, Leadership and accountability. Page 1 of 5
EXECUTIVE SUMMARY: As requested at the last IMTG this report provides an update on the following area: 1. ICT Disaster Recovery and Actions undertaken: A review of ICT system documentation and processes has been completed and brought together to develop a comprehensive set of tools, manuals and policies to support ICT disaster recovery processes. A draft integrated has been developed to provide a standard approach to business continuity which clearly identifies shared resources that are available to respond to incidents. The work carried out commenced with an audit and review of systems and information sources used by Powys ICT to support the infrastructure. Identified actions included the creation of plans, manuals, test plans, supporting procedures. Disaster Recover Policy / Plan A top level Disaster Recovery Policy has been drafted (PTHB-ICT007 ICT Disaster Recovery Policy) and handed over to Powys ICT infrastructure team for implementation. A top level Disaster Recovery Plan (PLA-POW-ICT001_Disaster Recovery) has been completed and handed over to Powys ICT infrastructure team for implementation. This provides more detailed instructions around processes to be undertaken for disaster recovery response. This plan was originally created in respect of health requirements, during 2018 it will be replaced by a single integrated plan for health and council ICT. Disaster Recovery Manuals A set of manuals, recording the information required for system recovery, have been created for systems managed by Powys ICT for health users: Page 2 of 5
There is a core of 21 manuals providing detailed instructions and configuration information for system management and recovery. This includes information to allow the infrastructure team to plan for failover options. It includes a backup and DR test plan as agreed by the infrastructure team for regular testing and assurance around system recovery. Manuals are reviewed on an annual basis or more regularly if the infrastructure team determine a requirement. The manuals are stored in electronic format in a number of locations across Powys to maximise access options. Paper copies are also kept in the main computer room in Bronllys. Disaster Recovery Toolkit In order to support timely operational response a variety of resources are required. This will include information around network designs, addressing, physical locations of equipment, warranty information. For health locations and systems this information has been consolidated into an electronic library of information that has been labelled as the disaster recovery toolkit. The toolkit includes shutdown and start-up scripts to support the infrastructure team in undertaking responses. The toolkit also includes documented maintenance checks to support the infrastructure team in undertaking proactive monitoring of systems to minimise service downtime and to reduce operation risk. This is supported by a software library to enable restoration of software, applications and operating systems. Disaster Recovery Infrastructure During 2017/18 the ICT service has undertaken a number of actions designed to improve the capability, these include: 1. Storage Area Network A storage area network is a highly resilient solution for storage. Powys has implemented a solution that improves performance and resilience over previous systems in place. 2. Cluster A cluster is a group of physical servers working together to provide a virtual server infrastructure. This provides a load balanced system with 4 nodes capable of hosting resources loss of a physical node will result in services Page 3 of 5
being switched to other nodes automatically and with minimal impact on users. For Powys each node should be capable of providing services in the event of a situation (i.e. we can lose three nodes will limited impact on services). 3. Switch Refresh The core of our networks is based around central network switches. Much of the estate throughout Powys was quite old and no longer meeting performance requirements. During the end of financial year 2017 a programme of replacing older units was undertaken to improve resilience and performance across local networks. 4. Exchange Work was completed during the first quarter of 2017 to migrate Powys across to Exchange 2010 based on a three server solution. Two servers, located in Bronllys, provide automatic failover to each other. The third server is located in Brecon and provides off site failover for the service. 5. DR Failover Site A medium term solution was implemented in Brecon to provide a manual failover site for the majority of services provided from Bronllys. This solution provides the infrastructure team with the capability to re-provide services within a reasonable time frame even if they don t have physical access to the main computer room in Bronllys. 6. Wireless programme To support alternative access options for users and to provide further resilience options a programme has been undertaken to blanket cover health board sites with wireless network access. The infrastructure team developed, during 2017, a health centric business continuity plan. This will be replaced by comprehensive integrated plan to cover all of Powys ICT operations across council and health. This will help support a standard approach to business continuity and clearly identify shared resources that are available to respond to incidents. Completion of the integrated ICT is targeted for mid-summer of 2018. As Powys ICT is a service provider to the wider council and health board parts of the plan will be dependent on other areas clearly identifying their ICT business continuity requirements within their plans. This will enable Powys ICT to clearly understand resource requirements and update its plan accordingly. Page 4 of 5
RECOMMENDATION(S): It is recommended that the Information Management Technology & Management Committee DISCUSS and NOTES the IT Disaster Recovery and. NEXT STEPS: There are still single points of failure that require work to mitigate risks. Offsite backup and disaster recovery sites will need significant improvements to provide robust capabilities from investigative work undertaken cloud solutions have been identified as the most appropriate solution. Once NWIS have completed the initial configuration and pilot work around Azure / Office 365 this will be investigated as appropriate solution for health (mirroring work already undertaken for council). To reduce risks and to improve business continuity it has been determined that services presently provided from Bronllys computer room should be provided either from a purpose built 3 rd party data centre or via a cloud solution. Work is underway to look to implement consolidation of services to a centrally provided infrastructure this will reduce costs and hardware to support so reducing risk. Further work to improve failover capabilities is underway e.g. DHCP failover to automate processes so as to minimise impact on users. Page 5 of 5