Server Virtualization and Optimization at HSBC John Gibson Chief Technical Specialist HSBC Bank plc
Background Over 5,500 Windows servers in the last 6 years. Historically, Windows technology dictated 1:1 service to server relationship for security, resiliency, compatibility and management reasons. Resulting server sprawl costly in terms of consumption, server underutilisation, capital expenditure and staff. Server Optimisation strategies to consolidate as many things onto as few servers as possible. 2 years ago our research focused on Virtualisation technologies as one option for consolidation. Evaluations and Proof of Concept work highlighted VMware ESX Server as the only viable option to us embarked on virtualisation migration project from 1,300 to 150 physical servers.
Results to date 1500 1400 1300 1200 1100 1000 900 800 700 600 500 400 300 200 100 0-100 -200-300 -400-500 -600 Virtual Infrastructure and Migration History H/W Demise Migrations Total Total VMs H/W Reused ESX Hosts Migration Target
Results to date Windows Operating Instances 2005/2006 - All Sites 6000 5500 5000 4500 4000 Instances 3500 3000 2500 2000 1500 1000 500 0 Oct-05 Nov-05 Dec-05 Jan-06 Feb-06 Mar-06 Apr-06 May-06 Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 2005/2006 Operating instances (VM new services) Operating instances (VM migrations) Operating Instances (All DCs)
Where we are today 1,270 VMs running Production, Dev and UAT services. 990 New VMs provisioned 280 Physical servers P2Vd Physical Servers 587 demised and thrown away 330 reused 900 avoided purchase
Benefits Reduced and contained the number of physical servers. Curtailing costs (capital cost avoidance) GBP1,650,000 reuse of physical servers and ever-greening. GBP3,220,000 purchase of new servers. GBP160,000 Hardware maintenance costs from demised servers Reduce time to market. Reduce the rates of Window server recharge to the business. Improved physical server utilisation, operability, automation and manageability. Team in India successfully engaged with OS Deployment and migrations enabling release of UK based contract staff.
Benefits Environmental 141 Kw power saving from server demise and avoiding install new servers. 80 Kw in ESX infrastructure Farms provide capacity for 2,500 additional VMs. Alleviate critical issues, still able to deploy new services. 51 Racks worth of space.
Benefits Environmental Avoidance - DC Sites Power (KW) 120.00 100.00 80.00 60.00 40.00 20.00 0.00-20.00-40.00-60.00-80.00-100.00-120.00-140.00-160.00-180.00-200.00-220.00-240.00-260.00-280.00-300.00 Jan 06 Feb 06 Mar 06 Apr 06 May 06 Jun 06 Jul 06 Aug 06 Sep 06 Oct 06 Nov 06 Dec 06 2006 Total pow er generated from new hardw are (KW) Total pow er saved from new services on VM (KW) Combined environmental avoidance (KW) Total pow er saved from demised servers (KW) Combined physical environmental neutral position (KW)
Timeline - VMware from PoC into Production Mar 04 - Proof Of Concept Oct 04-1 st Development Servers deployed Jan 05 - Engaged VMware Professional Services Feb 05 - Operational Framework in place < important bit Mar 05 - Contingency Servers deployed Apr 05 - Production Servers deployed Sep 05 - Sieve of whole estate, 1,300 servers identified for migration Oct 05 - Started P2V Migrations of existing estate Feb 06 - Started to deploy new 4-way infrastructure Aug 06-4-way infrastructure online Aug 06 - Commenced VI3 Testing Nov 06 - VC 2 Upgrade complete Nov 06 - ESX 3 into Production
Design Methodology VI Designed using industry best practices, recommendations, whilst maintaining a cost / performance balance. Vetted and validated by VMware Professional Services and peer review. Infrastructure is a mix of two tiers for optimal balance of cost, performance and time to deliver migrations and new services. Going forward all new services will be deployed on a virtualised platform unless technical constraints deem otherwise, to take advantage of our ability to deliver faster, cheaper and more efficiently.
Host Design Tier 1 Clusters Tier 1 Clustered/Farm of new high performance 4-way hosts 10 hosts per farm 250 VM Guests per farm Multi CPU VMs possible Clean environment for new provisioning and migrations Shared dedicated local storage (SAN) Cheap tier 2 - non SRDF Disk 6Tb per farm as multiple 400Gb LUNs Option to connect to Shared fabric for Tier 1 SAN (For replicated SRDF Data requirements) via additional HBA ports in the Host. Can VMotion VMs based on SAN disk, around the environment while up and running for hardware maintenance; also with VI3 using HA and DRS. Gigabit networking for hosts and guests, Trunked Data Ports presenting multiple data VLANs Average 20:1 VM to Host ratio
Host Design Tier 2 Farms Tier 2 Blade Servers - 2 CPU Average 5 to 10 VMs per blade Ideal for single CPU VMs 100 Full Duplex or gigabit networks Single data VLAN Local storage only: Can t move guests around while up and running 300Gb local storage Focus on reuse of existing hardware Benefits: Hardware reuse and optimisation Eases migration activity by reducing IP address changes Average 7:1 VM to Host ratio
Current Estate Overview ESX Hosts 120 Tier 1 Hosts (SAN Connected) 190 Tier 2 Hosts (Local Storage) 70 separate VLANs Currently 1,200 VMs Driving up CPU utilisation over 50% on ESX hosts (above 80% on full hosts) Capacity for 2,500 VMs taking us into next year VM to Host density Tier 1 20:1 Tier 2 7:1
Operational and Support Framework Engaged VMware Professional Services to assist build the VI process flow. Treat VMs the same way as physical servers, to keep things consistent. Each VM is logically separate from others and has its own IP address and name. Each has BMC Patrol, AV and backup agent installed locally. Each production VM has NetBackup installed for local backups Vizioncore esxranger Take weekly snapshots of all the VMs across the network to NFS File servers. Focused on using same build standards and methods of accessing servers as physical. The goal is to design once and then re-use in a consistent and globally supportable framework.
Operational Framework Communication and tailored training were key to gain buy-in. New technology, concepts and change of mindset (i.e VMotion). Support and Operational Acceptance processes now treated as BAU. Biggest Challenges: The pressure is on to adopt the technology, placing demands on VI team to scale i.e DMZ, Raw LUNs, Virtual Desktops, larger guests. Avoid a virtual sprawl (balance new infrastructure and reuse). Never had any major problems with the VI in 24 months+
VM Guest Provisioning and Requests Use Hosting Criteria to determine if and where to host guests. Request processes and turn around less than 3 days for a VM Guest, usually 3 hours. Requestors required to complete a form to ensure all required build and billing information is captured. We validate and then pass to our team in India to implement the VM. Deploy the minimum required; not all VMs are created equally. Small memory size, options on 384Mb, 512Mb, 768Mb and 1Gb+ Deploy Single CPU VMs by default, unless can prove more are required. 90% of guests have 1 CPU and less than 512Mb RAM. Only create data volumes of an appropriate size. This way we ensure that we get a good ratio of VMs per host.
Migration Tools Server Sieve Candidates for Virtualisation. Took a view on 4,000 servers and came up with 1,300 to virtualise or demise. Stats are low as we have a lot of Citrix and Grid servers. Eliminated hardware component issues (encryption cards). MS Perfmon did the job for us. P2V Processes and Tools VMware P2V Assistant. PlateSpin PowerConvert.
Q&A Any Questions? Follow up: PM me on VMware Community Forums http://www.vmware.com/community John Gibson
Presentation Download Please remember to complete your session evaluation form and return it to the room monitors as you exit the session The presentation for this session can be downloaded at http://www.vmware.com/vmtn/vmworld/sessions/ Enter the following to download (case-sensitive): Username: cbv_rep Password: cbvfor9v9r