Managing The Infrastructure Of Data Centers David Cuthbertson Square Mile Systems Ltd david.cuthbertson@squaremilesystems.com www.squaremilesystems.com
Square Mile Background Develop toolsets, training and techniques for operational management of complex IT infrastructure Focus areas Data center management Connectivity management System change impact analysis Documentation techniques Infrastructure visualisation All technologies! Business Processes Departmental, Company Services End user, infrastructure, supplier Applications PC, server, mainframe, SOA Virtual Infrastructure Network, Servers, Storage, DBMS Hardware Infrastructure Network, Servers, UPS, Storage, Other Fixed Infrastructure (Cabling, Power, Cabinets, Buildings) Data Center Infrastructure
Common Management Issues Data Centre is often not visible, nor are the staff If it s not broke attitude isn t good, infrastructure risks need to be managed IT groups are often task and project orientated, less focus on operational issues Getting funds allocated for improving management techniques is difficult Skills sets need to evolve with technology and organisation control requirements
Defining Management Planning what needs to be done to achieve a particular result Organising and directing resources. Controlling and making adjustments as needed Motivating all those involved.
Management Maturity 1 2 3 4 5 Reactive Repeatable Defined Managed Optimised Individual approach Some process, often informal Process documented and explained Process checked and reviewed for gaps Process open to external review and updated regularly Where might you be if you a) Didn t label patch cables b) Labelled patch cables consistently? c) Audited records against patching documentation?
Why Improve DC Management? 1. New technology demands Cooling, power, cabling, weight 2. Save on capital and operational costs Optimise existing facilities Reduce power and other costs 3. Less tolerance of outages and disruption 4. Speed of change 5. External need for evidence of control
Changing Requirements BEFORE AFTER No. of Servers per cabinet 3-6 30-40 Power Disipated per cab. 300-2000W 3kW - 25kW Current service to cabinet 16A 2x32 A or 3 phase Types of Equipment Servers Blade Servers Monitor Power Distribution Units KVMs MidSpan Boxes Power Strips Disk Arrays (Storage) UPS Smart Power Strips Regular Power Strips Network types 100M 1G, 10G, SAN No. of Cables Power 1 or 2 2 to 6 (per server) Network 1 or 2 5 to 10 Cabinet Total 20-30 300-400
New Technology Challenges Sun Blade 8000 Blade Chassis 4 Power supplies (N+1) 9kW 3 chassis per rack 27kW? HP C7000 Blade Chassis Up to 6 Power Supplies 13kW 4 chassis per rack Cisco Nexus 7000 Data Center Switch 3 Power Supplies 12kW Up to 384 ports And in the next few weeks?
Starting Well 1. Specify and build the infrastructure using a standards based approach TIA942 data centre design Other standards TIA, EN, etc. 2. Test installation for conformance to requirements 3. Handover of documentation, skills transfer and operational procedures to customer
So How Did This Happen?
Different Working Practices
So You may understand, but you can t assume others do Professionally designed infrastructure will be compromised without professional management practices!
Defining Best Practices You could define your own best practice Authority Experience Technically qualified Best communicator Management information Or you could adopt a framework Quicker path to end result with less opinions
Management Frameworks ITIL / ISO20000 Service Management BS25999 ISO27001 CoBit Business Continuity Information Security IT Governance All have a continuous process Plan But no equivalent for data center management! Act Check Do
ISO20000/ITIL V2 Security Management Service Continuity & Availability Management Release Processes Release Management Service Delivery Processes Service Level Management Service Reporting Control Processes Configuration Management Change Management Resolution Processes Incident Management Problem Management Capacity Management Financial Management Relationship Processes Business Relationship Management Supplier Management
Why have a framework? Common understanding of complex issues Terms, Processes, Roles Measurement, identification of gaps Communication Training for individuals and teams Focus provided Easier adoption of industry techniques Overcomes internal reluctance to change
Example of Best Practice Procuring a new server Policies - sign off, payment Ordering process life cycle Purchase orders common reference Roles and responsibilities specify, order and approve
Design Best Practice in Data Centers TIA942 standard, Uptime Institute, manufacturer guidelines Build and Install Standards and regulations TIA, etc. Operate??? EU Code of Conduct for Data Centers New!
OUTLET # BLINK = REMOTE = U OFF I = ON LINK 10/100 I/U TOGGLE CURRENT LAN SER IAL FEATURE OUTLET # BLINK = REMOTE = U OFF I = ON CURRENT LAN SER IAL LINK 10/100 I/U TOGGLE FEATURE Different Power Views 16A feed What should the working limit 16A feed be for the power strip? KVM STATUS 9 10 11 12 13 14 15 16 100-240V RESERVED 50~ 60Hz 1.2A 1 2 3 4 5 6 7 8 Servers ~
So. Monitoring tools are useful, but they only tell you what they see For managing power infrastructure we may need multiple values Manufacturer power rating Derated power often 60% of manufacturer Design power Actual power
Managing Existing Data Centers Environment limits Information sets - formal and informal Working practices - formal and informal Roles / responsibilities Current issues Establish priorities
Room Establish Design Limits Architectural and Structural - Weight Mechanical - Cooling, fire detection /suppression Electrical Power Cabling standards and limitations
PROLIANT mic ros ys tems WARNING: DRIVE SURFACES MAY BE HOT ALLOWTO COOL BEFORE TOUCHING A001 A002 A003 A004 A005 A006 A007 A008 A009 A010 A011 A012 A013 A014 A015 A016 A017 A018 A019 A020 A021 A022 A023 A024 112 0 mic ros ys tems 112 0 mic ros ys tems 112 0 mic ros ys tems 112 0 Is this Rack Full? 01-07 - FRONT UK_BIRM_UX08 UK_BIRM_UX07 UK_BIRM_UX06 UK_BIRM_UX05 It depends on Space Weight Power Cooling Connectivity SVR-BHAM-010701 Cable Mgmt 01-07-04 PP01-07-01 PWR01-07-B PWR01-07-A
Controlling the Environment Known design limits A baseline of the current estate Change approval process Forward planning for capacity Regular reviews against limits Maintenance practices Routine Verification on process adoption
Data Center Documentation Commissioning documentation Project plans and designs Testing results Initial systems provision BMS Data Center A to Z Operational documentation Various sets for ongoing management
Different Teams, Different Focus Business Processes Departmental, Company Systems Services End user, infrastructure, supplier Applications Service Management Applications PC, server, mainframe, SOA Networks LAN/SAN Mid-range Servers Virtual Infrastructure PCs, Network, Servers, Storage, DBMS Hardware Infrastructure PCs, Network, Servers, UPS, Storage, Other Desktops IMAC Data Centre Fixed Infrastructure (Cabling, Power, Racks, Rooms, Buildings)
Different views of a server Rack Position Service impact BLADE_BIRM01 Floor Plan H/W Build BLADE-BIRM01.BLADE-SW2 UK_BIRM01_BLADE-05 UK_BIRM01_BLADE-04 UK_BIRM01_BLADE-12 UK_BIRM01_BLADE-03 UK_BIRM01_BLADE-02 UK_BIRM01_BLADE-10 UK_BIRM01_BLADE-01 UK_BIRM01_BLADE-09 BLADE-BIRM01.BLADE-SW1 Power Supply Network Connections
Recommended Information Sets Space Environment (power, cooling) Connectivity (power, networks) Asset and Inventory controls Device management Service management
Where to Start? Structured cabling only LAN diagrams KVM Architecture Inventory list Storage diagrams Patching spreadsheets KVM WAN diagrams Point to Point Cabling Asset list IIS Architecture Building wiring diagrams Power architecture Computer room layout Edge switches Backbone switches Legacy systems Power distribution Blade switches PDUs Circuit breakers Labelling standards SAN Architecture PABX port mapping LAN Architecture Power strip connections
LAN Connectivity Example Identifying Focus for LAN Baseline Project User Impact of Disconnect Low High 1 3 Low High Amount of Connections 2 4 1. Backbone cabling 2. Cabinet/Zone cabling 3. Floor boxes 4. Servers 5. Core Switches 6. Edge Switches 7. Wireless Access Points 8. Routers 9. Firewalls 10.SANs 11.Power strips 12.KVMs 13.IP phones 14. Desktops
To Manage Connectivity 1. Document the fixed infrastructure first Backbone, power, vertical 2. The active components Switches, servers, SAN etc. 3. Finally the connectivity Local, path and endpoints
Defining the Level of Detail 1. Local patch? 2. End to End path? Patch Panel Patch Panel Patch Panel 3. All devices connected to the switch? Patch Panel Patch Panel
Asset Controls Lists of all devices and assets Their current status and location Previous history and audit trail Often combined with maintenance and procurement data Auto-discovery can help, but often limited in value in data centers.
Device or Element Management Network, server, storage monitoring Configuration systems Automated deployment / provisioning Network and other architecture diagrams Automated discovery and scanning Backup and failover
Service & Risk Management Help or service desk system Project control or workflow system Services maps Devices mapped to critical services Service monitoring tools Billing and charging Recovery planning and testing
DC Capacity Management Demand management to capture requests Existing + allocated demand recorded Capacity Plan and database Reporting and trending on Space Power Cooling Network, SAN Port availability Resource (staff) Green reporting
Charging and Funding Different perspectives Space Power Cooling Network Ports used Shared Infrastructure Costs and Support Hardware Maintenance Costs and Support Operations Costs and Support
Meeting the Needs of 3rd Parties SOX, PCI, FSA, auditors, etc. Building & planning requirements Employment and buildings legislation Disability Health and safety Electricity at work Carbon tax And others Insurance What would be sufficient evidence to satisfy them of your controls in most cases?
Energy Issues EU Regulations already in place Energy performance of buildings Energy using product directive (colour codes on white goods) WEEE and RoHS directives US Green Building Council LEED Program Leadership in Environmental Design The Green Grid programme EU Code of Conduct for Data Centres Completed 1Q2009 Covers all data center, server and equipment rooms UK draft climate change bill Carbon trading
EU Code of Conduct Aim is to inform and stimulate data center owners to reduce energy consumption Understand energy usage Raise awareness Communicate practices which will reduce energy consumption Voluntary Available at http://dcsg.bcs.org
EU Code of Conduct Measurements against best practices for Cooling Power equipment Other data center equipment Data center utilisation, management & planning IT equipment and services Energy monitoring Temperature and humidity requirements for equipment Suggested limits are 5ºC- 40ºC
EU Code of Conduct Additional best practices document is useful for all as it covers Design Operate New equipment and retrofit issues IT equipment selection Power, cooling, storage Monitoring and reporting
What presents the greatest risk? Managing Risk
Evidence of Conformance Policies covering control, security etc. Evidence of processes that support the policies Change records Build and test records Written material or email trails Communications Incident reviews Access lists
Current Issues Security of data on individual s financial and personal lives is becoming high profile. The data in the data center is valuable! www.idtheftcenter.org Example ITRC20090304-01NYPD Pension Fund 3/4/2009 A civilian official of the NYPD s pension fund has been charged with stealing the identities of 80,000 current and retired cops, sources said. He allegedly got into a secret backup-data warehouse on Staten Island last month and walked out with eight tapes packed with Social Security numbers, direct-deposit information for bank accounts, and other sensitive material.
Management Maturity 1 2 3 4 5 Reactive Repeatable Defined Managed Optimised Individual approach Some process, often informal Process documented and explained Process checked and reviewed for gaps Process open to external review and updated regularly What will be different next year?
Managing the Infrastructure Planning what needs to be done to achieve a particular result Organising and directing resources. Controlling and making adjustments as needed Motivating all those involved
Thank you for your attention Questions or feedback? David Cuthbertson Square Mile Systems Ltd www.squaremilesystems.com www.assetgen.com