Maximizing Voice Services Uptime in VoIP Environments Bill Caraher, von Briesen & Roper Jim McCue, Rodey Law Firm
Overview Phone systems and network design VOIP components and services: What are they? Where are they? Essential in a disaster? Vendor approaches to disaster recovery/business continuity What can go wrong? How do we adjust or do we?
Rodey Phone System/ Network Design Rodey Law Firm: 70 attorneys, 150 people, 2 offices in Albuquerque and Santa Fe, New Mexico Shoretel Phone System replaced Siemens 9751 PBX Headquarters server in Albuquerque, Distributed Voice Services server in Santa Fe (2 vm servers) Network: HP Procurve Gigabit POE Switches Vlans: data per floor, voice Routing switches MSTP: Multiple Spanning Tree Protocol per Vlan Spanning Tree MSTP: Multiple Spanning Tree Protocol, per Vlan Spanning Tree Interoffice connection: QMOE (Qwest Metropolitan Over Ethernet)
Rodey Phone Diagram: Phones/Switches
Rodey Phone Diagram: External Lines
Rodey Network Diagram
von Briesen Phone System/ Network Design von Briesen Law Firm: 110 Attorneys, 250 total employees. 4 Offices in Wisconsin Cisco VoIP Replaced (4) Legacy PBX Systems Centralized Voice System at HQ Dual P2P T1 between HQ, PRI & POTS @each Voice VLANs & QoS Routing
von Briesen Voice Infrastructure
von Briesen Network diagram
Phone Services Need for in a DR scenario? Switch to outside service? Conference Calling, faxing Essential?: Internal 4 digit dialing Outbound phone calls Inbound phone calls: Voice mail Unified Messaging Auto attendant Call forwarding 911 services Less essential? TAPI PC interface with phone system Conference Calling Cost Recovery Faxing: Inbound, outbound Overhead paging Music on Hold Extension monitoring Presence and IM Dial by Voice
Shoretel DR Features Distributed Call Control Distributed call control on each switch: Call setup, breakdown, conferencing, transfer, conference, forward Peer to peer: switch to switch Distributed Voice Applications Voice mail Auto attendant Unified messaging Call control N+1 redundancy Voice switch failure Phone-switch heartbeat Caveat: lose a switch and a headquarter server Voice mail failover/auto attendant failover DVS to headquarters, DVS to DVS Headquarters won t failover to DVS Auto attendant to backup auto attendant Server High Availability: Doubletake VMware with HA PSTN failover DRS enabled?
Shoretel DR Features Outbound Trunk Call Routing/Failover: Selects from all lines marked for type of call (local, ld, international, 911, etc.) Trunk group rollover Extension portability: Different phone Wireless phone Soft phone Office Anywhere VPN accessible: phone, soft phone Mobile Call Manager - blackberries 911 services: Ensuring 911 calls and accurate location information are routed to the appropriate Public Safety Answering Point (PSAP) Native Shoretel 911 E911 Add on
Cisco DR Features Phone System Disaster Recovery Redundant Call Manager Subscriber & Publisher Supports Virtualization for Server Based Components SRST Gateway Failover Local Survivability in Branch Offices & HQ Voice Mail Clustered Unity Server (Windows) All Voicemail stored in Exchange no data loss or data failover w/clustered Exchange Auto attendant failover Subscriber / Publisher Failover trunk groups from carrier POTS backup Service Advertisement Framework (SAF) New in CM 8.0 services and DN can be advertised by different hardware when a failure occures, no manual switchover. Location Disaster Recovery Centralized Phone System in Data Center Soft phones Take desk phones remote via Internet Proxy no VPN Necessary Power Loss Have local UPS and bricks available when PoE is used. Feature Disaster Recovery Feature Disaster Recovery Virtualize or Cluster important services but weight cost vs. benefit
What can go wrong?
What can go wrong?
Phone failure Phone Failure Users value their phone service!: Rodey Replace phone Log into another phone (LAN, Wireless, VPN) Use Softphone (LAN, VPN) Office Anywhere Mobile remote (BB) von Briesen Log into another phone Cisco Extension Mobility Replacement Hardware on Hand Single Number Reach calls go to Cell Phone Automatically Cisco Mobility outbound dial via Cell Phone Proxy Desk Phone via Internet and ASA Softphones
Phone switch failure Phone switch failure (Jim) IP switch Analog switch T1 switch Solutions Rodey Shoretel IP Switch failover: local, remote Analog Switch: Replace Move extension in software, hard wire T1 Switch: Redundant lines Replace
Phone switch failure
Phone Switch Failure von Briesen Cisco Core Network Routing & Switching Cisco Endpoint Switching PoE / Bricks SRST: Phones register with Gateway (Router)
LAN Failure LAN failure von Briesen To phone: Single network cable into core switch w/vlan T M lti l t k bl i t it h /VLAN To server: Multiple network cables into core switch w/vlan Solutions von Briesen Rely on Core Switch for fault tolerance on LAN Long term plan is to Virtualize Servers, easy to spin up/restore. Rodey Spanning Tree per vlan Dual port switches Dual port servers Spare network switches! Wireless phones DHCP
Interoffice line failures Interoffice Line Failures MPLS, Frame Relay, T1, Metro over Ethernet, etc. Solutions Rodey PSTN failover Backup Internet VPN Switch is call manager von Briesen Multiple T1 lines between offices, Internet VPN backup, SRST local survivability. Long term failure Phone Proxy through Internet
External line failures External line failures External lines are the weakest link in your phone system!: Types: Analog PRI SIP Solutions: ` von Briesen Outbound multiple PRIs, Call Routing over Data Lines, POTS Inbound multiple PRIs, Calls can originate from any PRI (Telco Trunk DID Trunk Group Failover) SIP solution for Cisco Rodey Outbound PRI call routing/failover Alb PRI2 to Alb PRI1 in descending order Alb PRI1 to Alb POTS line SF PRI to Alb PRI1, PRI2 SF PRI to SF POTS line Inbound Alb PRI1 to Alb PRI2 in ascending order SF PRI to SF POTS line SF PRI via Qwest BCR to Alb PRI SIP solution for Shoretel?
Server failure: von Briesen Publisher & Subscriber Failure Scenario The cool thing about Cisco is that you can lose both the Call Manager Publisher & Subscriber and still have dial tone. Phones can re-register with the Gateway at HQ and Branch Offices. *Some Features will not work when in SRST mode.
Server failure: Rodey Headquarters down, DVS up Voice Mail: Headquarters VM forwards to switch based backup auto attendant (not to SF DVS) Remote VM still functional IP phone failover will not work at either location PCM: Headquarters doesn t work Remote site works Can t change call handling mode/configuration Users can t reassign extensions
Server failure: Rodey DVS down, Headquarters up Voice Mail: Headquarters VM still functional Remote users forward to Headquarters VM Remote users can t use Office Anywhere, reassign extension IP phone failover works PCM: Headquarters works Remote site doesn t HQ Users can reassign extensions, remote can t (uses DVS VM)
Power failure Total Computer Room Network Switch Rooms Personal Offices Solutions Rodey Power Failures POE, UPS, n+1 power supplies, n+1 ups, power bricks Analog line hard wired for emergencies, power backup runs out von Briesen 95% of all phones have local power bricks, not on PoE. Data Center on UPS Power that will hold for 90 Minutes. After that, HQ is dead but branch offices work on SRST. In 11 or 12 the DC will be moved to a co- location facility with generator backup power. Backup Analog lines will only work if the phones and gateway have power.
Office loss Office loss Headquarters, remote office, collocation facility? Solutions von Briesen: A t t l l f th HQ ld b bl ti Th b h ffi ld A total loss of the HQ would be problematic. The branch offices could operate but we would need to re-establish a Call Manager and ASA Proxy through another office or co-location facility. Voicemail is also at the HQ, so that and the feature servers would need to be rebuilt.
Office loss-rodey Redirect DIDS with BCR BCR: Qwest Business Continuation Routing Call forwarding service Predefined groups of numbers Forward to up to 2 alternate locations Triggered by phone call DID info preserved Dependent on CO functionality Restore HQ server or DVS Reconfigure effected extensions Use either Doubletake or VMware to sync copy of HQ server/dvs
Plan for the Minor Disasters Troubleshooting a faulty VoIP system can be tricky. Intermittent e problems are the worst and hardest to resolve! Back into the problem what doesn t work and then think about what is required for that part of the system to function. Keep documentation up to date and available. Example: Intermittent one way audio on some Inbound long distance calls. Enable verbose logging, write output to a syslog server, QRG buttons on phones that are problematic write an extended d log to the system for analysis. Open a ticket with the Telco carrier. Request a dispatch to rule out local equipment. They always want to blame CPE. Had a scenario recently closed my ticket twice stating it was CPE with a faulty DID. I finally got a tech dispatched, he agreed there was a problem on their end, they determined that the number got ported on accident. Open a ticket with the VoIP system vendor. Often times they will be helpful in helping to get to the root cause of the failure. Be prepared with version numbers, log files and lots of details to get the call going quicker.
Jim Summary Look for all the points of failure (phones, switches, lan, wan, external lines, power, operator) and test solutions Don t create your own disasters Don t forget your network!! Rodey weakness dependence on HQ server for phone/switch assignment changes, Alb VM Solution: VMWare with HA certified with version 11 Solution: move VM in Albuquerque to new Alb DVS
Summary Bill Successful Disaster Recovery involves planning and execution. Chose the right telco partner that can automate as much fail-over as possible. Keep the WAN and LAN healthy you now have a local l voice network. Look for single points of failure in the design and system. Test your DR plan. Weigh the cost vs benefit on some redundancy scenarios is it really necessary to have every feature working.
Audience questions and real life experiences