Control and Management of Optical Networks

Typical Management Functions! Performance management! Fault management! Configuration management! Security management! Accounting management! Safety management

Performance Management! Monitoring and managing performance parameters in the network.! Can be used to provide for QoS provisioning and to monitor whether clients comply with the requirements.! Provides data to administrators, clients and other management functions.

Fault Management! Detecting and isolating failures in the network.! Restoring the links for the traffic.! In optical networks most of this function fall into Protection and Restoration.

Configuration Management! Managing addition/removal of equipment (rerouting of traffic!) and software upgrades.! Connection management deals with settingup and tearing down connections, can be:! Centralized! Distributed! Adaptation management: incoming optical/electric signals have to be converted into WDM signals accepted by the network.

Security Management! Authentication, Encryption, User rights.! Network security can be horizontally and/or vertically partitioned.! Vertical partitioning means that some users may only access some of the network elements.! In horizontal partitioning, some users may access some parameters all across the network. (e.g., user its own lightpath)

Accounting Management! Billing! Lifetime history of users! Lifetime history of components! Does not differ much for optical networks compared to other wired networks

Safety Management! Optical radiation safety! Eye safety (Class-1 Laser)

Management Framework and Protocols

Centralized vs. Distributed Management! Centralized management usually relies on hierarchical management systems.! Centralized management is usually slow due to large software path overheads.! Decentralized methods are much faster (e.g., for protection purposes it is necessary), thus some functions should be decentralized.! In large network distributed management enables more than one managers. Especially if different domains are managed by different managers, this is recommended. Managers will have to communicate with other managers to perform certain functions.

Management Framework! Network elements have to be managed, e.g., OLTs, OADMs, OXCs, etc. Each of these elements has an associated element management system (EMS).! EMSs have communication interfaces (and software agents) to communicate with.! EMSs in a network are interconnected with a data communication network (e.g., the OSC can be used for that).! An EMS has a good view of the element but not necessarily of the network. Thus EMSs communicate with a network management system (NMS) (or operations support system) (OSS)

Management Framework

Information Model! EMSs operate on information models (IM).! IMs are usually implemented in an object oriented language, since they have an object-oriented representation themselves.! Inheritance plays an important role.! Connection trails are IMs for example that can represent lightpaths.

Management Protocols! Most management systems are master-slave based with get and set operations over IMs.! Additionally, messages can be slave initiated (exception reports, or alarms, traps)! The most well-known IP based management protocol is SNMP. The IM ins SNMP is called MIB (management information base)! Yet most carriers still rely on TL-1 (Transaction Language-1). TL-1 is a simple ASCII command language.

Management Protocols! The standardization of a new mg. framework is timely and underway: TMN (Telecommunications Management Network).! TMN s hierarchical management protocol is called: CMIP (Common Management Information Protocol). CMIP runs over OSI but has also been defined for TCP/IP.! In order to be able to manage a network with equipment from various vendors, a common interface is needed. CORBA (common object request broker) can be used to enable communication of software agents.

Optical Layer Services and Interfacing

Clients of the Optical Layer! Clients of the optical layer should be able to control the optical layer, thus a clearly specified interface is needed.! Management functions are:! Lightpath set-up and tear-down! Lightpath bandwidth requests! Adaptation functions to convert WDM signals to client signals (transponders, supported wavelengths, modulation, bit-rates)! Performance level management (i.e., BER should be below 10-12 ).

Clients of the Optical Layer! Management functions cont d:! Different levels of protection may be supported. Maybe transferring low-priority data on the protection bandwidth. Restoration time specification.! Direction of lightpaths (uni-, or bi-).! Multicasting (drop-and-continue)! Signal jitter requirements of the client layer have to be addressed (with 2R or 3R).! Requirement on maximum delay have to be addressed (e.g., ESCON)! Root-cause alarms. One failure can result in several alarms most of which have to be suppressed.

Interface Between OL and Clients! A simple interface is used today, where the clients talk to the optical layer s EMS, and that talks to the equipment s EMS. This works good assuming long holding times.! In future, it would be more appropriate to specify a signaling interface between OL and it s clients. Some carriers yet think this is not necessary.

Layers Within the Optical Layer

Sub-layers in the Optical Layer! Three sub-layers have been defined by ITU: 1. Optical Channel Layer (OCh-top) responsible for end-to-end routing of lightpaths (optical channels) 2. Between OLTs and OADMs is an optical multiplex section (OMS) 3. Optical Transmissions Sections (OTS), the OTS also consists of the OSC.

Sub-layers in the Optical Layer

Sub-layers of the OCh! OCh-TS (transparent section) section of the lightpath within an all-optical subnet, no OEO conversions! OCh-S (Channel section) Adds some overhead, e.g., FEC! Och-P (Path) End-to-end lightpaths! The above sub-layering enable division of management functions.

Multi-vendor Interoperability

Interoperability?! Today there is (almost) no interoperability. It also seems to be utopistic for the near future.! Different vendors may use different modulations, bitrates, wavelengths, even the OSC can be at different wavelengths. Additionally WDM is an analog transmission technique which makes standardization even harder.! Interoperability is usually solved by interconnecting equipment using transponders.! There are attempts to standardize WDM and corresponding signaling protocols

Performance and Fault Management

Performance and Fault Management! Performance management ensures a lower threshold on the performance metrics.! Fault management generates alerts if some of the performance metrics fall below the threshold. Fault management also includes recovery from failures.

Transparency! If the network is transparent it is almost impossible to monitor BER, but optical power levels have to be measured. Yet this depends on the signal format which should be known to the measuring equipment.! Non transparent networks (e.g., for OC-48 SONET) on the other hand are cheap and easily managed (yet not future-proof).! Today most networks are somewhat transparent, where the clients are predefined (e.g., SONET/SDH,ATM,ESCON, Gb Ethernet)

BER measurement! BER can only be measured in the E domain.! SONET has built in bits in the frame to communicate about BER measurements.! Estimating BER on optical power is almost impossible.

Optical Trace! Traces identify lightpaths inside the network (e.g., the source and destination IP address).! Different sub-paths may have different traces.

Alarm Management! A single failure can cause multiple alarms (e.g., a link failure).! Thus single-root alarms are required to not to confuse the management system.! Alarm suppression is accomplished by special signals: 1. Forward defect indicator (FDI) 2. Backward defect indicator (BDI)

Forward Defect Indication! When detecting a failure, a node sends out an FDI to the downstream nodes.! FDIs are also passed on to further downstream nodes. If and FDI is received from an upstream node, the current node will suppress its own fault indication.

Backward Defect Indication! A BDI signal is also sent to the upstream nodes, if the upstream node did not send out an FDI, then it knows that the error is right between itself and the downstream node.

Data Communication Network

Data Communication Network! The DCN is used for signalling. It is usually a standard TCP/IP or OSI network.! The DCN should be well connected (e.g., 2-connected), in which case the DCN can survive broken links.

Data Communication Network 1. DCN can be a separate network (out of band), outside the optical layer. Simple leased lines can be used for this, but not all elements may be connected (e.g., EDFAs). 2. DCN can be built using the OSC. Yet all optical equipment (all optical OXC) cannot easily accommodate this. 3. In-band, inside the optical layer with superposed channels.

Optical Layer Overhead

Optical Layer Overheads! Overhead in the optical layer is required to enable the measurement for quality of the link (e.g., BER).! This is implicit with SONET, but not in general true for the optical layer.

Pilot Tone/ Subcarrier Multiplexed Overhead! Subcarrier is used to modulate the additional signalling overhead. This channel is relatively low bandwidth and amplitude.! Demultiplexing the pilot tone does not require the demultiplexing of the entire signal.! It is relatively inexpensive, does not depend on the format of the real signal. Yet it cannot be used to measure BER and can only be terminated at the termination points of the real signal.! Trace function is easily accomplished.

Optical Supervisory Channel! If optical amps are used, an OSC is employed to monitor those amps. This same signal can be used to establish the DCN.! The wavelength selection for the OSC contains some trade-offs (e.g., should not be close to pumping frequencies). The ITU standardized the OSC at 1510nm. Another good choice is above the L band (1620nm) to avoid collision with pumping sources and the S band.

Rate-Preserving Overhead! The existing frame structure of SONET is used to employ not defined bytes for FEC.! It cannot be used in a transparent network. And equipment has to be SONET/SDH enabled. (Some of the unused bits have been already hijacked by different vendors for their own purposes).

Digital-Wrapper Overhead! Similar to the subcarrier multiplexing method, where additional bits are added to the ongoing wavelengths. Can be indeed used for BER indication.! It can be used for FEC, and to establish the DCN. The wrapper is somewhat transparent to the signals transmitted.! It does not work well with legacy equipment.

Optical Safety

Classes of Lasers! Class-I laser cannot emit damaging radiation. Power levels have to be maintained below some given treshold values even in the case of single failures.! Class-IIIa systems allow higher emission powers but are restricted to be handled by trained personnel.! Class-IIIb systems can emit even higher powers and can cause damage even when it is not focused.! Usually long haul networks employ Class-III lasers, while enterprise and metro employ Class-I lasers.

Other Safety Considerations! Most equipment connectors have shutters closing when they are disconnected.! Protocols can also be employed to turn off laser sources if failures have been detected, e.g., the open fiber control protocol of Fibre Channel.

Network Survivability Based on: Optical Networks, a Practical Perspective (2 nd Edition) Chapter 10, by R.Ramaswami, K.N.Sivarajan

Resilience Against Failures! Availability of connection should be better than 99.999% (five 9s), corresponding to less than 5 minutes a year downtime.! Survivable or resilient are important properties of optical networks, so they can continue providing service even in the presence of failures.

Protection Switching! Protection switching is the key technique used for survivability.! With protection switching, traffic is rerouted between end-points using redundant bandwidth.! Restoration is the process of doing the protection switching (protection is a proactive while restoration is a reactive term).! To ensure fast restoration, management of protection should be distributed not centralized.

Sources of Failures! In the largest part of the cases: human triggered failures, e.g., cable cuts (Joe S. with a backhoe), turning the wrong switch, or disconnecting the wrong cable. Cable that is run together with other infrastructure (e.g., gas pipes) is less likely to be cut.! The next most likely cause of failures is equipment failure, e.g., transmitters, receivers, other active elements, thus usually controllers are redundant in equipment.! Since today (and this is the trend) most equipment is software-controlled, software bugs represent another threat of failures.! Additionally entire offices can go down (usually due to Mother Nature).

Protection! Protection is also needed to be able to service equipment without down times.! Usually resilience is engineered for a single failure at a time. Large networks are usually subdivided into subnetworks, each of them capable to handle a single failure.! Engineering for a single failure is sufficient if the mean time between failures (MTBF) is significantly larger than the average repair time.! Yet, some schemes can provide protection again multiple failures by their nature.

Restoration Times! A rule of thumb (standardized) coming from the requirements of SONET is to restore connections in less than 60ms (10ms of which should be sufficient for fault detection and initiation of restoration).! In data networks this is not a strong requirement, which enables reducing the cost of equipment (yet a downtime of 1 second results in 312MB 1.25GB 5GB lost information on OC-12, OC-48 and OC-192 link respectively)

Layers of Survivability! Protection can be performed at: 1. The Physical layer (SONET layers, Optical layer) 2. Link layer (e.g., ATM layer, MPLS layer) 3. Network layer (e.g., IP layer IP routers)! In general, the higher the protection is implemented the slower it gets (but also the more sophisticated it can be)

Working Paths vs. Protection Paths! Working paths carry traffic in normal conditions.! Protection paths are used in case of failures on working paths.! Working and protection paths should be routed link (and node) disjoint.! Ring topologies are popular due to their minimal link properties to remain 2- connected. Yet most research today is focused on mesh topologies (future)

Dedicated vs. Shared Protection! Dedicated protection assumes the equal (redundant) bandwidth available for each connection over working and protection paths.! In shared protection several protection paths can use the same wavelength, assuming that only one failure happens at a time (and to restore from that failure, only one of the protection paths needs to be established)! Shares protection can reduce the overall bandwidth requirement (but will require higher restoration times)! Redundant bandwidth can be reused to carry best effort traffic as long as protection has higher priority over this traffic).

Revertive vs. Nonrevertive Protection! In both revertive and nonrevertive schemes traffic is automatically switched from the working path to the protection path in case of a failure.! In revertive protection as soon as the error has been fixed the system is automatically going to switch back to the working path.! In nonrevertive protection manual reaction is needed to switch back to the working path.! Obviously, shared protection schemes should be revertive too.

Unidirectional vs. Bidirectional Protection! In unidirectional protection, protection switching is handled in each direction separately.! In bidirectional protection, protection switching is handled on both directions simultaneously.! Fiber cuts are easily handled with unidirectional protection, but in the case of, e.g., transmitter failure, unidirectional protection only switches the failed direction to another path (or fibre).! Unidirectional protection is used with dedicated protection, since then no signalling is needed.

Automatic Protection Switching! APS is a signaling protocol (e.g., to be used with bidirectional switching) for devices to quickly notify each other about the need of restoration.! In a simple APS, a node that encounter loss of signal turns its own transmitter ower to the protection path by which it signals the peer node about the failure.! In shared protection schemes APS is required to coordinate access to the shared protection bandwidth.

Rerouting of Traffic! Path Switching: Connection is rerouted end-to end along an alternate path! Span Switching Connection is rerouted on a spare link between the same nodes (where the failure happened)! Ring Switching Connection is rerouted along a ring between adjacent nodes that encountered the failure.

Protection in SONET

SONET! A major goal of SONET was to have fast restoration times.! The SONET layer includes the path and line sublayers at both of which protection schemes are used (path layer is terminated end-to-end, while line layer is terminated at every TM, ADM, or OXC but not at every regenerator (section)).! A path scheme operates on individual connections (e.g., on individual STS-1 on an OC-12 ring)! A line layer on the other hand would switch the OC- 12 traffic

1+1 Protection Between Nodes! 1+1 protection means that there is a dedicated protection fiber and traffic is transmitted over both links simultaneously (e.g., two fiber-pairs between nodes).! The destination simply selects one of the signals (e.g., the strongest) and switches over to the other in case this signal fails very quick restoration.! The two endpoints may use the links on different fiber-pairs, which is not a problem.

1:1 Protection Between Nodes! Similar to 1+1, but traffic is transmitted over only one of the fiber-pairs (the working fiber).! Both nodes have to use the same fiber-pair all the times for which APS is needed slower than 1+1 protection switching.! The protection-redundant bandwidth can be used to transfer best-effort traffic as long as protection has priority over it (which is seldom used today)

1:N Protection Between Nodes! Generalized version of 1:1 protection, where N working fibers share a single protection fiber.! This scheme can handle the failure of a single fiber but APS has to be used to avoid using the protection fiber simultaneously for more than one failures.

Self Healing SONET Rings

SONET Rings! Rings are the simplest 2-connected structures (and topologically efficient too).! Today most legacy infrastructures are SONET self-healing rings.! Two main approaches to SONET rings are: 1. Unidirectional rings (traffic is carried in one direction only) 2. Bidirectional rings (traffic is carried in both directions)

SONET Self-healing Rings 1. Unidirectional path-switched rings (UPSR) rings have a single fiber pair between adjacent ADMs. 2. Bidirectional line-switched rings with four fibers (BLSR/4) have two fiber pairs between adjacent nodes 3. Bidirectional line-switched rings with two fibers (BLSR/2) have a single fiber pair between adjacent nodes

Unidirectional Path-switched Rings (UPSR)! One fiber is used as working fiber and the other as a protection fiber. Similar to the linear 1+1 scheme.! The working fiber s and the protection fiber s directions are different (e.g., one is clockwise while the other is counter clockwise).! Switch over is done on a connection basis thus at the path layer (unlike with 1+1 protection).

Unidirectional Path-switched Rings (UPSR)! Can handle link, node, transmitter or receiver failures easily.! No signaling protocol is needed.! The redundant bandwidth is equal to the working bandwidth and cannot be reused for other connections.! UPSR cannot spatially reuse bandwidth!! UPSRs are popular in lower speed access networks (in a hubbed situation there is no need for spatial reuse anyways).! Typical UPSR speeds are below OC-24 speed.! Ring length is limited by the restoration times (and the fact that the propagation delay in the two different directions are different too)

Bidirectional Line-switched Rings (BLSR)! More sophisticated than ULSRs! They operate at the line layer.! In BLSR/4 two fibers are used as working and two fibers are used as protection.! Working traffic can be carried in both directions (unlike UPSR) usually with shortest path routing (but not always).! BLSRs support up to 16 nodes (hardware limited by 4 bits). Maximum length is 1200km (corresponding to 6ms). Longer rings are permitted if the 60ms requirement is relaxed (submarine cables)

Bidirectional Line-switched Rings (BLSR/4)! BLSR/4 can do two types of protection: span switching and ring switching.! With span switching a defective fiber between adjacent nodes can be rerouted to the protection fiber between those two nodes.! In case of a regular fiber cut, ring switching is used in which all length of the protection fiber is involved.

Bidirectional Line-switched Rings (BLSR/2)! In a BLSR/2, the protection fibers are embedded into the working fibers by sharing the bandwidth of a fiber, thus half the capacity of the fibers is reserved for protection purposes.! Span switching is not possible, yet ring switching works similar to that of BLSR/4.! The protection bandwidth in both BLSR cases can be used to carry low-priority traffic.

Bidirectional Line-switched Rings (BLSRs)! BLSRs provide with spatial reuse capabilities!, thus they are more efficient than UPSRs (since protection bandwidth is shared among all connections).! BLSRs thus are a good choice for long haul networks. BLSRs can have speeds of up to OC- 192. Metro carriers prefer BLSR/2 while long-haul carriers like BLSR/4 (can correct more errors e.g., on each span a transmitter error).! Compared to UPSRs, BLSRs require much more complex signaling (BLSR/4s even more), but the SONET frame structure provides bits for these purposes.

Node Failures in BLSRs! Node failures are less likely than link failures (due to redundant equipment).! BLSR restoration is more complicated in the event of a node failure, if each of the adjacent nodes to the failed node assumes that the links have been broken leads to unwanted situations.! To avoid wrongful situations, nodes have to determine what the exact situation is first, and then remove connections to/from the failing node. This requires extensive signaling and is called squelching.! Each node maintains squelch tables to know which connections need to be squelched in the event of a node failure (slower restoration times)

Ring Interconnection! Bigger (mesh) networks can be established by the interconnection of rings (or mesh networks can be seen as interconnected rings).! The most simple way to connect rings is to connect the add/drop ports of two ADMs in two different rings (with additional grooming capabilities).! Yet if one of these ADM fails, the two rings become separated.

Dual Homing! Two adjacent ADMs in each ring are used as hubs to 2-connect the two rings. Thus one of these hubs can fail and connection would be still available.! Usually the hubs in one ring use a drop-andcontinue approach (where data is not removed from the link).! Dual homing is extensively used in long-haul networks to interconnect BSLRs, and also to interconnect metro BLSRs with access UPSRs.

Protection for IP Networks

IP Networks! Traditionally IP networks provided best effort services. This is changing (IntServ, Diffserv) and one of the possible vessels to QoS provisioning can be MPLS.! In IP networks, the intradomain IP routing protocols (e.g., OSPF) provide with restoration by updating their routing tables after encountering errors. Usually it takes several tens of seconds to detect failure and several minutes for the networks and the routing tables to converge to a stabile situation.! Another option is to have the underlying layer provide on information of failure which would speed up the recovery process.

IP Networks Today! Adjacent router exchange hello messages periodically to detect link failures.! Loss of link is detected by counting the number of missing respective hello packets.! (A typical set-up is 10s periods and 3 missing packets => 30-40s just for detection) Core router typically detect failures in the 10s range.! There is a lower limit on the frequency of hello packets (1s) but that is still too much time spent for error detection

The Alternative: Protection at the Optical Layer

WDM! WDM can be used to serve the clients, such as SONET or IP or ATM.! Client layers have their own protection mechanisms and do not have to rely on the optical layer s resilience.! Yet resilience in the optical layer can be extremely helpful (if not instrumental) in reducing cost/complexity of client layers as seen next.

Reason 1 for Optical Layer Protection! Not all clients (e.g., ATM, ESCON, Fibre Channel, IP) have as sophisticated protection methods as SONET does.! Yet, since the industry seems to converge onto a uniform packet switched approach (likely to be IP), all the above clients are expected to have the same kind of protection SONET is bearing.! This can be done either at each individual clients or down at the OL (is more cost effective).

Reason 2 for Optical Layer Protection! Cost savings of OL protection to client layer protection.! Optical layer protection is more efficient since it can share protection bandwidth among different(!) clients (which clients cannot do).! The cost of a router port is significantly higher than the cost of an optical layer port, thus it is better to leave protection switching to the optical layer (less client ports are required).

Reason 3 for Optical Layer Protection! A WDM link carries more than one wavelength, a cable carries more than one WDM link.! In case of a cut, each stream has to be restored by the client layer resulting in an extreme amount of alarm management signalling.! If the OL would take care of this, fewer entities would have to be rerouted => faster and simpler with less overhead.

Reason 4 for Optical Layer Protection! OL protection introduces an additional degree of resilience to that of the client layer s protection.! Repair of a failed link can take a long time in which additional protection may be available due to the combination of the two protection schemes.

Reason 5 for Optical Layer Protection! SONET protection is mainly based on rings (UPSR, BLSRs) which require reserved protection bandwidth.! However in the backbone a mesh topology is more likely, thus distributed shared protection can further save on bandwidth requirements (thus enhance efficiency).! Protection schemes for mesh topologies are currently developed (hot research area).

Drawback 1 of Optical Layer Protection Only! OL cannot protect against failures in the client equipment (e.g., transmitters, receivers).

Drawback 2 of Optical Layer Protection Only! Decision on when to invoke OL protection is non-trivial.! No BER measurements are available, only the presence or absence of power.! Even power level measurements have to be digital (loss of light), since the OL has no way of knowing what the values are intended to be.

Drawback 3 of Optical Layer Protection Only! In the OL only lightpaths can be protected.! If protection of individual flows in the lightpaths is required, then the client layer has to do that.

Drawback 4 of Optical Layer Protection Only! Protection paths can be significantly longer that requires optical signal level engineering.

Drawback 5 of Optical Layer Protection Only! Interworking of protection schemes is a serious issue to avoid counter actions of different protection initiatives.

Protected Service Classes! Different protection schemes can be used for different QoS requirements of streams.! There are no standards defined but there is some consensus on the different classes of protected services.

Protected Service Classes! Platinum! Very-high availability very-fast restoration (e.g., dedicated 1+1 SONET like protection)! Gold! High availability, fast restoration (n*100ms, e.g., shared mesh protection)! Silver! Good availability, best effort restoration (e.g., connection is reattempted in case of failure)! Bronze! Unprotected lightpaths, in the case of failure, connection is lost! Lead! No protection or guaranteed bandwidth (e.g., using protection bandwidth to transmit best effort data)

Protected Service Classes! Mapping of applications to different service classes is not clear yet.! Today most services are platinum, since infrastructure is yet SONET based.! The trend is towards asynchronous data communications, thus it is likely that all of the previous services will be defined.

Optical Layer Protection

The Optical Layer (review)! Optical layer consists of:! Optical channel layer - OCh (SONET path layer)! Optical multiplex section layer OMS (SONET line layer)! Optical transmission section OTS (SONET section layer)! OL protection can be done in either OCh or OMS. OCh restores on a lightpath basis endto-end, while OMS restores on a fibre basis locally.

OL Protection Alternatives! There is a significant cost difference between OCh and OMS protection.! OMS is general requires less equipment, since it is operating on a set of lightpaths, where OCh handles the lightpaths individually.! OMS protection is independent of the number of channels, while OCh is heavily dependent.

OL Equipment! In the OL, OLTs and OADMs can both provide for OCh and OMS protection.! OXCs are likely to perform protection at the OCh level (or even higher if it is a grooming OXC).! Meshed backbones are likely to use OXCs at the OCh level.! Metropolitan networks with a large amount of OLTs and OADMs are likely rely on OCh and OMS protection of the OL.

1+1 OMS Protection! Simplest OL protection scheme.! The WDM signal is split onto two diversely routed physical links, and the receiver selects the better signal among the two using either a 2*1 switch or two EDFAs (one off and the other on) and a combiner.! Splitting results in 3dB loss that has to be calculated (and other insertion losses).

1:1 (1:N) OMS Protection! Similar to SONET 1:1 (1:N) protection! Typically has a switch at the transmitter (less signal power loss).! APS protocol is needed for protection coordination between endpoints.

OMS Dedicated Protection Ring (OMS-DPRing)! Similar to SONET UPSR or more like ULSR.! In a likely implementation it is just two overlaid bus networks (with the ring broken at one node). If a link failes that break point will be moved to a node adjacent to the failed link.

OMS Shared Protection Ring (OMS-SPRing)! Similar to BLSR/4.! Two fibers have corresponding WDM equipment while the other two do not (cost and loss saving).! It can either do span or ring switching of fibers (OMSs).! Protection fibers may have to include amps to compensate for longer paths.! Two fiber version is theoretically possible (half the wavelengths for protection while the other for working and vice-versa on the other fibre). Yet, it requires groups of wavelengths to be multiplexed (not OMS level).

1:N Transponder Protection! None of the before mentioned schemes are protecting for transponder failures.! 1 transponder can be set aside to protect N transponders. Yet since most of the time fixed wavelength transponders are used, in case of failure a night lightpath has to be set up at the new wavelength (alternative is to use tunable spare transponders)

1:1 OCh Dedicated Protection! Two lightpaths on disjoint routes are set up for each client.! Signal is split at the input, destination selects the best signal.! Can be deployed in rings: Och-DPRing.! The protection bandwidth is not shared among different clients, yet can be implemented simply.

OCh-Shared Protection Ring (OCh-SPring)! Working lightpaths are set up along the shortest path on the ring. If a lightpath fails, it can be restored either by span or ring switch.! None-overlapping lightpaths can use the same protection lightpath in the ring (spatial reuse).! In case of node failures squelching is needed (signaling overhead).

OCh Mesh Protection! Backbone networks are more densely connected and thus are mesh networks.! Mesh protection can be more bandwidth efficient (wavelengths can be shared among rings) than virtual ring topologies. The denser the network in general the higher the gain (20-60%) using mesh protection.

Mesh Protection Problems! Standardization is needed (just like in SONET)! Network has to be treated as one entity (unlike with rings, where partitioning is possible), since partitioning the network reduces bandwidth efficiency. Human errors will have a greater impact on the network! Management is significantly more complex than that of rings. New management tools are needed (for working and protection path calculations), but since network does not have to be partitioned in rings, the likely hood for blocking in a shared case is reduced.

Mesh Protection Problems! New (network-wide) signaling mechanisms are needed to be able to provide for rapid restoration.! Protection routing tables (topology and protection paths) will have to be maintained at nodes. If the network changes, all the tables have to be updated and have to be always consistent (unlike that for IP routing tables).

Online vs. Offline Protection! If protection routes are computed while setting up working paths then protection is offline (pro-active).! If protection routes are calculated after a failure then protection is online (reactive).

Offline Protection! The computation can be based on disjoint path routes, but can also involve several protection routes for different errors (has an impact on the signaling and memory overheads).! If routes are computed centrally, then computation is relatively easy but there is a single point of failure, and a bottleneck (both communication and processing wise) can be formed.! If routes are computed in a distributed fashion, then failure notification has to be flooded (or sent one-by-one) in which case all the routers involved have to reconfigure themselves.

Online Protection! Will there be enough idle bandwidth to do the calculation? It is possible that not all lightpaths will be restored.! In case of a distributed calculation, lightpaths will have to contend for available bandwidth which requires additional signaling overhead (slower too).! A centralized calculation would handle contentions more easily but communication and processing overheads would be too large.

Mesh Protection Requirements! Route computation (Disjkstra)! Topology and table maintenance (OSPF)! Signaling for protection routes (RSVP, PNNI, SS7)! Most of which have already been implemented in IP or ATM networks (but are too slow) see brackets.

Interworking between Layers! Since protection can be done at several layers (and their corresponding sub-layers) overall protection must be coordinated to avoid pathological situations of independent but contending protection initiations.! If different layer protections have some well defined behaviors than the above contention can be avoided.

Avoiding Contention A Simple Example! If the following conditions are satisfied, then the network will always restore traffic flows: 1. There exists a protection path for each involved protection layer. 2. The server layer s protection does not depend on a failure detection from the client layer. 3. The client layer is revertive (it will always try to switch back to the original working path)

Interworking?! Although interworking would be desirable it is not always possible (different entities trigger restoration for different layers).! Yet, if priority between protection schemes could be given, then that would implicitly ensure interworking (e.g., OL restoration is so fast that the IP layer did not have time to detect a failure).! Another approach is to delay restoration at the higher layers waiting for the lower layer to act, but that introduces additional delay (why not have the fastest layer react first?)