Daitan White Paper WebRTC Lessons Learned SUCCESSFULLY SUPPORTING WEBRTC IN BUSINESS APPLICATIONS Highly Reliable Software Development Services http://www.daitangroup.com/webrtc
WebRTC: Lessons Learned 1 ABSTRACT: SUPPORTING WEBRTC IN BUSINESS APPLICATIONS The intended audience for this paper is the provider of business solutions supporting real-time communications (conferencing, webcasting, customer relationship management, customer service, call center systems, project management, collaboration, etc.). This paper assumes that the reader is familiar with real-time communication technologies and WebRTC. If you are looking for a WebRTC primer, please download the Daitan Paper WebRTC and Universal Communications (available at http://daitangroup/webrtc). For the first time, WebRTC provides a non-proprietary layer of functionality that developers can use without being encumbered by the limitations of one application or platform and that end users can utilize without the need to download and install any proprietary component. With the availability of a standard RTC client everywhere, it becomes more difficult to demand users to download and install a proprietary client in order to participate in a conference call or use a business application. So business solution providers are under increasing pressure to support WebRTC clients. Beyond that, the availability of WebRTC opens doors for new functionality, broadens the potential audience for the application, frees software vendors from royalties and license fees, and can be used to improve user experience. Daitan Group has collaborated with several software vendors over the past two years and pioneered the implementation of WebRTC in business applications. In this paper, we share some of our experience and insight we have gathered in the process. Overall, we found that, because WebRTC is primarily designed as a way to integrate real-time communications to websites (and not as a general communication or conferencing tool), integrating it to business solutions presents more challenges than one would expect. Interestingly, most of the proven WebRTC use cases we have seen so far are geared towards business applications.
2 THE CHALLENGES (ISN T WEBRTC SUPPOSED TO BE EASY?) The goal of WebRTC is to make real-time communications capabilities easy to implement and use. But WebRTC is architected and designed primarily for decentralized, flat topologies with minimum structured signaling and where media streams travel peer-to-peer without being processed by a server. Many business applications have different requirements. They are usually deployed in a centralized topology, with an application server so that multiple types of endpoint clients can be supported, use standardized business logic, and enable integration with other business systems. Those differences in design assumptions create a series of challenges when attempting to support WebRTC clients in a traditional business application. When implementing WebRTC, you must consider: Interoperability While consumers can switch tools overnight, business systems need to support a gradual transition and the co-existence of different sets of technologies. There is a large installed base of SIP phones, soft-phones, video conferencing equipment, etc., that will remain in use for the foreseeable future. WebRTC is just one more technology in the mix that is not intended to replace some of the existing components. WebRTC standards are still evolving and there are still some uncertainties related to video codecs and critical mass adoption. Performance & Scalability WebRTC is designed to minimize the need for user intervention, so client implementations today seek to autonomously adjust the parameters of the call to adapt to available bandwidth in a best-effort basis. In a business application, performance requirements are typically more stringent, so the application also needs to monitor the connections to take corrective action or at least warn the administrator when call quality degrades. Another key question is related to scalability. Is the mesh topology adopted by consumer-oriented tools the right answer? Or is there a requirement to support a large number of users which dictates an MCU-based solution that can scale further? Mobile Devices Apple and Microsoft platforms do not support WebRTC today. Codec standards and hardware acceleration are important to minimize the use of CPU and battery in mobile devices. Hardware-based implementations of VP8 encoding/decoding are becoming available in most mobile SOC and chipsets. We will examine these challenges in the next sections of this paper.
3 WEBRTC INTEROPERABILITY The figure below shows a simplified architectural view of WebRTC. Application developers will mostly interact with the Web API. The latest documentation for the WebRTC Web API (published by the W3C WebRTC Working Group) can be found here: http://dev.w3.org/2011/webrtc/editor/webrtc.html Figure 1- WebRTC General Architecture (source: webrtc.org) The stack above runs on the client device. For client devices to communicate with each other through WebRTC and to communicate with other communication elements in the business several other networks elements are required. An application server or web server can host an application to allow users to register or to find each other in order to request a connection and to negotiate media parameters. STUN and TURN servers are used to allow the peer to peer media connections between the client browsers to traverse Network Address Translation (NAT) gateways and firewalls.
If your application requires integration with traditional telephony or use Session Initiation Protocol (SIP) as the call initiation and control protocol, you will need to deploy a SIP gateway that can convert SIP over Web Sockets to SIP over UDP. If all end points use the same set of audio/video codecs, then media streams can flow directly between them. But the most common scenario in business applications is that there are existing end points using different codecs. In that case, you will also need to deploy RTP media gateways that perform transcoding. The diagram below details the infrastructure needed to run WebRTC as part of a typical business application. Figure 2- WebRTC network Infrastructure 3.1 INTEROPERABILITY - MEDIA The WebRTC standard will eventually define the mandatory codecs to be supported by compliant implementations, but that is still a topic in debate within the standardization community. The two main contenders are VP8/VP9 and H.264/H.265 (we will refer to them as VP8 and H.264, which are the most common versions deployed today). Google has acquired the technology behind the VP8 codec, released its implementation in open source form and promised never to demand royalty payments. H.264 is a long-standing standard that is available everywhere (including hardware-based acceleration in many platforms). The challenge with H.264 is that it is not royalty-free.
For most stand-alone applications, which video codec is supported is not very relevant, but for business solutions that require integration with existing systems, the codec choice has impact in the need for transcoding. Most WebRTC implementations today utilize the VP8 video codec while most of the installed base of video systems utilizes H.264. Assuming the most likely scenario above where there is a need to support more than one codec (and different versions of each codec), a business application or additional network element will have to process and provide transcoding of media. Media transcoding requires significant computing power, which can limit the scalability of the solution. Traditionally, media servers use specific DSP-based hardware to transcode. But software-based transcoding in generic high-performance CPUs is now becoming more common (and this is a requirement for flexible deployments using a standard cloud infrastructure). Estimating transcoding CPU resources for deployment of WebRTC in the Cloud is crucial to determine the viability of a project. WebRTC relies heavily on RTCP for controlling the media stream (e.g., for requesting image refresh), while many existing applications/devices do that through the application layer (e.g., through SIP messages), so the transcoding or gateway element needs to handle this too. 3.2 INTEROPERABILITY CALL SETUP AND SIGNALING WebRTC was originally defined as a technology to facilitate the integration of communications to webapplications. Because it was not intended to serve as a tool for universal communication, it does not include a signaling layer for establishment and control of calls. SIP is broadly used in telephony for call setup. It is the natural candidate for business applications that support calling between clients and a requirement if integration with business communications systems is needed. SIP was designed to run over a separate control network. In web applications, it is more convenient to transport all the traffic (including call setup and control) over web protocols, so SIP is typically encapsulated into Web Socket connections. If the call manager used by the business application does not support SIP over Web Socket, a WebRTC SIP Gateway is required. SIP stacks included in open-source WebRTC clients such as JsSIP and sipml5 support establishment of basic sessions. If additional features such as call transfer, conferencing, etc., need to be supported, adjustments in such stacks are required. Some development effort should be expected depending on how complete the SIP implementation needs to be. If your application provides only simple calling features and does not require integration with legacy telephony, a possible alternative to using SIP is to implement a proprietary session establishment protocol. Some WebRTC services and platforms in the market today have taken this approach and created simple signalling through JSON or XML. In practice however, we have found that typical business use cases demand more complete call control which leads to reinventing the wheel to a large
extent and still leaves vendors with the inherent problem of incompatibility with SIP based business systems. 3.3 INTEROPERABILITY BROWSERS As of today Google Chrome, Mozilla Firefox and the Opera browsers have built-in support for WebRTC. That represents more than half of the browser installed base, but does not include Apple Safari and Microsoft Internet Explorer. Apple has not stated a position, but given the leveling effect of WebRTC, it is unlikely to join unless it is forced to do so. Microsoft has made moves (and introduced a similar but different standard called CU-RTC-Web), but it has a clear conflict of interest now that it owns Skype. Getting critical mass of applications is a major challenge for WebRTC to succeed without early support from two important platform/browser players. From the user perspective, the implementations of WebRTC across browsers already interoperate without problems (i.e. a user on Chrome can talk to a user on Mozilla, for example). Chrome and Mozilla represents more than half of the browser installed base, but a challenge for WebRTC is to win the support from Apple Safari and Microsoft Internet Explorer over time. For the developer however there are differences that need to be considered and dealt with. Since the WebRTC standards are still evolving, the API implementation is not 100% uniform among the browsers (e.g., vendor prefixes are added into API functions). Although this does not pose a major problem to Web developers who delve into WebRTC development, we have found that tools and modules such as adapter.js provide the necessary abstraction of these differences to simplify the development effort. In addition the completeness of the implementation of the WebRTC engine can differ so that features such as screen capture are (as of this writing) not available in all browsers. Application developers need to consider how they will deal with these differences in order to provide a clean user experience for their users. 4 WEBRTC PERFORMANCE & SCALABILITY 4.1 MONITORING MEDIA PERFORMANCE WebRTC is designed to minimize the need for user intervention including autonomously adjusting the parameters of the call to adapt to it in a best-effort basis. In a business application, performance requirements are typically more stringent. The application should monitor the connections and adjust the media stream to optimize user experience or at least notify the administrator when call quality degrades. Media quality can be monitored by using the WebRTC GetStats API, which provides information such as packet loss, available bandwidth and frame rate. Based on that, the WebRTC client can, for example:
Renegotiate the media and limit the bandwidth by changing the Session Description Protocol (SDP) parameter for bandwidth Request another media stream with lower resolution or using different constraints and renegotiate the media with the peer As in any media application, the performance of any in-line component of the system (such as a TURN relay server or media gateway) can affect the performance and quality of the connection and ultimately the user experience. We have found surprising differences in TURN server performance which can lead to unexpected quality issues so we recommend a high degree of care is taken in selecting this element. 4.2 DO I NEED AN MCU? The WebRTC design assumption is that media streams will travel directly peer-to-peer, without need for intermediation. For multi-user applications this implies a mesh topology where each participant is a peer to all others and sends/receives audio streams to each other participant. This is great in reducing the need for network infrastructure and minimizes latencies, but requires all end points to use the same codecs and have similar capabilities. Because a mesh topology requires every node handles multiple streams of audio/video, the CPU and bandwidth requirements at each node grow exponentially with the number of parties in the call. That is why pure WebRTC (or other consumer-oriented applications like Skype) cannot scale beyond a handful of parties. One alternative approach that works in some applications is to elect one of the nodes to take on the task of mixing audio streams and composing a single video stream to be sent to each other client. But, the CPU and bandwidth requirements at the central node increase proportionally to the number of parties in the call. This introduces other complications such as the election of the node to do the mixing and dealing with the eventuality of that node failing Business applications often require a larger number of users as well as support for a heterogeneous set of endpoints (e.g. IP phones, soft-phones, room video conferencing systems, etc.), using different codecs and with different capabilities (screen sizes, data, etc.). Many applications implement a Multipoint Control Unit (MCU) that centralizes the processing of all audio and video streams. The MCU, also referred to as Conference Bridge, is a central gateway in a multipoint videoconferencing system. Historically, an MCU required specialized DSP-based hardware capable of processing audio/video in real-time, but it is now possible to implement MCUs entirely in software running in standard hardware and cloud environments. The next diagram provides a summary and comparison of these three alternatives. Your use cases will dictate which option is right for you.
MESH STAR MCU Every peer receives the stream from all other peers. Pros: No need for external servicer to process media. The peer can select which stream to view or hear and change the video s layout. Cons: High CPU and bandwidth usage on all peers. Max number of users limited by client s hardware and bandwidth. One peer is selected as the server and will be responsible for mixing all media exchanged between all peers. Pros: No need for an external server to process media. Cons: Intelligence is on the client. High CPU and bandwidth usage on server peer. Use of an extra network element to transcode and mix audio and video streams from all participants. Pros: Low CPU and bandwidth usage. Max number of users is limited by the MCU. Cons: Requires an extra network element (server) to process media. Layout is defined by the MCU. 5 WEBRTC MOBILE On the mobile devices front WebRTC is supported in Android by Chrome, Mozilla, and Opera but for Apple IOS and Microsoft Windows, there is no native browser support for WebRTC. For mobile implementations of WebRTC applications on IOS and Windows devices the alternative to native browser support is to utilize an SDK/library that provides the WebRTC engine functionality and provides an API to allow for application development. Another major challenge that is very real today is the CPU and battery consumption (in mobile end points) incurred to encode/decode the VP8 video media. While encode/decode of H.264 has been supported in mobile chipsets for many years, there has been no such support for VP8 until recently. Encoding/decoding VP8 or other media format in CPU is a heavy process that leads to poor performance and heavy drain on the battery.
Figure 3- Source: http://www.webmproject.org The graph above shows the power consumption impact of VP8 decoding performance in an Android tablet based on the Rockchip RK2918. New System on a Chip (SoC) implementations are gradually becoming available that support hardware encoding/ decoding of VP8. A few examples: Qualcomm Snapdragon 800 nvidia Tegra 4 ST-Ericsson NovaThor LT9540 Intel Z3480 More recent phones and tablets, including for example Google Nexus 5, Samsung Note and Galaxy, Sony Experia, are already VP8 enabled in hardware. 6 CONCLUSION: WEBRTC IN BUSINESS APPLICATIONS WebRTC is a great step forward and promises to change the way we use our computers and smartphones to communicate in real-time. A ubiquitous and standardized RTC client across all devices and platforms not only democratizes the access to communications, but opens the doors for a wave of creative uses of voice and video in business and consumer applications. Because WebRTC design assumptions differ from the use cases of traditional business applications, delivering successful enterprise or carrier-grade applications using WebRTC presents significant challenges. Before embarking on a WebRTC project, evaluate solutions carefully to see how they will work in your environment and with your use cases.
7 GETTING STARTED WITH WEBRTC The best way to get started with WebRTC is by interacting with the technology and making real calls. In order to demonstrate some of the capabilities of WebRTC and how it can be integrated with legacy communications infrastructure Daitan created a simple application to integrate a WebRTC client with FreeSWITCH for typical cloud services deployments and with Clearwater IMS for carrier environments. This demo shows the possibility to seamlessly integrate WebRTC to existing communications infrastructure and support the transition from a world using proprietary endpoints to a world that embraces Internet technologies. To access the demonstration and make real calls from WebRTC, please visit http://daitangroup.com/webrtc.
ABOUT DAITAN GROUP Daitan provides highly reliable software development services. We partner with technology vendors to help them develop their next software solution in Telecom, Unified Communications and Cloud/Web solutions. We pioneered WebRTC implementation and some of our customers were first in the market with their business solutions supporting the technology. To find out more about what Daitan can do for you, please visit http://daitangroup.com.. All rights reserved. Other company and product names may be trademarks of their respective owners.