A Multilingual Video Chat System Based on the Service-Oriented Architecture

Similar documents
The BaBL project Real-Time Closed-Captioning for WebRTC. Luis Villaseñor Muñoz 30 th April 2014

Real-Time Communications for the Web. Presentation of paper by:cullen Jennings,Ted Hardie,Magnus Westerlund

Become a WebRTC School Qualified Integrator (WSQI ) supported by the Telecommunications Industry Association (TIA)

Blackboard Collaborate Ultra

Network Applications and Protocols

P2PSIP, ICE, and RTCWeb

Jigsaw Troubleshooting Tips

Instavc White Paper. Future of Enterprise Communication

This is a sample chapter of WebRTC: APIs and RTCWEB Protocols of the HTML5 Real-Time Web by Alan B. Johnston and Daniel C. Burnett.

How to use Video Conferencing & Desktop Sharing on Magnet Voice

The paper shows how to realize write-once-run-anywhere for such apps, and what are important lessons learned from our experience.

VoipSwitch User Portal for Rich Communiation Suite RCS features, HTML 5, WebRTC powered FOR DESKTOP AND MOBILES

November 2017 WebRTC for Live Media and Broadcast Second screen and CDN traffic optimization. Author: Jesús Oliva Founder & Media Lead Architect

WebRTC 1.0 Real-Time Communications in the Browser

Large-Scale Measurement of Real-Time Communication on the Web

COLLABORATE INTERFACE QUICK START GUIDE

Audio and Video Overview: Audio and Video Configuration. Ultra Experience: Audio and Video FAQ


Unified Communication and WebRTC

Building video apps on Salesforce Platform

WebRTC Lessons Learned SUCCESSFULLY SUPPORTING WEBRTC IN BUSINESS APPLICATIONS

An Efficient NAT Traversal for SIP and Its Associated Media sessions

Operating Instructions. User Guide

SIP AND MSRP OVER WEBSOCKET

White Paper Conquering Scalable WebRTC Conferencing

Real-time video chat XPage application using websocket and WebRTC technologies AD-1077

Video CONFERENCING COMPARISON GUIDE

mashme.io Room of the Future

Network Requirements

Alkit Reflex RTP reflector/mixer

Information Authors: Trevor Frese, Evan Crook, Britt Christy, Kevin Malta Team: Struct bylighting Project Name: Benedictation

WebRTC standards update (September 2014) Victor Pascual

Location Based Advanced Phone Dialer. A mobile client solution to perform voice calls over internet protocol. Jorge Duda de Matos

Oracle Communications WebRTC Session Controller

Grandstream Networks, Inc.

LISTENING BY SPEAKING

Network Requirements

Web Conferencing in Canvas

Quick Start Guide: Software

Join an OmniJoin Meeting as an Attendee (Mac and PC)

Any mobile device with internet access

Yealink Meeting Server (YMS)

Frequently Asked Questions

Audio Streams Merging Over ALMI

Journal of Information, Control and Management Systems, Vol. X, (200X), No.X SIP OVER NAT. Pavel Segeč

Janus: back to the future of WebRTC!

Chapter 2. Application Layer. Chapter 2: Application Layer. Application layer - Overview. Some network apps. Creating a network appication

Getting Started with Zoom

Integrating Mobile Applications - Contrasting the Browser with Native OS Apps. Cary FitzGerald

Introduction to the Application Layer. Computer Networks Term B14

AnyMeeting Instructions

WebRTC video-conferencing facilities for research, educational and art societies

nanostream WebRTC.live

Making Meeting Simpler

X-Communicator: Implementing an advanced adaptive SIP-based User Agent for Multimedia Communication

TechNote AltitudeCDN Multicast+ and OmniCache Support for Citrix

Realize Reader ios Mobile App Version User Guide

Security Statement Revision Date: 23 April 2009

Getting Started for Moderators Quick Reference Guide

Cloud UC. Program Downloads I WOULD LIKE TO... DOWNLOADING THE CLIENT SOFTWARE

Spontania User Setup Guide

2. Zoom Video Webinar runs on Windows, macos, Linux, Chrome OS, ios, Android, and

Collaborate Ultra. Presenter Guide for D2L Brightspace. University Information Technology Services

argusoft A66, Sector 25, GIDC Electronic Estate, Gandhinagar, Gujarat , India Dumbarton Court, Newark, CA 94560, USA

Virtual Media & Entertainment House

ZOOM Video Conferencing: Quick Start Guide

Blackboard Collaborate Classroom in D2L Brightspace Daylight Experience

DESIGN AND IMPLEMENTATION OF SAGE DISPLAY CONTROLLER PROJECT

Minimum Requirements to Operate the Moderator and Client Modules

Chat and Instant Messaging

Realize Reader Windows App. User Guide

Solution Sheet. The Acano solution. March 2016

Quickstart Guide to Setup your Zoom Account To start, visit the USF Zoom site:

Cisco Spark Widgets Technical drill down

Requirements. System Requirements

FIREFLY ARCHITECTURE: CO-BROWSING AT SCALE FOR THE ENTERPRISE

2. Zoom Video Webinar runs on Windows, macos, Linux, Chrome OS, ios, Android, and

CDW LLC 200 North Milwaukee Avenue, Vernon Hills, IL

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 4

Oracle Communications WebRTC Session Controller. WebRTC Session Controller Features

Google Hangout Lin Zhong

Janus: a general purpose WebRTC gateway

Release Notes: Blue Jeans 2.9.5

Quickstart Guide to Setup your Zoom Account To start, visit the USF Zoom site:

CDCS: a New Case-Based Method for Transparent NAT Traversals of the SIP Protocol

Interactive Distance Learning based on SIP

Training Module. 1.0 Getting Started with Google+ Hangouts (Teacher)

LifeSize Multipoint Extension Administrator Guide

Cisco Collaboration Meeting Rooms Cloud

Is there still a place for peer-to-peer in the DICOM world?

SOLUTIONS: TECH TIP. How to Set Up Skype with the Attero Tech undusb in a Symetrix conferencing system

Alcatel-Lucent OpenTouch Conversation applications

Cisco Unified Videoconferencing Manager 5.7

Administrator s Guide

Department of Computer Science. Burapha University 6 SIP (I)

Version 12. Mobile User's Guide

MySip.ch. SIP Network Address Translation (NAT) SIP Architecture with NAT Version 1.0 SIEMENS SCHWEIZ AKTIENGESELLSCHAFT

MeetingPlaza Version 8.0

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Distributed Computing on Browsers

Transcription:

2017 IEEE Symposium on Service-Oriented System Engineering A Multilingual Video Chat System Based on the Service-Oriented Architecture Jayanti Andhale, Chandrima Dadi, Zongming Fei Laboratory for Advanced Networking University of Kentucky Lexington, Kentucky 40506, USA Emails: jran229@g.uky.edu, cda232@g.uky.edu, fei@netlab.uky.edu Abstract The use of video chat and video conference applications is ubiquitous, especially in this era of wireless and mobility. However, a video chat system that allows communicating parties to speak/type in different languages has not been widely used. In this paper, we propose a service-oriented architecture to develop a multilingual video chat system that supports people speaking/texting in different languages. It uses the Web Real Time Communication (WebRTC) technology and takes advantage of the services available on the Internet, including Google Web Speech API, Google Transliterate API, and Microsoft Translator. It is a browser based solution that allows users to connect from various platforms, such as Windows, Linux, or Mac. Since the application uses WebRTC, the user does not have to download and install any plugins. The service-oriented architecture design based on WebRTC allows us to develop and implement the whole system in a short period of time. Keywords-video conferencing; multilingual; translation; social networking I. INTRODUCTION Video conference applications have been widely used in work place and our daily life. They bring us closer and enable us to see each other even when we are located in different places. They give us a look-and-feel of face-to-face meeting and greatly shorten the distance between different people. Most of the existing video conferencing applications, such as ichat [1], Google Hangouts [2], and Skype [3] need to download native applications (apps) or plugins. These plugins or downloads can sometimes be difficult to install. In addition, the users are required to constantly update and maintain the up-to-date version. Recently the Web Real Time Communication (WebRTC) technology [4] has attracted attention because it enables in-browser communications by integrating voice and video directly in the browser without the need for any download or any plugin. The Web API provided can be easily used to develop web-based real-time applications. Another issue with most existing video conferencing applications, including those WebRTC based applications, such as OpenTok [5], Vline [6] and GoInstant [7], is that they require users from the same virtual room to use a common (natural) language for communication. One of design goals of this project is that we want to enable people speaking different languages to use the application for video chat and text messaging. To achieve the goal, we adopt a service-oriented architecture for the system design. Instead of implementing all the functions related to facilitating multilingual communications from scratch, we take advantage of the services available out there in the Internet and invoke these functions in our multilingual video chat application. In particular, we made use of Google Web Speech service, Google Transliterate service, and Microsoft Translator in our system. Our experience with the WebRTC technology and service-oriented architecture highlights some challenges and lessons learned in the implementation. The approach allows us to develop and implement the whole system in a short period of time. The rest of the paper is organized as follows. Section II presents the design of the service-oriented architecture of the multilingual video conference system. Section III illustrates the steps for establishing media and data channels between browsers. Section IV describes implementation of the modules in the system. Section V discusses related work and Section VI concludes the paper. II. THE SYSTEM DESIGN BASED ON THE SERVICE-ORIENTED ARCHITECTURE As mentioned before, the goal of the multilingual video conference/chat system (called MLChat) is to enable people speaking different languages to video conference/chat with each other directly. Each person can select the language of his/her preference and use the language in speaking in the video call and typing in the text messages. If two persons A and B using different languages communicate with each other using MLChat, the video displayed at A will be captioned with A s language. Ideally, we would like the audio to be translated and synthesized in A s language. Similarly, the language displayed in the text messages at A will be in A s language, even if B uses a different language. The situation on B s side will be similar, i.e., in B s language. The design of the multilingual video conference/chat system is based on the service-oriented architecture, as shown in Figure 1. When two users want to communicate via video conferencing/video chat, they get into the system from their respective browsers. The WebRTC provides basic functions 978-1-5090-6320-8/17 $31.00 2017 IEEE DOI 10.1109/SOSE.2017.17 126

the video and audio engines render received media data back. These engines contain audio codec, video codec, text codec and other components essential for encoding and decoding of audio/video data and processing text messages. Several techniques have been used to improve the quality of voice and video playback, including equalizer for audio, image enhancement techniques for video and echo cancellation. The WebRTC uses real time protocol (e.g., RTP [8], or SRTP) for sending audio and video data using UDP. These real time protocols contain timing information and their associated control protocols specify information like media codec, frame rate, and bit rate, etc. Besides WebRTC APIs, signaling is one other important task when establishing the browser-to-browser communication. Signaling is used to exchange control messages about the communication channels. Signaling methods and protocols are not specified by WebRTC and are not a part of the WebRTC API. The functionalities are implemented by the signaling server. Figure 1. The Architecture of the System for video/audio communications between browsers. It is distributed as a part of the browser code. Many popular browsers like Google Chrome and Mozilla Firefox support WebRTC. MLChat uses HTML5 and JavaScript APIs provided by WebRTC to capture user media, establish a peerto-peer connection and establish a data channel between browsers. To establish the peer to peer connections, MLChat implements a signaling server that facilitates the process. It is co-located with the web server that serves the application pages to the browsers. The HTML5 Web Speech service from Google is used to capture the voice data, which will be sent over to the server and translated to the language of the receiving side. Then it will be displayed as captioning in the language of the receiving side. The Web Speech service is invoked by each browser independently. The translation support is achieved by using the Microsoft Translator service and is invoked by the web/signaling server. The Google Tranliteration service is also used to support users to type in the selected script. A. Web Browser and WebRTC The web browser is the portal and the interface for users. To start the system, the user just loads the page using the URL provided. One of the key components in the browser is the WebRTC, which provides an abstraction layer and APIs that allow developers to ignore the low-level details about dealing with voice, video and cross-browser communication. The WebRTC defines an abstract session layer that performs session management, including conference initiation and conference management. The video and audio engine modules capture media contents from camera and microphone, respectively, and send them to the application. In addition, B. Browser to Browser Media Channel We use web APIs provided by WebRTC and the signaling server to establish browser to browser media channel and data channel. One important feature of WebRTC is that it provides a way for traversing NAT and firewall. When two browsers are behind the NAT and/or firewall, they will still be able to communicate with each other. We used the three APIs provided by WebRTC for developing the MLChat system, i.e., getusermedia, RTCPeerConnection and RTCDataChannel. We use getusermedia Javascript API and HTML5 elements to obtain the local media stream. We use RTCPeerConnection API to instantiate RTCPeerConnection object at each browser and start the signaling for establishing browser to browser communication. We use RTCDataChannel for the transmission of other data such as video captioning and text messages. After a successful signaling process, a media channel and a data channel between browsers will be established as shown in Figure 1. The media channel will be used for exchange of audio and video data between browsers and hence will provide video chat feature for the application. The data channel can be used for communication of other data. C. Application/Signaling Server We implement the server to perform the functions of both application server and the signaling server. As an application server, it serves the HTML file for the application to the browsers after the users load the page. As a signaling server, it is used as the forwarding engine for the signaling process when local media streams are active in the browser. D. Google Web Speech and Transliteration Engine We use the HTML5 Web Speech API from Google for voice recognition (speech to text). It allows continuous 127

Figure 2. MLChat Peer-to-Peer Media Channel Setup speech dictation and can record recognized speech in a text format. Because the session limit of the Speech recognition API is 60 seconds, the application has to restart the Speech Recognition every 60 seconds. With the help of the established data channel, we can provide Real Time Captioning feature for video conferencing. If the preferred languages are different for communicating parties, we can provide the captioning in the language chosen by the receiver with the help of the translation service discussed below. We use the HTML5 transliteration API from Google to transliterate the text entered into the specific text area in a script selected by the user. It can only handle UTF-8 text and uses the dictionary-based phonetic transliteration approach. It provides the transliteration typing service for Chinese, Hindi and other languages. E. Microsoft Translator We use Microsoft translator to translate text between different languages. It is invoked by the application/signaling server because it is a subscribed service. Microsoft Translator API is a cloud-based automatic translation service supporting multiple languages. When a browser gets the text chats or text from the speech recognition service, it will send to the application server. After the translation, the content in the selected language will be sent to the receiver for displaying. III. ESTABLISHING THE MEDIA CHANNEL One of the key steps in starting the video conference is to establish the media channel between two browsers using the signaling process. The application follows steps illustrated by Figure 2 for this process. Browsers can be either Google Chrome or Mozilla Firefox. During the process, we set up the channel using Interactive Connectivity Establishment (ICE), which employs Session Traversal Utilities for NAT (STUN) and Traversal Using Relay around NAT (TURN) to traverse the NAT and Firewalls. 1: Browser 1 loads MLChat URL. 2: Browser 2 loads MLChat URL. 3: Browser 1 loads local user media. 4: Browser 2 loads local user media. 5: Browser 1 instantiates peer connection object and prepares the call offer using Session Description Protocol (SDP). It contains the supported configuration for the session, i.e., the description of the local MediaStreams. 6: The offer is sent to the signaling server. 7: The signaling server forwards the offer to Browser 2. 8: Browser 2 instantiates the peer connection object and sets its local configuration based on the received offer. It prepares answer using SDP. It contains supported configurations for the session that is compatible with the parameters in the offer. 128

Figure 4. MLChat: User Local Media Figure 3. MLChat System 9: The answer is sent to the signaling server. 10: The signaling server forwards the answer to Browser 1. 11: Browser 1 sets its local configuration based on the received answer. Based on the type of connectivity discovered by running tests using STUN and TURN servers, Browser 1 calls startice() method of peer connection and prepares ICE candidate. 12: The ICE candidate is sent to the signaling server. 13: The signaling server forwards the ICE candidate to Browser 2. 14: Browser 2 adds the received candidate to the current session and prepares its own ICE candidate. 15: The ICE candidate is sent to the signaling server. 16: The signaling server forwards the ICE candidate to Browser 1. 17: Browser 1 adds the received candidate to the session. 18: Browser 1 formulates the best solution and tries to establish a connection with Browser 2. If successful, Browser 1 and Browser 2 establish the media channel between them. After the media channel is established, the audio and video streams will be sent from one browser to the other as shown in Figure 1. During the process, both STUN server and TURN server are used as illustrated in Figure 3. STUN servers are used to find the public IP address of the browsers, if they are located behind NAT boxes. The TURN server is used as a fallback mechanism for browsers that cannot establish a Peer to Peer connection. It is used as a media proxy server between the browsers. The RTCDataChannel will be established by using the media channel. It provides the text messaging feature to the application and transfers data captioning for the video content. IV. IMPLEMENTATION OF THE MULTILINGUAL VIDEO CHAT SYSTEM MLChat uses several APIs, JavaScript libraries and packages. The server runs in Node.js environment and uses express development framework. Additionally, the server uses socket.io library for secure communication and listens on port 8443. The messages sent and received during the RTCPeerConnection establishment phase are exchanged using the server. When the user starts the MLChat system and loads the page, the screen shown in Figure 4 will appear. The user can click on Start video button to request for access to the microphone and the camera. The user has to explicitly give the permission to the application to use these devices to capture the media data for privacy reasons. The browser uses navigator.getusermedia() API provided by WebRTC to get the video and the audio streams. Each MediaStream has one input and one output. The input can be the MediaStream generated by navigator.getusermedia(). The output stream can be the one sent by the other peer. After the browsers obtain local media data using getuser- Media API, they will establish peer connection with each other and exchange its session, network and media information as we described before. They uses ICE framework to find the other peer via STUN server and may use TURN server as a proxy for transferring media data. In addition to provide basic functions of video conferencing using WebRTC, MLChat also aims to provide additional supports, especially for users speaking different languages. As a starting point, we implement a real time captioning function. Using HTML5 speech-to-text API provided by Google Chrome browser, the speech is recorded and transferred to the other side so that the real-time captioning can be displayed along with the video content. Figure 5 shows the short captioning under the received video window (big window). In particular, MLChat allows the user to select the language of his/her preference. In Figure 5 the user selected Hindi in the selection box. The transcribed text from the peer will be sent to the server, which will then be translated to Hindi and sent to the browser. During the process, the application uses Microsoft Translator for translation. As shown in the figure, the caption is displayed in Hindi, even though the other user may speak a different language. When entering text messages for sending to the other user, the MLChat uses Google s Transliteration API to allow the user to type in the script of the language the user has selected as shown in Figure 6. The transliteration API supports multiple destination languages such as Arabic, Chinese, Greek, Hindi, Russian, Urdu, Serbian and Persian. If the preferred language selected by the user is not one of the 129

Figure 5. MLChat: Translated Subtitles Figure 6. MLChat: Translation and Transliteration for text messages supported languages by Transliteration API, then the text message can be typed in English. Similar to captioning, the text messages will be sent to the server and translated to the language of the other side. In the figure, the user just types in the Hindi language selected. When the browser receives a text message from the other side, it will be displayed in Hindi, even if the other side uses a different language. V. RELATED WORK Over the decades, we have seen the increasing demand for Real Time Communication (RTC) [9], such as video conferencing. These applications have been used in distance learning, business meetings, and social networking sites. Distance learning allows students to take classes remotely, saving the time and travel. In business, these online applications allow employees from different parts of the world to share their ideas and collaborate with each other without the need to be physically present in the same place. Furthermore, the social networking sites allow friends and families to get together virtually and share their thoughts. Some popular real time applications are ichat [1], Google Hangouts [2] and Skype [3]. ichat is a video chat application from Apple, and is included in the latest Mac Operating System as a built-in plugin. Google Hangouts provides support for video conferencing, desktop sharing and instant messaging. However, similar to ichat, Google Hangouts requires the user to install the Google talk plugin. Skype also allows instant messaging and video conferencing, but the user has to install the application. We have seen that some web applications start using WebRTC [4] for real time communication, such as Appear.in [10], OpenTok [5], vline [6], Bistri [11], GoInstant [7], GetOnSIP [12] and 1Click [13]. These applications implemented browser-based video conferencing, file sharing and/or instant messaging for a group of people. However, they did not provide multilingual support. A closely related work is the multilingual chat system [14], which supports multilingual chat and focuses on using images automatically generated at the sender side to detect mistranslation. However, it only supports text chat without implementing video chat function and is a traditional stand-alone system rather than a web-based system. VI. CONCLUSION In this paper, we developed a video chat application that is plugin free and platform independent. The application allows two users from any part of the world to communicate using their own preferred languages. The service-oriented architecture greatly facilitated the design and development of the system. 130

ACKNOWLEDGMENT This research work was supported in part by a grant from the Kentucky Science and Engineering Foundation as per Grant Agreement #KSEF-148-502-16-394 with the Kentucky Science and Technology Corporation. REFERENCES [1] Apple ichat, https://www.macupdate.com/app/mac/12174/ apple-ichat. [2] Google Handouts, https://hangouts.google.com/. [3] Skype, https://www.skype.com/en/. [4] WebRTC, https://webrtc.org/. [5] The OpenTok Platform - A Cloud Platform for Embedding, http://www.tokbox.com. [6] Vline, http://blog.vline.com/. [7] GoInstant, https://github.com/goinstant. [8] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: A transport protocol for real-time applications, RFC 1889, 1996. [9] S. Casner and S. Deering, First IETF internet audiocast, ACM Computer Communication Review, pp. 92 97, July 1992. [10] Appear.in, https://appear.in/. [11] Bistri, https://bistri.com/. [12] GetOnSIP, https://www.onsip.com/getonsip. [13] 1ClICK, https://1click.io/. [14] E. Hosogai, T. Mukai, S. Jung, Y. Kowase, A. Bossard, Y. Xu, M. Ishikawa, and K. Kaneko, A multilingual chat system with image presentation for detecting mistranslation, Journal of Computing and Information Technology, vol. 19, no. 4, pp. 247 253, 2011. 131