Ubiquitous Programmable Internet Telephony End System Services

Size: px

Start display at page:

Download "Ubiquitous Programmable Internet Telephony End System Services"

Leonard Fleming
5 years ago
Views:

1 Ubiquitous Programmable Internet Telephony End System Services Xiaotao Wu Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2006

3 ABSTRACT Ubiquitous Programmable Internet Telephony End System Services Xiaotao Wu Telecommunication networks are moving from circuit-switched public-switched telephone network (PSTN) to packet-switched Internet telephony. A major difference between Internet telephony and PSTN is that the PSTN usually assumes dumb endpoints while Internet telephony incorporates intelligent endpoints. In Internet telephony, endpoints usually have CPU and memory, so they are programmable and can perform services such as call forwarding, transfer, and screening. In addition, peer-to-peer (P2P) technologies introduce telecommunication networks that do not need proxy or application servers to make calls. In such P2P networks, many telecommunication services have to be performed on end systems, such as intelligent phones. The enhanced capabilities of end systems and the service requirements in P2P networks motivate the investigation of end system services in this thesis. Performing services in end systems may result in many new communication services, make telecommunication services more distributed, and make telecommunication networks more robust and efficient overall. At the same time, telecommunication services may become more difficult to manage, thus requiring new techniques for creating and composing services.

4 Because the PSTN assumes dumb endpoints, most existing service research on PSTN focuses on services in network servers and is not well-suited for end system services. Therefore, it is important to conduct research specifically for end system services, such as defining end system services, developing efficient and user-friendly tools for creating end system services, managing end system feature interactions, and integrating end system services with other Internet services, such as web, , location-based services, and networked appliance control. In this dissertation, I first analyze the difference between end system services and network services. The analysis shows that they differ in call models, targeted service creators, and call control actions. Because of these differences, I define a new scripting language called the Language for End System Services (LESS) specifically for describing end system services. LESS is designed to allow comparatively inexperienced end users to create services. Because inexperienced end users could possibly create services which may have harmful side effects, I limit the functionality of LESS to do nothing more complex than describing Internet telephony services. One important feature of LESS is that it uses a tree diagram to represent telecommunication services because the tree diagram provides many benefits for creating services, such as safety, ease of analysis, and enabling the back-and-forth translation between graphical and textual representations of services. In addition, it also makes feature interactions among multiple LESS scripts easy to handle. Based on the tree diagram, I propose a method which uses action conflict tables and a tree merging algorithm to handle feature interactions. Once a potential feature conflict is detected, my solution can also help to resolve the detected conflict. The tree diagram also suggests a way to automatically generate services by using

5 decision tree induction. Services that are automatically generated, based on communication behaviors, can be a great help to end users who may not be aware of available services or do not know how to customize or create their desired services. I have built a service learning system that uses the Incremental Tree Induction (ITI) algorithm to automatically generate services. The generated services can be represented as decision trees for call handling and the decision trees can be translated to LESS scripts. I noticed one drawback of auto-generated services, namely, that they may sometimes perform unwanted actions that can cause users to lose calls, spend more money than necessary, or sacrifice privacy. I refer to this undesirable outcome as service risks. The service learning system I developed contains some preliminary work on avoiding or reducing service risks. To evaluate how useful LESS is for end users, I conducted a survey on service creation by end users. The survey shows that relatively inexperienced users are willing and capable to create their desired services, and our LESS-based service creation tool Columbia University Telecommunication service Editor (CUTE) fits their needs. In addition, the survey shows that many service participants can easily understand LESS source code. The survey also reveals that users can understand feature conflicts and would like to resolve feature conflicts based on the choices provided by service creation tools, such as CUTE. Beyond end system service creation research, I also investigated location-based services in Internet telephony, especially on Internet telephony end systems. In this dissertation, I first analyze location-based services in Internet telephony. Following the analysis, I introduce the implementation of the location-based services in our lab environment, as well as the prototype implementation on emergency call handling in SIP-

6 based Internet telephony systems. In addition, I also discuss our SIP-based global-scale ubiquitous computing architecture. The architecture uses the Service Location Protocol (SLP) to find available resources based on location information, then uses the SIP third-party call control architecture (3PCC) to control available resources. To test the hypotheses of my research, I have implemented a SIP user agent, SIPC. SIPC contains basic SIP functions and an end system service execution environment. It also supports many other Internet functions, such as service discovery, event notification, networked appliance control, instant messaging, and multicast media streaming based on the Session Announcement Protocol (SAP). Multiple functions in SIPC can interact with each other to provide new services. This dissertation also discusses different ways for multi-function integration and interaction in SIPC.

7 Contents List of Tables viii List of Figures x Acknowledgments xiv Chapter 1 Introduction Background and related work Internet Telephony Telecommunication services and service creation Session Initiation Protocol (SIP) SIP for Instant Messaging and Presence Leveraging Extensions Call Processing Language (CPL) Location-based services Ubiquitous computing Other Internet protocols related to my thesis End system services i

8 I The Language for End System Services (LESS) 30 Chapter 2 Motivation and design strategies of LESS Design strategies of LESS Requirements for LESS Design strategies Comparing LESS with other service creation languages Chapter 3 LESS definition High level abstraction Grammar, types, and variables Grammar and types Variables Program execution Basic LESS elements Triggers Switches Modifiers Actions Subactions LESS extensions Media handling extension Mid-call handling extension User interaction extension Instant messaging extension

9 3.6.5 Event handling extension Queue handling extension Chapter 4 Using LESS to program communication services Services for Q.1211, 5ESS, and CSTA phase III Using LESS to program conferencing services Program presence-enabled conferencing services Presence aggregation Presence aggregation with location-based services Chapter 5 Service creation tools for LESS Web-based service creation GUI-based service creation Two-stage service creation Chapter 6 Evaluating LESS The simplicity and safety of LESS Simplicity Safety Summary A survey of service creation by end users The willingness of users to create services The capability of users to create services Evaluating CUTE Evaluating LESS Services of interest

10 6.2.6 Handling feature conflicts Summary II Handling feature interactions in LESS 107 Chapter 7 Handling feature interactions in LESS Related work Feature interaction detection in LESS End system call control actions End system presence and event notification actions Other end system services Feature interactions between caller s preference, end system s capabilities and users service scripts Using tree-merging to detect and resolve feature interactions Tree merging algorithm Feature interactions caused by multiple triggers Implementation Conclusion and future work III Service learning and service risk management 141 Chapter 8 Service learning and service risk management Representing users communication behaviors Decision tree learning Criteria on choosing a learning algorithm

11 8.2.2 Incremental Tree Induction (ITI) algorithm Accuracy of the ITI algorithm Performance of the ITI algorithm Using LESS to represent learned results Service risk management Identify service risks Analyze service risks Resolving risks Implementations Conclusion and future work IV Location-based services and ubiquitous computing using SIP163 Chapter 9 Location-based services in Internet telephony systems Introduction Location description and detection Location-based services Sending location information to remote parties for location tracking Making communication decisions Location-triggered actions Resource discovery Treat a location as a communication entity Extending LESS for location-based services Implementations Location-based services in SIPC

12 9.5.2 Location-based services demo in our lab environment Emergency services architecture and prototype Conclusion Chapter 10 Ubiquitous computing using SIP Related work System architecture Resource discovery and control Resource discovery using SLP Resource control using SIP, HTTP and SOAP Call Control Event-Triggered Actions Events Using back-to-back user agent (B2BUA) to control resources Access Control Privacy Service examples Conclusion V SIPC a multi-function programmable SIP user agent 197 Chapter 11 SIPC, a multi-function SIP user agent Introduction New services introduced by multi-function integration Setting up a favorable communication environment

13 Call handling based on presence information Using networked resources Location sensing and location-based services Multimedia session sharing Voic handling Conference floor control with active talker indicator How to integrate multiple functions Functions integrated in SIPC Overlap among SIPC functions Interaction among SIPC functions Approaches for multi-function integration Programming multi-function interactions Implementation Conclusion Chapter 12 Conclusions and Future Work 217 VI Appendix The XML Schema for LESS XML Schema for LESS

14 List of Tables 1.1 CPL switches CPL operations User-interested services Call control actions The context assumption and expected result of call control actions Call control action conflict table for handling incoming trigger Interactions between services on end systems and proxy servers for incoming call handling Interactions between services on end systems and proxy servers for outgoing call and call termination handling Action conflicts between two end systems for incoming call handling More complicated action conflicts between two end systems Context assumption and expected results of lamp control actions Interactions between lamp control actions LESS actions may cause loss The characteristics of different communication methods viii

15 9.1 Differences between two approaches Functions in SIPC

16 List of Figures 1.1 (a) Routing model in Internet telephony and (b) Routing model in the PSTN Call states in CCXML SIP entities SIP event notification architecture Graphical representation of a CPL script Call models of network services and end system services Call decision making process in LESS Address-switch example An example of the tree-like structure for trigger handling LESS script example Using C-like pseudo code to program a service (a) Call states for an incoming call (b) Call states for an outgoing call Relationship among LESS elements LESS script execution Call movement scenario x

17 4.2 Call movement signaling flow Presence-enabled conferencing script Simple event aggregation script Browser-based LESS service creation Choose switches or actions Columbia University Telecommunication service Editor (CUTE) Constructing a call handling decision tree Two-stage LESS service creation Users willingness to create services Service creation for scenario Service creation for scenario Service creation for scenario Using CUTE to create services Understanding LESS source code Being aware of feature conflicts Detecting feature conflicts Handling feature conflicts Merging two trees to get a preferable service Sample script for defining decision rules Tree merging process Checking two rules conflict or not Determining overlap between two rule paths Detecting feature interactions among scripts with different triggers

18 7.7 SIPC s service manager Resolve feature interactions Finding the scripts to merge Configuration dialog Events and actions log Service decision tree Two different trees representing the same data set Training time of the ITI algorithm Location detection Displaying the locations in contact list Displaying the locations on a floor map Display geographic location on a local map Location-based communication services prototype Emergency call handling architecture Ubiquitous computing system architecture SIP-based ubiquitous computing in a hotel Session setup with using the visited domain camera Overlap among SIPC functions Interaction among SIPC functions Function set in SIPC Main user interface of SIPC People menu of SIPC Status menu of SIPC

19 11.7 Action menu of SIPC Internet TV in SIPC SIPC networked appliance control

20 Acknowledgments First of all, I must express my sincere and foremost gratitude to Prof. Henning Schulzrinne, for his insightful guidance, brilliant ideas, and tremendous efforts for my research. I will be forever indebted to him. Also, I am delighted to have been a member of the Internet Real-Time (IRT) Laboratory headed by Prof. Schulzrinne. All IRT members are talented. I have learned a lot from them through our collaboration and weekly group meeting. I would also like to thank SIPQuest for financially supporting my research. Last but not least, I wish to thank my family members, especially my wife, Shan Chen, for her love, encouragement and support during my Ph.D. study. xiv

21 1 Chapter 1 Introduction Telecommunication networks are moving from the circuit-switched public switched telephone network (PSTN) to packet-switched Internet telephony. In Internet telephony, telecommunication services can be executed not only on network servers, but also on endpoints, such as intelligent phones. Performing services on end systems introduces many new communication services, makes telecommunication services more distributed, and makes telecommunication networks more robust and efficient overall. At the same time, it may also make telecommunication services more difficult to manage, thus requiring new techniques for creating and composing services. Because most existing service research on PSTN focuses on services in network servers and is not well-suited for end systems, it is important to conduct research specifically on end system services, which is the main focus of this dissertation. This dissertation aims at three topics for end system services: service creation, location-based services, and integrating multiple functions in a Session Initiation Protocol (SIP) user agent. The dissertation is structured as follows: In the rest of the introduction, Chapter 1.1 provides background knowledge and

22 2 related work to this dissertation. Chapter 1.2 defines end system services. Following the introduction is the first topic of my dissertation end system service creation. Part I, Part II, and Part III address different aspects of end system service creation. Part I discusses the differences between end system services and network services, then presents the Language for End System Services (LESS) [140], which I defined to describe end system services. For safety, ease of analysis, and enabling the back-and-forth translation between graphical and textual representations of LESS scripts, LESS uses a tree diagram to represent telecommunication services. The tree diagram makes it easy to handle feature interactions among LESS scripts. Part II introduces a tree-merging algorithm to detect interactions among LESS scripts. The algorithm is based on the LESS action conflict tables, which I carefully defined for analyzing LESS-based feature interactions. Once a feature interaction is detected, the algorithm can clearly identify the conditions that may cause the interaction, and I have developed a service management system that can guide users to resolve detected feature conflicts. The LESS tree diagram also suggests a way to automatically generate services by performing decision tree induction. The induction is based on users communication behavior data, which can be collected by users endpoints. The inducted decision trees can be converted to LESS scripts to represent the generated services. Automatically generated services can help users to better understand what they need because often users are not aware of available services or do not know how to customize or create their desired services. However, automatically generated services may sometimes perform undesired actions that can cause users to lose calls, spend more money than necessary, or sacrifice privacy. I name these undesired actions service risks. Avoiding or reducing

23 3 service risks is critical to make a service learning system usable. Part III discusses both service learning and service risk management. The second topic of my dissertation is location-based services, which are discussed in Part IV. This part presents different work I joined for location-based services, including analyzing location-based services in Internet telephony, implementing location-based services in our lab environment, designing and developing a SIP-based ubiquitous computing architecture, and prototyping an emergency call handling system. The third topic of my dissertation presents a SIP user agent, SIPC [64], which I implemented as a testbed for my research work. I integrated many different functions and protocols in SIPC, such as networked appliance control, real-time multimedia streaming, instant messaging, handling presence information, networked resource discovery, third-party call control, handling multicast session announcement, location sensing, emergency call handling, and conference floor control. Part V introduces SIPC and discusses the advantages of integrating multiple functions in one user agent. Chapter 12 concludes my dissertation. 1.1 Background and related work Internet Telephony Internet Telephony (also called Voice over Internet Protocol (VoIP), or IP Telephony) originally refers to carrying voice conversations over the Internet or any other IP-based networks. The term is now used more generally for transporting multimedia streams, not only voice streams, but also video, text, white board, and application sharing data streams.

24 4 One of the advantages of Internet telephony is reduced cost. While there is a cost to consumers for subscribing Internet service, using Internet telephony over the existing Internet service usually does not involve any extra charge. Hence, Internet telephony phone calls are widely regarded as free due to using a single network to carry both voice and data. Besides cost saving, Internet telephony also introduces many new communication services with the integration of other Internet services such as , web, presence, instant messaging, networked appliance control, and directory services. Traditional telephony services, such as call forwarding, transfer, and screening can be enhanced by the integration. An Internet telephony call usually consists of two parts: call signaling and media transmission. Call signaling part uses call signaling protocols, such as the Session Initiation Protocol (SIP) [106] or H.323 [62], to establish and maintain a call session. Media transmission part usually uses the Real-Time Protocol (RTP) [117] to transmit media packets ( RTP is a protocol provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services [117]. ). In Internet telephony, both call signaling and media streams flow over a packet-switched IP-based network, instead of dedicated, circuit-switched voice transmission lines, which are used in the traditional public switched telephone network (PSTN). IP-based network does not provide quality of service guarantees, while multimedia conversations require real-time data transmission. To ensure low latency of media streams while still applying call routing services, in Internet telephony, media streams are often transmitted end-to-end, but call signaling goes through service providers servers,

25 5 as shown in Figure 1.1(a). This makes end systems the only entities where signaling and media flows converge. It is different from PSTN in which both signaling messages and media streams traverse service providers servers, as shown in Figure 1.1(b). signaling phone proxy media signaling phone signaling phone switch media signaling phone (a) (b) Figure 1.1: (a) Routing model in Internet telephony and (b) Routing model in the PSTN Telecommunication services and service creation Both PSTN and Internet telephony provide a very basic service to connect two endpoints and establish a call session. In PSTN, this basic service is referred to as POTS (Plain Old Telephone Service). In Internet telephony, we can use the same term for a simple end-to-end call setup. In addition to POTS, telephony systems usually provide many other functionalities to provide convenience to people. The additional functionalities are usually called services orfeatures. A service is viewed as an alternative to POTS for call setup. For example, for an incoming call, POTS alerts the callee s device, but call blocking service (CB) rejects the call automatically without alerting. Internet telephony should provide new telecommunication services and make service creation and deployment more efficient. First of all, Internet telephony must provide comparable services to those in PSTN. Furthermore, Internet telephony can provide new services by integrating existing Internet services. The service architecture of Internet telephony should be open so people can easily integrate more services as Internet evolves. Besides, services in Internet telephony should be distributed to make the whole

26 6 service network robust and efficient. In addition, it is desirable to have a user friendly service creation environment not only for professional service developers, but also for comparatively inexperienced end users. Many efforts [103] [39] [92] [57] [150] have explored service creation and execution in Internet telephony systems. Among these efforts, we are more interested in the ones using open standards because we expect distributed services. Services following open standards can better ensure service portability. There are several standard organizations or forums providing open standards for telephony service creation, such as the International Telecommunication Union Telecommunication Standardization Sector (ITU-T), the Internet Engineering Task Force (IETF), the Java API for Integrated Networks (JAIN) Forum, the European Computer Manufacturers Association (ECMA), and the World Wide Web Consortium (W3C). Below I use call blocking (CB) as an example service to discuss these standards. In my examples, CB blocks calls from sip:tom@example.com. JAIN SIP Servlet API The program below shows how to use the SIP Servlet API [95], which is defined by the JAIN Forum, to program the CB service. The program retrieves the From header value from an incoming SIP request. If the value matches sip:tom@example.com, it will send a 403 Forbidden response to the caller. SIP Servlets are language and protocol specific. Service programmers must know Java programming and understand SIP to use SIP Servlets API. public void doinvite(sipservletrequest a_request) throws ServletException, IOException

27 7 { // Establish session state. SipApplicationSession appsession = a_request.getapplicationsession(); SipSession sipsession = a_request.getsession(); // Act as a terminating user agent and return // a 403 Forbidden response if (a_request.getfrom().geturi().tostring().equals ("sip:tom@example.com")) { SipServletResponse response = a_request.createresponse(sipservletresponse.sc_forbidden); response.send(); } } W3C Call Control XML (CCXML) The program below shows how to use Call Control XML (CCXML) [135] to handle call blocking (CB). CCXML is published by W3C. It is designed to complement and integrate with a VoiceXML [133] interpreter to provide telephony call control support. VoiceXML is also published by W3C. It is designed to create audio dialogs to bring the advantages of web-based development and content delivery to interactive voice response (IVR) applications. <?xml version="1.0" encoding="utf-8"?> <ccxml xmlns="..."> <var name="in_connectionid" expr=" " />

28 8 <var name="currentstate" expr=" initial " /> <eventprocessor statevariable="currentstate"> <transition state="initial" event="connection.alerting" name="evt"> <assign name="in_connectionid" expr="evt.connectionid" /> <assign name="uri" expr="evt.connection.remote.uri"/> <if cond="uri == "> <assign name="currentstate" expr=" failed " /> <reject connectionid="in_connectionid" /> <else/> <assign name="currentstate" expr=" alerting " /> </if> </transition> </eventprocessor> </ccxml> At the beginning of the script, the call state is in initial state, the script checks the URI of the caller, if it is the script will reject the call and the call state will be changed to failed state. CCXML programmers must understand call states to program services. Figure 1.2 shows the call states in CCXML. IETF SIP Common Gateway Interface (CGI) The IETF IPTEL working group published the SIP Common Gateway Interface (SIP CGI) [71] and the Call Processing Language (CPL) [72] for service programming. SIP CGI is based on the HTTP CGI [4]. Through the SIP CGI interface, SIP requests and responses can be communicated to external programs, which can then respond with ap-

29 9 Figure 1.2: Call states in CCXML propriate actions to take. The script below shows a Windows batch script handling the CB. SIP CGI programmers must know every detail of SIP to program communication OFF IF NOT "%REQUEST_METHOD%"=="INVITE" GOTO EXIT IF "%SIP_FROM%"=="sip:tom@example.com" GOTO BLOCK GOTO EXIT :BLOCK echo SIP/ Forbidden :EXIT

30 10 IETF Call Processing Language (CPL) The Call Processing Language (CPL) is an XML-based language. CPL is designed to be powerful enough to describe a large number of services, but it is limited in power so that it can run safely in Internet telephony servers. I introduce CPL in detail in Chapter after introducing the Session Initiation Protocol (SIP). Here I just show a CPL script handling CB. The script uses address-switch to check the caller s address. If it is sip:tom@example.com, the script will reject the call. CPL programmers must understand several concepts in CPL, such as switches, and actions, to program services. Compared with SIP Servlets, CCXML, and SIP CGI, CPL has fewer things to learn for a service programmer. LESS is based on CPL. <?xml version="1.0" encoding="utf-8"?> <cpl xmlns="..."> <incoming> <address-switch field="origin"> <address is="sip:tom@example.com"> <reject status="403" reason="forbidden"/> </address> </address-switch> </incoming> </cpl> ECMA Computer Supported Telecommunication Applications (CSTA) ECMA defines services for Computer Supported Telecommunications Applications (CSTA) [59]. The CSTA standards are similar to ITU-T s Q.1200-series recommendations [60].

31 11 Both of them only provide definitions to services, but no programmable interface for service creation. However, CSTA has defined an XML-based protocol to allow service applications to control telephony entities. For example, the script below shows an XML script for terminating a call. <?xml version="1.0" encoding="utf-8"?> <ClearConnection xmlns="..."> <connectiontobecleared> <callid> </callid> <deviceid>sip:tom@example.com</deviceid> </connectiontobecleared> </ClearConnection> Session Initiation Protocol (SIP) Most of my work is based on the Session Initiation Protocol (SIP) [106]. SIP is an IETF standard used for Internet telephony call session setup. With SIP extensions, such as SIP extensions for presence [100] and SIP extensions for instant messaging [32], SIP can also handle functions beyond multimedia session setup. SIP methods SIP defines several methods: INVITE invites a user to a call; BYE terminates a connection in a call; ACK is used for reliable message exchanges for invitations; CANCEL cancels a pending request; OPTIONS conveys information about capabilities, e.g., what SIP methods a SIP entity can support; REGISTER conveys information about a user s

32 12 location to a SIP server. SIP also has many methods defined in its extensions. Chapter introduces SIP for Instant Messaging and Presence Leveraging Extensions in detail. The syntax of SIP are similar to those in the HyperText Transfer Protocol (HTTP) [44]. A SIP request contains header fields used to convey information about the request. Examples of some of the header fields are To, which lists the request destination; From, which lists the request originator; Subject, which identifies the subject of the request; Call-ID, which contains a unique identifier of the request; and Contact, which lists the device location where a user can be contacted. The message below shows a typical SIP INVITE request for a call setup. INVITE sip:tom@example.com SIP/2.0 Via: SIP/2.0/UDP ;branch= To: Tom <sip:tom@example.com> From: Bill <sip:bill@example.com>;tag= Call-ID: a5b6c7d888717@ Subject: Lunch CSeq: 2 INVITE Contact: <sip:bill@ > Content-Type: application/sdp Content-Length: 187 v=0 o=bill IN IP s=lunch c=in IP

33 13 t=0 0 m=audio RTP/AVP 0 In the request, the Content-Type header indicates that the request uses the Session Description Protocol (SDP) [52] to carry the parameters for media transmission. The SDP content tells the request recipient that the request sender is ready to receive an audio stream at IP address and the port number is The audio stream should be transmitted by the Real-Time Protocol (RTP) [117] and use PCM µ- law codec (Codec is a portmanteau of either Compressor-Decompressor or Coder- Decoder, which describes a device or program capable of performing transformations on a data stream, e.g., convert an analog audio signal to a digital signal. 0 is the payload type of PCM µ-law codec defined in RFC 1890 [110]). SIP entities A SIP-based telephony system usually consists of several different entities. Figure 1.3 shows different SIP entities and the typical message flow for a call setup. SIP-based endpoints are usually called SIP user agents (UA). They can be SIP hardware phones or SIP soft-phones. A SIP soft-phone is a software that can understand SIP and handle media transmission; an example is the Windows Messenger application. A SIP user agent usually consists of a SIP user agent client (UAC) and a SIP user agent server (UAS). A UAS responds to the requests sent by a UAC. For example, in Figure 1.3, UA-1 s UAC sends an INVITE request, UA-2 s UAS receives the request and sends a 200 Ok response. SIP UAs can talk end-to-end without any servers in between. But in general, a SIP telephony system deploys registrars, proxy servers, and redirect servers for call routing.

34 14 media streams SIP UA-1 (Bill) (5) 200 (2) INVITE Bob 2) INVITE Tom 3)3xx INVITE Tom Ok Registrar Proxy server Redirect server 4) INVITE B2BUA (1) REGISTER Bob is at (4) 200 (3) INVITE Bob 1) REGISTER Tom is at INVITE Tom Ok SIP UA-2 (Bob) UA-4 (Bob) SIP UA-3 (Tom) media streams media streams Figure 1.3: SIP entities Registrars can map a user s logical address, e.g., sip:tom@example.com, to his devices addresses, e.g., sip:tom@ Proxy servers can then forward a call to a user s logical address towards a user s device based on the mapping, and redirect servers can inform the caller of the mapping. This way, people can be contacted by their logical addresses, instead of device addresses. For example, in Figure 1.3, Tom s user agent tells the registrar that Tom uses a device with IP address When Bill calls Tom, Bill can dial sip:tom@example.com, instead of sip:tom@ , to reach Tom. One important feature of proxy servers is forking a proxy server can send a SIP request to a number of locations sequentially or in parallel. Forking is very useful when a user owns multiple end devices, e.g., a home phone, an office phone, a mobile phone, and a softphone. For an incoming call, forking can alert all the devices at the same time,

35 15 the user can pick up the most convenient device to answer the call. Once one device answers the call, a forking proxy will cancel calls to the other devices. Proxy servers and redirect servers only handle call signaling routing, media streams are still transmitted end-to-end. However, some services require that media streams also traverse service servers, such as an anonymous call service. For example, when Bill wants to make an anonymous call to Tom, the service server talks to Bill s UA as a UAS to process Bill s INVITE request. On the other end, it sends an INVITE totomasa UAC to establish a call with Tom. Both Bill s and Tom s media streams terminate at the service server so Tom will not know Bill s device address. The service server between Bill and Tom is called a back-to-back user agent (B2BUA). A back-to-back user agent (B2BUA) is a logical entity that receives a request and processes it as a user agent server (UAS). In order to determine how the request should be answered, it acts as a user agent client (UAC) and generates requests [106]. Reasons to base my work on SIP I base my work on SIP for the following reasons: SIP allows end-to-end operations: Two SIP user agents (UAs) can talk to each other directly. SIP header fields, which can be inserted and modified by end systems, can control services in end systems without interference from proxy servers. These features make it easy to implement end system services in SIP endpoints. SIP is simple: The simplicity of SIP also makes it easy to develop services in end systems since end systems often have limited computational capabilities. SIP can be easily integrated with other Internet services: For example, it is very easy

36 16 to forward calls to web pages or by putting an HTTP URL or a mailto URL in a SIP redirect response s Contact header. Web pages can also contain SIP URLs so users can click-to-dial a SIP call. We have a SIP-based infrastructure: Members of the IRT lab have developed a SIPbased infrastructure named Columbia InterNet Extensible Multimedia Architecture (CINEMA) [68]. CINEMA is a set of SIP-based Internet Multimedia servers for creating an enterprise Internet telephony and multimedia system. The core part of CINEMA is a SIP proxy, redirect, presence and registration server, named SIPD, which can execute service scripts, such as Call Processing Language (CPL) scripts [70] [72], SIP servlets [95], and SIP CGI scripts [71] to perform call control services, such as follow-me and call screening. CINEMA also contains a SIP conference server and a SIP voic /unified messaging server. I developed a SIP user agent: I developed a SIP user agent, SIPC, as a testing platform to experiment with my research work. Initially, SIPC can only be used for Internet telephony calls. By integrating my research work into SIPC, it now supports multimedia conferences, instant messaging, shared web browsing, desktop sharing, control of networked appliances and location-based sevices. It is also able to execute LESS [144] or SIP CGI [71] scripts to perform call control services and has a service management system to handle feature learning and feature interactions. I introduce SIPC in more detail in Part V.

37 SIP for Instant Messaging and Presence Leveraging Extensions The IETF SIP for Instant Messaging and Presence Leveraging Extensions (simple) working group has defined a set of standards for the suite of services collectively known as instant messaging and presence (IMP). Among these standards, the SIP MESSAGE [32] method was defined to convey text-based instant messages. The SIP event notification architecture [98] was defined to handle event subscriptions and notifications. Figure 1.4 shows the architecture. PUA-1 PUA-2 PUB PA-1 PUB presentity SUB NOT REGISTER Proxy /Registrar SUB NOT PA-2 watcher NOT PUA-3 Figure 1.4: SIP event notification architecture In the architecture, a Presence User Agent (PUA) manipulates presence information for a presentity. A PUA is an entity that can acquire the presence information of a human, program, or collection of humans and/or programs and can transmit the state information to a presentity. A presentity is an entity that provides presence information to a presence service. The presence service can then send the presence information to watchers who are interested in the information. A watcher is a logical entity that requests presence information about a presentity, or watcher information about a watcher, from presence services. There can be multiple PUAs per presentity. A presence agent

38 18 (PA) can access presence data manipulated by PUAs for the presentity. It can then receive SUBSCRIBE requests, respond to them, and generate notifications of changes in presence state. One way for a PA to access the data is by co-locating the PA with the proxy/registrar. Another way is to co-locate the PA with the PUA of the presentity [100]. In Figure 1.4, PUA-2 and PA-1 are co-located so PA-1 can directly get the presence information of PUA-2. PA-1 can also co-locate with the proxy/registrar and get the presence information through SIP REGISTER requests sent by PUAs. If a PUA and a PA are not co-located, the PUA can use SIP PUBLISH [88] requests to send their status to the PA. PA-2 may use SIP SUBSCRIBE requests to ask for presence status stored in PA-1, and PA-1 uses SIP NOTIFY requests to send the status. PUA-3 may use XCAP [99] or WebDAV [78] to retrieve the status information stored at PA-2. In SIP event notification architecture, presence information is usually encoded in the Presence Information Data Format (PIDF) [128]. Below is an example of a PIDF document showing that pres:tom@example.com is online. <?xml version="1.0" encoding="utf-8"?> <presence xmlns="urn:ietf:params:xml:ns:pidf" entity="pres:tom@example.com"> <tuple id="sg89ae"> <status> <basic>open</basic> </status> <contact priority="0.8">tel: </contact> </tuple> </presence>

39 Call Processing Language (CPL) The Call Processing Language (CPL) [72] is a language that can be used to describe and control Internet telephony services. LESS is based on CPL because CPL has many characteristics that are well-suited for end system services. However, CPL itself is not sufficient for end system services because it is originally designed to run on signaling servers, specifically, on proxy or redirect servers, though some of the signaling operations in CPL, such as reject, can also be applied to end systems. Abstractly, a CPL script consists of a collection of nodes, each of which describes an action that can be performed or a choice that can be made. All nodes have outputs, which depend on the result of the action or the choice to make. Nodes are arranged in a tree diagram, starting at a single root node; outputs of a node are connected to additional nodes, which are considered children of the node. There are no back-references from a child node to its ancestors or itself. There are several advantages of using the tree diagram. First, it allows easy backand-forth translation between graphical and textual representations; second, it facilitates program inspection, such as feature interaction handling; third, it avoids infinite loops and makes service scripts safe; and fourth, our end user service creation survey shows that the tree diagram is easy for users to understand. A CPL script can contain several different root nodes for handling different call events. When a call event, e.g., an incoming call, happens, the root node that matches the event gets invoked. Based on the output of the root node, more nodes are processed to check additional conditions or perform further actions. This process continues until a node with no specified outputs is reached. Figure 1.5 shows the graphical representation of a CPL script. In the figure, the incoming node always has a default output for

40 20 further call handling. The address-switch node checks whether the call is from If it is from it proxies the call to otherwise, it automatically rejects the call. The script further handles the outputs of the proxy action. If returns a busy response, the script will redirect the call to sip:tom@vmail.com. Figure 1.5: Graphical representation of a CPL script The above example shows three basic node types, namely toplevel-action (similar to the trigger defined in LESS, (see Chapter 3)), switch, and signaling operation (similar to the action defined in LESS (see Chapter 3)). Toplevel-action nodes serve as root nodes of CPL scripts. CPL defines two toplevel-action nodes, namely incoming and outgoing, that handle incoming calls and outgoing calls, respectively. Switch nodes represent choices a CPL script can make. Table 1.1 shows the switches defined in CPL. Signaling operation nodes cause signaling messages being generated in the underlying

41 21 signaling protocol. Table 1.2 shows the three signaling operations defined in CPL. Switch time-switch address-switch priority-switch string-switch language-switch Handling calls based on the time of executing a LESS script the addresses, e.g., the caller s URI the priority of the original call the free-form strings in a call request the languages in which the caller wishes to communicate Table 1.1: CPL switches Action Definition Parameters accept accept a call none reject reject a call code, reason redirect redirect a call permanent Table 1.2: CPL operations In CPL, one important concept is its location model. Note that the term location in CPL refers to a routable Internet address, not a person s physical location such as a street address. When a CPL script gets invoked, it maintains a location set as an implicit global variable throughout the execution of the script. When a CPL script processes an incoming call, its location set is initialized to be an empty set. For an outgoing call, it is initialized to the destination address of the call. CPL has three operations to modify the location set. These operations are called location modifiers. Explicit location modifier: An explicit location modifier specifies a location literally, as a URL. The URL will be added to the location set. The modifier has a parameter clear indicating whether the location set should be cleared before adding the URL.

42 22 Location lookup modifier: Locations can also be looked up through external means, e.g., registration information. The found locations will be added to the location set. This modifier also has a clear parameter for clearing the current location set. Location Removal modifier: A CPL script can also explicitly remove a location from the current location set by using the location removal modifier. Relationship between CPL and LESS LESS inherits CPL s tree diagram, the concept of triggers, switches, and actions, and CPL s location model. LESS extends CPL with new triggers, switches, modifiers, and actions to make it suitable for end system services. LESS also incorporates many other Internet services, such as location-based services, presence and instant messaging, and networked appliance control. In addition, LESS introduces several new concepts such as variables to make the language more powerful than CPL Location-based services Because many telecommunication services, such as emergency call services, rely on people s physical locations, location information is very important to telecommunications. Location-based services have received a lot of attention by wireless providers and are integrated into the 3GPP (UMTS) service architecture and WAP [75]. In Internet telephony, people have also started to investigate location-based services, for example, the Presence and Integrated Communications (PIC) Working Group in the Internet2 community has conducted several trials to demonstrate enhanced SIP-based communications by integrating location-based services [63].

43 23 We can roughly divide the ongoing location-based services research into three categories: location acquisition, location representation, and location usage. Location acquisition Triangulation, scene analysis, and proximity are the three principal techniques for automatic location-sensing for people. The triangulation location-sensing technique uses the geometric properties of triangles to compute object locations. For example, a 2D position can be determined based on the distance to three known non-collinear points. Triangulation is commonly used in wireless network to get the locations of hand held devices. The scene analysis location-sensing technique uses features of a scene observed from a particular vantage point to draw conclusions about the location of the observer or of objects in the scene. A proximity location-sensing technique entails determining when an object is near a known location. For example, when a person puts his identification (e.g., a radio frequency identification (RFID) or a swipe card) at an identification reader, if we know the location of the identification reader, the location of the person can be determined. There are many different kinds of identification devices, such as magnetic swipe card, RFID, Bluetooth tag, and ibutton [121]. There are some other techniques to get devices location on Internet. For example, administrators can identify which room a computer is in by checking which network jack the computer is using. On the public Internet, a computer s rough location can also be determined by its IP address, but in general, this method is too coarse to be useful.

44 24 Location representation There are a number of efforts currently underway for establishing a standard for positioning techniques and a standard for relaying location information. Proposals are offered by the Location Interoperability Forum (LIF) under the Mobile Location Protocol [89] and the Open GIS Consortium. The GEOPRIV working group in the IETF deals primarily with location-related privacy issues. The IETF SIMPLE working group has also done some work on location information representation [116]. The location information could be a room (name or function information), civic (street and community), categorical (such as movie theater ), activity (such as flying ) and privacy policies (such as quiet ). The information can be conveyed, for example, in a SIP NOTIFY request in RPID format [116] or as part of the DHCP option for civic location [111] Ubiquitous computing Ubiquitous computing aims to enhance computer use by making many computers available throughout the physical environment, but making them effectively invisible to the user. [137] Existing ubiquitous computing environment In the past decade, there have been numerous efforts in ubiquitous computing. Some examples are the Intelligent Room (MIT) [26], the Interactive Workspaces Project (Stanford University) [20], the Aura Project (CMU) [47], the Composable ad hoc location-based services (UC Berkeley) [56] and the Easy Living (Microsoft) project [27], to name a few. While these projects have successfully built systems that effectively interact with

45 25 the user and the environment, they use proprietary systems and are primarily based on non-standard protocols and are generally limited to a single organization or building. The ubiquitous computing architecture in this dissertation uses open protocol standards like SIP, SLP [51], and on-going efforts in the IETF. Service Location Protocol (SLP) The Service Location Protocol provides a scalable framework for discovering and selecting network services. There are three entities in the SLP framework. A user agent (UA) is a process working on the user s behalf to establish contact with some services. The UA retrieves service information from the Service Agents or Directory Agents. A service agent (SA) is a process working on behalf of one or more services to advertise the services. A directory agent (DA) is a process which collects service advertisements. There can only be one DA present per given host. A UA may issue a Service Request (SrvRqst) through multicast specifying the characteristics of the services it requires. Once a SA receives the SrvRqst, it can send a Service Reply (SrvRply) specifying the location of the available services. A SA can also send register messages (SrvReg) containing the services it wants to advertise to a DA. A UA can then send SrvRqst to the DA to find available services Other Internet protocols related to my thesis I use many Internet protocols in my dissertation. In previous sections, I briefly introduced SIP[106][32][98], SDP[52], RTP[117][110], and SLP[51]. There are three other protocols related to my dissertation: the Real Time Streaming Protocol (RTSP)[119], the Simple Object Access Protocol (SOAP)[134], and the Session Announcement Protocol

46 26 (SAP)[53]. Real Time Streaming Protocol (RTSP) RTSP is an application-level protocol for control over the delivery of data with real-time properties. RTSP provides an extensible framework to enable controlled, on-demand delivery of real-time data, such as audio and video [119]. RTSP allows a client to remotely control a streaming media server, issuing VCR-like commands such as play and pause, and accessing files on the server based on timestamps. Below is an example RTSP message which plays seconds 10 through 15 of an audio clip. PLAY rtsp://audio.example.com/audio RTSP/1.0 CSeq: 835 Session: Range: npt=10-15 Simple Object Access Protocol (SOAP) SOAP provides the definition of the XML-based information which can be used for exchanging structured and typed information between peers in a decentralized, distributed environment [134]. SOAP can be used to perform remote procedure call (RPC). The sender can encapsulate remote procedure s name and parameters in a SOAP request and get the procedure execution result in a returned SOAP message. SOAP is usually carried as the content of HTTP messages, but SIP messages can also carry SOAP content, e.g., to perform conference control. A SOAP message is contained in an envelope. Within the envelope are two additional sections: the header and the body of the message. The header section is optional.

47 27 It contains relevant information about the message, such as date. The body contains the real information to exchange between peers. Below is an example of a SOAP document. <soap:envelope xmlns:soap="..."> <soap:body> <getuserdetails xmlns="..."> <userid>827635</userid> </getuserdetails> </soap:body> </soap:envelope> Session Announcement Protocol (SAP) SAP is an application-level protocol for advertising multimedia sessions through multicast. Usually, a SAP message carries SDP content to communicate the relevant session information to prospective participants. SAP uses well-known multicast addresses ( for IPv4 and FF0X:0:0:0:0:0:2:7FFE for IPv6, both with port 9875) to send session information. These well-known addresses are like TV guide channels for SAP listening applications. A SAP listening application can listen to the well-known SAP multicast addresses and get the information of all advertised multicast sessions. 1.2 End system services In Internet telephony systems, end systems and network servers cooperate to provide services, including and beyond basic call setups. In many cases, services can be imple-

48 28 mented in either end systems or servers, for example, both proxy servers and user agents can reject calls. This section defines end system services. In this dissertation, end systems are referred to as the systems located at the end of session signaling routing paths. They can initiate or accept session setup messages, and usually directly interact with users. Accordingly, end system services are the services performed on end systems. Network servers, such as SIP proxy servers, are the entities that route session messages. And network services are the services running on network servers. According to the end-to-end design principle of the Internet [109], I believe that many telecommunication services that are in service provider controlled servers can move to end user operated end devices, e.g., the pickup call service, which I introduce in detail in Chapter 4.1. In this dissertation, my focus is mainly on services running on user-operated end devices because PSTN assumes dumb end devices so there is little service research on those devices. Providing services in user-operated end systems has the advantage that on-thespot interaction with users is much easier. Network services can only interact via protocol messages and possibly media content, rather than GUIs. It might be possible to incorporate user interaction into scripts running in servers, but it is likely to be far more cumbersome and subject to network delays. In addition, in Internet telephony, in general, end systems are the only entities where signaling and media flows converge. This is architecturally different from that in the legacy PSTN, in which switches route both signaling and media flows. The next chapter provides more differences between end system services and network services. The differences result in different techniques for creating end system services. Because there are many different Internet telephony end devices, it is desirable

49 29 to define a portable service description language for end systems, instead of different toolkits for different end devices. The next chapter also introduces the service description language I defined in detail.

50 Part I The Language for End System Services (LESS) 30

51 31 This part introduces the Language for End System Services (LESS), which I defined specifically for describing services running on people s end devices, such as SIP hardware phones and softphones. This part includes five chapters. Chapter 2 introduces the design strategies of LESS; Chapter 3 defines LESS; Chapter 4 discusses how to use LESS to program services; Chapter 5 presents the tools I developed for LESS-based service creation; Chapter 6 evaluates the simplicity and safety of LESS and analyzes the survey I conducted on end users willingness and capability to create their desired services.

52 32 Chapter 2 Motivation and design strategies of LESS Enabling end system services can bring great convenience to users. In general, while it is difficult for users to modify network services that are operated by service providers, they can install and modify services on end systems they own. However, the existing service creation techniques, which are mainly designed for PSTN, are not well suited for end system services. It is important to define a language specifically for end system service creation for the following reasons. First, the call models for end system services and network services differ. Figure 2.1 shows the models of a two-party call for network services and end system services. Network services mainly deal with how to establish connections between multiple addresses, e.g., routing calls; While end system services focus on how to control local resources and applications to communicate with remote entities, e.g., control audio and video devices to generate or play out media streams. The two different call models imply different service states, events, and actions. Hence, many network services, e.g., forking,

53 33 are unsuitable for end systems and vice versa. Call Call Connection Connection Address1 Address2 Remote app1 app2 app3 a. Network service call model b. End system service call model Figure 2.1: Call models of network services and end system services Second, the languages for end system services and network services have different simplicity and functionality requirements. In general, network servers are controlled by experienced users, such as system administrators, but endpoints are handled by inexperienced users. Experienced users often need a language that can program all sorts of services, but inexperienced users require an easy to understand language. Therefore, the functional richness of a network service programming language is more important than its simplicity. On the other hand, an end system service programming language must be relatively simple. Third, as we discussed in Chapter 1.2, in Internet telephony, signaling messages, media streams, and direct user interactions converge at end systems. Thus, an end system service programming language must take advantage of this convergence and allow users to directly handle media streams and user interactions. Based on the above reasons, I designed a language specifically for end system service creation, which is called the Language for End System Services (LESS).

54 Design strategies of LESS Requirements for LESS I setup several requirements for designing LESS. First, LESS or a commonly used subset of LESS must be relatively simple, easy to understand, safe for creating error-free services, and powerful enough to build a wide range of services by comparatively inexperienced users. Because its power is restricted to ensure its simplicity and safety, the language needs not to handle all kinds of services, and needs not to be Turing-complete. However, it must be extensible for introducing more complicated concepts and elements to build more powerful services by experienced service programmers; Second, LESS must be platform-neutral because users often want to create their services once and use them on different platforms; And third, it must also support user interactions and control media streams because signaling messages, media streams, and direct user interactions converge at end systems Design strategies The development of the Internet and the rise of the extensible Markup Language (XML) [24] as a language standard have prompted proposals that XML-based scripting languages be used for creating telecommunications services. Among other advantages, XML is platform, network and technology neutral, and independent of underlying programming languages. In addition, XML-based languages are readable by machines as well as humans. For example, HTML s ease of learning and the view source capability for browsers have bootstrapped the Web s popularity in an amazing way. For the above reasons, I base LESS on XML.

55 35 My colleague Dr. Jonathan Lennox, my advisor Prof. Henning Schulzrinne and I have defined the Call Processing Language (CPL) [72] to describe Internet telephony services. CPL has a tree-like structure and avoids the use of loops, variables and recursion to allow program inspection and the back-and-forth translation between graphical and textual representations. These characteristics make it a good candidate for end system service creation. However, it is originally designed for proxy servers and lacks the support for end system services. For example, CPL scripts cannot accept incoming calls, and cannot handle timer triggered services. The drawback makes it not sufficient for end system service creation. Based on the advantages and drawbacks of CPL, I base LESS on CPL, but add many extensions and introduce several new concepts to CPL. In Internet telephony, services will not be a small number of named services such as call forwarding busy, rather, it appears more plausible to have policy-based or rulebased services, e.g., certain events under certain conditions invoking certain actions, similar to how user agents, for example, support message filtering and forwarding. Using rules to describe a service is much simpler than using C/C++ or Java to program a service so it is more appropriate for comparatively inexperienced end users. Meanwhile, rule-based languages can also be powerful enough to represent a wide range of telecommunication services [141]. Thus, LESS should be a rule-based language. As we mentioned in Chapter 1, SIP has many advantages for developing end system services so I define LESS based on SIP, though LESS can also be used in other Internet telephony end systems, such as H.323 endpoints.

56 Comparing LESS with other service creation languages Several XML-based call control languages have been proposed, such as the Call Policy Markup Language (CPML) [2], the Telephony Markup Language (TML) [5], CallXML [3], Call Processing Language (CPL) [72], Service Creation Markup Language (SCML) [16], VoiceXML [133], and Call Control extensible Markup Language (CCXML) [135]. I observe that the more mature and interesting of these proposals are CPL, being standardized by the Internet Engineering Task Force (IETF), SCML, being developed by JAIN forum, and CCXML, being developed by the Voice Browser working group in the World Wide Web Consortium (W3C). My examination of CPL, SCML and CCXML concludes that none of the three approaches provide enough support for end system services. As I mentioned earlier, CPL is originally designed to run on proxy servers so it lacks the functions for end system services. It can proxy, reject,orredirect calls, but cannot accept a call or modify media attributes of a call. In addition, CPL cannot be activated through non-call events, such as a timer. Except for the above limitations of CPL, it has all the characteristics we need for end system service creation, such as the tree-like structure, event-driven model, no loop, no recursion, and no user-defined variables. I thus have LESS based on CPL, but make it more powerful, and better suited for end system service creation. For telephony applications, W3C has defined VoiceXML [82, 124] to facilitate Interactive Voice Response (IVR) systems. VoiceXML is rich in user interaction but has limited call control functions. To complement VoiceXML, W3C also defined Call Con-

57 37 trol XML (CCXML [135]) to program user interactions and point-to-point call control functions. First of all, the design purpose of CCXML is to provide telephony call control support for VoiceXML, making it limited to only a subset of the services we want for end system services. For example, it cannot handle non-telephony events, such as presence indication, and cannot perform actions in addition to telephony call control, such as instant messaging. The elements defined in CCXML are at a lower level of abstraction than what we need. For example, the createcall element defined in CCXML asks for callerid, connectionid, and aai (application-to-application information) as its attributes, and will generate one or more connection.progressing events. It is difficult for nontechnical inexperienced service creators to figure out the correct values for those attributes. In addition, CCXML explicitly uses a finite state machine (FSM) in its scripts. It uses a <transition> tag to describe state changes in the FSM. A CCXML service programmer must remember the states defined in CCXML to program a service. Further, CCXML allows user-defined variables and loops. Because inexperienced users may create services that can be harmful to themselves or others, e.g., a service with infinite loop or an out-of-bound value to a variable, it makes CCXML not as safe as CPL for end users to create services. SCML is incomplete and there are apparently no people actively working on it. However, since it was an effort in the JAIN forum, which is the most important community for Java-based communication service development, it is still necessary to compare SCML with LESS. SCML is closely tied to the JAIN Java Call Control (JCC) API and defined using an XML schema that is derived from JCC. The object model of JCC [66]

58 38 is exactly the same as the network service call model in Figure 2.1.a. It focuses on how to build connections among addresses, not on how to control local applications. It is designed mainly for call routing, instead of end system services. In addition, SCML also focuses only on call-related events. Based on the above comparison, I believe that LESS is more appropriate than the existing XML-based solutions for describing end system services.

59 39 Chapter 3 LESS definition 3.1 High level abstraction Figure 3.1 shows how LESS abstracts communication services. A LESS script is a collection of event handling rules. The events can be an incoming call, an outgoing call, or a timer event. For example, a LESS event handling rule can be When an incoming call arrives, check the caller of the call. If the caller is sip:tom@example.com, reject the call.. LESS uses a structured way to describe the rules so a LESS interpreter can interpret the rules and perform the desired communication services. A LESS event handling rule consists of a trigger, zero or more switches, and one or more actions. Modifiers are used to provide arguments for actions. A rule is invoked only when a signaling or nonsignaling event matches its trigger, e.g., an incoming call event matches an incoming trigger. For a given event, a LESS script contains no more than one matching trigger. Once a trigger is invoked, switches check the status of triggers and their context to make decisions. For example, an address-switch checks the caller and the callee s addresses and makes call decisions accept, reject, orredirect calls based

60 40 Figure 3.1: Call decision making process in LESS on those addresses. Modifiers can provide action arguments. For example, a location modifier can specify the target uniform resource identifiers (URI) of a redirect or a call action. One action may be followed by additional processing, e.g., more switches for condition checking and more actions for event handling. Multiple actions can also be executed in parallel. The relationship between triggers and actions is important in defining LESS. Actions are used to handle the event that invokes a trigger. Not every action can apply to every trigger. In the definition of LESS, we clearly define the triggers that each action can apply to. For example, an accept action, which automatically accepts an incoming call, can only be used in an incoming trigger. We cannot use an accept action in an outgoing trigger, which handles outgoing SIP INVITE requests. In addition, in a LESS script any LESS action may not invoke another LESS trigger, thus avoiding infinite loops. For example, a LESS call action does not invoke a LESS outgoing trigger. LESS inherits the tree-like structure of CPL in which LESS elements, such as triggers, switches, and actions are called nodes. Each node has one or more outputs that can connect to additional nodes to further process an event. For example, the script fragment in Figure 3.2 shows an address-switch with two outputs. The first output handles incoming calls from sip:tom@abc.com. The second output handles calls

61 41 from others. <address-switch field="origin"> <address <accept/> </address> <otherwise> <reject/> </otherwise> </address-switch> Figure 3.2: Address-switch example Figure 3.3: An example of the tree-like structure for trigger handling Each LESS script starts at a trigger node, which then connects to a switch node or an action node. The outputs of the switch or action node are connected to additional nodes until a node with no specified outputs is reached. There are no back references from a child node to its ancestors or to itself. Later in this dissertation, I use the term LESS decision tree to refer to a LESS script. A LESS decision tree can be converted to a set of event handling rules if we define a rule as the path from the root of a decision tree (a trigger node) to a leaf of the decision tree (an action node). Figure 3.3 shows a decision tree, which represents the LESS service script shown in Figure 3.4. The script

62 42 can be translated into a C-like pseudo code shown in Figure 3.5. The C-like pseudo code is more concise than the XML-based LESS script, but as I discussed in Chapter 2.1.2, LESS has many advantages that C/C++ or Java cannot provide. <?xml version="1.0"?> <less> <incoming> <address-switch field="origin"> <address is="sip:t@abc.com"> <accept/> </address> <otherwise> <priority-switch> <priority equal="emergency"> <accept/> </priority> <otherwise> <reject status="486"/> </otherwise> </priority-switch> </otherwise> </address-switch> </incoming> </less> Figure 3.4: LESS script example void incomingcallhandling (Call c) { if (c.caller == "sip:tom@abc.com"} { accept(c); } else { if (c.priority == "emergency") { accept(c); } else { reject(c); } } } Figure 3.5: Using C-like pseudo code to program a service In general, it is important to maintain states of calls in call control services. However, because LESS is designed for comparatively inexperienced end users, the call states

63 43 should be transparent to service programmers, but a LESS interpreter must maintain call states to handle services. Figure 3.6 shows the call states in LESS. When a LESS interpreter invokes an incoming trigger or performs a call action, the interpreter will initiate a call that is in calling state. If the call is accepted, the call will be in connected state. If the call is rejected or redirected, it will be in finished state. A connected call that is terminated or successfully transferred will also be in finished state. All other actions will not change the call state. other actions other actions accept calling reject, redirect accepted calling busy, failure, redirection no-answer connected terminate finished connected terminate finished transfer transfer other actions other actions other actions other actions (a) (b) Figure 3.6: (a) Call states for an incoming call (b) Call states for an outgoing call 3.2 Grammar, types, and variables Grammar and types LESS is an XML-based language and has its grammar and data types defined in XML Schema. I include the XML schema of LESS as an appendix of this dissertation in Chapter This section presents an overview of the grammar, data types, and system variables defined in LESS.

64 44 Figure 3.7: Relationship among LESS elements The LESS schema contains four abstract data types: TriggerType, SwitchType, ModifierType, and ActionType; and four abstract elements that instantiate these abstract types: trigger, switch, modifier, and action. The schema then defines the relationship among the elements shown in Figure 3.7 only triggers can be immediate child elements of the <less> tag, which is the root tag of a LESS document; switches, modifiers, and actions can only be triggers sub-elements; and only actions can be the inner most elements. To force the element relationship, we require all the LESS elements, including elements in LESS extensions, must be substitutiongroups of these four abstract elements. substitutiongroup is a mechanism defined in XML Schema to allow el-

65 45 ements to be substituted for other elements so that elements in a substitution group can be used interchangeably. This way, we can ensure that all triggers, switches, modifiers, and actions defined in LESS and its extensions follow the tree-like structure so that any analysis applied to LESS will also be applicable to LESS extensions Variables LESS does not allow user-defined variables, but it allows several pre-defined variables to retrieve system information, user information, agent information, trigger information, and action information. In general, inexperienced service creators need not to use any variables because even without variables, LESS can still handle a wide range of services, which I will discuss in Chapter 4. However, providing pre-defined variables enables more complicated services needed by experienced service creators. Meanwhile, it will not compromise the safety of the language. LESS variables can only be used in LESS actions and modifiers to provide parameters. LESS switches must not use any variables. We use the format {variable-name} to represent a variable in a LESS script. For example, in a reject action, we can provide its reason parameter as below. <reject code="486" reason="my current activity is {user.activity} "> System information variables System information variables represent the attributes of the end system that is running a LESS script.

66 46 system.connection-speed represents the connection speed of the end system. Available values are Dial-up (56kb/s), DSL/Cable (256kb/s), DSL/Cable (768kb/s), T1 or higher (1.5Mb/s), or numerical speed values. system.device represents the device and model information of the end system as a string, e.g., Pingtel Xpressa, PX-1. system.os represents the operating system of the end system, e.g., Microsoft Windows XP Professional, Service Pack 2 Build User information variables User information variables provide the information about the user who runs the LESS script. user.presence represents the user s presence status. Available values are online and offline. The values can be mapped to open and closed defined in the <basic> element in Presence Information Data Format (PIDF) [128]. user.activity represents the user s current activity. Available activity values are defined in the Rich Presence Extensions to the Presence Information Data Format (RPID) [116]. user.mood represents the user s current mood. Available mood values are also defined in RPID. user.location represents the user s physical location, which can be a civic location, a geospatial location, or the place type of a location. We use the elements defined in civilloc format [93] to represent civic locations. For example, user.location.country

67 47 represents the country which the user is in. We use user.location.latitude and user.location.longitude to represent geospatial coordinates, and user.location.placetype to represent place types. The values of the place types are defined in RPID. user.language represents the user s preferred language. Language values are defined in RFC 3066 [10]. Users can configure this value in their SIP user agents. Agent information variables For a given LESS script, agent information variables contain the information about the SIP user agent running the script. agent.name represents the name and the version of the user agent. In a SIP user agent, the LESS interpreter should set the value of this variable the same as the value of the User-Agent header in a SIP request. agent.number-of-calls represents the number of currently active calls in the user agent. agent.capabilities represents the capabilities of the user agent. Capabilities can be represented by following the format defined in RFC 3840 [107]. For example, agent.capabilities.sip.audio indicates the audio capability of the user agent. Trigger information variables For a trigger in a given LESS script, trigger information variables represent the information of the event that invokes the trigger. trigger.origin represents the originator of the trigger s signaling message. This variable and the trigger.destination variable can only be used in triggers that

68 48 are caused by signaling messages, e.g., a timer trigger. For an incoming SIP request, the value of this variable is the value of the SIP From header. A user can use trigger.origin.uri to represent the URI of the originator, trigger.origin. uri.domain to represent the domain part of the URI, trigger.origin.uri.user to represent the user part of the URI, and trigger.origin.display-name to represent the display-name of the originator. trigger.destination represents the destination of the trigger s signaling message. For an incoming SIP request, the value of this variable is the string of the SIP To header. trigger.timestamp represents the time of receiving a signaling message. The time is in the format defined in RFC 1123 [22]. Action information variables Action information variables contain the information of the last performed action. action.last-action-result represents the return value of the last performed action. Not every action returns a value. If the last performed action does not return a value, action.last-action-result is an empty string. 3.3 Program execution Figure 3.8 shows a SIP user agent with a Call Control part, which performs default event handling actions, and a LESS engine part, which executes LESS scripts to handle events. For every event on the user agent, such as an incoming INVITE request, the call control part first passes the event to the LESS engine. If the event matches a LESS trigger in

69 49 INVITE incoming trigger handling User agent Call Control LESS engine no matching branch has a matching branch perform default incoming call handling function (e.g., alerting) perform actions in the matching branch Figure 3.8: LESS script execution the scripts loaded by the LESS engine, the LESS engine will execute the matched LESS script to handle the event. If the LESS engine performs a signaling action to handle the event, the call control part will not perform its default event handling actions. Otherwise, the call control part will perform the default event handling actions to handle the event. A LESS script cannot pass any information to other LESS scripts or to its own subsequent invocations. However, it may change system or user status, which may affect the behavior of other LESS scripts or the script s subsequent invocations. For example, in the script below, if the user is not on-the-phone, an incoming call will be automatically answered and the user s status will be on-the-phone. For next incoming call, the script will be executed again, but this time, it rejects the call. <?xml version="1.0"?> <less> <incoming> <status-switch statusname="activity">

70 50 <status is="on-the-phone"> <reject status="486" reason="busy"/> </status> <otherwise> <accept/> </otherwise> </status-switch> </incoming> </less> In the above example, if two incoming calls arrive almost at the same time, the first call invokes the script first. If we allow the second call to invoke the script again before the first invocation is finished, which call to be accepted will be undetermined. To ensure the deterministic result of LESS script execution, the simplest way is to ensure sequential execution of LESS scripts. Sequential execution has a potential problem if a LESS script requires human intervention to complete, e.g., a call action needs to wait until the callee answers or rejects the call. Human intervention usually takes a long time and blocks the execution of other scripts. To handle this problem requires feature interaction detection mechanism to check the relationship among LESS actions. Handling feature interactions in LESS is discussed in detail in Part II.

71 Basic LESS elements Chapter 3.1 lists four types of elements in LESS, namely triggers, switches, actions, and modifiers. This section briefly introduces every specific element in LESS. The complete definition of all the elements is in the Internet draft we submitted to the IETF [144] Triggers Basic LESS defines three triggers: incoming, outgoing, and timer. incoming: The incoming trigger handles incoming calls. In a SIP-based Internet telephony system, this trigger can be invoked by an incoming INVITE message. The trigger does not have any attributes. outgoing: The outgoing trigger handles user made outgoing calls. In a SIP-based Internet telephony system, this trigger gets invoked when a user makes an outgoing call, and generates an outgoing INVITE message. The trigger does not have any attributes. timer: The timer trigger handles timer-based events, e.g., automatically making an outgoing call at a specific time. The timer trigger has the same attributes as those defined in the time-switch (Chapter 3.4.2), except the dtend and the duration attribute. A timer trigger is invoked when its attributes are matched. For example, the timer trigger below gets invoked every weekday at 9:00AM since 09/01/2006. <timer tzid="america/new_york" tzurl="

72 52 dtstart=" t090000" freq="weekly" byday="mo,tu,we,th,fr">... </timer> Switches Every switch checks a certain condition and decides which action to take. Each switch has two general outputs. The not-present output handles the situation that the switch does not know the exact value of the condition it checks. For example, if we use a language-switch to handle an incoming INVITE message based on the caller s language preferences but there is no Language header in the incoming SIP request, the actions in the non-present output of the language-switch will be executed. The other general output is otherwise, which handles the situation that all other outputs do not match the condition which the switch checks. Basic LESS defines six switches: time-switch, address-switch, language-switch, priority-switch, string-switch, and status-switch. Below I introduce every switch defined in the basic LESS definition. time-switch: The time-switch in LESS is the same as the time-switch defined in CPL [72] Section 4.4. A time-switch in a script makes decisions based when the script is executed. For example, the script below shows automatically rejecting incoming calls in a specific time period. As shown in the example, every time-switch has an output named time, which specifies a period of time to match.

73 53 <?xml version="1.0"?> <less> <incoming> <time-switch> <time dtstart=" t163707z" dtend=" t173707z"> <reject status="486" reason="busy"/> </time> </time-switch> </incoming> </less> address-switch: The address-switch in LESS is the same as the address-switch defined in CPL [72] Section 4.1. An address-switch makes decisions based on the addresses presented in SIP requests, such as the caller s address and the callee s address. For example, the script below automatically accepts incoming calls from sip:bob@example.com. As shown in the example, every address-switch has an output named address, which specifies the address to match. <?xml version="1.0"?> <less> <incoming> <address-switch field="origin"> <address is="sip:bob@example.com"> <accept/>

74 54 </address> </address-switch> </incoming> </less> priority-switch: The priority-switchin LESS is the same as the priority-switch defined in CPL [72] Section 4.5. A priority-switch makes decisions based on the priority specified in call signaling messages. In a SIP INVITE message, the priority is specified in the Priority header. Every priority-switch has an output named priority, which specifies the exact priority value to match. string-switch: The string-switch in LESS is the same as the string-switch defined in CPL [72] Section 4.2. A string-switch makes decisions based on freeform strings in a call request. For example, a LESS script can make a call decision based on the Subject header of an incoming INVITE message. The string-switch has an output named string, which specifies the exact string value to match. language-switch: The language-switchin LESS is the same as the language-switch defined in CPL [72] Section 4.3. For a SIP call, a language-switch makes decisions based on the languages in which the caller wishes to use. The caller preferred languages are specified in SIP Accept-Language headers. The language-switch has an output named language, which specifies the exact languages to match. status-switch: A status-switch in a script makes decisions based on the script owner s status, such as online or offline, or other people s status that can be acquired by the script owner. The switch has a parameter, uri, which specifies the URI of the presentity. If the parameter is not provided, the presentity is the script owner.

75 55 The status-switch has an output named status, which specifies the status to match. Available status values are: number-of-calls: A status-switch can make decisions based on the number of calls that a person is involved in. presence: A status-switch can make decisions based on the presence status, such as online or offline, of a person. mood: A status-switch can make decisions based on the mood, such as happy or sad, of a person. activity: A status-switch can make decisions based on the activity, such as sleep, of a person Modifiers LESS follows the location model defined in CPL. When a LESS script is invoked, it maintains a location set as an implicit global variable throughout the execution of the script. For an outgoing trigger, the location set is initialized as the destination of the call. For all the other triggers defined in this dissertation, the location set is initiated as an empty set. LESS has three modifiers to modify the location set. location modifier: The location modifier is the same as the location modifier defined in CPL [72] Section 5.1. This modifier can explicitly add a location URL to the current location set, for example, the following script fragment redirects a request to sip:tom@example.com. <location url="sip:tom@example.com">

76 56 <redirect/> </location> The modifier has a parameter clear to indicate whether the current location set should be cleared before adding the new URL. The default value of the clear parameter is true. lookup modifier: The lookup modifier is similar to the lookup modifier defined in CPL [72] Section 5.2. The lookup modifier can retrieve one or more locations from an external location source, such as an addressbook or a database, and add the retrieved addresses to the current location set. LESS defines two additional parameters to CPL s lookup modifier for more flexible location lookup. The source parameter indicates where to find the location repository. It can be a web URL or a local addressbook. The other parameter is query, which contains the query string to retrieve locations from a location repository. For example, the following script fragment retrieves locations from a web URL. The lookup modifier also has a parameter clear indicating whether the current location set should be cleared before adding the new URLs. The default value of the clear parameter is true. <lookup source=" query="?from={trigger.origin.uri}"> remove-location modifier: The remove-location modifier is the same as the remove-location modifier defined in CPL [72] Section 5.3. This modifier can remove one or more locations from the current location set.

77 Actions The XML schema of LESS ensures that only actions can be the innermost elements of a LESS XML document. This way, every branch of a LESS decision tree has at least one action as its leaf. A decision branch without any actions is meaningless to users. A LESS action usually have one or more outputs, which represent the results of an action and further handling based on the results. For example, the outputs for a call action can be accepted, busy, noanswer, failure, and redirection. The script below shows how to use the noanswer output of a call action to handle the call forwarding with no-answer service. <timer dtstart=" t163707z"> <location url="sip:tom@abc.com"> <call> <noanswer> <location url="sip:mary@abc.com"> <call/> </location> </noanswer> </call> </location> </timer> In a LESS script, multiple actions can execute sequentially or in parallel. For example, in the script below, the alert action and the accept action are performed at the same time.

78 58 <incoming> <alert duration="1000" style="vibrate"/> <accept/> </incoming> But in the following script, the transfer action and the accept action are performed sequentially. <incoming> <accept> <next> <location <transfer/> </location> </next> </accept> </incoming> In the above script, the transfer action is performed after the accept action completes. In a SIP user agent, completing an accept action means that the user agent has sent a SIP 200 OK response, received the SIP ACK request for the SIP 200 OK response, and established a media channel. Each action has a complete state, which I describe in detail in the latter part of this section. Each LESS action can use a next output to define its subsequent actions. The later executed action must wait until the earlier executed action completes. Note that a LESS action may have other outputs in addition to the next output. A LESS action is considered complete only if all

79 59 the actions in its outputs except next are completed. For example, in the following script, the transfer action is executed after the alert action in the accepted output completes. <timer dtstart=" t163707z"> <location <call> <accepted> <alert duration="10"/> </accepted> <next> <location <transfer/> </location> </next> </call> </location> </incoming> Below I introduce every action defined in the basic definition of LESS. For each action, I first show the parameters of the action, then introduce which trigger the action may apply, and then discuss the completion conditions and the pre-conditions of the action. Pre-conditions provide the required system or user status to execute an action. For example, in an end system with only one audio device, one of the pre-conditions for accepting a call is that no other applications are using the audio device.

80 60 accept: The accept action can accept an incoming call. The action has no parameters but service creators can use media modifiers to change the media attributes, such as codecs, of accepted calls. Accept actions can only be applied to incoming trigger to handle incoming calls. An accept action will generate a SIP 200 OK response for an incoming call. The action completes when the SIP ACK request for the SIP 200 OK is received and the media channels are established, Different user agents may have different pre-conditions for an accept action. In general, for an audio-only call, the pre-conditions are that the audio devices are not occupied by other calls or applications. The action has only one output, next, for subsequent actions. reject: The reject action in LESS is the same as the reject action defined in CPL [72] Section 6.3. This action can reject an incoming call. It can only be applied to an incoming trigger. The action completes when it generates a reject message (e.g., a SIP 486 Busy response). This action does not have pre-conditions and has only one output, next, for subsequent actions. redirect: The redirect action in LESS is the same as the redirect action defined in CPL [72] Section 6.2. This action can redirect a SIP request to another location specified in the current location set. It can be applied to any trigger that is invoked by an incoming request, such as incoming, subscription, and message. This action completes when it generates a SIP 3xx response. It does not have preconditions and has only one output, next, for subsequent actions. call: The call action can generate an outgoing call. The action has two parameters: timeout indicating the time to try before giving up the call attempt; and

81 61 replace indicating which call at the calling target should be replaced. Service creators can also use location modifiers to specify the destination of the call, or media modifiers to change media attributes of the call. The action has six outputs: accepted, busy, failure, noanswer, redirection, and next. The accepted output handles SIP 2xx responses; the busy output handles SIP 486, 600, and 603 responses; the failure handles other SIP 4xx, 5xx, and 6xx responses; the redirection handles SIP 3xx responses; the noanswer handles the situation that the call attempt is not answered before timeout; and the next handles subsequent actions. In a redirection output, the new address to which the call is redirected to will be added to the current location set. The call action can be applied to any trigger. The action completes when a non-provisional response is received, or timed out. In general, for an audio-only call, the preconditions of the action are that the audio devices are not occupied by other calls or applications. terminate: The terminate action can terminate all the ongoing sessions or call attempts. The action can be applied to any trigger. It completes when it generates a BYE or CANCEL request. The action has only one output, next, for subsequent actions. There are no pre-conditions required for this action. mail: The mail action in LESS is the same as the mail action defined in CPL [72] Section 7.1. This action can send an to the locations specified in the current location set. This action is a non-signaling action. It has one output, next, for subsequent actions. There are no pre-conditions required for this action. The action can be applied to any trigger. It completes when it generates an and send the out.

82 62 log: The log action in LESS is similar to the log defined in CPL [72] Section 7.2 with a new added parameter recording to record a call. The recording parameter makes it easy to associate a log with a recorded media clip. The log action is a non-signaling action. The action has only one output, next, for subsequent actions. There are no pre-conditions for this action. The action can be applied to any trigger. But the recording attribute is valid only if the action is inside an incoming or an outgoing trigger, or inside the outputs of a call action. The action completes immediately, but will continue to log all the activities related to the trigger that causes this action. wait: The wait action halts LESS script execution for a period of time specified in its duration attribute. The wait action can be applied to any trigger. It completes after the time specified in its duration attribute is passed. The action has only one output, next, for subsequent actions. It does not require any pre-conditions. 3.5 Subactions LESS inherits the subaction and sub tags from CPL for subaction definition. Subactions are actions which can be called from other actions. It allows LESS scripts to reuse a segment of scripts in different decision tree branches. 3.6 LESS extensions Some user agents may support additional LESS features beyond those listed in the basic LESS definition. Some extensions are designed only for experienced service creators,

83 63 such as the queue handling extension, but most extensions can be used by comparatively inexperienced users. LESS extensions are indicated by XML namespaces [23] in LESS scripts. Every extension must have an appropriate XML namespace assigned to it. The XML namespace of every extension must be different from the XML namespace of the basic LESS definition. No extensions can change the syntax or semantics of the basic LESS definition. As I discussed before, the XML schema of LESS has four abstract elements: trigger, switch, action, and modifier. Accordingly, there are four abstract data type: TriggerType, SwitchType, ActionType, and ModifierType. Each trigger must be defined as the substitutiongroup of the abstract trigger element, and has its type as an <xs:restriction> of the TriggerType ; each switch in a LESS extension must be defined as the substitutiongroup of the abstract switch element, and has its type as an <xs:restriction> of the SwitchType ; each action in a LESS extension must be defined as the substitutiongroup of the abstract action element, and has its type as an <xs:restriction> of the ActionType ; and each modifier in a LESS extension must be defined as the substitutiongroup of the abstract modifier element, and has its type as an <xs:restriction> of the ModifierType. Using substitutiongroup preserves the tree-like structure of LESS when introducing LESS extensions and can help to check syntax errors in LESS scripts. Below, I briefly introduce several LESS extensions. More detailed description and service examples of the extensions can be found in the Internet draft, LESS: Language for End System Services in Internet Telephony [144].

84 Media handling extension The media handling extension defines the elements that can change media attributes of calls. It contains one action mediaupdate action and one modifier media modifier. mediaupdate action: The mediaupdate action can change media attributes of all the ongoing calls. The action does not have any parameters but it uses media modifiers to specify the media attributes of the ongoing calls. The mediaupdate action can be applied to any trigger. It has one output, next, for subsequent actions. There are no pre-conditions for this action. The action completes when the media attributes of all the sessions have been updated. media modifier: The media modifier can specify what media and codec to use for a signaling action. For example, the script below accepts a call and plays the hello.au file to the caller. <media media="audio" input="hello.au" mode="sendonly"> <accept/>... </media> Mid-call handling extension LESS can handle three mid-call operations: mute, hold, and transfer. We can use mediaupdate to handle mute and hold by setting the media attributes of calls. Call transfer can only be handled by a new action transfer, which is defined in the midcall handling extension. The extension has another action, merge, which can merge all existing calls into a conference call.

85 65 transfer action: The transfer action transfers all the existing calls to another URI which is specified in the current location set. The action has one parameter, timeout, which indicates the time to try before giving up the transfer attempt. The transfer action can be applied to any trigger. It has six outputs: accepted, busy, noanswer, failure, redirection, and next. These outputs are the same as the outputs defined in the call action. There are no pre-conditions for a transfer action. If a user agent without ongoing calls performs a transfer action, the user agent s LESS interpreter must ignore the transfer action. The transfer action completes when a notification from the target of the action arrives indicating the success of failure of the action, or if the action is timed out. merge action: The merge action can merge all the existing calls into a conference call. Internet telephony endpoints, especially soft phones, can usually handle audio mixing, so hosting a small conference call on an endpoint is viable. The merge action can be applied to any trigger. The merge action has only one output, next, and no pre-conditions. The action completes when it establishes a conference call for all the existing calls User interaction extension The user interaction extension performs actions for alerting users and getting user inputs. It defines one trigger and two actions. command trigger: The command trigger is invoked when a user performs a pre-defined action, e.g., pressing a button. It has an attribute name indicating which action is performed by the user.

86 66 alert action: The alert action can play alerting messages. An alerting message can be an audio file, a text message, vibrating a device, or flashing an icon. After performing an alert action, a LESS interpreter expects human intervention to stop the alerting. The alert action has five attributes. The duration attribute indicates the length of the time to play an alert. The priority attribute gives the priority value of the alerting, possible values are emergency, urgent, normal, and non-urgent. A user agent can use different pre-defined alerting tones for different priorities. The icon attribute indicates the URL of an icon file to be shown on an alerting dialog. The message attribute contains the text message to be shown. The style attribute indicates how to play an alerting message, possible values are vibrate, sound, flash, and text. By default, it is sound. If the style attribute is sound, the file attribute gives the URL of the file. The alert action has two outputs. The timeout output handles the situation that no user intervention after the time indicated in the duration attribute. The next output handles subsequent actions. The alert action completes when there is a user intervention to stop the alerting, or the time exceeds that specified in the duration attribute. This action can be applied to any trigger. getinput action: The getinput action can acquire users input. This action returns a value and stores the value in the {action.last-action-result} variable. The action has three attributes. The source attribute indicates where to collect user input data, possible values are screen and sound, collecting data from a GUI or from an audio input device, respectively. The entity attribute indicates whose input to collect, possible values are local and remote. Local means the input is from the endpoint that runs the script. Remote means the input is from the

87 67 other entities, e.g., DTMF digits in a RTP stream. The timeout value indicates how long to wait to get the input. The action has two outputs. The noanswer output handles the situation that the waiting time exceeds the value specified in the timeout attribute. The next output handles subsequent actions. This action can be applied to any trigger Instant messaging extension The instant messaging extension handles instant messages, including a trigger for handling incoming messages, and an action to send messages. message trigger: The message trigger handles incoming instant messages. There are no attributes for the trigger. sendmsg action: The sendmsg action can generate an outgoing MESSAGE request. The action has four outputs, success, failure, redirection, and next. The success output handles SIP 2xx responses of the SIP MESSAGE request; the failure output handles SIP 4xx, 5xx, and 6xx responses; and the redirection output handles SIP 3xx responses. The next output handles subsequent actions. The action has one attribute, message, containing the message in plain text to send. The sendmsg action completes when it sends out a SIP MESSAGE request. This action can be applied to any trigger Event handling extension The event handling extension is used to handle event subscriptions and notifications. It contains two triggers, one switch, and five actions.

88 68 subscription trigger: The subscription trigger handles incoming event subscriptions. There are no attributes for this trigger. For a SIP user agent, the trigger is invoked when the user agent receives an incoming SUBSCRIBE request. notification trigger: The notification trigger handles incoming event notifications. There are no attributes for this trigger. For a SIP user agent, the trigger is invoked when the user agent receives an incoming NOTIFY request. event-switch switch: This switch allows a LESS script to make decisions based on the event values in an incoming event subscription or notification. The switch can branch users decisions based on the package and the event value of an event. approve action: The approve action can approve an incoming event subscription. This action can only be applied to a subscription trigger. It has no attributes and only one output, next, for subsequent actions. In a SIP user agent, the action completes after sending a 200 response and a NOTIFY message for the incoming event subscription. In the NOTIFY message, the value of the Subscription-State header must be set as active. deny action: The deny action can deny an incoming event subscription. This action can only be applied to a subscription trigger. It has no attributes and only one output, next, for subsequent actions. In a SIP user agent, the action completes after sending a 202 response and an immediate NOTIFY message for the incoming event subscription. In the NOTIFY request, the value of the Subscription-State header must be set as terminated with the reason code as rejected. defer action: The defer action can defer the decision on approving or denying an incoming event subscription. The state of the event subscription will be set as

89 69 pending. It has no attributes and only one output, next, for subsequent action. In a SIP user agent, the action completes after sending a 202 response and a NOTIFY message for the incoming event subscription. In the NOTIFY request, the value of the Subscription-State header must be set as pending. subscribe action: The subscribe action can generate an event subscription to a presentity. The action has seven outputs: the approved output handles approved subscriptions; the denied output handles denied subscriptions; the pending output handles pending subscriptions; the noanswer output handles timed out subscriptions; the failure output handles failed subscriptions; the redirection output handles redirected subscriptions; and the next output handles subsequent actions. The subscribe action has three attributes: the timeout attribute indicates the time to try before giving up the subscription; the package attribute gives the event package which the subscriber is interested in; and the expires attribute gives the duration of the subscription. By setting the expires value as 0, a user can create an immediately expired subscription to poll the ongoing status of a presence agent. The action completes when the user agent receives a non-provisional response and a NOTIFY message for the SUBSCRIBE message it generates. This action can be applied to any trigger. notify action: The notify action can generate an event notification to a watcher. The action has four outputs: the success output handles successful notifications; the failure output handles failed notifications; the redirection output handles redirected notifications; and the next output handles subsequent actions. The notify action has two attributes: the package attribute indicates the event package in the outgoing event notification; and the event attribute indicates the event

90 70 value in the event notification. The action can be applied to any trigger Queue handling extension Many call center services require queuing incoming calls for available operators. This extension defines two actions for queuing operations. enqueue action: The enqueue action can put a call in a waiting queue. A user agent can maintain multiple queues for calls. The enqueue action has one parameter, queue, which provides the name of the queue for an enqueue action. A user agent must maintain a default call queue. If the queue parameter is not provided, calls will be put into the default call queue. The enqueue action has two outputs: the failure output handles the situation that the enqueue action fails, e.g., the queue is full; the next output handles subsequent actions. The enqueue action completes when the information of the queued call has been saved. If a call in a queue terminates, the call will be automatically removed from the queue. This action can only be applied to incoming or outgoing triggers. Note that transfer and mediaupdate actions will not be applied to queued calls. dequeue action: The dequeue action can retrieve a call from a waiting queue. The dequeue action has one parameter, queue, indicating the name of the queue. The action has three outputs: the success output handles the situation that the dequeue action successfully retrieves a call; the failure output handles the situation that the dequeue action failed to retrieve a call, e.g., because the queue is empty; and the next output handles subsequent actions. For a retrieved call, the remote party s address of the call will be added to the current location set. This

91 action can be applied to any trigger. 71

92 72 Chapter 4 Using LESS to program communication services I tried to use LESS to create services defined in ITU Q.1211 [61], 5ESS switches [14], and CSTA Phase III [59]. My service creation experience confirms that LESS can handle most of the end system services defined in those documents. In addition, LESS can handle many new services integrated with other Internet services. This chapter provides a summary of my service creation experience, and discusses how to use LESS to handle conferencing. 4.1 Services for Q.1211, 5ESS, and CSTA phase III A large part of the services defined in Q.1211 and 5ESS switches can be handled by using LESS on end systems. Some services are designed for proxy servers and require the proxy action defined in CPL to handle. There are two kinds of services that cannot be handled by LESS and CPL. One is VPN related services defined in Q.1211, such as

93 73 Off-net access (OFA) and Off-net calling (ONS). Because VPN services are link layer or network layer services, they are out of the scope of LESS and CPL, which are designed for application layer services. The other is charging related services defined in Q.1211, such as Premium Charging (PRMC), Reverse Charging (REVC), and Split Charging (SPLC). Charging services are usually not handled by user operated endpoints. However, the log action defined in LESS and CPL may help to handle charging services. With call logs, people can find out how many calls they have made for how long and then calculate the cost. The detailed description of handling each service defined in Q.1211, 5ESS switches, and CSTA phase III can be found in my technical report End System Service Examples [141]. Below I use a LESS script as an example to show how LESS handles the Automatic Call Back (ACB) service defined in Q The ACB feature allows a called party to automatically call back the calling party for the last call received by the called party. <less> <incoming> <status-switch> <status status="busy"> <reject> <next>  <Queue:enqueue queue="callback"/> </next> </reject>

94 74 </status> </status-switch> </incoming> <Event:notification> <address-switch field="origin">  <address <Event:event-switch> <Event:event package="presence" status="open" activity=""> <Queue:dequeue queue="callback"> <success> <call/> </success> </Queue:dequeue> </Event:event> </Event:event-switch> </address> </address-switch> </Event:notification> </less> The service involves two steps. The first is to queue a received call; and the second is to detect the called party s availability, then retrieve the queued call and call back. Most of the call control services defined in CSTA phase III can be expressed as

95 75 an action defined in LESS, for example, the join call service can be handled by the merge action. But LESS does not handle services other than call control defined in CSTA phase III, such as monitoring services. One interesting category of services defined in CSTA phase III are call movement services, such as the Group Pickup service. Figure 4.2 shows the general situation of the call movement services. before scenario after scenario D1 a,c, h,q C * D2 D1 C * D2 D3 D3 c a: alerting, h: holding, q: queued, c: connected, *: any state D1, D2, D3: different endpoints, C: call Figure 4.1: Call movement scenario As shown in Figure 4.2, the service can move one call from the endpoint D1 to the endpoint D3 once the endpoint D3 picks up the call. This is difficult to handle in a SIPbased Internet telephony system because SIP user agents need to follow the offer/answer model of SIP [105] for call setup. If there are no SIP INVITE messages between D3 and D1, the call cannot be moved to D3. So, if D3 wants to pickup a call, it must receive an INVITE containing the media information of D2 or send an INVITE to D2 to replace the call between D2 and D1. In a circuit-switched environment, a switch can easily handle the call movement services by detecting the off-hook event of D3 then connecting the call to D3. We certainly can use master/slave protocols, such as the Media Gateway Control Protocol (MGCP [13] or Megaco [36]), to control D3. But the master/slave approach breaks the

96 76 peer-to-peer architecture of SIP. a: alerting, h: holding, q: queued, c: connected D1, D2, D3: different endpoints Figure 4.2: Call movement signaling flow One way to handle the service in a SIP based environment is to implement SIP dialog event notifications [102] and end system services in D3. As shown in Figure 4.2, when D3 answers a call from D1, it subscribes to D1 s SIP dialog states. If D1 is in a call, D3 will automatically send an INVITE to D2 to replace the call between D1 and D2 by using SIP Replaces header [80]. The script below shows how to program the Group Pickup service in LESS. <less> <UI:command command="pickup"> <location url="sip:d1@example.com"> <Event:subscribe package="dialog" expires="0"/>

97 77 </location> </UI:command> <Event:notification> <address-switch field="origin"> <address <Event:event-switch> <Event:event package="dialog" state="trying,proceeding,early,confirmed"> <location uri="{trigger.event.dialog.remote}"/> <call replace="{trigger.event.call-id}"/> </location> </Event:event> </Event:event-switch> </address> </address-switch> </Event:notification> </less> 4.2 Using LESS to program conferencing services We can program a presence-enabled conferencing service by using LESS. The service can check the presence status of essential conference participants and start the conference only when all the essential participants are online. The service can reduce the waiting time for people to join a conference.

98 78 We can also use LESS to program location-based conferencing services. When a group of people meet in a conference room, the location update event can invoke an action to automatically place them in a conference. These participants can then share slides, and get their conversation recorded by the conference server. External participants can also be invited into the meeting Program presence-enabled conferencing services <less> <timer dtstart=" t090000"> <location url="sip:vip_tom@foo.com"> <EVENT:subscribe package="aggregation"/> </location> </timer> <EVENT:notification> <address-switch field="origin"> <address url="sip:vip_tom@foo.com"> <EVENT:event-switch> <EVENT:event package="aggregation" is="match"> <lookup source="participants"> <success> <call/>... Figure 4.3: Presence-enabled conferencing script Figure 4.3 shows a LESS script to program presence-enabled conferencing. The script is invoked by a timer event. The start time of the timer event can be set to the scheduled conference start time. The script first tries to send a subscription to a presence agent for the aggregated presence status [98] of all the essential participants. The content of the subscription is shown in Figure 4.4 and discussed in detail in Chapter The script then waits for a notification for the aggregated presence status. Once all the

99 79 essential participants are online, the script will receive a notification and makes calls to all the participants Presence aggregation <trigger> <all> <match contact="sip:tom@example.com" package="presence" status="open"/> <match contact="sip:bob@example.com" package="presence" status="open"/> <any> <match contact="sip:alice@foo.com" package="presence" status="open"/> <match contact="sip:steve@foo.com" package="presence" status="open"/> </any> </all> </trigger> Figure 4.4: Simple event aggregation script It is easy for LESS to check one person s presence status and perform actions. However, it is difficult to check multiple presence status simultaneously and trigger actions based on the combined status. This requires aggregation of all the subscriptions. To keep the service logic easier to implement and understand, it is preferable to handle the aggregation in a separate presence agent. The conference server handles the service logic based only on the aggregated event. For example, if a conference server wants to wait until Tom and Bob from example.com, and one of Alice and Steve from foo.com are online to start a conference, it can simply put the script shown in Figure 4.4 as the content of the subscription.

100 80 The aggregation can be more complex if more conditions such as time, callee s capabilities, and language preferences are considered. For example, if Tom serves as the moderator of a conference and a PSTN phone is not convenient for him to perform conference control functions, the presence aggregation may check the URI of Tom s user agent. If the user=phone parameter is present in Tom s contact URI, the presence agent should not consider this contact as a match for the presence event aggregation. The presence agent that handles presence aggregation should de-couple each aggregated subscription and sending individual subscriptions to all the corresponding parties. An alternative to event aggregation is to define an aggregated presentity such as my-group@domain, and define rules such as my-group is online when all of its members are online. Rules can be simple logical AND or OR, or a more complex function of individual events. For simple cases such as a room, the entity can be defined naturally, e.g., room460@columbia.edu Presence aggregation with location-based services The presence aggregation can further interact with location-based services. Suppose Tom, Bob and Alice would like to have a conference. Their user agents can know their physical locations by infrared location sensors, Bluetooth beacons or DHCP options [111]. The user agents then publish the location information to their presence agents. The presence agents will notify the conference server, which subscribes to their physical location information [93]. The conference server discovers that Tom and Bob are in the same building, whereas Alice is away. With Service Location Protocol (SLP [51]), the server may find that room 460 is close to both Tom and Bob and has good communication capabilities. The conference server sends instant messages to Tom and

101 81 Bob asking them to go to that room. When the server discovers that both Tom and Bob are in that room, it starts a conference, and invites the devices in room 460, such as the room speakers and video projectors, to the conference. It also invites Alice to the same conference. This way, Tom and Bob can talk to each other face-to-face, and teleconference with Alice.

102 82 Chapter 5 Service creation tools for LESS There are several ways to create LESS scripts. Manually editing is certainly a choice, but it is not for inexperienced users and too tedious and error prone. I developed two service creation tools for LESS: one uses a web interface, and the other uses a graphical user interface (GUI). The LESS service creation process can be divided into two stages. The first stage generates a service template, and the second replaces the variables in the template to real values. This chapter introduces my service creation tools and the two stage service creation process. 5.1 Web-based service creation The web-based service creation for LESS can be server-based or browser-based. The server-based approach has the service creation logic running on a web server, for example, using a Common Gateway Interface (CGI) [4] script to handle the service creation process. A user can provide arguments for LESS triggers, switches, or actions. The CGI script can then generate further service creation options.

Figure 5.1: Browser-based LESS service creation Figure 5.2: Choose switches or actions Figure 5.

103 83 The browser-based approach has the service creation logic running on users browsers, for example, using a client-side JavaScript script. The advantage of using the browser-based approach is its efficiency. There is no back-and-forth HTTP packets exchange during the service creation. Figure 5.1: Browser-based LESS service creation Figure 5.2: Choose switches or actions Figure 5.1 shows the browser-based LESS service creation interface that I developed by using client-side JavaScript. A user can click on the pull-down menu to decide how to handle a trigger, as shown in Figure 5.2.

104 84 The pull-down menu lists available switches or actions to choose for a trigger. Once a user chooses a switch or an action, the JavaScript script will generate new entries for the user to configure the attributes of the switch or the action. Different switches or actions have different attributes. For example, if the user chooses Check the time, the new entries will allow the user to specify starting time and ending time of a time-switch. But if the user chooses Check the priority of the call, the new entries will ask the user for a priority value. Once a user finishes the form, the user can click the Submit button to generate a LESS script. The LESS script can be displayed in the web browser and saved locally, or sent to a web server for service distribution. 5.2 GUI-based service creation Figure 5.3: Columbia University Telecommunication service Editor (CUTE)

105 85 Figure 5.4: Constructing a call handling decision tree Figure 5.3 shows the service creation tool I developed. The tool is called CUTE, which stands for Columbia University Telecommunication service Editor. To guide inexperienced users to create their desired services, CUTE lists its available tasks when it starts. Users can simply pick up a task, CUTE will list available operations to handle the task as shown in Figure 5.4. By clicking the pull-down menus and choosing desired operations, users can build up a decision tree. Experienced users can also drag-and-drop triggers, switches, and actions to the service creation panel, and connect them together to generate decision trees. CUTE can convert decision trees to LESS scripts.

106 86 less.xsl XSLT LESS editor XML editor service.less (template) LESS designer service.html translate.cgi user Configuration editor service_foo.less Figure 5.5: Two-stage LESS service creation 5.3 Two-stage service creation Users can directly generate their desired services by using the service creation tools I developed. However, it is more efficient to break the service creation process into two stages. The first stage is to create a service template, represented in Figure 5.5 as service.less. The service template is written in LESS, but uses conventions for user-configurable values. For example, <address is="{$var}"> means that the address value is configurable, and {$var} should be replaced later by user input. The second stage is to instantiate the service template and generates usable LESS scripts. Users can use a graphical editor to fill in the variables in the template. The template can also be translated into an HTML page by extensible Stylesheet Language (XSL) and XSL Transformations (XSLT). During the translation, by using the xsl:if tag, the XSLT script can check the value of each attribute. If the value is {$var}, the XSLT can generate an HTML input tag so that users can input values in an HTML form. Finally, a user-configured LESS script can be generated by the translate.cgi.

107 87 Chapter 6 Evaluating LESS 6.1 The simplicity and safety of LESS LESS is designed for programming communication services in end systems. The goal of the language is to allow comparatively inexperienced users, such as college students, to use the graphical LESS service creation tool or the web-based service creation tool to create a wide range of services based on the basic LESS definitions. More experienced users can create more complicated services with LESS extensions and LESS pre-defined variables. To achieve the goal, the language must represent a high-level abstraction of communication behaviors as I described in Chapter 3.1. The elements in the language must have semantic meanings. The language has to be easy to learn and safe. In this section, I will show how the language provides simplicity and safety for communication service creation.

108 Simplicity The simplest way to achieve simplicity is through thoughtful reduction [76]. LESS is not designed to handle all communication services. Instead, it uses very limited elements to represent commonly used communication services. As I discussed in End System Service Examples [141], most of the end system services defined in ITU Q.1211 [61], 5ESS switches [14], and CSTA Phase III [59] can be programmed by LESS. Organization makes a system of many appear fewer [76]. LESS uses a tree-like structure to represent services. The tree-like structure can organize LESS elements in a defined order with triggers as the roots of trees and actions as leaves of trees. Our end user service creation survey, which I will introduce in Chapter 6.2, proves that users can easily understand the tree-like structure for their service creation. Savings in time feel like simplicity [76]. In our end user service creation survey, college students can start to create LESS-based services by simply watching a three minute movie. The savings on learning time make LESS felt simple. Openness simplifies complexity [76]. LESS is an open language. It can be easily extended. The basic definition of LESS contains most of the commonly used services. More complicated services can be handled in LESS extensions, such as call queueing and presence-based call handling. The fewer concepts to understand in a language the better [127], though, in some cases two smaller concepts might be simpler and more flexible than one more powerful but complicated concept. For example, separating commonly used action arguments, such as a person s URI, from actions, is better than putting the arguments into every single action. Simplicity enters in four aspects: uniformity (rules are few and simple), gener-

109 89 ality (a small number of general functions provide as special cases of more specialized functions), familiarity (familiar symbols and usages are adopted whenever possible), and brevity (economy of expression is sought) [40]. For LESS, we consider analyzability is more important than brevity because LESS is an XML-based language designed for communication services. In communication service creation, users may encounter feature interaction [142] problems; how to analyze and solve feature interactions is essential for users to create modularized services. We discuss the generality, uniformity, familiarity, and analyzability of LESS below. Generality Generality requires a language to have a small number of general elements. The special elements of the language can be introduced by extending the general elements. As illustrated in Chapter 3.1, LESS has four types of elements: triggers, switches, actions, and modifiers. Every new element defined in LESS must fall in one type of the elements. In other words, in LESS s XML schema, LESS defines four basic abstract elements, namely trigger, switch, action, and modifier, every LESS element must be a substitutiongroup of one of the four abstract elements. Uniformity Uniformity requires a language with few and simple rules to generate valid programs. There are four rules in LESS. The rules are summarized below: Trigger rule: A trigger must be the root of a LESS decision tree and can appear no more than once in a LESS script. Switch rule: Switches must be internal nodes of a LESS decision tree.

110 90 Action rule: Only actions can be LESS decision tree leaves. Modifier rule: A modifier can only be used as the parent element of actions. Familiarity Familiarity requires a language not to violate common use of notation. LESS is an XMLbased language and uses the datatypes defined in XML schema. The elements defined in LESS all have semantic meaning so it is easy for LESS service creators to understand and remember. Analyzability Analyzability requires easy program inspection. For LESS, we are more interested in how LESS handles feature interaction problems in communication services. Feature interactions among multiple LESS scripts can be easily handled by using action conflict tables and a tree merging algorithm I designed. I discuss the tables and the algorithm in detail in Part II Safety Because LESS may be used by comparatively inexperienced users to program services, we expect that LESS service creators may produce errors that seem naive by experienced programmers. We also expect that people may download third-party created malicious service scripts. Thus, error prevention mechanisms are required for service creation. The error prevention mechanisms for LESS fall into two categories: one is to put restrictions on the language itself, the other is to put restrictions on LESS interpreters. We are

111 91 more interested in the restrictions on the language itself because that directly affect the language design. Type safety Type checking is an effective way to catch programming errors, from the trivial errors, such as misspelled identifiers, to the fairly deep errors, such as violations of data structure invariants. LESS is an XML-based language and uses XML schema [42] to define its elements. The strong typing mechanism in XML Schema, along with the large set of intrinsic types and the ability to create user-defined types, provides for a high level of type safety in instance documents. This feature can be used to express more strict data type constraints, such as those of attribute values, when using XML Schema for validation [42]. In LESS, there are no user defined variables so only static type checking is needed. There are many advantages using a statically type-checked language. Static type checking can provide earlier, and usually more accurate, information on programming errors. It can eliminate the need for run-time checks, which may slow program execution. Statically type-checked languages may be less expressive than dynamically type-checked languages. However, because LESS allows comparatively inexperienced service creators to create services, safety is more important than expressiveness. Control flow safety LESS use a tree-like structure to make call decisions. There are no back-references from a node to its ancestors or itself so there are no loops or recursion in LESS scripts, and it

112 92 excludes the possibility of non-terminating or non-decidable LESS scripts. In addition, the LESS trigger rule ensures that a specific trigger can appear no more than once in a LESS script. This rule helps to avoid run-time feature interactions in LESS scripts. Memory access A language can be described as unsafe in that the language allows some means to access memory directly. In LESS, there are no pointers, no direct memory accesses, and even no user-defined variables. LESS has maximum string length defined in its XML schema for every element with string type to prevent buffer overflow attack. If a user put a very long string in a service script, the service script will be considered invalid when checking against the LESS XML schema. LESS interpreter safety There are several ways to ensure the safety of LESS scripts by appropriately building LESS interpreters. The developers must ensure that every trigger has default actions defined. If a trigger gets invoked in a LESS interpreter, but there are no actions defined for the matching context, the LESS interpreter must perform the default actions for that trigger. This makes LESS scripts decidable. LESS is different from CPL [72] when dealing with program safety. CPL is designed for running on signaling servers, such as SIP proxy servers. The safety consideration for CPL ensures that semi-trusted users cannot create malicious or incompetent service scripts to interrupt other users services, including crashing the server, revealing security-sensitive information, and causing denial of services. While LESS is used on users end devices, in general, it does not need to handle interference among multiple

113 93 users. However, LESS interpreter should ensure safe resource usage, such as CPU usage. It needs to define how deep a decision tree can be, what is the minimum interval for a timer trigger, and what multimedia devices a LESS script can control. Using third-party created or auto-generated service scripts Sometimes, users may not be aware of a new service or may not know how to create a service. Allowing third-party created services, or having services created by feature learning can bring great help to users. However, users have to check the safety of the third-party created or auto-generated scripts, for example, to make sure that the scripts will not forward their calls to unwanted parties. The safety consideration discussed in previous sections can be applied to thirdparty or auto-generated service scripts. In addition, the tree-like structure of LESS make it possible to convert any valid LESS scripts to graphical representation of decision trees. This is a big advantage of LESS for users to do safety checking. Users do not have to check LESS programs line by line, instead, they can watch the graphical representation and easily find out what actions a service script will perform Summary This analysis shows that LESS is well-suited for comparatively inexperienced service creators to create end system services. LESS follows CPL s design strategy to limit the power of the language, but powerful enough to describe a large number of services and features, as described in our technical report End System Service Examples [141].

114 A survey of service creation by end users I conducted a survey on end users willingness and capability to create their desired services. The survey is based on the graphical service creation tool I implemented for LESS. We call the tool CUTE, which stands for Columbia University Telecommunication service Editor (Chapter 5.2). Survey participants needed to take three steps to complete the survey: first, watch a short movie (2 minutes and 37 seconds) showing an example of using CUTE to create services; second, use CUTE to create services for three scenarios; and third, fill out an online form for their background information and service creation experience. The three scenarios (scenario 1, 2, and 3) in the second step have different complexities but all treat incoming calls. scenario 1 differentiates calls based only on the callers addresses; scenario 2 differentiates calls based on both the callers addresses and time of the calls; and scenario 3 differentiates calls based on the callers addresses, status of the callees, and the priority of the calls. The survey participants are divided into three groups: Group 1 consists of experienced computer users with some knowledge of telecommunication services. This group includes nine people (master or Ph.D. students) from the Internet Real-Time Laboratory (IRT) at Columbia University and one master student from University of Ottawa. They volunteered to do the survey; Group 2 consists of experienced computer users that know little about telecommunication services. This group includes five undergraduate students from the Computer Science Department at Columbia University; and Group 3 consists of people using computers mainly for text editing and web browsing, including four graduate students from the dental school, business school, medical school, and statistics department respectively, and one faculty member from the biostatistics department. We

115 95 paid group 2 and 3 participants ten dollars each for doing the survey. The survey result shows that the survey participants are willing and capable of creating end system services by their own. Below I present and analyze the survey results regarding how many participants would like to create services by their own, and whether they can correctly create services for scenario 1, 2, and 3, what services they are interested, whether they like CUTE, whether they can understand LESS, and whether they are aware of feature conflicts and how they would like to detect and resolve feature conflicts. Because this survey requires survey participants background information, such as the department they are working at, we had the survey reviewed by Columbia University Institutional Review Board (IRB). IRB concluded that this survey was not human subjects research thus did not require further review by IRB (IRB AAAA5250 (Y1M00)) The willingness of users to create services Figure 6.1 shows how many participants would like to create their own services. Overall, 70% of participants are interested in creating all their desired services. This is an unexpected result to me as presumably users cannot and should not create complicated services. I expected that most participants would create simple services by themselves but ask professionals for complicated ones. However, the result shows that only 15% of the participants would like to ask professionals to create complicated services. One explanation is that most participants do not need complicated services. Another explanation is that they had not thought about any complicated services when doing the survey. No matter what the explanation is, the result reveals one fact that most participants (85% of the participants, including 90% of group 1 participants, 60% of group 2 participants, and 100% of group 3 participants) are willing to create all or part of their desired services.

116 96 Do you like to create your telecommunication services by your own or pay a professional to create them? 1. I would like to create all my services. 2. I would like to create some of my services, but pay a professional to create some complicated services for me. 3. I would like to pay a professional to create all services because that s more reliable. 4. I do not need to use any supplemental telecommunication services at all. I only need direct end-to-end call. 5. Other All by user Some by user All by professionals No services needed Other Overall Group Group Group Percentage of participants 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Overall Group 1 Group 2 Group 3 Group Other No services needed All by professionals Some by user All by user Figure 6.1: Users willingness to create services The capability of users to create services Users willingness alone cannot ensure successful service creation. The capability of users to create services is also important. The second step of the survey has three scenarios to evaluate the capability of users to create services. Below is the description of the three scenarios: Scenario 1 When John Smith calls me, forward the call to ; Scenario 2 I have a meeting on 05/30/2006, from 9:00AM to 11:00AM. For an incoming call during the meeting, if the call is from my boss, John Smith, I will answer the call. Otherwise, I will forward the call to my voic

117 97 Correct Partially correct Wrong Did not do Overall Group Group Group Creating service for scenario 1 Percentage of participants 100% 80% 60% 40% 20% 0% Overall Group 1 Group 2 Group 3 Group Did not do Wrong Partially correct Correct Figure 6.2: Service creation for scenario 1 at ; Scenario 3 For an incoming call while my activity is on the phone, if the call is not from my boss, John Smith, and is not an emergency call, I will reject the call. Figure 6.2 shows that 90% of the participants (80% for group 1, 100% for group 2, and 100% for group 3) can correctly create the service for scenario 1. If we exclude the participants who did not create the service (10%), all participants can correctly create the service for scenario 1. We can then conclude that end users with some computer experience can handle simple service creation, such as handling calls based only on the caller s address. Figure 6.3 shows that 65% of the participants (60% for group 1, 80% for group 2,

118 98 Correct Partially correct Wrong Did not do Overall Group Group Group Creating service for scenario 2 Percentage of participants 100% 80% 60% 40% 20% 0% Overall Group 1 Group 2 Group 3 Group Did not do Wrong Partially correct Correct Figure 6.3: Service creation for scenario 2 and 60% for group 3) can correctly create the service for scenario 2. If we exclude the participants who did not create the service, 81% of the participants can correctly create the service. Figure 6.4 shows that 80% of the participants (70% for group 1, 100% for group 2, and 80% for group 3) can correctly create the service for scenario 3. If we exclude the participants who did not create the service, 84% of the participants can correctly create the service. The service creation results of scenario 2 and 3 prove that end uses can create some more complicated services, but not every user can correctly create these services. Hence, it is important to define and design viable ways to find out potential incorrect

119 99 Correct Partially correct Wrong Did not do Overall Group Group Group Creating service for scenario 3 Percentage of participants 100% 80% 60% 40% 20% 0% Overall Group 1 Group 2 Group 3 Group Did not do Wrong Partially correct Correct Figure 6.4: Service creation for scenario 3 services, such as by learning from users call history, or by simulating use cases Evaluating CUTE Figure 6.5 shows that 80% of the participants (80% for all groups) can easily use CUTE to create services. On one hand, this shows that CUTE has a relatively easy-to-use interface. On the other hand, this also proves that end users can create communication services if there is an appropriate service creation tool. In addition, this proves that the tree-like representation of services is acceptable to users because CUTE s user interface presents services in decision trees.

120 100 Do you feel comfortable to use CUTE to create your telecommunication services? 1. I feel very comfortable to use CUTE to create my services. 2. I can easily use CUTE to create my services, but it requires some improvements to make the tool more useful and friendly. 3. I have some difficulties in using CUTE, but with some additional training, I am confident that I can use the tool. 4. I do not know how to use CUTE, and additional training will not help. 5. Other Very comfortable Easily Some difficulties Don't know how Other Overall Group Group Group Use CUTE to create services Percentage of participants 100% 80% 60% 40% 20% 0% Overall Group 1 Group 2 Group 3 Group Other Don't know how Some difficulties Easily Very comfortable Figure 6.5: Using CUTE to create services Evaluating LESS Since CUTE is based on LESS, we also ask survey participants to evaluate LESS. In general, end users need not to read service source code. However, because we have noticed that HTML s ease of learning and the view source capability for browsers has bootstrapped the Web s popularity, we think the readability of LESS is very important. Thus, we ask survey participants to read the source code of the services they created, but we did not train them on LESS. Figure 6.6 shows that many survey participants (65%

121 101 Do you think that the "source code" of the services is easy to understand? 1. Yes, the "source code" is easy to understand. 2. I can only partially understand the "source code". 3. I cannot understand the "source code". 4. I had not read the "source code" of the services at all. 5. Other Easy to Partially Not Not reading understand understandable understandable source code Other Overall Group Group Group Is LESS source code understandable? Percentage of participants 100% 80% 60% 40% 20% 0% Overall Group 1 Group 2 Group 3 Other Not reading source code Not understandable Partially understandable Easy to understand Group Figure 6.6: Understanding LESS source code of the participants, 70% for group 1, 80% for group 2, and 40% for group 3) can easily understand the source code of the services they created, even without any training. It proves the readability of LESS Services of interest In addition to collecting information on users service creation experience, the survey also asks the participants what services they are interested in. Table 6.1 shows the an-

122 102 Very Somewhat Not interested interested Useless understandable (score: 3) (score: 2) (score: 1) (score: 0) Average location-based time-based user-based alerts changing context priority-based changing status presence-based language-based Table 6.1: User-interested services swer. In the table, location-based means handling calls based on the users physical locations, e.g., Automatically change the ring style to vibrating when I am in a movie theater ; time-based means handling calls based the time, e.g., Forward calls to voic before 8:00AM ; user-based means handling calls based on the caller or the callee s address, e.g., Forward calls to voic if Alice calls ; alert means notifying users by for incoming calls; changing context means changing the communication environment for a conversation, e.g., Pause Windows media player when accepting an incoming call ; priority-based means handling calls based on the priority of the call, e.g., Forward urgent calls to my cell phone, otherwise, to voic ; changing status means detecting and updating users status information automatically, e.g., Change my status to on the phone when I accept a call ; presence-based means handling calls based on presence status, e.g., Reject calls when my presence status is busy ; language-based means handling calls based on users language preference, e.g., Reject non-english calls. Users are most interested in location-based services, and least inter-

123 103 ested in language-based services. Note that none of the survey participants were working in an enterprise environment. For enterprise users, the answer may be different. For example, language-based services are not very useful for individual residential users, but can be very useful for a customer service center Handling feature conflicts You may create multiple services to handle different situations. For example, you may create a service to automatically answer calls from your boss. If you have a meeting tomorrow, you may then create another service to automatically reject calls during the meeting. Are you aware that there is a conflict between these two services? 1. Yes, I noticed the conflict as soon as I saw these two services, and could understand the conflict. 2. Yes, I can understand the conflict after I read the following hint. (Hint: If your boss calls you durning the meeting, the first service will answer the call, but the second service will reject the call). 3. No, I cannot understand the conflict. 4. I cannot understand the question at all. 5. Other Aware of With hint Not aware of Don't understand Other Overall Group Group Group Percentage of participants 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Overall Group 1 Group 2 Group 3 Group Other Don't understand Not aware of With hint Aware of Figure 6.7: Being aware of feature conflicts Correctly handling feature conflicts is very important for successful service cre-

124 104 For the conflicting services, how would you like to have the conflict be detected? 1. By myself. 2. By the service creation tool, such as CUTE. 3. By other professional service creators. 4. No need to detect the conflicts at all. 5. Other Myself CUTE Professionals No need Other Overall Group Group Group Percentage of participants 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Overall Group 1 Group 2 Group 3 Group Other No need Professionals CUTE Myself Figure 6.8: Detecting feature conflicts ation. We claim that end users must be involved in feature conflict resolution, which requires end users to understand feature conflicts. Hence, the survey asks survey participants whether they can understand feature conflicts or not. Figure 6.7 shows that 50% of the participants (50% for group 1, 60% for group 2, and 40% for group 3) were aware of the feature conflicts we designed without any hint, and 40% participants (30% for group 1, 40% for group 2, and 60% for group 3) can understand the conflicts with some hint. The result suggests that it is practical to present feature conflicts to users so as to involve users in resolving the conflicts.

125 105 For the services conflicting, how would you handle the conflict? 1. I will manually change my services (e.g., manually set the second service with a higher priority than the first service so calls during the meeting will be rejected). 2. I would expect CUTE to provide some options (e.g., if your boss calls you during the meeting, option 1, answer the call; option 2, reject the call; option 3, forward the call to my voic ) to me so I can select one option for handling the conflict. 3. I would expect CUTE to learn from my previous communication behaviors and automatically resolve the conflict for me. 4. I did not see any problems for the conflict so I do not need to handle the conflict. 5. Other Manually Provided with choices Automatically by CUTE No need Other Overall Group Group Group Percentage of participants 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Overall Group 1 Group 2 Group 3 Group Other No need Automatically by CUTE Provided with choices Manually Figure 6.9: Handling feature conflicts There are many different ways to involve users in resolving feature conflicts. Users may detect feature conflicts by themselves and manually resolve the conflicts. They may also use a tool, such as CUTE, to help them detect and resolve the conflicts. There are two survey questions investigating users preference on detecting and resolving feature conflicts. Figure 6.8 shows that 85% survey participants (70% for group 1, 100% for group 2, and 100% for group 3) would like to use CUTE to detect feature conflicts. Figure 6.9 shows that 80% of the participants (90% for group 1, 80% for group 2, and 60% for

126 106 group 3) would like to handle feature conflicts based on the choices provided by CUTE, and 15% of the participants would like CUTE to automatically handle feature conflicts. Based on Figure 6.8 and 6.9, it is important to develop a feature interaction handling algorithm in CUTE to handle LESS-based feature conflicts Summary In summary, the survey shows that relatively inexperienced users are willing and capable to create their desired services, and our LESS-based service creation tool, CUTE, fits their needs. In addition, many users can easily understand LESS source code. The survey also reveals that users can understand feature conflicts and would like to resolve feature conflicts based on the choices provided by service creation tools, such as CUTE.

127 Part II Handling feature interactions in LESS 107

128 108 Chapter 7 Handling feature interactions in LESS In telecommunication systems, different services may interfere with each other, a problem known as feature interaction. Feature interaction is an inevitable by-product of feature modularity. [149] While modularity enables efficient service creation, it may also cause feature interactions among multiple service scripts running on one or more devices. When users create new services, they often focus on their immediate needs without checking existing services, so feature interactions are likely to arise between newly created scripts and the old ones. Feature interactions exist in all telecommunication systems, including telecommunication end systems. Because LESS is used to describe services running on telecommunication end systems, feature interactions can also happen among LESS scripts. One design goal of LESS is to facilitate the easy detection and resolution of feature interactions among LESS scripts. In this chapter, I introduce a tree-merging algorithm to detect interactions among LESS scripts. The algorithm is based on the LESS action conflict tables, which I carefully define for analyzing LESS-based feature interactions. Once a feature interaction is detected, my algorithm can clearly identify the conditions that may cause the interaction, and the service management system I devel-

129 109 oped can then guide users in resolving any feature conflicts detected. This chapter is organized as following. In Chapter 7.1, I detail the differences between LESS and CPL in handling feature interactions and discuss the existing methods. I then present the LESS action conflict tables in Chapter 7.2. Chapter 7.3 discusses the tree-merging algorithm I developed based on those action conflict tables. In Chapter 7.4, I present the implementation of the LESS-based service creation and execution environment for handling feature interactions. 7.1 Related work Several methods already exist to handle feature interactions in CPL, but because feature interactions in LESS are very different from those in CPL, those methods cannot fully handle the feature interactions in LESS. Xu et al. proposed to translate CPL into a formal language to check feature interactions [148]. Amyot et al. developed a tool called FIAT for filtering inconsistencies among features, then implemented a translator to convert CPL to the FIAT input language [12]. FIAT has a user-friendly web-based user interface for handling conflicts detected among CPL scripts. Nakamura et al. analyzed possible semantic warnings in an individual CPL script, then extended the analysis to multiple scripts by defining an operator to combine multiple scripts into one [86]. Because LESS is extended from CPL, we can use these approaches to detect and resolve feature interactions. However, because of the many differences between LESS and CPL, the existing work cannot fully handle feature interactions in LESS. All the existing work focuses on call routing services because they are CPL-based. Thus, they do not handle signaling actions for endpoints such as accept, terminate, and call. These new signaling actions in LESS can introduce many interesting feature

130 110 interactions that could not occur among CPL scripts. Though the existing work may be extended to handle feature interactions in LESS, the work in this chapter, especially the action conflict tables, can be of great help for extending the existing work. I also noticed that both Nakamura and Xu s work handle feature interactions among multiple users scripts because CPL scripts run on proxy servers that are usually owned by ISPs. ISPs may have the right to access multiple users service scripts. However, LESS scripts usually run on endpoints that cannot access other users scripts. Thus, in general, I do not expect to handle feature interactions among multiple users LESS scripts. In fact, even if there were a centralized server that could access all the LESS scripts of multiple users, privacy concerns would make it difficult to resolve the detected interactions. For example, user Bob has a script that keeps calling Alice every 10 minutes, but Alice has a script that rejects all calls from Bob. According to Nakamura s work, the combined script causes a CRAE (call rejection in all execution paths) semantic warning. But can we inform Bob that his calls to Alice will always be rejected? The answer depends heavily on the social context surrounding Bob and Alice. For example, in an enterprise environment that requires employees to share service information, Bob may be informed about the conflict. In this case, multi-user feature interaction handling is possible and very useful. But in a residential environment, to protect Alice s privacy, Bob should not be informed. In the latter case, the conflict cannot be resolved. Neither Nakamura nor Xu addressed how to resolve feature interactions. The FIAT system of Amyot et al provides a viable way to resolve feature interactions. FIAT has a web interface and involves end users in resolving feature interactions. It provides suggestions like Add EXCEPTION, DISABLE, SET PRIORITY, and TOLERATE,as

131 111 well as human-understandable explanations to guide users in resolving the detected feature interactions. I consider it one practical way to involve end users in resolving feature conflicts. My implementation also uses a popup dialog that shows users the detected feature conflicts and asks users to make a choice. My feature interaction detection algorithm can clearly identify the conditions that may cause the detected interactions. The conditions can then be presented to users in a human-readable way and the users then make decisions. The users decisions can be saved in a service management system. The user interface in my system for resolving feature interactions is different from that in the FIAT system in that I do not define different suggestions for users, but instead ask users to make a choice among the actions they originally defined in their own services. Based on users choices, my system can automatically prioritize, disable, or merge services, or tolerate the conflicts based on users choices. In addition, I integrated the service learning and service risk management work I did into my feature interaction handling implementation. Users can receive suggestions that have been inducted from their call histories and have more options for reduced-risk call handling actions. I detail my implementation in Chapter 7.4. There is also some existing work on feature interaction in policies [41], and on using logic programming [19][49] to detect feature interactions. These efforts can handle event-based call processing, which is also the call processing model for LESS and CPL. But they do not address the feature interactions in endpoints among end user created services, which is the focus of this chapter. Some researchers have proposed to use architectural approaches to deal with feature interactions in general. Architectural approaches attempt to clearly define the relationships among features to make feature composition possible. One simple architectural

132 112 approach statically defines the precedence of all the features, then executes features in order. This approach is insufficient to deal with complex communication services because in many cases, the precedence of features is dynamic when performing communication services. Other more sophisticated architectural approaches are available to handle interactions among well-modularized features, such as the pipe-and-filter architecture [46], the Distributed Feature Composition (DFC) [65], and the agent-based architecture [50]. I will not describe these architectural approaches in detail but note one fact that makes them unsuitable for end-user-created LESS scripts. All these approaches assume that features are carefully designed and modularized, and thus handle feature interactions based on this assumption. They assume features following the pipe-and-filter architectural style: Feature components are independent, they do not share state, they do not know or depend on which other feature components are at the other ends of their calls (pipes), they behave compositionally, and the set of them is easily enhanced [46] [65]. This assumption can be held for many existing PSTN services, which are designed by professional service designers, and are carefully checked to make them compositional. However, for services created by inexperienced service creators, such as CPL or LESS scripts, this assumption is unlikely to hold. Users may create an ill-formed feature that overlaps with existing features or a monolithic script that should be divided into multiple modules. It is difficult to use architectural approaches to handle these illformed features in end systems. Sometimes, architectural approaches may be required to reconstruct the ill-formed features, and compose the reconstructed features to achieve the expected results. The reconstruction certainly gives the features a better format, but it is not the original format the users are familiar with. For example, in Figure 7.1,

133 113 feature1 and feature2 conflict when the time is between 2:00PM and 3:00PM on Dec 25, The preferable service shown in the figure takes feature1 s decision if the caller is sip:t@a.com, but takes feature2 s decision if not. Defining the precedence between the two services cannot resolve the conflicts. However, merging two trees into one, as the figure shows, can easily resolve the conflicts. incoming 2:00PM ~3:00PM 12/25/04 incoming caller is sip:t@a.com caller is sip:t@a.com incoming 2:00PM ~3:00PM 12/25/04 redirect to sip:s@b.com accept accept reject redirect to sip:s@b.com reject accept feature-1 feature-2 preferable service Figure 7.1: Merging two trees to get a preferable service 7.2 Feature interaction detection in LESS Feature interactions among multiple LESS scripts may occur if multiple actions are invoked at the same time. There may be no interactions among the features involved; sometimes, the interactions are desired, but in many situations features conflict. Consider the example for handling an incoming call, one script performs an action to accept the call, while another script performs an action to log the call. The accept and log actions do not interact. In another example, when a user is already in a call session and receives a new call, one script performs an action to transfer the existing calls, while another script performs an action to automatically accept the new call. If there is only one audio input/output resource in the user agent, the accept action must be performed

134 114 after the transfer action. In this case, the feature interaction is desired. Based on this observation, the relationships among actions must always be analyzed. Because we must also consider the availability of resources of an endpoint, such as the number of audio devices, in deciding whether two actions conflict, we must analyze the relationship between feature interactions and end system capabilities. E.J. Cameron et al. [29] classified feature interactions into three dimensions customer-system, single-multiple user, and single-multiple component dimension and five categories SUSC (single-user-single-component), SUMC (single-user-multiplecomponent), MUSC (multiple-user-single-component), MUMC (multiple-user-multiplecomponent), and CUSY (customer-system). A component represents an entity that can perform services in a telecommunication system, such as a proxy server, a switch, or an IP phone. End system services usually experience single-user interactions. In other words, in this chapter I focus only on SUSC and SUMC feature interactions. One exception is the interaction between the caller s preferences and the callee s service scripts, which involves multiple users. I discuss this kind of multi-user feature interaction in Chapter In the PSTN, feature interactions may occur when multiple users share one end device. However, in Internet telephony, different people usually have different URIs even if they own the same end device. This is different from the PSTN situation, in which multiple persons sharing one end device also share one logical address the phone number. Because of this difference, in Internet telephony end systems, we can still perform single-user feature interaction handling even if multiple users use one device. For single-user feature interactions, we can define preconditions and expected results for LESS actions. Based on the preconditions and expected results, we can con-

135 115 struct action conflict tables and use the tables to detect feature interactions. Because action conflict tables are required to handle LESS-based feature interactions, the designers of the new LESS extensions must define action conflict tables for their extensions. In the sections that follow, I categorize LESS actions into call control actions, presence notification actions, and other actions such as instant messaging and networked appliance control. of actions, I first analyze the preconditions and their expected results, then check both SUSC and SUMC feature interactions End system call control actions The call control actions can be signaling or nonsignaling actions, and can take place in different call stages. Table 7.1 shows the actions. Stage\Action Signaling actions Nonsignaling actions Incoming call accept, reject, redirect log, mail, wait (all stages) Outgoing call call Mid-call transfer, media-update, merge Call termination terminate Table 7.1: Call control actions For signaling actions, the actions that belong to the same call stage usually conflict. For example, an end system can only choose among the accept, reject, or redirect actions to handle an incoming call. Actions at different call stages may also interact. For example, accepting an incoming call then transferring the call is a desirable interaction; however, rejecting a call then transferring the call is an undesirable interaction. The nonsignaling actions do not conflict with the signaling actions. To check feature interactions between two actions, we must define the execution order of the actions and check possible interactions in different orders. For example,

136 116 action precondition call states device states accept incoming call setup call setup finalized, media devices occupied pending, media devices available a session is setup reject call setup pending call setup finalized no change redirect call setup pending call setup finalized no change call media devices available if accepted, a ses- if accepted, media de- transfer media-update merge terminate one or more sessions, media devices occupied one or more sessions, media devices occupied one or more sessions, media devices occupied one or more sessions, media devices occupied sion is setup vices occupied if succeeds, all if succeeds, media devices sessions terminatewise, available, other- media devices occupied all sessions alive media transmission changed, e.g., held, or muted all sessions merged all sessions terminated media devices occupied media devices available Table 7.2: The context assumption and expected result of call control actions if we want to check the interactions between action A and action B, we first check the situation in which A is performed before B. It consists of two steps: checking whether A s result changes or conflicts with the precondition of B, and checking whether B s result changes the expected result of A. We then check the interactions in a different execution order with B performed first. The preconditions and expected result of each action are shown in Table 7.2. I further investigate the cause of feature interactions and find five kinds of interactions. I call the first action conflict, which has the expected result of two actions conflicting, e.g., the conflict between an accept and a reject action.

137 117 I call the second attribute conflict, in which two actions have the same name but different attributes. LESS modifiers should be treated as action attributes. Two actions with the same name conflict if their modifiers are different. For example, two scripts conflict if they both perform redirect actions, but to separate locations. I call the third interaction resource competition conflict. Two actions may compete for resources such as audio devices. For example, if there is only one audio device in an end system, two calls using the one audio device will cause a conflict such as accepting a call and making an outgoing call to a new address at the same time. I call the fourth interaction disabling conflict. It occurs when one action s expected results make another action s preconditions impossible. For example, if a script terminates existing sessions, another script cannot execute media-update action because the precondition of media-update assumes one or more existing sessions. The disabling conflicts are similar to the semantic warnings in the paper by Nakamura et al. [86]. I call the last kind of interactions enabling interactions. It occurs when one action s expected results enable another action s preconditions. This kind of interaction is desirable. For example, the accept action can enable the transfer action for the same call. SUSC feature interactions Table 7.3 shows the conflict table for call control actions. One assumption in the table is that a call usually requires audio, and there is only one audio device in an end device. For video and text conversations, people can watch multiple video windows and handle multiple instant messaging sessions simultaneously if the CPU power or network bandwidth

138 118 First\Later accept reject redirect transfer merge accept A(m) C C E E reject C A(r) C D D redirect C C A(a) D D transfer E - - A(a) C merge R - - C - m-update R 1,E C C term E - - D D call R - - E E m-update term call accept E E R reject D D E redirect D D E transfer C C E merge C C R m-update A(m) C R 1,E 1 term D - E call E E R m-update: media-update, term: terminate, -: no interaction, C: action conflict, A(m): attribute conflict on media, e.g., using video or not A(r): attribute conflict on code/reason, e.g., using 4xx or 6xx response in SIP A(a): attribute conflict on address, e.g., redirect to different addresses E: enabling, D: disabling conflict, R: resource competition R 1 : media-update for unholding calls competes resources with call or accept E 1 : media-update for holding calls enables call or accept Table 7.3: Call control action conflict table for handling incoming trigger allows, so we do not consider resource competition for video and text conversations. The table is asymmetrical. The cell of row m, column n and the cell of row n, column m may not have the same values. In this table, row actions are performed before column actions. For example, row 1, column 4 means accept then transfer.

139 119 SUMC feature interactions A user s service scripts can be hosted on the user s end devices and signaling servers in the network. However, scripts in different places may interact. For example, if the scripts on a SIP proxy server reject all calls, the scripts on the destination end devices can never be executed. When a proxy server proxies a call to all the end devices of a user in parallel, if one of the end devices (e.g., the voic server) automatically accepts the call immediately, the other end devices never have the chance to accept the call. These examples involve one user but multiple end devices for the user so they are SUMC feature interactions. I divide these kinds of SUMC feature interactions into two categories: end system proxy server and end system end system feature interactions. End system proxy server feature interactions: End system proxy server feature interactions are caused when the CPL scripts on proxy servers interact with the LESS scripts on end systems. There are only three signaling actions for CPL: proxy, redirect, and reject. Every action may interact with the actions on end systems. For incoming calls, proxy server scripts are executed before end system scripts. They may interact in two ways: proxy server scripts may block the execution of end system scripts or proxy server scripts may overlap with end system scripts. For outgoing calls, end system scripts are executed before proxy server scripts, and proxy server scripts may modify the result of end system service scripts. Table 7.4 shows the possible interactions for incoming call handling. As shown in the table, if a proxy server uses the proxy action to handle an incoming call, and an end system tries to use the accept action to handle the same call, the proxy action blocks the execution of the accept action. The accept action is executed only if the target URI of the proxy action is equal to the end system s URI. So

140 120 Server\End system accept reject redirect reject block overlap block redirect block + block + block/overlap + proxy block + block + block + +: depending on the URI to which to redirect or proxy a call Table 7.4: Interactions between services on end systems and proxy servers for incoming call handling I use block + to mark the interactions that depend on the target URI of the redirect or proxy actions. Table 7.5 shows the possible interactions for outgoing calls and call termination handling. In this table, end system actions are performed before proxy server actions. End system\server reject redirect proxy call block modify - transfer block modify - terminate N/A N/A - Table 7.5: Interactions between services on end systems and proxy servers for outgoing call and call termination handling End system end system feature interactions: End system end system feature interactions involve multiple end devices belonging to one user. These kinds of feature interactions are the most complicated feature interactions for end system services. They sometimes also involve service scripts running on proxy servers. For example, if a SIP proxy server sequentially proxies a call with a voic server as the last, the auto-accept script running on the voic server does not affect other end devices behavior. However, a proxy server can also proxy a call to a number of locations at the same time. This

141 121 type of parallel proxying is known as parallel forking [106]. When a proxy server does parallel forking, if the timeout value of the auto-accept script running on the voic server is set to zero or to a very small value, it may render other end devices unable to accept incoming calls. Thus we must also consider proxy server scripts when we handle end system end system feature interactions. Table 7.6 summarizes the action conflicts between two end systems. End system 1\End system 2 accept reject redirect accept C - C reject - - A(r) redirect C A(r) A(am) C: action conflict, A(r): conflict based on reason attribute, A(am): conflict based on address and media attribute Table 7.6: Action conflicts between two end systems for incoming call handling The table shows that during the call setup stage, the forking proxy in Internet telephony systems may cause multiple end devices to receive an incoming call setup request at the same time. If they all try to accept the call, or try to redirect the call to separate locations, a feature conflict occurs. To resolve this kind of feature conflict, the forking proxy must choose a single best response. In SIP, there are different status codes for rejecting a call. If one end system rejects a call using a 603 Busy everywhere status code, other end systems should not try to redirect the call. This is similar to the Basic Rule BR1 in [148]. Typically, a person would not talk with another person using two different end devices of the same media type. The A(am) in the table indicates such feature interactions. The redirect action may also cause call forwarding loops, which can be detected by checking the locations of the actions based on the table.

142 122 Action conflicts between two end systems can be more complicated, and not just for incoming call handling. For example, two end systems that handle two different calls but transfer their calls to the same destination may cause the destination to reject one call. Table 7.7 shows the more complicated action conflicts between two end systems. End system 1\End system 2 call transfer terminate accept A(am) A(am) - reject redirect A(am) A(am) - call A(am) A(am) - transfer A(am) A(am) - terminate A(am): conflict based on address and media attribute Table 7.7: More complicated action conflicts between two end systems End system presence and event notification actions I base the event-based services on the SIP event notification architecture [98]. This architecture is explained in Chapter Feature interactions for event-based services can be SUSC or SUMC interactions, depending on whether the PUA and the PA of a user are co-located or not. If a PUA and a PA are co-located, we must deal with SUSC feature interactions; otherwise, we must deal with SUMC interactions. Actions related to event-based services can be divided into two categories. One category handles incoming subscriptions, such as approve or deny. The other sends outgoing messages, such as the subscribe and notify actions. Feature conflicts can be categorized into action conflicts and action attribute conflicts, the same as those I defined in Chapter

143 123 SUSC feature interactions For an incoming subscription, approve and deny actions conflict. Approve actions with different attributes such as different subscription expiration times also conflict. Similarly, deny actions having different reasons conflict. Subscribe and notify actions do not conflict with other actions, but they may cause action attribute conflicts. Two subscribe actions will conflict if they have the same destination and the same event package, but differ in other attributes such as the expiration time. Two notify actions will conflict if they have the same destination and the same event package (e.g., presence), but different event descriptions (e.g., one presence status is open, the other is closed). SUMC feature interactions For an incoming subscription, a PA can decide whether to approve or deny it. A PA can also set the subscription status as pending, and send a notification to the PUA about the watcher-list changes [101]. Once the PUA gets the watcher-list, the PUA will use XCAP [99] to update the watcher-list document on the PA to authorize the subscription. A user may have multiple PUAs. Feature interactions may occur among the PA and all the PUAs involved in handling incoming subscriptions. For an incoming subscription, if a PA and a PUA make different decisions (for example, the PA approves a subscription but the PUA denies the subscription), an action conflict occurs Other end system services In Internet telephony, end systems can support instant messaging and networked appliance control. Therefore, I also analyze feature interactions for these services.

144 124 Feature interactions for instant messaging LESS has only one action for instant messaging, namely sendmsg, which is used to send a message. If we ignore the content of messages, there are no conflicts between multiple sendmsgs. However, the content of a sendmsg may have special meanings in some circumstances. For example, if we use SIP MESSAGE to perform shared web browsing [139], the message content is used to convey URL information. In general, two sendmsgs with different content conflict. Instant messaging may also experience SUMC feature interactions. One incoming message may be sent to multiple contacts of a user. If more than one contact can automatically send a message back, SUMC feature interactions may occur. There is little difference between SUSC and SUMC instant messaging feature interactions: both depend on whether the message content conflict. Feature interactions for networked appliance control Internet telephony user agents can control networked appliances. For example, when getting an incoming call, a UA can automatically turn off a nearby stereo. Networked appliance control services can be very complicated. Different sensors may trigger different control actions, and the actions performed by multiple networked appliances may conflict. For example, turning on an air conditioner to cool a room and turning on a heater to warm the room conflict. Kolberg et al. described these kinds of feature interactions in detail in [69]. If multiple LESS scripts try to control a networked appliance to perform different actions, feature interactions may occur. Different networked appliances may have different interactions. For example, when controlling a lamp, power on and power off

145 125 conflict. To control a stereo, the power on action enables the tune action. To analyze feature interactions, we must first identify the appliance we want to control and its available control actions. We must then build the preconditions and expected result tables for the control actions. From these tables, we can build the action conflict tables for the device. I choose to use lamp control as an example for feature interaction analysis. Keep in mind that networked appliance control actions may interact with call control actions. For example, the brightness of the lamp in a person s room may affect the perceptual quality of the person s video communications. The commands for a lamp can be power on, power off, dim, and bright. Table 7.8 shows the preconditions and expected results of the lamp control actions. action precondition expected result power on - The lamp is on. power off - The lamp is off. dim The lamp is on. The lamp is dimmer and still on. bright The lamp is on. The lamp is brighter and still on. Table 7.8: Context assumption and expected results of lamp control actions Table 7.9 shows the conflict table, which assumes that multiple scripts are trying to control the same device at the same time. I also added two call control actions, accept and call, to show that networked appliance control actions may cause feature interactions with call control actions. In the table, the power on action makes dim and bright possible, and the bright or power on action may provide a better environment for a video call, so I use enabling to mark this kind of interactions. Networked appliance control intrinsically involves multiple components, the con-

146 126 First\Later power on power off dim bright accept call power on - C E E E(V) E(V) power off C - C C C C dim - C A C C C bright - C C A E(V) E(V) A: attribute conflict, C: conflict, E: enabling, E(V): enabling video communications Table 7.9: Interactions between lamp control actions troller and the appliances. If multiple users try to control one networked appliance at the same time, MUMC feature interactions may occur. If all users access one device through the same appliance controller, e.g., a networked appliance gateway, the policies residing on the controller may help resolve the conflicts. For example, the controller administrator may define user priorities. User actions having higher priority may override user actions having lower priority. If multiple users access a device through different controllers, intercontroller communication is required to solve possible feature conflicts. This kind of feature interaction is beyond the scope of the LESS-based feature interaction handling Feature interactions between caller s preference, end system s capabilities and users service scripts Sometimes, a caller may explicitly express preferences in a call signaling message [108]. These preferences include the ability to select which URI a request is routed to, and to specify request handling directives in proxy servers. For example, the Reject-Contact: *;mobility="mobile" header in a SIP INVITE request expresses a desire not to route a call to a mobile device. The caller s preferences may conflict with the callee s service scripts. These kinds of conflicts cannot be detected offline. However, they

147 127 are easy to detect by checking the callee s service script actions and the value of the Reject-Contact header in the caller s SIP message. If a feature interaction occurs, the caller s preferences should override the callee s service script actions. For example, Alice uses a mobile phone and has a service script that automatically accepts calls from Bob. However, Bob does not want to talk to a mobile phone so he puts Reject-Contact: *;mobility="mobile" in his request. In this case, Alice s phone should not accept the call. Sometimes, service script actions may also conflict with end system capabilities. For example, for an incoming video-only call to an end system with only audio capability, an accept action is not appropriate. In this case, system capabilities should always override script actions. In this example, the end system should prompt the user for proper handling. The action conflict tables in previous sections contain several assumptions about end system capabilities, e.g., with one audio input/output device, or with enough bandwidth and CPU resources to handle multimedia calls. Although these assumptions hold for many Internet telephony endpoints, some endpoints may have more or fewer resources. Most of the action conflict table elements will not be affected by the difference. But some action conflicts, such as the conflicts caused by resource competition, should be adjusted to reflect the difference. In my implementation, I collect system information and adjust action conflict tables based on that information.

148 Using tree-merging to detect and resolve feature interactions Because LESS has a tree-like structure, it is straightforward to merge multiple LESS decision trees into one to resolve feature conflicts. After merging, there is only one active LESS script for an end device for each trigger. The merging algorithm holds for services running on the same device. However, for service scripts on different devices, such as the SUMC feature interactions discussed in Chapter 7.2.1, the merging algorithm can only detect feature interactions, it cannot resolve them. In my implementation, I keep the original scripts after merging so that users can modify them independent of the merged script. Although users can edit services based on a merged script, editing services based on the original scripts could be easier since they are created by the users themselves, while the merged script is created by a machine. In this way, no conflicts arise when we execute service scripts, and we can still keep service modularity so users can easily maintain their services and create new services efficiently. One potential problem in this approach is that modifications to the original scripts may interact with users decisions for resolving earlier conflicts. This problem can be easily resolved by using the same algorithm to detect feature conflicts between the merged tree and the modified script Tree merging algorithm In a LESS decision tree, the path from the root of the tree to each leaf node is called a decision rule. A rule is composed of a trigger, the actions in accordance with the trigger, and a list of switch nodes and action nodes along the path from root to action node. I call the list of switch nodes and action nodes a rule path. For example, for the script in

149 129 <less> <incoming> <string-switch field="organization"> <string is="abc Inc."> <address-switch field="origin"> <address <string-switch field="subject"> <string is="group meeting"> <accept/> </string> </string-switch> </address> <otherwise> <location <redirect/> </location> </otherwise> </address-switch> </string> </string-switch> </incoming> </less> Figure 7.2: Sample script for defining decision rules Figure 7.2, a decision rule can be represented as {incoming, accept, {{string-switch,organization="abc Inc."}, {address-switch, origin="sip:tom@abc.com"}, {string-switch, subject="group meeting"}}}. A decision consists of the items in a rule path and the actions in accordance with the trigger. A rule path can be validated based on Nakamura s work [86]. Changing the orders of items in a valid rule path does not affect the decision. To facilitate rule merging, we must normalize the rules generated from LESS decision trees. The normalization process sorts switches in a rule path, e.g., ordered as address-switch, time-switch, status-switch, string-switch, priority-switch, where-switch, language-switch, and event-switch. It also merges the switch nodes with the same switch name into one node in a rule path. For example, a normalized rule for the script above is {incoming, accept,

150 130 {address-switch, {string-switch, subject="group meeting", organization="abc Inc."}}. Because switches are independent of each other, normalized rules are functionally equal to the original rules. The overall multiscript merging process is shown in Figure 7.3. set base-rule-set empty foreach LESS-tree { convert the LESS-tree into a rule set foreach rule in the rule set { normalize the rule } merge the normalized rule set into base-rule-set } convert merged base-rule-set into a decision tree Figure 7.3: Tree merging process if (two rules have different triggers) { no rule conflict } elseif (actions in two rules do not conflict) { no rule conflict } elseif (no overlap between rule path in two rules) { no rule conflict } else { two rules conflict, return the rule path overlap and action conflict information prompt the script owner to judge } Figure 7.4: Checking two rules conflict or not Figure 7.4 shows the conflict checking process of two rules. During the process, we can use the action conflict table in Chapter 7.2 to check whether two actions conflict. If the actions in two rules conflict, we must check whether any conditions match both

151 131 rule paths. I call the conditions matching two or more rule paths an overlap. Figure 7.5 shows the algorithm for determining an overlap. set overlap-set empty foreach switch-node1 in rule-path1 { if (there is a switch-node2 in rule-path2 that has the same switch name) { if (the overlap between switch-node1 and switch-node2 is empty) { return empty overlap-set } else { insert the overlap into overlap-set } } else { insert switch-node1 into overlap-set } } foreach switch-node2 in rule-path2 { if (there is not a switch-node1 in rule-path1 that has the same switch name) { insert switch-node2 into overlap-set } } return overlap-set Figure 7.5: Determining overlap between two rule paths Once we find the overlap and the conflicting actions, we can present the information to users to make decisions. It is straightforward to form a human-understandable description because LESS defines a very limited number of switches and actions. We can simply design the description format for each switch and action, and then compose a sentence based on the description format. In general, there is no need to use complicated natural language processing techniques to present the conflict information. For example, for the situation in Figure 7.1, we can ask the user For an incoming call, if the time is between 2:00PM and 3:00PM on Dec 25, 2004, and if sip:tom@abc.com calls you, what

152 132 would you like to do?. We can provide two choices for the user: redirect the call to or automatically accept the call. We can record the user s decision and build a normalized rule set without conflicts. Any resolution of feature interactions for end system services must involve end users because only they can make decisions about what they need. Because end systems can directly interact with end users, and end users can directly modify their scripts, involving end users in feature interaction resolution is practical and necessary. After resolving feature interactions, the normalized rules should be converted back to a decision tree for service execution. The conversion process is straightforward. It is started by setting an empty tree, then goes through every rule and incorporates the rule s switches into the tree. Finally, it puts the rule s actions at appropriate branches of the tree Feature interactions caused by multiple triggers for decision trees with different triggers { rename every trigger as common-trigger store the original trigger information } perform regular tree-merging algorithm for common-trigger if (there is resource competition feature conflict) { present the conflict along with the original trigger information modify service scripts based on the decision of the user } Figure 7.6: Detecting feature interactions among scripts with different triggers In most cases, no feature interactions arise between decision trees having different triggers (root nodes). However, occasionally, actions caused by different triggers compete for resources. For example, a timer trigger may invoke a call action at the same

133 time as an incoming call is automatically accepted. Both the outgoing call caused by the timer and the accepted incoming call will try to use the one audio device and cause a conflict.

153 133 time as an incoming call is automatically accepted. Both the outgoing call caused by the timer and the accepted incoming call will try to use the one audio device and cause a conflict. For an end system with limited resources, if multiple scripts have different triggers, we can use the tree-merging algorithm to detect possible resource competition. Figure 7.6 shows the algorithm. 7.4 Implementation Figure 7.7: SIPC s service manager I have implemented a SIP-based user agent, SIPC, which I discuss in detail in Chapter 11. SIPC has two interfaces for service programming, a SIP common gateway

154 134 interface (SIP CGI) [71], and a built-in LESS interpreter to handle LESS service scripts. The SIPC service manager cannot detect feature interactions between SIP CGI programs and LESS scripts because SIP CGI programs can be written in a variety of programming languages, such as C/C++, Perl, and Java. It can, however, detect and help resolve feature interactions among LESS scripts. As I introduced in Chapter 5, I built a service creation tool called CUTE and integrated it into SIPC. Users can use CUTE to build LESS decision trees. Once a LESS service script is created, SIPC can show the script in its service manager, as shown in Figure 7.7, and use it to handle users communication events. A user can activate or deactivate services in the SIPC service manager. If there is more than one active LESS script and they are in conflict, the service manager can use the algorithm in this chapter to detect the interactions and ask the user to make a decision. Figure 7.8: Resolve feature interactions Figure 7.8 shows the user interface for resolving a feature conflict. As the figure

155 135 shows, the SIPC service manager translates the context that can cause feature conflicts into human-readable language, then asks users to choose an action to perform. The translation is made straightforward by using the trigger and rule-path overlap information of two services. Once a user makes a decision, the SIPC service manager tries to prioritize services to resolve the conflict because the prioritization is easy for people to understand. As Figure 7.7 shows, services are listed in order with the services listed first executed first. Figure 7.8 shows that if a user chooses to reject a call according to the In conference service, the In conference service is listed at the top of the Accepting calls from William service. Users can also manually change the order of services by clicking the Move up or Move down button. This is similar to the method defined in FIAT[12], but I do not explicitly ask users to set priority among or disable features. If all the branches of a feature are redundant or shadowed [12] by features having higher priority, the feature will be automatically disabled. Figure 7.7 shows the disabled feature in light gray. Compared with merging services, prioritizing services can help improve the feature conflict resolution. In general, end users are not expected to create a very complicated service manually. I assume that the depth of LESS decision trees created by end users will be no more than 10. In our performance test, which was run on a modest PC (2.0 GHz AuthenticAMD processor, 1.0 GB memory, running Windows XP Professional), with a modest load (around 50 percent CPU usage by a load generator), and using Tcl (an interpreted language) to implement our feature interaction detection algorithm, the longest delay I observed for detecting feature conflicts between two LESS decision trees, each with a tree depth of 10, was 2.5ms. However, after merging, in the worst case the depth of a merged tree can be the summation of the depth of all the trees joining the

156 136 merge. The outcome can cause longer delays in calculating feature conflicts. Thus, the SIPC service manager always tries to prioritize services to resolve feature conflicts. However, as I discussed in Chapter 7.1, some conflicts cannot be resolved by prioritizing services, especially when three or more services are in conflict. For example, a user has three features in conflict: Fa, Fb, and Fc. Because the conflicts may be on different branches of the decision trees of the features, the user s choices may cause Fa>Fb, Fb>Fc, and Fc>Fa (here, Fa>Fb means feature Fa has higher priority than feature Fb). We cannot prioritize Fa, Fb, and Fc here to resolve the feature interactions. In fact, the orders of features imply a directed graph. If the graph contains a cycle, we cannot prioritize features to resolve the feature interactions. Instead, we must merge the features in the cycle into one by using the tree-merging algorithm. Figure 7.9 shows the algorithm to find the cycle. // For a feature F, I use F.higher to represent the set of // features that have higher priority than F, and F.lower for // the features with lower priority than F. for every two features Fa and Fb in a user s feature set { if (Fa has higher priority than Fb) { if (Fa.higher contains Fb) { there is a cycle use DFS (Depth-First Search) to find out the cycle and merge } else { add Fa into Fb.higher foreach feature Fx in Fb.lower, add Fa into Fx.higher add Fb into Fa.lower foreach feature Fy in Fa.higher, add Fb into Fy.lower } } } Figure 7.9: Finding the scripts to merge The SIPC service manager does not list the merged services, instead, it still lists

157 137 the original scripts but groups them in the list. Each original script has a reference to the merged script. And every action in the merged script also has a reference to its original script. This way, users can still edit their original scripts but use the merged script for call handling. The options provided by the SIPC service manager are not limited to the services defined by the user. Integrating our service risk management implementation [145] into the SIPC service manager offers users additional options. These options give users a chance to choose a reduced-risk action to handle a conflict. For example, as Figure 7.8 shows, the reject and accept actions are defined by the user. For each user-defined option, there are several alternatives within that option. For example, a conference attendee has an incoming call from sip:william@cs.columbia.edu, but the user does not want to answer the call during a conference session. However, the attendee also does not want to risk losing important calls from William by simply rejecting the call. The additional options allow the user to transfer the call to a contact person, or to reduce risk by using another communication method such as or instant messaging. Figure 7.7 shows the alternative options for rejecting and accepting calls. The additional contact information can be configured in the configuration dialog of the SIPC service manager, as shown in Figure If a user chooses an alternative option, he or she, in fact creates a new service. The new service is listed in the SIPC service manager, and receives higher priority than the two existing services. The configuration dialog also shows system properties. On a Windows platform, I use the systeminfo command to obtain CPU and memory information, the ping command to obtain approximate bandwidth information, the waveingetnumdevs() and waveoutgetnumdevs() functions to obtain the number of available audio de-

138 vices, and the ICreateDevEnum interfaces to obtain the number of available video capture devices.

3, 7.6, and 7.7 to reflect the end system s capabilities.

10: Configuration dialog Figure 7.11: Events and actions log If a user does not want to handle the detected feature conflicts, he or she can simply click the Cancel button shown in Figure 7.

158 138 vices, and the ICreateDevEnum interfaces to obtain the number of available video capture devices. From this information, the SIPC service manager calculates the maximum audio and video sessions the system can support and adjusts the conflicts that may cause resource competition shown in Tables 7.3, 7.6, and 7.7 to reflect the end system s capabilities. Those capabilities are also used to handle feature interactions between a caller s preferences and a callee s capabilities, and are sent to other parties by using SIP NOTIFY method. Figure 7.10: Configuration dialog Figure 7.11: Events and actions log If a user does not want to handle the detected feature conflicts, he or she can simply click the Cancel button shown in Figure 7.8 to tolerate the conflicts. This action keeps the current order of the features unchanged. To make feature interaction resolution transparent to users, I am integrating the service learning implementation [145], which I introduce in detail in Chapter 8, into the feature interaction handling process. By learning from users call histories, I can infer users preferences for handling feature conflicts and suggest a default choice for each user. In my current implementation, I check the probability of the occurrence of

159 139 a feature conflict from users event logs and action logs as shown in Figure The SIPC service manager can offer suggestions to users based on probabilities. Currently, as a default setting, the SIPC service manager checks events for the last 30 days. If a situation that caused a feature conflict did not occur within the last 30 days, the SIPC service manager suggests that the user ignore the feature conflict. Otherwise, it tells the user the number of times the situation occurred in the last 30 days and suggests that the user choose an option instead of clicking the Cancel button. As shown at the bottom of Figure 7.8, the service manager finds out that sip:william@cs.columbia.edu did not call the user between 09:00AM and 10:00AM in the last 30 days, the service manager then infers that sip:william@cs.columbia.edu may not call during the same time period in the future and suggests that the user ignore the feature interaction. 7.5 Conclusion and future work One LESS design goal is to make it easy to detect feature interactions among LESS scripts. I propose a tree-merging algorithm to handle LESS feature interactions based on the action conflict tables presented in this chapter. I investigated a variety feature interactions among LESS scripts and showed that my method can easily handle feature interaction detection and resolution for a language, such as LESS, having a tree-like structure. I have developed a LESS-based service creation and management environment with my feature interaction handling method built-in. I then integrated this service creation and management environment into the SIP user agent I developed, SIPC. I also did some preliminary work on building a user-friendly interface to help end users better understand feature interactions and resolve any interactions they encountered. My work includes integrating my service learning implementation into the feature interaction

160 140 handling process. By service learning, we may make the feature interaction resolution transparent to users in some situations, although I believe that in many cases, feature interaction handling for end system services requires involving the end users.

161 Part III Service learning and service risk management 141

162 142 Chapter 8 Service learning and service risk management One of the key advantages of Internet telephony is its ability to provide many new communication services, however, most users may not be aware of what services are available and not know how to customize or create their own services. It will be a great help to users if a service creation environment can automatically generate desired services. Thus, I consider using machine learning for automatic service creation. Using machine learning for service creation is applicable in Internet telephony systems for the following reasons. Internet telephony signaling can convey more information to end users than the signaling in the PSTN. The information allows people to make sensible call decisions so the information and people s call decisions have a causal relationship. The causal relationship makes service learning possible and useful. In addition, Internet telephony end systems usually have more computational capabilities so they can execute programmable call handling services and can easily collect users communication behaviors for learning.

163 143 I believe many services can be learned from users communication behaviors, for example, phone spam filtering, call handling based on the callee s location (as defined in GEOPRIV location object format [93]), media capabilities (as defined in User Agent Capability Extension to Presence Information Data Format [74]) and status ( as defined in Rich Presence Extensions to the Presence Information Data (RPID) [116]), call routing based on the caller s address, time based call handling, and call handling based on priority, preferred language and subject of calls. There are a lot of parameters for call decision making. Users may not even know some of the parameters, let alone how to use them. In addition, creating such services manually is tedious and error-prone. Thus, it will be very helpful to users to make proxies and end systems being able to create rules automatically for call decision making by learning from observed user behaviors. There are four tasks for service learning: representing communication behaviors, finding a learning algorithm, representing the learned results, and handling service risk management. I chose to use decision trees to represent communication behaviors and detail the rationale of the choice in Chapter 8.1. Chapter 8.2 gives the criteria for choosing a decision tree learning algorithm. Chapter 8.3 shows how to represent a decision tree as a LESS script. Chapter 8.4 analyzes the potential service risks caused by service learning and presents several approaches to create fail-safe services. Chapter 8.5 describes how to integrate service learning functions in our SIP user agent, SIPC [64]. Chapter 8.6 concludes this part and discusses the future work.

164 Representing users communication behaviors There are many possible ways to represent users communication behaviors, such as finite state machines, Use Case Maps [11], rule sets, decision trees, and Bayesian networks [67]. The tree-like structure of LESS gave me a hint that using decision tree to represent communication behaviors could be a viable approach, and the learned results could be easily converted to LESS scripts, and uploaded to endpoints for execution. There are several advantages of using decision trees to represent users communication behaviors. We can clearly identify learning targets in a binary decision tree due to its simple structure. The learning target of a binary decision tree is to find non-leaf tree nodes that can best partition a user s behavioral data. For example, in Figure 8.1, the priority < urgent node partitions a user s behavioral data into two parts, one with call priority smaller than urgent, and the other equal to or bigger than urgent. For the part with smaller priorities, the where = conf node and the caller = Bob node will further partition the data. If we use finite state machines or Use Case Maps, it is hard to identify the learning targets. The cost of simplicity is to sacrifice some functionality. A binary decision tree is not Turing-complete so it cannot represent all possible communication behaviors. However, as the analysis of LESS suggests, most commonly used communication services can be represented as a decision tree. Also, learning of complicated services is error prone and will cause higher service risks. Using rule sets to represent communication behaviors is another viable way. We can convert a rule set into a binary decision tree, and vice versa. However, for service execution, if we use a rule set to represent a binary decision tree with N leaf nodes, the average time to go through the rules for service execution will be O(N). But if we use the

165 145 Y accept caller = Bob Y Y where = conf N reject incoming Y priority < urgent N accept Figure 8.1: Service decision tree binary tree for service execution, in general, the average time will be O(log 2 N). Thus, decision trees are more efficient for service execution. Bayesian networks [67] are commonly used in machine learning. However, I found Bayesian networks to be inappropriate for communication service learning because different decision factors may be related to each other to make a communication decision. For example, domain-based call rejection may overlap with caller-based call rejection. It is hard to build a Bayesian network with co-related decision factors [67]. 8.2 Decision tree learning To perform learning, we need to identify input variables as well as expected outputs. As I mentioned earlier, the learning target is to find non-leaf tree nodes that can best partition a user s communication behavior data, which are stored at leaf nodes. I define user communication behaviors as the actions a user can perform, such as accept,

166 146 reject, proxy, redirect for incoming call handling, call for outgoing call setup, transfer, hold, mute for mid-call handling, approve for event subscription handling, and power-on, power-off for networked appliance control. The actions will be leaf tree nodes. The non-leaf nodes represent the parameters of a call and its context to match, such as the identity, status, location, media capabilities, and device type of the caller and callee, the time, priority, subject, preferred language of the call. The value of the parameters can be acquired from SIP INVITE messages and SIP event notification messages [98]. Once we get these data, we can perform decision tree learning Criteria on choosing a learning algorithm There are several existing decision tree learning algorithms [96][131]. To choose an appropriate algorithm for the communication decision tree learning, I defined several requirements for the algorithm. The algorithm should perform incremental learning. Behavior learning is a dynamic process. New samples and new rules may get introduced from time to time, and old rules may be broken. Building decision trees in an incremental way reflects the dynamics of people s communication behavior. The algorithm should have an appropriate tree quality measurement mechanism [138]. As shown in Figure 8.2, two trees represent the same user behavior over 47 calls. They have the same height of 5. The number in a leaf node indicates how many calls are partitioned to that leaf node. For example, accept(7) means that 7 calls with priority equal to or bigger than urgent were accepted. The service execution efficiency of a tree can be determined by the total number of internal nodes of the tree. For a given call, each internal node checks for some conditions to decide how to handle the call.

167 147 incoming incoming priority < urgent Y N caller accept(7) = Alice Y N caller reject(10) = Bob Y reject(30) Y reject(30) Y priority < urgent N caller = Bob accept(3) N caller = Alice Y reject(10) N priority < urgent N accept(4) Figure 8.2: Two different trees representing the same data set I name it a step for each internal node checking. For example, in Figure 8.2, to reject a call with priority < urgent, caller = Alice, and caller = Bob, it needs to take 3 steps. In the figure, the left tree has fewer leaves and fewer internal nodes, but the right tree takes fewer steps to partition the whole set of training samples (the left tree needs 117 ( ) steps, and the right tree needs 108 ( ) steps). In terms of simplicity (total number of nodes), the left tree is better. In terms of efficiency (number of steps to partition samples), the right tree is better. Because service learning is to help users to create and understand communication services, a simpler decision tree is preferable. For example, in Figure 8.2, the left tree is the desired result. Because, sometimes, people make call decisions randomly, the algorithm must have an effective pruning method to handle overfitting. Overfitting may cause the learner to generate rules to adjust to very specific random samples of the training data. The overfitted rules may cause misclassification for future data. In addition, due to the dynamics

168 148 of people s communication habits, the algorithm should be able to save and reactivate pruned branches because a pruned branch may be restored with additional learning samples Incremental Tree Induction (ITI) algorithm Based on the above requirements, I chose to use the Incremental Tree Induction (ITI) [131] algorithm. The pseudo code below shows the basic ITI algorithm. incremental_update(node,example-x) { add_example_to_tree(node,example-x); ensure_best_test(node); } add_example_to_tree(node,example-x) { if (node is NULL) { convert node to an empty leaf } if (node is a leaf) { save example-x at node if (node should be converted to a decision node) { construct information at node mark node fresh foreach example-j saved at node { if (test_is_true(node,example-j)) { add_example_to_tree(node->left,example-j) } else {

169 149 add_example_to_tree(node->right,example-j) } } } } else { update information at node mark node stale if (test_is_true(node,example-x)) { add_example_to_tree(node->left,example-x) } else { add_example_to_tree(node->right,example-x) } } } ensure_best_test(node) { if (node is a decision node and node is stale) { find best_test for this node; if (best_test is not already installed) { transpose_tree(node,best_test); if node is a decision node { mark node fresh; ensure_best_test(node->left); ensure_best_test(node->right); }

170 150 } } } In the algorithm, the add example to tree function incorporates new training examples into a tree, while the ensure best test function maps an existing tree to a new tree based on several tree transformation mechanisms. The detailed tree transformation mechanism can be found in [131]. The average incremental cost of updating a tree is much lower than the average cost of building new trees from scratch each time. For example, in the performance test for a sample set with 700 samples (as shown in Figure 8.3(a)), the average time of non-incremental learning is about 0.95 seconds, but the average time of incremental learning is 0.12 seconds. ITI uses an algorithm called direct metric tree induction [131] to map one tree to another based on tree quality measurements. The algorithm introduces five tree quality measurement matrices, namely expected-number-of-tests, leaf-count, minimum-descriptionlength, expected-classification-cost, and expected-misclassification-cost [131]. The expectednumber-of-tests returns the average number of tests that one would expect to evaluate in order to classify an example. The leaf-count returns the number of leaf nodes in the learned tree. The minimum-description-length returns the number of bits needed to encode the learned tree. The expected-classification-cost is identical to the expectednumber-of-tests except that each test has a specified evaluation cost. The expectedmisclassification-cost measures the penalty that one would pay when misclassifying examples. I choose to use the leaf count matrix for the service learning process because that reflects the simplicity of a decision tree.

171 151 ITI also introduces a pruning technique named virtual pruning. Instead of deleting pruned branches, it uses one bit on every node to mark prune decisions. If the bit value is true, the branch rooted at the node should be hidden from users. When new learning samples added, ITI will use the un-pruned tree for learning so pruned branches may be reactivated for useful services Accuracy of the ITI algorithm There are two ways to measure the accuracy of the ITI algorithm for communication service learning: one performs real world usage testing and the other uses simulation. Real world usage testing requires communication behavior data collected from real users as well as users evaluation of the learned services. This requires a large deployment of VoIP systems and having people using the systems for their daily communications. Currently, people are making efforts to deploy VoIP systems, such as the SIP.edu [125] initiatives in Internet2 community and many companies have deployed VoIP solutions in their intranet. However, when I was conducting the service learning research, most of the deployments were still in their testing stage, not used as primary communication means for people s daily usage. Thus, I chose simulation to test the accuracy of the ITI algorithm. I have built a simulation environment to generate random simulated calls. In the simulation environment, the distribution of call arrival time, priority of calls, and callers addresses are adjustable. By default, I use the Poisson distribution for call arrival time, uniform distribution multiplied by weights for setting the priority of calls and picking callers: 98% of the calls are of normal priority, 0.9% urgent calls, 0.1% emergency calls, and 1% non-urgent calls. 30% of the calls are from user1, 10% each from user2,

172 152 user3, and user4, the rest from all the other users. The simulation environment also simulates people s daily life, such as time for meal and sleep, and can load calendars in ical [38] format for meeting and appointment information. I then randomly created several expected services, applied the services to the generated calls, and simulated the call handling process. For example, if there is an expected service "reject all calls from sip:bob@example.com", when I applied the service to the generated calls, all calls from sip:bob@example.com were rejected. The other calls were handled based on the default simulation setup. With the default setup, the simulated user rejects 50% of incoming calls when he is sleeping; rejects 20% of incoming calls when he is in meal; and rejects 40% calls when he is in an appointment. In normal cases, he accepts 85% of incoming calls. The default setup can introduce reasonable noise for service learning. Once the simulation completes, the simulated communication behavioral data is saved in C4.5 [96] format and learned by the ITI algorithm to generate decision trees. The pruned decision trees generated by the ITI algorithm should match the expected services. I have tested 40 expected services, each applied to 300 simulated calls. The tests show that 80% of the learned decision trees exactly match their expected services, 10% of the learned decision trees represent the expected services in different ways, and 10% of the learned decision trees do not match the expected services. The mismatch comes from the randomness of the simulation data. For example, if an expected service is "accept all emergency calls", however, because only 0.1% calls are emergency calls, there may not be emergency calls in the simulated call set, thus, the learned decision tree will not match the expected service. Based on the simulation, I believe that the ITI algorithm fits the need.

173 Performance of the ITI algorithm I conducted several tests to evaluate the performance of the ITI algorithm. The performance testing proves that the cost of the incremental training is much lower than that of the non-incremental training as the number of samples increases. Figure 8.3(a) and Figure 8.3(b) show the performance of the ITI algorithm, running on an IBM ThinkPad laptop with the Linux operating system, a 1 GHz Intel Pentium III Mobile CPU, and 256 MB memory. The expected services make call decisions based on the priority of calls, caller s addresses, and callee s ongoing activities, such as appointment, meeting, meal, and sleeping. For example, one simple expected service is to perform call screening based on the caller s addresses. Training time (sec) Non-incremental Incremental Training time (sec) samples 20 samples Number of samples (a) Fast vs. incremental training Number of internal tree nodes (b) 20 vs. 250 incremental samples Figure 8.3: Training time of the ITI algorithm Figure 8.3(a) shows that the training time for non-incremental training increases polynomially as more training samples are added in, while the training time for incremental training is a constant if only training on the new added samples. For 20 new

174 154 added samples, the training time for incremental training is about 0.12 seconds, which is quick enough to provide automatic service creation to users. Figure 8.3(b) shows that for incremental training, the training time is independent of the number of internal nodes in the expected services. This is because the ITI algorithm uses the virtual pruning technique to handle the overfitting problem so it always constructs complete decision trees. The number of nodes in a pruned tree will not affect the training time of building a complete decision tree. 8.3 Using LESS to represent learned results Since LESS has a tree-like structure, it is straightforward to convert learned results to LESS scripts. The non-leaf tree nodes can be converted to LESS switches, the leaf nodes can be converted to LESS actions, and the root nodes can be converted to LESS triggers. For example, we can convert the decision tree in Figure 8.1 to the LESS script below: <less> <incoming> <priority-switch> <priority less="urgent"> <where-switch type="civil"> <where LOC="conf"> <address-switch field="origin" subfield="display"> <address is="bob"> <accept/> </address>

175 155 <otherwise> <reject/> </otherwise> </address-switch> </where> </where-switch> </priority> <otherwise> <accept/> </otherwise> </priority-switch> </incoming> </less> 8.4 Service risk management Automatically generated services may introduce unexpected side-effects that users may not be aware of. Users using the services may have to take the risk of losing calls, money, or privacy. However, risk is a part of any activities and can never be eliminated, nor can all risks ever be known. The opportunity for a better communication experience cannot be achieved without taking risk. What we should do is to balance the possible negative consequences of risk against the potential benefits of its associated opportunity[120]. I consider that service risk should be handled in the service creation stage. As Charette [33] has said, Risk engineering does not deal with future decisions, but with the future of present decisions.

176 156 There are three key steps for risk management [120][18]: Identify: Analyze: Resolve: We must find the cause of service risks. We must evaluate potential loss caused by service risks. We must help users to act on the risks Identify service risks During the learning, we may get both false positive and false negative errors. A false positive error means that the learning algorithm generates an unwanted service. A false negative error means that a desired service is missed. The false negative errors are less important than the false positive errors because they usually just bring inconveniences to users. When a false negative error causes a missed service, the basic call handling services can still handle calls, as I discussed in Chapter Thus, I focus only on false positive errors for service risk management. There are two factors that comprise a risk [120]: loss resulting from its occurrence and probability or likelihood that it will occur. I will identify possible losses in this chapter and analyze the probability in the next section. I consider the following losses are possible for communication services, namely losing communication, compromising privacy, costing money, and distracting attention. Because I use LESS to represent services, the loss can only occur when performing LESS actions. Because the number of LESS actions is very limited, we can simply check the relationship between LESS actions and potential losses to identify service risks. The relationship is shown in Table 8.1.

177 157 Loss Losing communication Compromising privacy Costing money Distracting attention LESS actions may cause the loss reject, redirect, transfer, disconnect, accept on a wrong branch, media-update, hold call, accept, notify accept, redirect, or transfer calls to a device with higher charge rate alert, accept, appliance control *: When a call is forked to multiple branches, e.g., the callee s phone and the callee s voic server, if one branch always automatically accepts the call immediately, the other branch cannot accept the call Analyze service risks Table 8.1: LESS actions may cause loss Three main steps can be used to analyze service risks: estimating the probability of a risk, evaluating the impact of a risk, and determining the overall risk of a service. Probability The probability of a risk can be estimated quantitatively as well as qualitatively. When the ITI algorithm learns a service, it associates the number of matching instances to each tree branch. We can use these numbers to perform a quantitative analysis to estimate how likely a risk may happen in a tree branch. The bigger the number, the more likelihood the branch may introduce service risks. We can also perform a qualitative analysis. In a decision tree, the internal nodes along the path to a leaf node comprise the conditions under which a corresponding service action will execute. To estimate the probability of a risk, we must analyze the internal nodes. In LESS, the internal nodes are switches, such as address-switch and time-switch. Different arguments of a switch have different risk characteristics.

178 158 For example, in an address-switch, domain matching or wildcard matching is more likely to introduce risk than specific URL matching. Changing the arguments with higher risk probability to the arguments with lower risk probability can help to resolve service risks. For example, for spam handling, filtering based on a specific address is safer than filtering on a domain or a wildcard string. Impact Different risks have different impact on users. The impact can be categorized as negligible, marginal, critical, and catastrophic [9], or as very low, low, moderate, high, and very high in commonly-used classification. In general, for communication services, I consider the risks causing irreversible loss have higher impact than the other risks. Thus, compromising privacy, losing communication, and costing money are more severe than distracting attention. Different users may have different views about the impact. In my implementation, by default, I set the impact of compromising privacy and losing communication as high, costing money as moderate, and distracting attention as low. Overall risk Service risks are not independent from each other. For example, when a user tries to preserve communications, he may take the risk of losing privacy, money, or attention. To resolve service risks, we must try to avoid or mitigate risks with higher impact, even though it may introduce risks with lower impact.

179 Resolving risks Because a risk is composed of the probability of its occurrence and the loss due to its outcome [18], to resolve or mitigate it, we can reduce its probability or reduce its potential loss. The design rule of the resolution is to ensure that either the overall impact of service risks is low, or users can get alerted of risk occurrence and the correction is viable. There are several risk resolution options we can use [18], e.g., risk avoidance, risk transfer, risk impact reduction, and coming up with a contingency plan. Reducing or avoiding risks We can reduce or avoid a service risk by reducing the probability of its occurrence. As I discussed in Chapter 8.4.2, we can adjust the arguments of switches in a LESS decision tree to reduce risk probability, for example, changing domain-based address matching to URI-based address matching. Risk transfer Service risks can be transferred to another person. For example, in a meeting, instead of rejecting calls, a boss can transfer calls to his secretary so the risk of losing calls is decreased, but the risk of being distracted is transferred to his secretary. Risk impact reduction To reduce overall risk impact, we should focus on risks with high loss impact, such as the risk of losing communication and compromising privacy, as defined in Chapter Table 8.1 shows the actions that may cause these risks. Here I use reject action as an example showing how to reduce the risk of losing communication. Automatically reject-

180 160 ing a call is dangerous. People reject calls for many different reasons, but in most cases, due to people s availability and the subject of calls. Nowadays, there are many different communication methods. Different communication methods have different characteristics in terms of disturbance factors and expected response time as shown in Table 8.2. The differences suggest that when a call is rejected, we may use other communication methods to mitigate the potential loss. For example, if a user has a service to automatically reject all calls when he is in a meeting, we can modify the service to make it safer: In an unimportant meeting, change the alerting style from ringing to vibrating; in an important conference, forward the call to a voic and provide voic indication; if the user is doing a presentation in a conference and does not want to be disturbed in any way, forward the call to voic without providing any indication. Note that vibrating a user s end device or showing voic indication may somewhat distract the user s attention. But as I discussed in Chapter 8.4.2, the overall risk is lowered. Action Disturbed entities Expected response time Call/ringing Callee and others seconds - minutes Immediate attention otherwise the call is lost Call/vibrating Callee seconds - minutes Immediate attention otherwise the call is lost Instant messaging Callee minutes - hours Immediate attention but delay is ok Voic / Callee minutes - days with indication Immediate attention but delay is ok Voic / Callee hours - days without indication No immediate attention Table 8.2: The characteristics of different communication methods

181 161 Building contingency plans It is very important to make the correction viable when a service risk occurs. For example, if a script mistakenly rejects a call without keeping a record of the call, the user cannot correct the misaction. Thus, a service engine must log all the actions performed by its service scripts. For each service action, there must be a convenient way for users to remedy the potential loss caused by the action. For example, if a call is automatically rejected, the service engine should allow users to easily retrieve the caller s number and call back. 8.5 Implementations I have integrated service learning and service risk management into the SIP user agent I developed, SIPC. SIPC can record the call related information and user performed actions in a file in C4.5 [96] format, the same format as that used in the simulation environment. Once the number of new call records reaches a threshold, e.g., 20 new call records, SIPC uses the ITI algorithm to perform service learning. The learned result is saved in two files, one is a LESS script that representing the pruned tree, the other is a binary file that representing the whole tree with virtual pruning marks. SIPC then loads the LESS script in its LESS interpreter to handle calls. 8.6 Conclusion and future work This part proposes a model for communication service learning. Because end users are usually not trained to create communication services, they may not know how to cus-

182 162 tomize or create their own services. Service learning can help them to automatically generate communication services based on their communication behaviors. I have developed a simulation environment to test the viability of the algorithm and measured the accuracy and performance of the algorithm. The simulation proves that the ITI algorithm fits the learning model. I noticed that communication services may introduce unexpected side-effects and proposed several approaches for service risk management. But how to present potential risks and possible solutions to users in a friendly and easy-to-understand way still requires more investigation.

183 Part IV Location-based services and ubiquitous computing using SIP 163

184 164 This part discusses a different but related topic to LESS location-based services in end systems. This part consists of two chapters. The first chapter analyzes location-based services in Internet telephony and extends LESS to handle location-based services. The second chapter presents location-based service architectures we designed for two projects ubiquitous computing and emergency call handling. Both projects are group work done by a few people in Internet Real-Time Labortary (IRT) at Columbia University, including myself.

185 165 Chapter 9 Location-based services in Internet telephony systems 9.1 Introduction Location information describes a physical position or attributes of a place, such as place type, e.g., airport, and privacy status, e.g., private, that may correspond to the past, present or future location of a person, event or device. Many applications used in the Internet, such as tracking applications, emergency services, ubiquitous computing, and equipment management, benefit from using location information. In Internet telephony, location information can introduce many new services. For example, a user agent can automatically adjust its alerting style to vibrating when the user is in a movie theatre. In order to better use location information to provide communication services in Internet telephony applications, I analyzed location information and its usages in Internet telephony. In addition, my colleagues and I had implemented several location-based services, which I will present in this chapter.

186 166 Previous research on location-based services [54] gives us necessary technologies to acquire location information and handle the network-layer location-based call routing and QoS management. In this chapter, I will not discuss specific location tracking techniques, which have been well defined and analyzed in many articles [54] [132][91][55] [136]. Neither will I discuss the network-layer location-based packet routing and QoS management [30] [31][73][123]. In this chapter, I focus on the application-layer, human understandable location descriptions, and end-user-oriented location-based services. I first investigate how to describe a location and how to acquire physical locations from end users point of view in Chapter 9.2. I then summarize and categorize different location-based services in Chapter 9.3. The analysis leads me to define an extension to LESS to support location-based services, which is discussed in Chapter 9.4. Chapter 9.5 introduces the location-based services in our SIPC and CINEMA implementations. Chapter 9.6 concludes this chapter and discusses future research on location-based services. 9.2 Location description and detection People s physical locations can be represented in three ways: geospatial coordinates, civic locations, and location attributes. Geospatial coordinates give the longitude, latitude, and altitude value of a physical location. They are usually used for outdoor location tracking because the coordinates are usually acquired by GPS receivers, which do not work indoors. Geospatial coordinates can describe a physical position in a unique and standard way and can be encoded in the Geography Markup Language (GML) [34]. Civic locations provide street address information, similar to postal addresses.

187 167 For outdoor locations, a civic location can refer to a specific building; for building-level indoor locations, a civic location can refer to a specific room; for room-level indoor locations, a civic location can refer to a specific part of a room, e.g., in the middle of room 123. In the United States, civic locations can be represented by following the standards published by the United States National Emergency Number Association (NENA) [147] and the United States Postal Service Publication 28 [122]. Civic locations can also be used for location tracking or resource discovery, e.g., to find nearby restaurants or available devices, like printers. Compared to using geospatial coordinates for tracking, in general, civic locations are easier to understand by end users, but more cumbersome and less accurate for computers to pinpoint a place. Sometimes, pinpointing a street address on a map requiring human intervention for many causes, e.g., two different places may have the same name, and two different names may represent the same place. In most cases, geospatial coordinates can be mapped to civic locations and vice versa. The mapping can help people to choose proper formats for location-based services when only one type of location information is available. Location attributes are used to describe facets of a location, e.g., the type of a place or the number of persons inside a certain area. Location attributes can be applied to both indoor and outdoor locations. The IETF Rich Presence Format [116] draft has provided some commonly used attributes, such as the type and privacy status of a place. We can use location attributes to make communication decisions, e.g., rejecting incoming calls requiring privacy when the caller is in a public place. Different location descriptions require different location detection technologies. As shown in Figure 9.1, geospatial coordinates are usually acquired by using GPS receivers. They can be transmitted through serial port, USB port, or other I/O interfaces

188 168 Bluetooth manually ibutton swipe card active badge Geospatial coordinates GPS receiver address SIP UA address DHCP server PUBLISH SUBSCRIBE NOTIFY I-D I-D I-D Location server Signal information WiFi Figure 9.1: Location detection to a user s communication agent. Civic location information or location attributes of a room can be saved in a Bluetooth, an infrared (IR), or a radio frequency (RF) device. When a communication agent enters the room, it can get the location information of the room through BlueTooth, IR, or RF beacons. Both civic location and geospatial address information of a user can be stored in a local DHCP server and transmitted by DHCP options [94][111]. The above approaches have the location information directly sent to users communication agents, and the agents associate users identities to their locations. I name this approach agent-centric location detection. In a SIP-based Internet telephony system, the result of the agent-centric location detection can be sent to a presence agent acting as a location server in SIP PUBLISH requests [88]. Location detection can also be server-centric when communication agents cannot get their locations directly. In Figure 9.1, a user can put his I-D in a small device, such as a swipe card, an IR/RF programmable badge or an i-button [121]. The device readers in a location can read the user s I-D and send the I-D information to a location server. The location server knows the device reader s location and will associate the user s I-D

189 169 with the location. A location server may also use triangulation calculation to get a user agent s location based on the signaling information of WiFi access points. In SIP-based Internet telephony systems, SIP user agents can subscribe to their own location events by following the SIP event notification architecture [98] to get location information from location servers. Table 9.1 shows the differences between the two approaches for location detection. Server-centric Agent-centric User devices Cheaper More expensive Privacy Limited Better control Setup Device I-D to user mapping No device I-D to user mapping required Table 9.1: Differences between two approaches As shown in the table, usually, user devices used in the server-centric approach are cheaper than those in the agent-centric approach. For example, a swipe card or a RF badge may cost less than one dollar, but a BlueTooth device or a GPS receiver is much more expensive. However, agent-centric approach gives users better control over their location privacy than server-centric approach does. With the server centric approach, the control is limited to what the server can offer. In addition, the server-centric approach usually needs to know the mapping between users profiles and device IDs, while agentcentric does not require this mapping. In a large social event, such as a big conference, because people come to communicate with each other, they are more likely to release their location information so location privacy is not an essential concern. Meanwhile, people usually must register to join a conference so a location server can easily get the participants profiles. Hence, the

190 170 server-centric mode is an economic way to handle location detection for big conferences. For a place often having visitors, such as a hotel room, the agent-centric approach is more appropriate. Users do not have to provide their profiles beforehand for location detection. The location information will be stored only in users own communication agents so that users can fully control their location privacy. 9.3 Location-based services Location-based services can be divided into five categories: First, location information can be sent to remote parties. This set of services are commonly used today, e.g., in location tracking applications. Second, location information can be used to make communication decisions, e.g., automatically rejecting non-urgent calls when driving. Third, location changes can trigger communication actions, e.g., when a person enters a room, his user agent can automatically turn on the light of the room. Fourth, location information can be used in resource discovery, e.g., we can put location information in a Service Location Protocol (SLP) [51] query to find available multimedia input/output devices nearby. Fifth, a location can be treated as a communication entity, e.g., an instant message to sip:room123@examples.com will be broadcast to all the persons in room 123. I discuss each set of services in detail below Sending location information to remote parties for location tracking Locations are usually represented in geospatial coordinates or civic locations for tracking. In SIP-based Internet telephony systems, location tracking is based on the SIP event

191 171 notification architecture [98]. A watcher sends a SIP SUBSCRIBE request to a presentity or a PA. Once the subscription is accepted, the presentity or the PA can use SIP NOTIFY requests to send the location information to the watcher. The safety and privacy issues are important for location tracking. These issues have been addressed in IETF Geopriv working group drafts [37][115] [112]. Another way to send location information is to put the information in a SIP INVITE, INFO or UPDATE request, encoded in MIME multipart format [45]. This can be used in emergency call handling. When a Public Safety Answering Point (PSAP) receives an emergency call, it can pinpoint the caller based on the location information in the INVITE request Making communication decisions Different locations may require different communication behaviors. For example, video or text conversation is not appropriate when driving. In general, appropriate communication behaviors are decided by location attributes, instead of geospatial coordinates or civic addresses. User agents should respect the required communication behaviors when making communication decisions. For a call setup, either the caller s location or callee s location may affect communication behaviors. Other information, such as calendar information, may also be combined with location information to deduce appropriate communication behaviors. I illustrate different kinds of location-based decisions below based on who makes the decision and the source of the location information. A caller makes decision based on the caller s location: When making an outgoing call, a caller s user agent may check its own location to decide how to handle the outgoing call. For example, if the caller is in a place

192 172 requiring quiet, the caller s user agent may only enable text and video conversation, and mute audio devices. A caller makes decision based on the callee s location: A caller may subscribe to the callee s location and make call decisions based on the callee s location. For example, if the callee is driving, the caller s user agent may suggest calling at a later time. A callee makes decision based on the caller s location: A callee may also subscribe to the caller s location for call decision making. For example, if the callee prefers to have a private conversation, the callee s user agent can check the caller s location privacy. Location privacy status indicates whether third parties may be able hear or view a conversation. The value can be public or private [116]. If the callee finds that the caller s location privacy is public, it may reject the call. A callee makes decision based on the callee s location: A callee s user agent can check its own location for incoming call handling. For example, if the callee is in a place requiring quiet, the callee s user agent can choose to vibrate the device for incoming calls. Call decisions based on both the caller and the callee s locations: In many cases, both the caller and the callee s locations are taken into account for communication decision making. For example, a private conversation requires both the caller and the callee s location privacy to be private. Combining location information and other information: Location information can be combined with other information, such as calendar

193 173 information, to deduce appropriate communication behaviors. For example, in a conference, when a session is going on, the room of the session should be quiet. By checking the conference calendar, a user agent can decide whether to vibrate or ring the device Location-triggered actions User agents may invoke actions when detecting location changes. Location changes can be in an incoming location notification from a location server, or retrieved through locally connected location sensors. This kind of services can be programmed in LESS. I divided location-triggered actions into three classes based on the source of the location changes. Actions triggered by a user s own location changes: For example, when a user drives towards his office, his user agent will get a location notification and automatically turn on the air-conditioner in his office. Another example, when a user moves from one location to another, his user agent can transfer his ongoing media session to the new location [15]. For this class of actions, because users subscribe to their own location information, there is no authorization needed. Actions triggered by remote parties location changes: For example, in a day care center, a teacher s user agent can subscribe to all the children s location information. When a child leaves the playground, the teacher s user agent will receive an event notification and alert the teacher. For this set of services, because a user agent needs to subscribe to other users location information so authorization is required.

194 174 Actions triggered by location relationship changes: For example, when two friends are close to each other on the street, their user agents can automatically make a call or send an instant message to each other so they will not miss each other Resource discovery We can use location information in resource discovery queries to find nearby resources. For example, a map service can help users to find nearby restaurants and points of interest based on a civic location. In Internet telephony systems, a user agent with limited multimedia I/O capabilities may encode its location in a Service Location Protocol (SLP) query to find available multimedia input/output resources in the surrounding area. The user agent can then control the resources for multimedia call handling [17] Treat a location as a communication entity We can assign a URI to a room or a building and treat the location as a communication entity. We can then use the URI to represent all the people in that location. For example, we can send an instant message or an to the location URI. The instant message or the will be broadcast to all the people in that location; We can invite all the people in that location to a conference by simply sending an invitation to the location URI; We can also subscribe to the location URI to acquire location attributes, such as the number of persons in the location, and make communication decisions based on the location attributes, e.g., starting a conversation with Alice, who is in room 123, when the number of persons in room 123 is 1.

195 Extending LESS for location-based services This extension introduces two new switches, namely where-switch and where-relation-switch. The new switches can check people s physical locations and make communication decisions based on the locations. where-switch: A where-switch allows a LESS script to make decisions based on the physical location of a person. The person can be the script owner, or another person defined in the uri attribute of the switch. The switch also has an attribute, type, indicating whether the location information is geospatial coordinates or a civic address. A where-switch can match against the following location values: longitude, latitude, altitude, country, A1, A2, A3, A4, A5, A6, PRD, POD, STS, HNO, HNS, LMK, LOC FLR, NAM, PC, and distance. The detailed description of these values can be found at the Internet draft A Presence-based GEOPRIV Location Object Format [93]. where-relation-switch: A where-relation-switch allows a LESS script to make decisions based on the relative locations of two persons, e.g., the distance between two persons. The persons information is defined in the uri1 and uri2 attributes of the switch. If uri2 is not specified, it will be the URI of the script owner. For example, the script below checks the distance between the script owner and sip:bob@example.com. If the distance is smaller than 100 meters, the script will generate an instant message to sip:bob@example.com.... <where-relation-switch uri1="sip:bob@example.com"> <where-relation distance="100" condition="in"

196 176 unit="m"> <location <sendmsg message="hi, Bob, come to find me. My location is {user.location}"/> </location> </where-relation> </where-relation-switch> Implementations Location-based services in SIPC I have integrated the location-based services discussed in this chapter into SIPC. The implementation includes five parts: acquiring location information, displaying location information, sending location information, using location information for resource discovery, and programming communication services based on location information. SIPC allows users to manually input location information as well as get location information from a GPS receiver through a serial port or from SIP event notifications. Once SIPC gets location information, it has five ways to display the information to a user: showing the location information in plain text in its contact list as shown in Figure 9.2; pinpointing a civic location on a floor map as shown in Figure 9.3; pinpointing a geographic location on a local map as shown in Figure 9.4; using a TCP socket to send the location information to GeoComm s GeoLynx Dispatch Mapping System [48]; and generating a URL to display the location on a web-based map service. For example,

177 with the civic address of our department (500 West

generate a URL as below to display the location on

q=500+west+120th+street%2c+new+york %2C+NY+10027%2C+USA

2: Displaying the locations in contact list 3:

197 177 with the civic address of our department (500 West 120th Street, New York, NY 10027, USA), SIPC can generate a URL as below to display the location on Google maps. %2C+NY+10027%2C+USA Figure 9.2: Displaying the locations in contact list Figure 9.3: Displaying the locations on a floor map Geospatial coordinates: 40:48:33N 73:57:36W Figure 9.4: Display geographic location on a local map

198 178 Instead of hardcoding location-based services one-by-one, SIPC uses where-switch and where-relation-switch in its LESS interpreter to program location-based services. This way, users can easily create personalized location-based services. For example, SIPC can use the script below to automatically reject incoming calls in a quiet place. <?xml version="1.0"?> <less> <incoming> <where-switch type="place-is"> <where is="quiet"> <reject status="480" reason="i am in a place requiring quiet"/> </where> </where-switch> </incoming> </less> Location-based services demo in our lab environment My colleagues, Ron Shacham, Matthew J. Hintz-Habib, Kundan Singh, and I have implemented a location-based services demo in our lab environment. The demo consists of five parts, namely location sensing, location tracking, location-based device control, ubiquitous computing, and using location information in emergency call handling. I implemented the RFID-based location sensing, location tracking, and location-based device control, and also joined the work on the other parts.

179 Room 7LW2 Location agent Bob is in 7LW2 Turn on the light SLinke Bob s ID SIP proxy/registrar Location server Device control gateway X10 RFID reader ibutton

.. SLP SA Use the device RFID Bob ibutton Available audio device: sip:7lw2@cs.columbia.edu audio stream location NOTIFY Call Figure 9.

199 179 Room 7LW2 Location agent Bob is in 7LW2 Turn on the light SLinke Bob s ID SIP proxy/registrar Location server Device control gateway X10 RFID reader ibutton reader Bob, you are at 7LW2 Turn on the light use sip:7lw1 for audio SLP DA Media:audio URI: sip:7lw2@... SLP SA Use the device RFID Bob ibutton Available audio device: sip:7lw2@cs.columbia.edu audio stream location NOTIFY Call Figure 9.5: Location-based communication services prototype As shown in Figure 9.5, when Bob enters room 7LW2, the location agent gets Bob s profile from an i-button [121] or a radio frequency ID (RFID) carried by Bob. The location agent then associates Bob s profile with the room number and sends the information to the location server. Bob s user agent is watching Bob s location from the location server and gets Bob s updated location. The updated location then triggers a LESS script as below and the script will perform a device control action to turn on the lamp in the room. <?xml version="1.0"?> <less>

200 180 <notification> <where-switch type="civic"> <where LOC="7LW2"> <location <control action="poweron"/> </location> </where> </where-switch> </notification> </less> With the updated location, Bob s user agent may also send a resource query to find available resources in the room. For example, in our demo, it will find a SIP phone is available in room 7LW2. Bob s user agent can then control the SIP phone to communicate as long as he is in the room. Chapter 10 details how to control available resources in a SIP-based ubiquitous computing environment. The updated location information can also be used for location tracking. For example, Bob s user agent can send the location information to Bob s friends so they can pinpoint him on a map. When making an emergency call, Bob s user agent can also encode the location information in the call setup messages so emergency call takers can easily find Bob. I will detail the emergency call handling in the next section Emergency services architecture and prototype One feature from the traditional public switched telephone network (PSTN) that is essential for VoIP telephony is the ability to summon emergency services, such as by dialing

201 in the United States and 112 in parts of Europe. We are developing an emergency call handling architecture based on SIP. This is a joint work of Anshuman Singh Rawat, Matthew J. Mintz-habib, Amrita Rajagopal, Jong Yul Kim, Wonsang Song, and I. I joined the effort on designing the architecture and part of the implementations related to location detection and transmission in SIPC. Emergency call handling relies on public safety answering points (PSAPs) to handle calls. Each PSAP is dedicated to a specific geographic area. It is responsible for coordinating local or regional emergency services, such as police, fire, and medical services. Proxy servers require the emergency callers location information to route calls to a proper PSAP. Location information is also necessary for dispatching help to emergency callers. Figure 9.6 shows the architecture of the system. DHCP DNS 3rd Party Service Call Route Optional Query Internet Route Location Location PSAP URI PSAP URI Extra Info Internet Resources (Fire dept., Police, Hospital) Tracking Emergency Call Caller SIP UA Outbound SIP Proxy PSAP SIP Proxy Calltaker SIP UA Figure 9.6: Emergency call handling architecture Emergency call handling can be divided into four steps that are executed in sequence for each emergency call. Each step involves one or more entities in the system

202 182 architecture as shown in Figure 9.6. The first step identifies emergency calls. For outgoing calls, the caller s user agent and outbound proxy server are responsible to check whether the call is an emergency call or not. Once an emergency call is identified, the second step determines the caller s location, and integrates the location information into call setup messages. The third step finds an appropriate PSAP based on the location information. We use DNS Naming Authority Pointer Resource Records (NAPTR) [83] to find appropriate PSAPs. A proxy server can then route the emergency call to the PSAP. The fourth step presents the emergency call to the emergency call taker at the PSAP. The emergency call taker utilizes the information in the call setup messages to handle the emergency call, such as pinpoint the caller on a map and bring police, fire, and medical services into a conference call. At any point, a SIP entity may query third party services for information, such as the caller s location or medical records. 9.6 Conclusion This chapter summarizes and categorizes different location-based communication services and presents our implementations on location-based services. Location information can also be used in ubiquitous computing, which I will describe in detail in next chapter.

203 183 Chapter 10 Ubiquitous computing using SIP Ubiquitous computing aims to enhance computer use by making many computers available throughout the physical environment, but making them effectively invisible to the user. [137] We are developing such a system based on SIP. The original design of the system is a joint work with my colleagues, Stefan Berger and Stylianos Sidiroglou, guided by Prof. Henning Schulzrinne. Later on, Ron Shacham joined the work working on the session mobility development. My contribution to the work involves jointly designing the overall system architecture and developing the SIP third-party call control [104] functions, Tcl-based Service Location Protocol (SLP) [51] stack, and SIP event notification architecture in SIPC. We use SIPC as the SIP user agent to prototype the system. In the past decade, ubiquitous computing has been pursued in a large number of prototypes. We believe that it is time to move from special-purpose, one-of-a-kind systems to more widely deployable systems that scale to the global Internet. Such globalscale ubiquitous computing systems need to be securable, administered by multiple, independent non-specialist administrators and operators, and integrate off-the-shelf hard-

204 184 ware and software. An example usage of such a system is in a home environment where it can automatically change communication devices when the talker is roaming from one room to another. The talker always uses the wired devices of the room which he is in. Compared with using wireless devices for roaming in the home, using the wired devices can get better conversation quality since wired devices usually have higher bandwidth and bigger displays. Another example is to allow a traveler to easily use the devices in a hotel room. We will describe the example scenarios in detail in Chapter We are developing such a scalable ubiquitous communications and computing system in our lab. The system incorporates core tenets of ubiquitous, pervasive and context-aware computing, in particular: Multimedia: We believe that communications incorporates all types of media, from continuous media to application sharing, and we consider multimedia support a core component of a ubiquitous computing environment. Device integration: Our system integrates mobile devices such as programmable active badges, PDAs and laptops with resources embedded into the environment, such as large displays, video projectors, high-resolution video cameras, loudspeakers, stereos and lights. Active multimedia sessions can be moved from one device to another and can be split across devices. In a home or a hotel room, a multimedia conversation might be controlled by a PDA, video shown on a TV and audio played through a stereo system. A user can use his PDA as a universal communication and control agent, not only for communication, but also as a controller to networkconnected appliances. Event-based: We believe that events offer a useful abstraction for tying together diverse

205 185 systems while requiring modest knowledge about their properties. We chose the SIP event model [98] as a core component of our system, as it can scale to large numbers of users spread across administrative domains and is also suitable for small-scale networks. Location-aware: Location is one of the key contexts that determines the types of devices that are available and how communication should be conducted to minimize disruption to the user. Rather than just using geographic or civic location information, the higher-level information that describes the category of a place, such as theater or public transport, and its properties can also be applied to our system. Other user context, such as the number of persons in a room, active conversations or how recently a device has been used, also influence system behavior. In a home environment, when a user is roaming from one room to another, the system will be aware of the user s location and enable the devices in the current room for conversation, which I had introduced in Chapter 9. Privacy-conscious: We aim to give users maximum control over their incoming communication and the amount of information about their context that is revealed to others. Invisible to user: Wherever possible, we delegate system behavior to user-defined policies rather than requiring direct user interaction. Policies can be programmed as LESS scripts. The system is designed to support a range of activities, from home-based settings to collaboration between distant sites. It is designed so that participation only requires standard SIP-speaking tools such as Microsoft Windows Messenger or even a basic cell

206 186 or landline phone. Our system derives its scalability from the underlying SIP architecture. The work builds on our Columbia InterNet Extensible Multimedia Architecture (CINEMA) infrastructure for multimedia collaboration [68]. The remainder of this chapter is organized as follows. Chapter 10.1 discusses earlier research in the area. Chapter 10.2 shows our system architecture. Chapter 10.3 shows how to use SLP to find available resources and control them. Following that, Chapter 10.4 discusses how to handle calls using the system, and Chapter 10.5 presents a service scenario. Finally, I conclude this chapter in Chapter Related work A number of projects in ubiquitous computing had motivated our work. Some examples are the Intelligent Room (MIT) [26], the Interactive Workspaces Project (Stanford University) [20], the Aura Project (CMU) [47], the Composable ad hoc location-based services (UC Berkeley) [56], and the Easy Living (Microsoft) project [27]. While these projects have successfully built systems that effectively interact with the user and the environment, they use proprietary systems and are primarily based on non-standard protocols and are generally limited to a single organization or building. The work presented in this chapter is centered around open protocol standards like SIP, SLP [51], Bluetooth technology and on-going efforts in the IETF, such as the Rich Presence Information Data Format (RPID) [116] and the DHCP option for civic location [111]. Location based services have received a lot of attention by wireless providers and are integrated into the Third Generation Partnership Project (3GPP) Universal Mobile Telecommunications System (UMTS) service architecture and WAP [75]. There are a

207 187 number of efforts currently underway for establishing a standard for positioning techniques and a standard for relaying location information. Proposals are offered by the Location Inter-operability Forum (LIF) under the Mobile Location Protocol [89] and the Open GIS Consortium. The GEOPRIV working group in the IETF deals with locationdependent privacy issues. Civic and categorical presence information has also been proposed as part of the Rich Presence Information Data Format (RPID) [116] and the DHCP option for civic location [111] System architecture As shown in Figure 10.1, the system involves the context around the user and the intelligent end system the user holds. Ubiquitous Computing Context GPS SIP server SA REG DA SA BlueTooth REG profile Service Location Query DHCP active passive Location Sensing Resource Discovery (UA) SIP Intelligent End System Call Control Resource Control Home Domain Internet Figure 10.1: Ubiquitous computing system architecture

208 188 The ubiquitous computing context provides location information and multimedia resources to users. The intelligent end system interacts with the context and retrieves the location and resource information, transfers the information to the SIP server and controls the resources in the context. The intelligent end system consists of three core components: location sensing, resource discovery and management, and call control. The location sensing part is introduced in Chapter 9. We use SLP [51] for resource discovery and use SIP, HTTP, or SOAP [134] to control available resources. The call control part receives events from and sends control commands to the other two parts, and uses SIP third-party call control architecture [104] to utilize available resources for call handling. I will introduce resource discovery and call control in detail below Resource discovery and control Resource discovery using SLP Service discovery is an essential step in mobile computing if a user owning a wirelessly connected device enters a new environment and wants to use services in the surrounding area. In our scenario we decided to use SLP [51] as a service discovery protocol for a number of reasons. First, SLP is an open standard. Second, the query language of the Service Location Protocol is fairly capable. It does not only allow simple matching for equality or prefixes of names, but also allows comparisons with mathematical operators such as,, which is particularly interesting when used with location based services. By using these operators, we can easily search for services within a given area.

209 Resource control using SIP, HTTP and SOAP We propose that devices, such as a networked projector or a SIP phone, be controlled using existing control protocols. There are at least three choices: SIP, HTTP and SOAP [134]. Both the SIP DO [84] and MESSAGE [32] methods can be used for device control. The SIP MESSAGE method was originally designed for text-based instant messaging between human users, however, we consider sending a control message, say, turn on the lights, to a device is not fundamentally different from talking to a person, except for the message content type. The message content could be an XML document for complicated devices such as stereos, pan-tilt-zoom video cameras and projectors, or a simple command such as on and off for basic devices such as lights and blinds. One major advantage of the SIP approach is that the SIP proxy infrastructure can map a generic, long-term-stable identifier such as sip:lamp@cs.columbia.edu into a current IP address and port. HTTP offers a second approach, with the request encoded as URI query parameters, as in Normal HTTP user authentication can then be used to restrict access to devices. SOAP, the third approach, is the most powerful, but also adds the most implementation complexity. A single device can easily be accessible by all three mechanisms. The SLP entry enumerates all service interfaces. In the user-centric approach, the device interface needs to ensure that actions initiated by different users do not interfere with each other. For a device-centric approach, the device controller can more readily devise a priority algorithm that, for example, keeps the setting to that preferred by the first person entering the room.

210 Call Control Event-Triggered Actions Actions can be triggered by the events received from the location sensing and service discovery parts, or inbound or outbound calls, or presence information. The actions can be executed in SIP proxy servers, e.g., described by CPL scripts [72], SIP servlets [95] or SIP CGI scripts [71]. The actions can also be executed in user s end systems, for example, described by LESS scripts. The call control module of the network servers or end systems executes the action scripts and sends control messages to related network components, such as an echo-canceling microphone in the user s environment Events Events are one of the core abstractions in our ubiquitous computing architecture. We use them to propagate information about the presence of people and devices. When a user enters a room with a device containing the user s identification, such as a RFID, an access point located in the room will sense the presence of the RFID, read the identification number from the device, map it to the user s identifier and propagate a presence detection message to a presence server located in the network. Similar behavior can be expected if a user authenticates himself with the environment using another passive device like a swipe card or a simple ibutton where neither one of the devices has the capability of actively sending a message. We use the SIP event framework [98] for event transmission. The framework defines mechanisms for transferring detailed status information, the kind of place the user is in (e.g., home, office, etc..) and the location-dependent privacy model the user is

211 191 holding (e.g., whether a person is allowed to make outgoing calls). Generally it contains additional information that we use to dynamically enforce policies depending on the current local restrictions of the user. When the call control part receives an event, it will perform CPL, SIP CGI or SIP servlet scripts accordingly Using back-to-back user agent (B2BUA) to control resources With available resources discovered, there must be a way to utilize the resources for calls. For example, a user should be able to use his PDA to instruct an echo-canceling microphone in the context to start recording and send packets to a certain address, and to enable a wall-hanging plasma display to accept packets and play the video streams. In a SIP environment, a secure and flexible way to utilize the resources in an environment is to have the visitor s device work as a SIP back-to-back user agent (B2BUA) and follow the third-party call control architecture [104] to control the devices. A back-to-back user agent is a logical entity that receives a request and processes it as a user agent server (UAS). In order to determine how the request should be answered, it acts as a user agent client (UAC) and generates requests [106]. For an incoming call, the visitor s PDA generates calls to the resources with necessary information (such as the remote party s SDP [52] information). All the calls to the resources must go through the visited domain SIP server for authentication and authorization. The resources in the context will automatically accept the authenticated and authorized calls, send their own information (such as IP address and port number for receiving packets) to the B2BUA and start the communication with the remote party. Upon receiving the resources information, the B2BUA then uses the information in the response to the remote party.

212 Access Control Remote control of devices and access to services can expose the infrastructure to significant risk. For example, we do not want to allow random strangers to turn the home video camera into a surveillance camera or to turn the lights off. Three security models are plausible. In the first one, users and visitors are explicitly registered with the local SIP server, obtaining a suitable shared temporary secret or, if a public-key-infrastructure exists, simply verifying their identity against a local access list, with an expiration date. Unfortunately, manually enrolling and removing users is tedious, but can be simplified by automating enrollment with physical tokens such as swipe or smart cards, as described earlier. Also, just because somebody visited a room, that person should not be granted access to the equipment when returning home. A second mechanism employs cross-domain AAA (AAA stands for authentication, authorization and accounting). For example, when the user alice@example.com visits visited.com, the visited domain queries the AAA server, using RADIUS [97] or DIAMETER [28], in the example.com domain and ascertains that Alice is a valid user in her home domain. This approach requires some kind of roaming agreement between domains. A third approach makes use of location information. A user that is physically in the visited domain can probably already manipulate the equipment, so it does not add much vulnerability to grant control protocol access to those in the room. For example, the Bluetooth location server can tell the user a secret that is tied to the visitor s temporary network address or public-key identity, somewhat similar to a Kerberos ticket. The ticket can then be used in call requests or control messages.

213 Privacy Location information is highly sensitive. Thus, users will be reluctant to allow a system to acquire such information unless they can tightly control who obtains this information under what circumstances. Already, SIP for instant messaging and presence offers mechanisms to restrict subscribing to presence information, but this is a binary decision that is too coarse-grained for location information. The IETF GEORPIV working group has defined a document format for expressing privacy preferences [115] that make it easy for a user to construct privacy profiles that satisfy the IETF GEOPRIV requirements [35]. We can also use CPL or LESS to handle privacy preferences. For example, a user might restrict delivery of location information by checking location (using location-switch), time-of-day (using time-switch), or subscriber (using address-switch) Service examples Media streams Bluetooth Location Visitor Use Resources SIP Authenticate Server Call Call Room 123 Info Hotel Resource Info SLP Server Register Front desk Home domain SIP and AAA server Figure 10.2: SIP-based ubiquitous computing in a hotel

214 194 Figure 10.2 shows an example of a hotel environment with the SIP-based ubiquitous computing configuration. In this configuration, the network environment of the hotel is the visited domain, and the environment containing the visitor s profile is the home domain. In the visited domain, the Bluetooth access point sends location information, such as the hotel room number, to the visitor s device. The SLP server provides information about available services, such as the SIP addresses of the audio and video devices in the visitor s room, to the visitor s device. The SIP server in the visited domain can contact the home domain AAA server to authenticate the visitor and perform authorization for using available services in the surrounding area. For example, the SIP server may allow the visitor to use the devices in his own room, but not the devices in the hotel conference room. The home domain SIP server may host call control service scripts. The visitor s device should send its location information to the home domain SIP server so the server can make location-based call routing decision. Figure 10.3 shows the protocol messages exchanged when Alice, who has a SIP URI sip:alice@columbia.edu, is in the hotel room and accepts an incoming call. The figure only shows the messages for setting up the video stream. The audio stream will be directly between Alice PDA and Bob and not shown in the figure. When Alice enters the hotel, she first registers with the front desk. The hotel SIP server now has her information, for example, her SIP address and her room number. The SIP server will allow her to access the resources in her room. When she enters her room, her PDA first acquires location information from the local Bluetooth server. (The Bluetooth server can also work as the access point to Internet.) The Bluetooth message indicates the room number and the service domain, here hotel.com. Alice sends out an SLP query to the SLP server for that domain and finds out that the room

215 195 Alice PDA visited domain Bluetooth server visited domain SLP server visited domain SIP server SLP Query Advertise hotel.com SLP Query REGISTER (allow video) 200 (for REGISTER) INVITE 407 INVITE with auth 200 ACK INVITE 200 ACK visited domain camera AAA request OK INVITE 200 ACK nyu.edu domain SIP server AAA server Video stream INVITE m=video 200 ACK Bob Figure 10.3: Session setup with using the visited domain camera has a network-attached camera, with the SIP address camera@hotel.com and a video display, plasma@hotel.com. Alice conveys her new network location to her home SIP server, via REGISTER and indicates, via the caller preferences mechanism, that she is now capable of sending and receiving video. When a remote caller, Bob, tries to invite Alice to a video session, using a SIP IN- VITE request, Alice s home domain SIP server forwards the call to her PDA. Using thirdparty call control, Alice sends an INVITE to the network-attached camera and display, with Bob s address in the session description. The INVITE request traverses the hotel SIP server in the hotel.com domain. Since Alice has registered at the front desk, she is authorized to use the resources in her room. However, the SIP server in hotel.com

216 196 domain does not have Alice s credentials for authentication, it consults Alice s home domain AAA server. In our example, the hotel.com domain has a roaming agreement with the columbia.edu domain and thus authenticates Alice. With the AAA authorization in hand, the hotel SIP server will permit the INVITE to reach the camera and display, which will automatically join the call in progress Conclusion In this chapter, I have presented a global scale ubiquitous computing architecture based on open standards like SIP and SLP. The system is part of our location-based services prototype. The system can be used to augment end-system capabilities and change device behavior using location aware user preferences.

217 Part V SIPC a multi-function programmable SIP user agent 197

218 198 This part mainly presents the implementation of my research work that I introduced in previous parts. All the implementation has been integrated into the SIP user agent I developed SIPC. SIPC has a built-in LESS interpreter, and can perform location-based services. This part also introduces a research topic on how to integrate multiple Internet functions into one user agent and how to enable new communication services based on multi-function interactions.

219 199 Chapter 11 SIPC, a multi-function SIP user agent I have developed a SIP user agent, SIPC, as a testing platform to experiment with the research work I have introduced in previous chapters. My colleagues, Anshuman Singh Rawat, Matthew J. Mintz-habib, Amrita Rajagopal, Jong Yul Kim, Wonsang Song, and I are also using SIPC for emergency call handling and ubiquitous computing research, and Ron Shacham is using SIPC for his session mobility research. In addition, SIPC can be easily modified for different projects. Prakash GS, Venkata Sastry Malladi, and I developed a traffic controller training system for Federal Aviation Administration (FAA) based on the SIP implementation in SIPC. I also developed a simplified version of SIPC, SIPCLOBBY, which has been installed in the machine at our department s lobby. A visitor can click on a SIP URI of a person in our department and invoke SIPCLOBBY to make a call. Beyond Internet telephony session setup functions, SIPC also incorporates many other functions, such as networked appliance control, real-time multimedia streaming, networked resource discovery, third-party call control, Internet multicast streaming, location sensing, emergency call handling, and conference floor control. In addition, SIPC

220 200 can interact with web browsers and clients. Putting all these functions together and allowing them to interact with each other can introduce many new services. In SIPC, these new services are programmable because SIPC contains a LESS interpreter. This chapter discusses how to integrate multiple functions in SIPC in a programmable way and presents the implementation details of SIPC Introduction In Internet telephony systems, traditional telephony services, such as call transfer, can be enhanced by the integration of Internet services, such as , web, instant messaging, presence notification and directory lookups. The enhancements require Internet telephony end systems to perform more functions in addition to audio and video communications. Our SIP user agent, SIPC [64], can support all the functions listed in Table Function Protocol External applications Multimedia SIP [106] RAT [130] for audio call RTP [118] VIC [129] for video SDP [52] WB [7] for whiteboard RFB [6] VNC [6] for desktop sharing Instant SIP MESSAGE method [32] messaging Presence SIP event notification notification architecture [98] Networked SIP DO method [84] SIP/X10 [8] control gateway appliance X10 lamp module (LM465) control SIP/sLink-e [58] control gateway slink-e infrared (IR) controller Real-time RTSP [119] rtspd (RTSP server) streaming Voic SIP MWI [77]

221 201 Resource SLP [51] mslp (SLP DA/SA) discovery Internet TV SAP [53] Quicktime for watching video Conference SIP, SOAP [134] [146] sipconf (SIP conference server) floor control Session SIP 3PCC [104] mslp (SLP DA/SA) mobility SLP [51] Location Presence-based GEOPRIV GeoComm s GeoLynx Dispatch Mapping sensing Location Object Format [93] System [48] TCP socket gpsgw (retrieving GPS data) Emergency call handling Emergency services URI for SIP [113] GeoComm s GeoLynx Dispatch Mapping System [48] DNS NAPTR [83] MapInfo Envinsa Location Platform [81] MIME multipart format [45] Snowshore Media Server [1] SIP REFER method [126] Web HTTP, Windows registry http URL handler defined in registry Update registry for SIP URL handling SMTP, Windows registry mailto URL handler defined in registry Table 11.1: Functions in SIPC With all these functions integrated in SIPC, SIPC can perform many new services that are otherwise impossible if all these functions are independent to each other. I describe these functions and the new services in detail in Chapter Too many functions in one application may make the application too complicated to maintain. Chapter 11.3 analyzes the relationship among all the functions in SIPC and discusses the integration approaches that can minimize the overall application complexity, while still provide convenient ways for function interaction. SIPC handles multi-function interactions in a programmable way. Chapter 11.4 shows how to program multi-function interactions by using LESS scripts. Chapter 11.5 briefly introduces the implementation details of SIPC. Chapter 11.6 concludes this chapter.

222 New services introduced by multi-function integration In traditional telephony systems, communication services are provided by switches in communication networks. The services are performed based on very limited information, such as the address and the busy status of the caller and the callee, allowing only a small set of actions, in most cases, to route calls. In Internet telephony systems, as we discussed before, services can be implemented in both network servers and intelligent end systems. With the integration of Internet services, such as presence indication, Internet telephony services have access to much more information, and offer a richer set of service actions, not limited to call routing, the actions can also be networked appliance control, instant messaging, and web browsing. I describe a few of the new services in SIPC below Setting up a favorable communication environment Communication quality is not only determined by the quality of audio/video streams transmitted between endpoints, but also affected by the talkers communication environment. For example, background noise may affect audio conversation and brightness of lights may affect video conversation. In a networked home with network controllable appliances, the integration of networked appliance control in a communication agent may help to setup an environment conducive to communication. In our lab environment, SIPC can automatically pause the stereo through a Slink-e controller [58] when receiving an incoming call. If the call requires video communication, SIPC can also automatically adjust the brightness of a lamp through an X10 controller. SIPC uses the SIP DO [84]

223 203 method to perform networked appliance control Call handling based on presence information Presence information can help to make call decisions. In traditional telephony systems, a caller usually does not know a callee s status before making a call. While in Internet telephony, a caller can acquire different kinds of information of the callee before making a call. The information is not limited to the online and offline status, it can also be location privacy preference, ongoing activities, device capabilities, etc.. Based on the status information, SIPC can generate many new services. For example, SIPC can automatically make a call when the called party is online, or start a conference when all the essential participants are available Using networked resources Usually the capability of an end system is in inverse proportion to its portability. A portable end system usually has a small display, low-quality audio, and inconvenient input devices. To ensure good conversation quality while keeping device portability, it is desirable to allow portable devices to find and control the surrounding devices with better communication capabilities. SIPC is able to find the available devices by using SLP and control them based on the SIP third-party call control architecture Location sensing and location-based services As I discussed in Chapter 9, in Internet telephony systems, location information can help to make call decisions and trigger automatic communication actions. SIPC supports

224 204 location information handling. SIPC uses location information in three ways: it sends the information to remote parties; it automatically performs communication actions when receiving a location update; and it ensures appropriate communication behavior based on the location information. Location information can be revealed to remote parties for location tracking as part of the presence notification or encoded in MIME [21] with other content. Location information could be room (name or function information), civic (street and community), categorical (such as movie theater), activity (such as travel) and privacy preference (such as quiet). SIPC can convey location information, for example, in a SIP NOTIFY request in RPID format [116] or GEOPRIV Location Object Format [93], to the parties explicitly showing interests in the information. SIPC can also include the location information in SIP REGISTER or PUBLISH [88] requests to upload the location information to a location server. When sending an emergency call, SIPC will encode its location information in MIME in a SIP INVITE request. The emergency call taker can conveniently track the caller with the location information. When SIPC receives a location update, it may invoke a service script, such as a LESS [140] script, to perform automatic actions. For example, SIPC can automatically turn on the light of a room when it enters that room; When SIPC receives the location of a person and the distance to the person is less than a certain value, SIPC can automatically send an instant message to the person. Integrating location information with call control services can also help to govern appropriate communication behavior. For example, in a movie theatre, the Bluetooth device in the movie theatre may broadcast its location attribute as quiet ; when SIPC receives the location information, it may automatically block incoming calls unless the

225 205 priority of the call is emergency Multimedia session sharing The Session Announcement Protocol (SAP) [53] advertises multicast multimedia sessions and their parameters to prospective participants. SIPC has a built-in SAP user agent so it allows users to watch sessions announced by using SAP. SIPC also allows users to easily share interesting media sessions with their friends. If a user finds an interesting media session and wants to ask his friends to watch the same session, the user must convey the session description to his friends. But we would not expect any user to understand the detail of session descriptions, which is encoded in the Session Description Protocol (SDP) [52]. Because SIP also uses SDP for session description, SIPC can simply copy the SDP content of the interesting session and put the SDP content in a SIP INVITE request. This way, the user can simply call his friends with the SDP content without knowing the session details Voic handling The integration of web, and SIP message waiting indication [77] provides various ways for handling voic . A voic can be sent as an attachment, or as a HTTP [43] URL or a Real Time Streaming Protocol (RTSP) [119] URL in an . The voic information can also be in SIP message waiting indication [77] notification. SIPC can dial in the voic server to get the voic , or play the attachment, or start a web browser to retrieve the voic .

226 Conference floor control with active talker indicator During a conference, floor control [114] helps to assign talking rights. Only the floor holders voice gets delivered to each participant. In a classroom environment, when a student gets the floor, turning on the light on the student s desk or adjusting the video camera to face the student may help to find the talker. The integration of networked appliance control with conference floor control in SIPC can handle this task gracefully How to integrate multiple functions The above service examples show that multi-function integration can bring many innovative services. However, integrating too many functions in one application may make the application too complicated and may confuse users if the application contains unnecessary functions. Since SIPC directly interacts with users, any confusion from users may impair its usability. It is very important to choose an appropriate integration method to enable the new services in SIPC but without making it too complicated and without adding too much implementation effort. Before discussing the integration methods, I first list the functions integrated in SIPC, and investigate the relationship among these functions. The investigation can help us to choose appropriate integration methods Functions integrated in SIPC SIPC supports a range of media types, such as audio, video, whiteboard and desktop sharing and can perform functions beyond multimedia calls. SIPC uses the SIP DO [84] method to perform networked appliance control, uses the SIP event notification architecture [98] to perform presence notification, uses the Session Announcement Protocol

227 207 (SAP) [53] to retrieve multicast multimedia session information, uses RTSP [119] and SIP message waiting indication [77] to retrieve voic , uses DHCP Options for Civic Addresses [111] and GEOPRIV Location Object Format [93] for location sensing, uses the Service Location Protocol (SLP) [51] to find available networked resources, and uses SIP for third-party call control [104] to control networked resources. SIPC uses external applications to handle and web browsing. SIPC has a built-in SIP CGI [71] and a built-in LESS [140] interpreter to handle service scripts. When I was integrating all these functions in SIPC, I noticed that many functions overlap with each other, and the functions can interact with each other in different ways. I show the relationship among these functions below and then discuss the integration approaches based on the relationship Overlap among SIPC functions Figure 11.1: Overlap among SIPC functions Many of the functions in SIPC have some overlap with each other. As shown in Figure 11.1, SAP user agents, RTSP user agents and SIP [106] user agents all use SDP

228 208 Figure 11.2: Interaction among SIPC functions [52] for session description and RTP [118] for real time media stream transmission. The presence status, conference status, and location information can all be transmitted by the SIP SUBSCRIBE and NOTIFY requests[98]. All of the SIP event notification, SIP multimedia session setup and SIP networked appliance control can share the same SIP implementation. Because of the overlap, when I integrated all the functions mentioned above in SIPC, it did not make SIPC too complicated to manage Interaction among SIPC functions Multiple functions integrated in SIPC may interact with each other and introduce new services. As shown in Figure 11.2, the SAP user agent passes the session description information to the SIP user agent so the SIP user agent can invite another SIP user agent, SIP UA2, to watch the same multicast media session; When the SIP user agent gets the message waiting indication from voic server, it can instruct the RTSP user agent to retrieve the voic ; When the SIP user agent gets an location notification, it can control networked appliances, for example, turning on a lamp.

229 209 Based on the the function overlap and function interaction illustrated in Figure 11.1 and Figure 11.2, I present the approaches on multi-function integration below Approaches for multi-function integration The integration methods can be built-in and interprocess-control. The built-in method is to hardcode functions in a user agent so the user agent can invoke the functions by using API calls. The functions integrated by the built-in method are tightly coupled with each other. They can share code with each other and easily interact with each other by API calls. The interprocess-control method puts functions outside the user agent. The functions integrated by the interprocess-control method may interact with each other via interprocess communication, such as Dynamic Date Exchange (DDE [87]) and Message Bus (MBUS [90]). When adding a new function, three criteria may help to choose an appropriate integration method. First, if the new function can share components with the existing functions so that adding the new function will not increase the overall application complexity much, using built-in method is applicable. With functions built-in, one function is easier to interact with another function by using function calls. Second, if the new function interacts with the existing functions extensively, the built-in method is preferable. Even though we can use interprocess communication to handle function interactions, using function calls is simpler and more efficient. Third, people like to use the software they are already familiar with. If there are existing popularly used applications supporting a function, controlling the existing applications to perform the function is better than developing a new built-in software component for the function. So, in this case, the interprocess-control method is more appropriate. Below I illustrate how to apply the criteria in integrating the function set of SIPC.

230 210 media applications (real-time) audio video white board desktop sharing TCP cmdline cmdline cmdline Third party call control SIP Multimedia call control RTSP Network appliance control Floor control SIP for presence Internet TV(SAP) Instant message SIP CGI engine LESS/CPL engine location sensing Emergency handling Service location non-real-time applications TCP cmdline mailto: handler cmdline handler location sensors client web browser cmdline sip: handler Figure 11.3: Function set in SIPC Figure 11.3 shows the function set of SIPC. The functions inside the big thickline-rectangle are integrated in the SIPC core, others are running in separate processes and controlled by the core through interprocess communication. In SIPC s function set, all SIP related functions, such as SIP call setup, SIP DO method for networked appliance control, SIP event notification, SIP for instant messaging, SIP third-party call control, and SIP emergency call handling, share the same SIP implementation and are tightly related to each other. These functions should be integrated in built-in way and put together in one application. SIPC uses SIP and SOAP [134] for conference floor control [114], so the floor control function should also be integrated in SIPC. To support the Session Announcement Protocol (SAP) and the Real Time Streaming Protocol (RTSP), based on the investigation in the Chapter and , and the service examples in Chapter 11.2, I consider the best way is to implement them as builtin functions in SIPC. Both SAP and RTSP sessions use SDP for session description, and RTP for multimedia transmission, the same as SIP multimedia sessions, so code sharing is possible. Using external SAP and RTSP applications requires communication interfaces between SIP functions and SAP or RTSP functions. The communication interface

231 211 is not trivial to build to handle function interactions. The Service Location Protocol (SLP) integration can be either built-in or interprocesscontrol because there is not much code sharing between SLP support and other functions. The communication interface between an SLP client and a SIP user agent can be simple. We chose to build an SLP client in SIPC because the implementation effort was not much but it is easier to announce detected resources to other components. As we discussed in Chapter 9.2, location sensing can be agent-centric and servercentric. SIPC uses the interprocess-control method for agent-centric location sensing. When SIPC starts, it listens on a TCP port (port 5622) for location information. Location sensors can send location documents in GEOPRIV Location Object Format [93] to that port. For server-centric location sensing, SIPC uses the SIP event notification architecture [98] to handle location subscriptions and notifications. In terms of function support for and web browsing, because there are many existing and web browsing applications, instead of building and web functions in SIPC, the preferable way is to control existing clients and web browsers in an interprocess way. For example, on a Windows platform, SIPC can invoke a web browser by running the command rundll32 url.dll,fileprotocolhandler $url During the installation of SIPC, SIPC s installer will set Windows Registry values as below, people can then invoke SIPC from a web browser. REGEDIT4 SIP Protocol"

232 212 "URL Protocol"="RFC 3261 SIP" [HKEY_CLASSES_ROOT\sip\shell] [HKEY_CLASSES_ROOT\sip\shell\open] %1" 11.4 Programming multi-function interactions Chapter 11.2 presents some new services but without describing how to perform these services. Instead of hardcoding these services one by one, it is more flexible and powerful to make the services programmable and customizable by using LESS scripts. SIPC has a built-in graphical LESS service creation environment, which I introduced in Chapter 5.2. SIPC can also perform service learning, which I introduced in Chapter 8, and handle feature interactions, which I introduced in Chapter 7. Users can use the above capabilities to program multi-function interactions in SIPC Implementation SIPC is written in Tcl/Tk and C/C++. It can run on different platforms, including Windows 98/2000/NT/XP, Linux, Sun Solaris, and FreeBSD. It contains about 100,000 lines Tcl/Tk code and 40,000 lines C/C++ code. Among them, about 20,000 lines Tcl/Tk code and 35,000 lines C/C++ code are from open source. The executable file of SIPC is about 4.5M bytes. SIPC uses external media applications for multimedia communications. I

213 Speed dial bar Contact list Callee address entry Information and instant messaging frame Service entry Media frame istyping indication Instant messaging input field User created service scripts

233 213 Speed dial bar Contact list Callee address entry Information and instant messaging frame Service entry Media frame istyping indication Instant messaging input field User created service scripts Figure 11.4: Main user interface of SIPC have introduced SIPC s functions in previous sections. Below, I briefly introduce the user interface of SIPC for invoking the functions. Figure 11.4 shows the main user interface of SIPC. SIPC s interface follows the style of the Mozilla Firefox web browser. SIPC allows multiple simultaneous calls. When a user clicks on the New call button, or types the Ctrl+T key, SIPC will create a new tab to host the new call. The user can input the remote party s address in the callee address entry, then click the Call button to make a call. Once a call established, the Mute, Hold, and Transfer functions will be enabled. To make an emergency call,

234 214 users can click the red SOS button. In the service frame, a user can add new services written in LESS and enable end system services. The service creation interface is as what I described in Chapter 5. In SIPC s contact list, in addition to the online/offline information, SIPC can also display the media capabilities, location, and activity information of the contacts. In SIPC s People menu, a user can choose what information to display as shown in Figure Figure 11.5: People menu of SIPC In SIPC s Status menu, as shown in Figure 11.6, a user can set his own status information, such as activities, location, and mood. For location information, SIPC uses the interface described in Chapter 9.5 to display the user s location. Users can access the functions we mentioned in previous sections through SIPC s Action menu, as shown in Figure Figure 11.8 shows the user interface for the Internet TV function. A user can pick up a session, right click on that session, then choose Invite someone for session to send a SIP INVITE with the session description to his friend. We have built two device control gateways, an x10 device control gateway for

SIPC can send control messages to these gateways to control networked appliances as shown in Figure 11.

235 215 Figure 11.6: Status menu of SIPC Figure 11.7: Action menu of SIPC lamp control, and a slink-e device control gateway for stereo control. SIPC can send control messages to these gateways to control networked appliances as shown in Figure Conclusion This chapter introduces how to integrate multiple communication functions in SIPC, and presents the new services facilitated by the multi-function integration. The integration is not simply putting all the functions together but run them separately; instead, a careful

interactions. I use LESS service scripts to automate the interactions.

236 216 Figure 11.8: Internet TV in SIPC Figure 11.9: SIPC networked appliance control design is required to minimize the overall complexity, and enable function interactions. I use LESS service scripts to automate the interactions. Multi-function interactions enable many innovative services that are otherwise impossible. We are going to put SIPC as an open source project and hopefully to see more interesting functions brought to SIPC from open source community.

Department of Computer Science. Burapha University 6 SIP (I)

Burapha University ก Department of Computer Science 6 SIP (I) Functionalities of SIP Network elements that might be used in the SIP network Structure of Request and Response SIP messages Other important