GENERIC STREAMING OF MULTIMEDIA CONTENT

Similar documents
Format-Independent Multimedia Streaming

GENERIC MULTIMEDIA CONTENT ADAPTATION

Delivery Context in MPEG-21

MPEG-4. Today we'll talk about...

AN INTEROPERABLE DELIVERY FRAMEWORK FOR SCALABLE MEDIA RESOURCES

MPEG-21: The 21st Century Multimedia Framework

Internet Streaming Media

Enhancing RSS Feeds: Eliminating Overhead through Binary Encoding

MPEG-21 MULTIMEDIA FRAMEWORK

Evaluation of Models for Parsing Binary Encoded XML-based Metadata

An Architecture for Distributing Scalable Content over Peer-to-Peer Networks

Internet Streaming Media. Reji Mathew NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2007

ABSTRACT. Keywords: Multimedia, Adaptation, Interoperability, XML, MPEG-21, Digital Item Adaptation

KNOWLEDGE-BASED MULTIMEDIA ADAPTATION DECISION-TAKING

IST MPEG-4 Video Compliant Framework

The RTP Encapsulation based on Frame Type Method for AVS Video

Network-Adaptive Video Coding and Transmission

ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS

Unified Communication Specification for H.264/MPEG- 4 Part 10 Scalable Video Coding RTP Transport Version 1.0

Interworking Between SIP and MPEG-4 DMIF For Heterogeneous IP Video Conferencing

MPEG-4: Overview. Multimedia Naresuan University

Internet Streaming Media. Reji Mathew NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2006

A MULTIPOINT VIDEOCONFERENCE RECEIVER BASED ON MPEG-4 OBJECT VIDEO. Chih-Kai Chien, Chen-Yu Tsai, and David W. Lin

EE Multimedia Signal Processing. Scope & Features. Scope & Features. Multimedia Signal Compression VI (MPEG-4, 7)

Internet Streaming Media

Lecture 7: Internet Streaming Media. Reji Mathew NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2007

Lecture 7: Internet Streaming Media

Request for Comments: 5109 December 2007 Obsoletes: 2733, 3009 Category: Standards Track. RTP Payload Format for Generic Forward Error Correction

Cobalt Digital Inc Galen Drive Champaign, IL USA

USING METADATA TO PROVIDE SCALABLE BROADCAST AND INTERNET CONTENT AND SERVICES

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

Request for Comments: 4425 Category: Standards Track February 2006

ISO/IEC INTERNATIONAL STANDARD. Information technology JPEG 2000 image coding system Part 12: ISO base media file format

[MS-RTPRAD]: Real-Time Transport Protocol (RTP/RTCP): Redundant Audio Data Extensions

Multimedia Protocols. Foreleser: Carsten Griwodz Mai INF-3190: Multimedia Protocols

Internet Streaming Media Alliance Hyperlinked Video Specification Version 1.0 September 2006

SDP Capability Negotiation

Network Working Group. Intended status: Standards Track Columbia U. Expires: March 5, 2009 September 1, 2008

RTP. Prof. C. Noronha RTP. Real-Time Transport Protocol RFC 1889

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 1: Systems

R&D White Paper WHP 087. A quantitive comparison of TV-Anytime and DVB-SI. Research & Development BRITISH BROADCASTING CORPORATION.

MISB EG Motion Imagery Standards Board Engineering Guideline. 24 April Delivery of Low Bandwidth Motion Imagery. 1 Scope.

Streaming Technologies Delivering Multimedia into the Future. May 2014

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

SCALABLE HYBRID VIDEO CODERS WITH DOUBLE MOTION COMPENSATION

over the Internet Tihao Chiang { Ya-Qin Zhang k enormous interests from both industry and academia.

draft-ietf-avt-rtp-mime-02.txt P. Hoschka W3C/INRIA/MIT March 10, 2000 Expires: September 10, 2000 MIME Type Registration of RTP Payload Formats

UNIVERSAL multimedia access (UMA) has become the

RTP Payload for Redundant Audio Data. Status of this Memo

FRACTAL COMPRESSION USAGE FOR I FRAMES IN MPEG4 I MPEG4

Enhancing Interoperability via Generic Multimedia Syntax Translation

VoIP. ALLPPT.com _ Free PowerPoint Templates, Diagrams and Charts

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased

MEDIA TRANSPORT USING RTP

Week 14. Video Compression. Ref: Fundamentals of Multimedia

MPEG-21 SESSION MOBILITY FOR HETEROGENEOUS DEVICES

Video Redundancy Coding in H.263+ Stephan Wenger Technische Universität Berlin

Audio/Video Transport Working Group. Document: draft-miyazaki-avt-rtp-selret-01.txt. RTP Payload Format to Enable Multiple Selective Retransmissions

Internet Engineering Task Force (IETF) Request for Comments: 5725 Category: Standards Track ISSN: February 2010

RTP for Application Sharing Payload Format Extensions

A Packet-Based Caching Proxy with Loss Recovery for Video Streaming

A Method on Multimedia Service Traffic Monitoring and Analysis 1

[MS-RTPRAD-Diff]: Real-Time Transport Protocol (RTP/RTCP): Redundant Audio Data Extensions

Internet Engineering Task Force (IETF) Request for Comments: 7198 Category: Standards Track. April 2014

Lecture 3 Image and Video (MPEG) Coding

Internet Engineering Task Force (IETF) Category: Informational August 2012 ISSN:

DRAFT. Encapsulation of Dirac Video content and time code markers in ISO/IEC Transport Streams

VIDEO COMPRESSION STANDARDS

Latest Technology for Video-Streaming Gateway of M-stage V Live

How to achieve low latency audio/video streaming over IP network?

draft-ietf-avt-rtp-mime-04.txt P. Hoschka W3C/INRIA/MIT March 2, 2001 Expires: August 2, 2001 MIME Type Registration of RTP Payload Formats

Request for Comments: 3016 Category: Standards Track NEC S. Fukunaga Oki Y. Matsui Matsushita H. Kimata NTT November 2000

ISO/IEC Information technology Multimedia content description interface Part 7: Conformance testing

Comparison of Shaping and Buffering for Video Transmission

A Survey of Adaptive Layered Video Multicast using MPEG-2 Streams

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES

Signaling layered coding structures and the SVC payload format

Video Quality Monitoring

ES623 Networked Embedded Systems

RTP model.txt 5/8/2011

Fast Region-of-Interest Transcoding for JPEG 2000 Images

The difference between TTC JT-Y1221 and ITU-T Y.1221

MPEG: It s Need, Evolution and Processing Methods

1 INTRODUCTION CE context and Previous works DESCRIPTION OF THE CE... 3

Adaptation of Scalable Video Coding to Packet Loss and its Performance Analysis

A Transport Infrastructure Supporting Real Time Interactive MPEG-4 Client-Server Applications over IP Networks

MULTIMEDIA ADAPTATION FOR DYNAMIC ENVIRONMENTS

Surveillance System with Mega-Pixel Scalable Transcoder

ANSI/SCTE

4 rd class Department of Network College of IT- University of Babylon

Transport protocols Introduction

NextSharePC: An Open-Source BitTorrent-based P2P Client Supporting SVC

Lecture 27 DASH (Dynamic Adaptive Streaming over HTTP)

Watching the Olympics live over the Internet?

MPEG's Dynamic Adaptive Streaming over HTTP - An Enabling Standard for Internet TV. Thomas Stockhammer Qualcomm Incorporated

Compressed-Domain Video Processing for Adaptation, Encryption, and Authentication

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 12: ISO base media file format

Reflections on Security Options for the Real-time Transport Protocol Framework. Colin Perkins

MISB ST STANDARD. MPEG-2 Transport Stream for Class 1/Class 2 Motion Imagery, Audio and Metadata. 27 October Scope.

Module 10 MULTIMEDIA SYNCHRONIZATION

Transcription:

GENERIC STREAMING OF MULTIMEDIA CONTENT Michael Ransburg and Hermann Hellwagner Department of Information Technology, University Klagenfurt, Austria {michael.ransburg, hermann.hellwagner}@itec.uni-klu.ac.at ABSTRACT The growing demand for multimedia information by different types of users equipped with a large variety of devices and connecting through different kinds of networks results in an increasing amount of different multimedia formats. Research is currently concentrating on the adaptation of the contents in order to provide Universal Multimedia Access (UMA) for the content consumer. But this does not solve the problem of the content provider, who still has to signal this variety of different multimedia formats to the consumer. In this contribution, we show a way to stream any type of multimedia format based on generic hint information. This hint information is based on a generic bit stream syntax description (gbsd) which is used for formatindependent content adaptation within the MPEG-21 Multimedia Framework. Ultimately, this can lead to a framework which allows generic streaming and generic adaptation anywhere in the network. KEY WORDS Streaming, Metadata, Multimedia, MPEG-21, XML and Digital Item Adaptation. 1. Introduction The demand for information increases steadily. While it was still a novelty to read e-mail and surf the Internet from a personal computer at home a few years ago, today people access the Internet from their mobile phones and even watch the live stream of the current match of their favourite soccer team. The variety of Internet-capable devices grows steadily, ranging from mobile phones integrated in a wristwatch to full blown personal computers with huge amounts of processing power and impressive display capabilities. This appetite for information and variety of devices leads to some new challenges for multimedia content providers. The challenge is to provide any content on every device in the best available quality. Sometimes it is sufficient to precompute streams of different quality for a given content and then deliver the stream which is most adequate to the current user, network, and device. However, such stream selection and switching mechanisms quickly become inefficient for a large variety of user preferences, network conditions, and device capabilities. In this case, on-the-fly adaptation is more adequate. Because of the large variety of different contents, the adaptation itself should be generic, i.e., there is one adaptation engine which is able to adapt any type of content. The descriptive metadata necessary to enable such an adaptation engine has been specified in Part 7 of the MPEG-21 Multimedia Framework, the Digital Item Adaptation (DIA) standard [1][2]. To enable generic adaptation within the MPEG-21 Multimedia Framework, a (generic) Bitstream Syntax Description (gbsd) in XML format is used [2][3]. This codec-independent description of the bitstream enables the adaptation of the bitstream without further knowledge about the bitstream itself and can be generated automatically during the encoding of content. In a first step, an adaptation decision is computed based on the usage context and a description of all possible adaptation options and trade-offs (called Adaptation QoS). Then, the gbsd of the bitstream is transformed in order to reflect the adaptation decision; e.g., all parts of the gbsd which describe a special type of access units, such as B-VOPs [4], are removed. Afterwards, the bitstream itself is adapted based on the transformed description and the address references in the gbsd are updated since the location of the different parts of the bitstream (e.g., VOPs) changed due to the adaptation. However, in addition to the adaptation of contents, there is a second important factor in enabling Universal Multimedia Access, namely the delivery of the variety of different multimedia formats to the consumer. Given the possible constraints from the network and the device, this must happen in a streaming scenario using, e.g., the Real Time Transport Protocol (RTP) [5]. Fragmentation and timing information is needed for this. For streaming, the content is fragmented into access units. An access unit (AU) [6] is the smallest data entity to which timing information is attributed. For example, an AU of a video is usually a frame. An access unit may be further divided into packets which form a so called packet group. The important thing is that all packets within such a packet group have the same time stamp, e.g., the composition timestamp (CTS) in case of MPEG-4 visual elementary streams (VES). This timing information is needed by the streaming server to determine how fast the AUs need to be streamed into the network. 462-083 324

In order to fragment content into access units, contentspecific knowledge is needed. Usually this information is either hard-coded into the streaming server, supplied through hint files, or encoded into the content, e.g., as a hint track. One example of such a hint file is the NHNT which is used in the MPEG-4 reference software [7]. These hint files are usually specific to a certain type of content, i.e., a streaming server needs to be able to understand different types of hint files and/or hint tracks in order to service different types of content. This is one of the reasons why streaming servers usually only service one or very few different types of content. In this paper, we introduce a generic streaming concept which enables a server to stream any type of content, given an extended gbsd. We will show the necessary extensions and limitations to the gbsd and present initial evaluation experiments which have been executed for different types of content. 2. Generic Streaming Server Concept A generic streaming server is proposed to use an extended version of the gbsd as a hint file. Based on the information in the gbsd, the server can fragment the content into meaningful chunks (access units) which can then be streamed into the network. Additionally, the gbsd itself can be fragmented as well, in such a way that each gbsd fragment describes one chunk (access unit) of the content [8]. This allows each chunk of the content to be generically adapted on its own and therefore also facilitates generic adaptation in the network in a streaming scenario. Ideally, the gbsd for any content which is to be distributed in a multimedia system is produced during content creation. Thus, such a description is created only once and at the place where all necessary knowledge (encoding parameters of the content) must be available, since it is needed for encoding purposes. When a generic streaming server is asked to deliver such content, it will parse the gbsd which comes with the content. This gbsd includes fragmentation information (i.e., how to fragment the content into access units) and timing information (i.e., how fast to stream the access units into the network). Environmental (context) information which is relevant to the streaming server, such as the maximum transmission unit (MTU) for packetization of the access units, can be supplied using XML descriptors, similarly to what is done in [1] for a generic adaptation engine. In the following, different types of content are analyzed in order to evaluate the feasibility of a generic streaming server as introduced above. The content formats considered include MPEG-4 visual elementary streams (VES) [4], MPEG-4 Bit Sliced Arithmetic Coding (BSAC) audio streams [9], and EZBC video streams [10]. Necessary extensions to the gbsd are introduced in order to carry fragmentation and timing information. Constraints to the gbsd are introduced as well. 2.1 Sample Content Formats and Their gbsds Figure 1 shows the gbsd of an MPEG-4 visual elementary stream (VES) 1. This type of stream only allows temporal adaptation (i.e., skipping of access units - VOPs in this case) and therefore its corresponding gbsd is very simple. Each gbsdunit in this gbsd describes one access unit of the bitstream, and the syntacticallabel indicates the type of VOP. <dia:dia><dia:descriptionmetadata><dia:classificationschemealias alias="mv4" href="urn:mpeg:mpeg4:video:cs:syntacticallabels"/> </dia:descriptionmetadata><dia:description xsi:type="gbsdtype" id="akiyo_gbsd" bs1:bitstreamuri="akiyo.mpg4"> <gbsdunit syntacticallabel=":mv4:vo" start="0" length="18"/> <gbsdunit start="18" length="5830" marker="violence-0"> <gbsdunit syntacticallabel=":mv4:i_vop" start="18" length="4641" marker="temporal-0"/> <gbsdunit syntacticallabel=":mv4:p_vop" start="4659" length="98" marker="temporal-1"/> <gbsdunit syntacticallabel=":mv4:b_vop" start="4757" length="16" marker="temporal-2"/> <gbsdunit syntacticallabel=":mv4:b_vop" start="4773" length="23" marker="temporal-3"/> <gbsdunit syntacticallabel=":mv4:b_vop" start="5780" length="68" marker="temporal-3"/> <gbsdunit start="5848" length="9244" marker="violence-1"> <gbsdunit syntacticallabel=":mv4:i_vop" start="5848" length="4670" marker="temporal-0"/> Figure 1: VES gbsd Fragment Figure 2 shows the gbsd of a BSAC audio stream which supports fine granular quality adaptation. Therefore the gbsd shows more details of the bitstream. Each access unit is described by a nested gbsdunit which carries the :MA4:BSAC:BSAC_frame_element syntacticallabel attribute. Within this gbsdunit one can find the description of 48 different quality layers, identified by the markers L:1 to L:48. Depending on the given constraints, an adaptation engine can decide how many quality layers to skip. <dia:dia><dia:descriptionmetadata><dia:classificationschemealias alias="ma4" href="urn:mpeg:mpeg4:audio:cs:syntacticallabels"/> </dia:descriptionmetadata><dia:description xsi:type="gbsdtype" id="bsac_gbsd" bs1:bitstreamuri=" bsac_22khz_101.raw "> <gbsdunit syntacticallabel=":ma4:bsac:bsac_frame_element" start="0" length="1576"> <Parameter start="0" length="11" marker="offset"> <Value xsi:type="bt:b11">197</value> <gbsdunit start="11" length="5"/> <Parameter start="16" length="6" marker="update"> 1 It must be noted that, in all XML documents, the namespace declarations are omitted for the sake of brevity. 325

<Value xsi:type="bt:b6">48</value> <gbsdunit start="22" length="370"/> <gbsdunit start="392" length="1184" marker="bitrate"> <gbsdunit start="392" length="24" marker="l:1"/> <gbsdunit start="416" length="24" marker="l:2"/> <gbsdunit start="1456" length="24" marker="l:47"/> <gbsdunit start="1480" length="96" marker="l:48"/> Figure 2: BSAC gbsd Fragment Figure 3 shows the gbsd of an EZBC video stream. This sort of bitstream offers three layers of scalability (temporal, spatial, and quality) and therefore its gbsd is quite detailed. The different layers are identified through the marker attribute. Each temporal layer includes six spatial layers. Each of the spatial layers includes five quality layers. A marker with the value T:0 S:1 Q:3, for example, indicates that this is the fourth quality layer of the second spatial layer of the first frame of the current GOP. Here an access unit is described by several gbsdunits. Each gbsdunit with the same timestamp, e.g., T:0 in the example below, belongs to the same access unit (in the current GOP). <dia:dia><dia:description xsi:type="gbsdtype" id="foreman_gbsd" bs1:bitstreamuri="foreman.bit"> <gbsdunit length="160" start="0"/> <gbsdunit length="414586" marker="gop-0" start="160"> <Parameter addressmode="consecutive" length="1" name=":mcezbc:ntemporal"> <Value xsi:type="xsd:unsignedbyte">5</value> <Parameter addressmode="consecutive" length="1" name=":mcezbc:nspatial"> <Value xsi:type="xsd:unsignedbyte">6</value> <gbsdunit addressmode="consecutive" length="4"/> <gbsdunit length="123" marker="t:0 S:0 Q:0" start="166"> length="1" name=":mcezbc:flag"> <Value xsi:type="b1">1</value> length="31" name=":mcezbc:spatiotemporallen"> <Value xsi:type="xsd:unsignedlong">169</value> <gbsdunit addressmode="consecutive" length="119"/> <gbsdunit length="13" marker="t:0 S:0 Q:1" start="289"/> <gbsdunit length="12" marker="t:0 S:0 Q:2" start="302"/> <gbsdunit length="13" marker="t:0 S:0 Q:3" start="314"/> <gbsdunit length="12" marker="t:0 S:0 Q:4" start="327"/> <gbsdunit length="3807" marker="t:0 S:5 Q:0" start="26606"> length="1" name=":mcezbc:flag"> <Value xsi:type="b1">1</value> length="31" name=":mcezbc:spatiotemporallen"> <Value xsi:type="xsd:unsignedlong">43918</value> <gbsdunit addressmode="consecutive" length="3803"/> <gbsdunit length="4953" marker="t:0 S:5 Q:1" start="30413"/> <gbsdunit length="8396" marker="t:0 S:5 Q:2" start="35366"/> <gbsdunit length="11965" marker="t:0 S:5 Q:3" start="43762"/> <gbsdunit length="14801" marker="t:0 S:5 Q:4" start="55727"/> Figure 3: EZBC gbsd Fragment 2.2 Fragmentation of the Content After introducing the sample gbsds, we will now take a look at the necessary extensions in order to use these gbsds in a generic streaming server. A streaming server needs to know about two fundamental things in order to stream content. First, it needs to know how to fragment the content into meaningful pieces, which may then be further packetized according to the MTU and streamed into the network. Second, it needs to know how fast to send these packets. The streaming server needs to know where an access unit starts and where it ends. The gbsd already offers the means to describe the start and the end of such access units through the syntacticallabel and marker attributes as explained above. As the start and the end of access units is content specific and as we would like to avoid the need for content specific knowledge within the generic streaming server, we propose to mark the gbsd accordingly during the content creation phase. Figure 4 shows the BSAC version of such a marked gbsd. The only difference to an unmarked gbsd is the additional gbsdunit with the access_unit marker which encapsulates each access unit description in this gbsd. This can be done in a similar way for the other types of content introduced in this paper. The generic streaming server then does not need content specific knowledge anymore in order to find out how to fragment the content, it just needs to be able to parse and interpret this extended gbsd. <dia:dia><dia:descriptionmetadata><dia:classificationschemealias alias="mv4" href="urn:mpeg:mpeg4:video:cs:syntacticallabels"/> </dia:descriptionmetadata> <dia:description xsi:type="gbsdtype" id="akiyo_gbsd" bs1:bitstreamuri="akiyo.mpg4"> <gbsdunit marker="access_unit" start="0" length="1576"> <gbsdunit syntacticallabel=":ma4:bsac:bsac_frame_element" start="0" length="1576"> Figure 4: Marked BSAC gbsd Fragment 2.3 Timing of the Content Packets In addition to the fragmentation of the content, the timing information, which indicates when to send a content packet, needs to be signalled through the gbsd. Usually this is given by an access units per second value, e.g., frames per second. A simple solution would be to include 326

this value in the header section of the gbsd. As soon as temporal adaptation is taken into consideration, however, a more flexible solution must be devised. In this case, it would be better to include a timestamp into every access unit description. Thus, in case of temporal adaptation, this timestamp can be adjusted to compensate for the adaptation in time. We therefore suggest including the timestamp into the marker attribute as well. Commonly, if there are several marker values for one gbsdunit, they are separated by spaces. Therefore, Figure 5 shows a marked gbsd including fragmentation and timing information (in milliseconds). <gbsdunit marker="access_unit 21" start="0" length="1576"> Figure 5: Marked gbsd Fragment with Fragmentation and Timing Information 2.4 Constraints on the gbsd The only constraint is that the gbsd must be detailed enough to describe the access units of a bitstream. There might, for example, be a gbsd which only describes that there are four different scenes with specific semantic properties, e.g., violence levels. This gbsd would clearly not be detailed enough for a streaming use case. resource specific knowledge. Such a generic streaming service could stream any type of resource, given that it comes with an extended gbsd. In addition to the generic streaming service, our approach enables generic distributed adaptation. Figure 6 illustrates a possible use case for a distributed adaptation scenario. Imagine an Internet-based pay-per-view provider, which streams the same movie continuously (e.g., for an entire week) on the same channel. For a fee, users can join this channel for the duration of the movie. After the payment (which is not covered in the figure), the user sends a request to join the channel to the server. The server then selects the most appropriate adaptation node (e.g., based on a location descriptor included in the user s request) and forwards the request to the adaptation node. The adaptation node then responds to the request and the user sends context information to the adaptation node (e.g., terminal capabilities and current network conditions). The adaptation node then compares this context information with the context information of all other users which it serves on this channel and decides whether the quality of the stream which it receives from the server is sufficient or not. If not, an improved quality is requested from the server. The content is adapted for every user on the adaptation node in a generic way, based to the gbsd fragments being streamed alongside the resource. A solution to that situation would be to enrich the gbsd by exactly the access_unit gbsdunits required to convey the fragmentation and timing information for generic stream-out, using a program that is aware of the specific content format and that extracts this information by parsing the media bitstream. 2.5 Streaming of the Content After the fragmentation of the content, based on the marked gbsd, the gbsd itself is fragmented as well. Every access unit of the content is streamed in synchronization with the appropriate fragment of the gbsd. The RTP timestamp is used to perform the synchronization. This enables generic adaptation in the network. 3. Use Cases One possible application for a generic streaming server is to act as a service provider to content creators. Content creators can negotiate the distribution of their content with the generic streaming service. Once an agreement has been reached, they provide their content together with a marked gbsd to the generic streaming service. The generic streaming service can then fulfil stream requests from the content creator s consumers without needing any Figure 6: Distributed Adaptation Scenario 4. Implementation The implementation is done using Visual C++ 6.0 on a Windows XP (SP2) platform. Libxml2 [11] is used for XML handling. Two different tools have been implemented so far. The gbsdmarker marks a gbsd in accordance to the extensions listed above. It takes an unmarked gbsd and configuration parameters as inputs. The configuration parameters indicate where to include access unit markers in the gbsd and they offer timing information for the bitstream. The marking information is resource-specific, 327

for VES the parameter might for example indicate to mark the gbsd based on the syntacticallabel attribute, by monitoring its value for VOP declarations. The output of this step is a marked gbsd. An example of such a gbsd fragment can be found in Figure 5. This marking is an offline process which can happen during content creation. The gbsdreader fragments the content into access units based on the marked gbsd. Additionally, the gbsd itself is fragmented into separate XML documents where each document describes exactly one access unit. These gbsd fragments are independent XML documents including full namespace declarations. Each fragment is stored in memory as a DOM structure and can be made available to a packetizer component which takes care of the packetization of gbsd and resource fragments. The fragmentation of the gbsd facilitates the streaming of the gbsd fragments in synchronization with the corresponding content AUs and, further on, generic adaptation in the network. 5. Results The implementation shows that it is possible to extend the gbsd in order to use it as a hint file for a generic streaming server. In this section, a quantitative evaluation of our implementation will be presented. Additionally we will compare our approach to the NHNT hint format which is used in the MPEG-4 reference software. All tests are performed on a Dell Latitude D600 Notebook with an Intel Pentium M 1.8 GHz processor and 512 MB of RAM. Time measurements were performed using the QueryPerformance methods which offer a resolution of 1/(3.57955 10 6 ) seconds on the test machine. Compression was performed using the MPEG-7 Systems (BiM) reference software [12] with ZLib enabled. BiM encodes XML documents into a binary format [13]. These binary XML documents can still be interpreted without having to decode them. For a comparison of different XML compression techniques, please consult [14]. The sample data used consists of an ECBZ gbsd describing 1360 access units, a BSAC gbsd describing 1275 access units, a VES gbsd describing 597 access units, and the corresponding content files. 5.1 Size Overhead Compared to Streaming without Metadata The size overhead depends on the complexity of the gbsd which depends on the scalability of the content. The gbsd of highly scalable content, such as EZBC, will be much more complex than the gbsd of content with limited scalability, such as VES. In the following, we evaluate the overhead introduced by fragmenting and streaming the gbsd in synchronization with the resource, e.g., for distributed adaptation. We use a frame (AU) rate of 20 for ECBZ and VES. For BSAC a frame (AU) rate of 25 is suggested and is therefore used. gbsd Size [kb] Streaming Overhead [kbps] XML Text Compressed XML Text Compressed BSAC 5018 470 787.14 73.76 ECBZ 4014 604 472.24 71.04 VES 99 9 26.53 2.40 Table 1: gbsd Overhead in a Distributed Adaptation Scenario Table 1 shows the results of this evaluation. One can see that the overhead for the more scalable contents is higher than the overhead for the less scalable VES content. Ultimately, one has to weigh the possible bandwidth benefit of the distributed adaptation against the overhead which is introduced by streaming the gbsd fragments, in order to decide whether the approach is feasible or not. 5.2 Size Overhead in gbsd due to Special Marking An additional gbsdunit node is included in order to mark the access unit descriptions in the gbsd. On average (depending on marker, start, and length values), this node accounts for about 69 bytes (uncompressed) per access unit. While this overhead is almost negligible for the more scalable contents, it is considerable for VES. Therefore, an optimization of the marking process could be considered, where no additional node is introduced, but the value is added to the marker attribute of the given gbsdunit. 5.3 Performance of Marking The marking of the uncompressed gbsd usually happens offline, ideally during the content creation process. Given this, the time needed for the marking process is not critical. There are, however, use cases, e.g., live streaming, where the content creation is part of the online scenario. In such a case, the marking of the gbsd can become time-critical. The following measurements of the marking process include opening the unmarked gbsd and storing the marked gbsd as a separate file. Size [kb] # AUs Marking Time [ms] Time / AU [ms] BSAC 5018 1275 750 0.59 ECBZ 4014 1360 1300 0.96 VES 99 597 31 0.052 Table 2: Marking Performance Table 2 shows the results of this evaluation. It shows that the marking of an AU takes less than a millisecond on average and can therefore also be used in a live scenario. 328

5.4 Performance of Fragmentation The content is fragmented into access units based on the marked, uncompressed gbsd. Additionally, the gbsd is fragmented into gbsd fragments each of which describes one content access unit. The following measurements of the combined gbsd and content fragmentation process include opening the marked gbsd, providing a document pointer to each individual gbsd fragment, and a buffer pointer to each individual content access unit. gbsd Size [kb] Content Size [kb] Time / AU [ms] BSAC 5018 285 2.2 ECBZ 4014 12791 1.7 VES 99 384 0.6 Table 3: Fragmentation Performance Based on the results shown in Table 3, about 450 BSAC access units could be processed every second. The RTP packetization and signalling will certainly decrease this performance; one has to note, however, that the current implementation is not optimized. 5.5 Comparison of NHNT and gbsd In this section, the size of an NHNT hint file is compared to a gbsd which only contains streaming information and to a gbsd which also contains scalability information of the content. In order to be able to compare the gbsd to the NHNT, the gbsd was created by reducing it to the same level of information that the NHNT provides. In this hint gbsd, all scalability information is removed. It contains the file reference, type of content, start, and end value of each AU, and the type of AU, e.g., B-, P-, or I-VOP for a VES. This hint gbsd was compressed as described in Section 5 in order to provide a fair comparison to the binary NHNT file. NHNT File [kb] Hint gbsd [kb] Complete gbsd [kb] BSAC 20 6 470 ECBZ 21 9 604 VES 10 3 9 Table 4: Comparison of gbsd and NHNT Sizes Table 4 shows the results of this comparison. While the hint gbsd and the NHNT hold comparable information, the hint gbsd is always less than half the size of the NHNT. For the complete gbsd, which also contains scalability information, the size is even more relevant, as it is signalled over the network in order to enable distributed adaptation. Here it is notable that the VES gbsd is still smaller than the NHNT file, while the gbsds of the more scalable contents are considerably larger. 5. Conclusion and Further Work In this contribution, we introduced a generic streaming server concept and described the motivation and applications for such a service. We evaluated the current implementation and compared the generic hint information for streaming with another hint file format. Our results so far show the usability of an extended gbsd to convey hint information for streaming. The combination of hint information and bit stream description leads to the possibility of generic streaming and generic adaptation anywhere in the network. This enables a content delivery framework where any type of content which is described by such a gbsd can be signalled and adapted without the need for a variety of content-specific tools. As a conclusion of the evaluation, we will extend the current implementation to a complete generic streaming server. This includes the mapping of the gbsd fragments to an appropriate transport format in order to remove unnecessary overhead. For this, compressed TeM or BiM [13] are considered. The next step is the RTP packetization of the content AUs and gbsd fragments. Separate RTP streams which are synchronized by the RTP timestamp are considered for this. This would provide the possibility to treat the metadata stream in a different way than the content stream, e.g., to protect it with a different mechanism. The generic streaming server will also have to handle certain configuration/context information from the outside world. Examples include the MTU of the network(s) it services, the protocol to be used for content and session management (e.g., RTSP and SDP) and streaming (e.g., RTP), and specific characteristics of the payload format which are not already covered by the existing protocols. The latter two might be signalled by a streaming client, in order to inform the server of supported ways to stream the content. There should be an interoperable way to inform a streaming server of such constraints, therefore further work will be spent on evaluating these constraints. Ultimately, the possible constraints should be specified, e.g., using XML schema, in order to provide an interoperable way of providing them to the streaming server. Existing XML-based languages which are used to convey such information (e.g. the Session Description and Capability Negotiation (SDPng) language [16]) will be evaluated and possibly extended for the purpose of carrying these descriptors. Acknowledgement Part of this work is supported by the EC in the context of the DANAE project (IST-1-507113) [17]. 329

References: [1] A. Vetro, C. Timmerer, S. Devillers (eds.): ISO/IEC 21000-7 FDIS Part 7: Digital Item Adaptation. December 2003, Hawaii, USA. [2] A Vetro, C Timmerer: Overview of the Digital Item Adaptation Standard. To appear in: IEEE Trans. on Multimedia (Special Issue on MPEG-21), Feb. or April 2005. [3] C. Timmerer, G. Panis, H. Kosch, J. Heuer, H. Hellwagner, A. Hutter: Coding format independent multimedia content adaptation using XML. Proc. SPIE Int'l. Symp. ITCom 2003, vol. 5242, Orlando, FL, USA, Sept. 2003. [4] ISO/IEC 14496-2:2004: Information technology Coding of audio-visual objects Part 2: Visual. 2004. [5] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson: RFC 3550 RTP: A Transport Protocol for Real- Time Applications. July 2003. [6] ISO/IEC 14496-1:2001: Information technology Coding of audio-visual objects Part 1: Systems. 2001. [7] ISO/IEC 14496-5:2001: Information technology Coding of audio-visual objects Part 5: Reference software. 2001. [8] S. Devillers, C. Timmerer, J. Heuer, H. Hellwagner: Bitstream Syntax Description-Based Adaptation in Streaming and Constrained Environments. To appear in: IEEE Trans. on Multimedia (Special Issue on MPEG-21), Feb. or April 2005. [9] ISO/IEC 14496-3:2001: Information technology Coding of audio-visual objects Part 3: Audio. 2001. [10] S.-T. Hsiang, J. W. Woods: Embedded image coding using zeroblocks of subband/wavelet coefficients and context modeling. MPEG-4 Workshop and Exhibition at ISCAS 2000, Geneva, Switzerland, May 2000. [11] Libxml2 Website: http://www.xmlsoft.org [12] ISO/IEC 15938-6:2003: Information technology Multimedia content description interface Part 6: Reference software, 2003. [13] ISO/IEC 15938-1:2002: Information technology Multimedia content description interface Part 1: Systems. 2002. [14] C. Timmerer, H. Hellwagner, J. Heuer, C. Seyrat, A. Hutter: BinaryXML A Comparison of Existing XML Compression Techniques. ISO/IEC JTC1/SC29/WG11 M10718,, March 2004, Munich, Germany. [15] G. Panis, A. Hutter, J. Heuer, H. Hellwagner, H. Kosch, C. Timmerer, S. Devillers, M. Amielh: Bitstream Syntax Description: A Tool for Multimedia Resource Adaptation within MPEG-21. EURASIP Signal Processing: Image Communication Journal, vol. 18, no. 8, pp. 721-747, Sept. 2003. [16] D. Kutscher et al., Session description and capability negotiation, IETF Internet-Draft, Work-in-progress: draft-ietf-mmusic-sdpng-07, October 2003 [17] DANAE Website: http://danae.rd.francetelecom.com 330