1 DECE Comments on ISO/IEC AMD 2, and Future File Format Amendments DECE (Digital Entertainment Content Ecosystem) is a consortium of approximately sixty companies* that are industry leaders in commercial video creation, delivery, and playback. Many of these companies are also active in MPEG, and have adopted MPEG standards to define a video format optimized for internet delivery and interoperable playback on an ever increasing variety of video devices ranging from cell phones, tablets, game machines, and computers, to high definition televisions. DECE used ISO Base Media File Format, AVC video, and AAC audio as the basis to define specific interoperability profiles under the consumer brand name UltraViolet. The following comments primarily consist of portions of the (draft) DECE Media Format Specification, which is in the final approval process within DECE, and will be published later this year. It includes an overview of DECE defined constraints, additions, and modifications of the ISO/IEC ISO Base Media File Format, referred to in DECE as the Common File Format (CFF). The UltraViolet application of ISO Base Media Format was developed over a two year period by a collaborative analysis of use cases prioritized based on consumer and commercial importance, including uses such as download, streaming, super-distribution, local area network playback, playback from removable media, and portable devices, as used by multiple individuals in families and households using a variety of publishers, devices, and content protection technologies. The Common File Format should enable a single encoding (of three different resolutions) to be used by many retailers and service providers using different delivery methods, while enabling consumers simple interoperability between devices comparable to DVD video format. DECE requests that MPEG consider the ISO Base Media File Format extensions included for incorporation in amendments to the MPEG-4 Part 12 standard. DECE will provide members, who are also MPEG members, to participate in MPEG, contribute specific amendment proposals, supply additional technical background, and participate in the standardization process. *At last count the DECE membership includes Adobe, Akamai, Alcatel-Lucent, Ascent Media, Best Buy, BT, CableLabs, Catch Media, CinemaNow, Cineplex Entertainment, Cisco, Comcast, Cox Communications, CSG Systems, Deluxe, DivX, Dolby, DTS, ExtendMedia, FilmFlex, Fox Entertainment, Hewlett- Packard, Huawei Technologies, IBM, Intel, Irdeto, LG Electronics, Liberty Global, Lionsgate, LOVEFiLM, Marvell Semiconductor, Microsoft, MOD Systems, Motorola, Nagravision, NBC Universal, NDS Group, Netflix, Neustar, Nokia, Panasonic, Paramount Pictures, Philips, RIAA, Red Bee Media, Rovi, Saffron Digital, Samsung, Sonic Solutions, Sony, Switch Communications, Tesco, Thomson, Toshiba, Verance, Verimatrix, VeriSign, Warner Brothers, Widevine Technologies, Zoran.
2 DECE Application of ISO/IEC ISO Base Media File Format to UltraViolet Common File Format DECE has specified an application of the MPEG ISO Base Media File Format consumer branded UltraViolet or UVVU, and wishes to inform MPEG and propose inclusion of newly defined boxes in MPEG specifications and amendments, such as N11137, where appropriate. DECE s primary interest is interoperability of internet delivered video, and to that end it plans to publish an openly available media format specification this year that defines specific profiles of AVC video, AAC audio, and the ISO Base Media container partially described in this communication. The contents of this communication are extracted from a DECE specification titled Common File Format & Media Formats Specification (CFF) that specifies the ISO container, audio, video, and subtitle track contents, and descriptive metadata for three Media Profiles intended to enable interchange and playback of internet delivered video on a wide variety of devices. Common Encryption is specified to enable content protection of files that can be played by multiple compliant player DRM systems. The CFF specification is currently in the final approval process. Several new ISO Base Media boxes have been defined, and assigned four character codes with intent to register them with MPEG 4 Registration Authority when the specification is finalized. A version 1 of the stsd Sample Description Box is defined that recognizes Subtitle Tracks as defined in this specification. A variation of the tfdt Track Fragment Base Decode Time Box, currently under consideration in a Part 12 PDAM, is defined and recommended for consideration by MPEG. DECE understands that MPEG may not adopt the changes to tfdt proposed, so DECE will consider naming a different box to support these functions if the final specification of tfdt does not adopt the proposed additions. The intention is to use MPEG specified boxes wherever possible. Delivered on approval of DECE by: Kilroy Hughes Microsoft
3 The Common File Format The following excerpt of the CFF specification provides a high level view of the ISO container, the new box definitions, and their location and use in the container The Common File Format (CFF) is based on an enhancement of the ISO Base Media File Format defined by [ISO]. The principal enhancements to the ISO Base Media File Format are support for multiple DRM technologies in a single container file and separate storage of audio, video, and subtitle samples in track fragments to allow flexible delivery methods (including progressive download and late binding of separately stored tracks) and playback with random access. 1.1 Common File Format The Common File Format is a code point on the ISO Base Media File Format defined by [ISO]. Error: Reference source not found shows the box type, structure, nesting level and crossreferences for the CFF. The media type SHALL be video/vnd.dece.mp4 and the file extension SHALL be either.uvvu or.uvv, as registered with [IANA]. The following boxes are extensions for the Common File Format: ainf : Asset Information Box avcn : AVC NAL Unit Storage Box bloc : Base Location Box pssh : Protection System Specific Header Box stsd : Sample Description Box sthd : Subtitle Media Header Box senc : Sample Encryption Box tenc : Track Encryption Box tfdt : Track Fragment Base Media Decode Time Box trik : Trick Play Box
4 DECE CFF Container (DCC) DCC Header DCC Movie Fragment - 1 DCC Movie Fragment - 2 DCC Movie Fragment - n DCC Footer Figure 2-1 Structure of a DECE CFF Container (DCC)
5 DCC Header File Type Box ( ftyp ) Progressive Download Information Box ( pdin ) Base Location Box ( bloc ) Movie Box ( moov ) Movie Header Box ( mvhd ) Asset Information Box ( ainf ) Object Descriptor Box ( iods ) for DRM-specific Information (IPMP) Metadata Box ( meta ) for DECE required metadata Track Box ( trak ) - 1 Track Box ( trak ) - n Movie Extends Box ( mvex ) Protection System Specific Box ( pssh ) for DRM-specific Information (multiple) Free Space Box ( free ) Media Data Box ( mdat ) for DRM-specific Object Descriptors (IPMP) Mandatory Box Optional Box Figure 2-2 Structure of a CFF Header
6 DCC Movie Fragment Movie Fragment Box ( moof ) Movie Fragment Header Box ( mfhd ) Track Fragment Box ( traf ) Track Fragment Header Box ( tfhd ) Track Fragment Base Media Decode Time Box ( tfdt ) Trick Play Box ( trik ) present for video tracks only Track Fragment Run Box ( trun ) Independent and Disposable Samples Box ( sdtp ) optional for nonvideo AVC NAL Unit Storage Box ( avcn ) not present for non-video tracks Sample Encryption Box ( senc ) Movie Data Box ( mdat ) Movie fragment samples (all of one type) Mandatory Box Optional Box Figure 2-3 DCC Movie Fragment Structure DCC Footer Metadata Box ( meta ) for DECE Optional Metadata Handler Reference Box ( hdlr ) for Common File Metadata XML Box ( xml ) for Optional Metadata Movie Fragment Random Access Box ( mfra ) Track Fragment Random Access Box ( tfra ) 1 Track Fragment Random Access Box ( tfra ) n Movie Fragment Random Access Offset Box ( mfro ) Mandatory Box Optional Box Figure 2-4 Structure of a DCC Footer
7 1.1.1 Protection System Specific Header Box ( pssh ) Box Type Container Mandator y Quantity pssh Movie Box ( moov ) or Movie Fragment Box ( moof ) No Any number The Protection System Specific Header Box contains data specific to the content protection system it represents. Typically this may include but is not limited to license server URLs, list of key identifiers used by the file, and embedded licenses in a format specified by each protection system. A single DECE CFF Container MAY contain zero, one, or multiple different Protection System Specific Header Boxes. For instance, there could be one for DRM A specific data and one for DRM B specific data. There SHALL be only one Protection System Specific Header Box for any particular content protection system, which SHALL interpret and control the contents of its Protection System Specific Header Box Syntax aligned(8) class ProtectionSystemSpecificHeaderBox extends FullBox( pssh, version=0, flags=0) UUID (8) SystemID; unsigned int(32) DataSize; unsigned int(8)[datasize] Data; Semantics SystemID specifies a UUID that uniquely identifies the content protection system that this header belongs to. DECE approved Protection Systems and SystemID values are specified in [DSystem]. DataSize specifies the size in bytes of the Data member. Data holds the content protection system specific data. This data structure MAY be defined by each Protection System, is in general opaque to DECE and is not constrained by this specification.
8 1.1.2 AVC NAL Unit Storage Box ( avcn ) Box Type avcn Container Track Fragment Box ( traf ) Mandator No y Quantity Zero, or one in every AVC track fragment in a file An AVC NAL Unit Storage Box SHALL contain an AVCDecoderConfigurationRecord, as defined in section of [ISOAVC] Syntax aligned(8) class AVCNALBox extends Box( avcn ) AVCDecoderConfigurationRecord() AVCConfig; Semantics AVCConfig SHALL contain sufficient sequenceparametersetnalunit and pictureparametersetnalunit entries to describe the configurations of all samples referenced by the current track fragment. Note: AVCDecoderConfigurationRecord contains a table of each unique Sequence Parameter Set NAL unit and Picture Parameter Set NAL unit referenced by AVC Slice NAL Units contained in samples in this track fragment, sequenced in order of sample composition time. As defined in [ISOAVC] Section semantics: sequenceparametersetnalunit contains a SPS NAL Unit, as specified in [H264]. SPSs shall occur in order of ascending parameter set identifier with gaps being allowed. pictureparametersetnalunit contains a PPS NAL Unit, as specified in [H264]. PPSs shall occur in order of ascending parameter set identifier with gaps being allowed Base Location Box ( bloc ) Box Type Container Mandator y Quantity bloc File Yes One The Base Location Box is a fixed-size box that contains critical information necessary for purchasing and fulfilling licenses for the contents of the CFF. The values found in this box are
9 used to determine the location of the license server and retailer for fulfilling licenses, as defined in Sections and of [DSystem] Syntax aligned(8) class BaseLocationBox extends FullBox( bloc, version=0, flags=0) byte baselocation; byte purchaselocation; // optional byte Reserved; Semantics baselocation SHALL contain the Base Location defined in Section of [DSystem], encoded as a string of ASCII bytes as defined in [ASCII], followed by null bytes (0x00) to a length of 256 bytes. purchaselocation MAY contain the Purchase Location defined in Section of [DSystem], encoded as a string of ASCII bytes as defined in [ASCII], followed by null bytes (0x00) to a length of 256 bytes. If no Purchase Location is included, this field SHALL be filled with null bytes (0x00). Reserved Reserve space for future use. Implementations conformant with this specification SHALL ignore this field Asset Information Box ( ainf ) Box Type Container Mandator y Quantity ainf Movie Box ( moov ) Yes One The Asset Information Box contains required file metadata necessary to identify, license and play the content within the DECE ecosystem Syntax aligned(8) class AssetInformationBox extends FullBox( ainf, version=0, flags=0) int(32) profile_version; string APID; Box other_boxes; // optional
10 Semantics profile_version indicates the Media Profile to which this container file conforms. APID indicates the Asset Physical Identifier (APID) of this container file, as defined in Section Asset Identifiers of [DSystem]. other_boxes Available for private and future use Sample Description Box ( stsd ) Box Type stsd Container Sample Table Box ( stbl ) Mandator Yes y Quantity Exactly one Version 1 Version one (1) of the Sample Description Box defined here extends the version zero (0) definition in Section of [ISO] with the additional support for the handler_type value of subt, which corresponds to the SubtitleSampleEntry() defined here Syntax class SubtitleSampleEntry() extends SampleEntry(codingname) string namespace; string schema_location; // optional string image_mime_type; // required if Subtitle images present 8.5.2) BitRateBox(); // optional (defined in [ISO] aligned(8) class SampleDescriptionBox(unsigned int(32) handler_type) extends FullBox( stsd, version=1, flags=0) int i; unsigned int(32) entry_count; for (i = 1; i <= entry_count; i++) switch (handler_type) case soun : // for audio tracks AudioSampleEntry(); break; case vide : // for video tracks VideoSampleEntry(); break; case hint : // for hint tracks HintSampleEntry(); break; case meta : // for metadata tracks MetadataSampleEntry();
11 break; case subt : // for subtitle tracks SubtitleSampleEntry(); break; Semantics All of the semantics of version zero (0) of this box, as defined in [ISO], apply to this version of the box with the following additional semantics specifically for SubtitleSampleEntry(): namespace gives the namespace of the schema for the subtitle document. This is needed for identifying the type of subtitle document, e.g. SMPTE Timed Text. schema_location optionally provides an URL to find the schema corresponding to the namespace. image_mime_type indicates the media type of any images present in subtitle samples, including images that are embedded in-line in the subtitle document. An empty string indicates that images are not present in the subtitle sample or document. All samples in a track SHALL have the same image_mime_type value. An example of this field is image/png Sample Encryption Box ( senc ) Box Type Container Mandator y Quantity senc Track Fragment Box ( traf ) No (Yes, if tenc is included in track) Zero or one The Sample Encryption Box contains the sample specific encryption data, including the initialization vectors needed for decryption and, optionally, alternative decryption parameters. It is used when the sample data in the fragment might be encrypted. The box is mandatory for a track fragment in a track that contains a Track Encryption Box ( tenc ).
12 Syntax aligned(8) class SampleEncryptionBox extends FullBox( senc, version=0, flags=0) if (flags & 0x000001) unsigned int(24) AlgorithmID; unsigned int(8) IV_size; UUID KID; unsigned int(32) sample_count; unsigned int(iv_size*8) InitializationVector; if (flags & 0x000002) unsigned int(16) subsample_count; unsigned int(16) BytesOfClearData; unsigned int(32) BytesOfEncryptedData; [ subsample_count ] [ sample_count ] Semantics flags is inherited from the FullBox structure. The SampleEncryptionBox currently supports the following flag values: 0x1 OverrideTrackEncryptionBox parameters 0x2 UseSubSampleEncryption If the OverrideTrackEncryptionBox parameters flag is set, then the SampleEncryptionBox specifies the AlgorithmID, IV_size, and KID parameters. If not present, then the default values from the TrackEncryptionBox SHALL be used for this fragment and only the sample_count and InitializationVector vector are present in the Sample Encryption Box. If the UseSubSampleEncryption flag is set, then the track fragment that contains this Sample Encryption Box SHALL use the sub-sample encryption as described in Section Error: Reference source not found. When this flag is set, sub-sample mapping data follows each InitilizationVector. The sub-sample mapping data consists of the number of subsamples for each sample, followed by an array of values describing the number of bytes of clear data and the number of bytes of encrypted data for each sub-sample. AlgorithmID is the identifier of the encryption algorithm used to encrypt the samples in the track fragment. The currently supported algorithms are:
13 0x0 Not Encrypted 0x1 AES 128-bit in CTR mode (AES-CTR) If the AlgorithmID is 0x0 (Not Encrypted), then the key identifier KID SHALL be ignored and SHALL be set to all zeros and the sample_count SHALL be set to 0 (since no initialization vectors are needed). IV_size is the size in bytes of the InitializationVector field. Supported values: 8 Specifies 64-bit initialization vectors 16 Specifies 128-bit initialization vectors KID is a key identifier that uniquely identifies the key needed to decrypt samples referred to by this Sample Encryption Box. This allows the identification of multiple encryption keys per file or track. Unencrypted fragments in an encrypted track SHALL be identified by setting the algorithmid parameter to 0x0 and setting the OverrideTrackEncryptionBox flags bit to 0x1. sample_count is the number of encrypted samples in this track fragment. This value SHALL be either zero (0) or the total number of samples in the track fragment. InitializationVector specifies the initialization vector (IV) needed for decryption of a sample. The n th InitializationVector in the table SHALL be used for the n th sample in the track fragment. For an AlgorithmID of Not Encrypted, no initialization vectors are needed and this table SHALL be omitted. For an AlgorithmID of AES-CTR, if the IV_size field is 16 then InitializationVector specifies the entire 128-bit IV value used as the counter block. If the IV_size field is 8, then its value is copied to bytes 0 to 7 of the counter block and bytes 8 to 15 of the counter block are set to zero. For an AlgorithmID of AES-CTR, counter values SHALL be unique per KID. If an IV_size of 8 is used, then the InitializationVector values for a given KID SHALL be unique for each sample in all tracks and samples must be less than 2 64 blocks in length. If an IV_size of 16 is used, initialization vectors SHALL have large enough numeric differences to prevent duplicate counter values for any encrypted block using the same KID. See Section Error: Reference source not found for further details on how encryption is applied. subsample_count specifies number of sub-sample encryption entries present for this sample.
14 BytesOfClearData specifies number of bytes of clear data at the beginning of this subsample encryption entry. (Note, that this value can be zero if no clear bytes exist for this entry.) BytesOfEncryptedData specifies number of bytes of encrypted data following the clear data. (Note, that this value can be zero if no encrypted bytes exist for this entry.) The sub-sample encryption entries SHALL NOT include an entry with a zero value in both the BytesOfClearData field and in the BytesOfEncryptedData field. The total length of all BytesOfClearData and BytesOfEncryptedData for a sample SHALL equal the length of the sample. Further, it is recommended that the sub-sample encryption entries be as compactly represented as possible. For example, instead of two entries with 15 clear, 0 encrypted, 17 clear, 500 encrypted use one entry of 32 clear, 500 encrypted Track Encryption Box ( tenc ) Box Type Container Mandator y Quantity tenc Scheme Information Box ( schi ) No (Yes, for encrypted tracks) Zero or one The TrackEncryptionBox contains default values for the AlgorithmID, IV_size, and KID for the entire track. These values SHALL be used as the encryption parameters for this track unless overridden by a SampleEncryptionBox with the OverrideTrackEncryptionBox parameter flag set. For files with only one key per track, this box allows the basic encryption parameters to be specified once per track instead of being repeated in each fragment. Note that the TrackEncryptionBox is mandatory for encrypted tracks Syntax aligned(8) class TrackEncryptionBox extends FullBox( tenc, version=0, flags=0) unsigned int(24) default_algorithmid; unsigned int(8) default_iv_size; UUID default_kid; Semantics default_algorithmid is the default encryption algorithm identifier used to encrypt samples in the track. It can be overridden in any track fragment by specifying the OverrideTrackEncryptionBox parameter flag in the Sample Encryption Box. See the AlgorithmID field in the Sample Encryption Box for further details.
15 default_iv_size is the default IV_size. It can be overridden in any track fragment by specifying the OverrideTrackEncryptionBox parameter flag in the Sample Encryption Box. See the IV_size field in the Sample Encryption Box for further details. default_kid is the default key identifier used for this track. It can be overridden in any track fragment by specifying the OverrideTrackEncryptionBox parameter flag in the Sample Encryption Box (see Section 1.1.1). See the KID field in the Sample Encryption Box for further details Track Fragment Base Media Decode Time Box ( tfdt ) Box Type tfdt Container Track Fragment Box ( traf ) Mandator No y Quantity Zero or one Version 1 The Track Fragment Base Media Decode Time Box ( tfdt ), if present, SHALL be positioned after the Track Fragment Header Box ( tfhd ) and before the first Track Fragment Run Box ( trun ) Syntax aligned(8) class TrackFragmentBaseMediaDecodeTimeBox extends FullBox( tfdt, version, flags=0) if (version==1) unsigned int(64) basemediadecodetime; unsigned int(64) trackfragmentduration; else // version==0 unsigned int(32) basemediadecodetime; unsigned int(32) trackfragmentduration; if (flags & 0x000001) unsigned int(32) ntp_timestamp_integer; unsigned int(32) ntp_timestamp_fraction; if (flags & 0x000002) Box other_box(); // optional
16 Semantics flags is inherited from the FullBox structure. The TrackFragmentBaseMediaDecodeTimeBox supports the following values: 0x1 NTP Timestamp present, indicates that the optional NTP timestamp values are set in this box. 0x2 indicates that another box is contained in this tfdt. version is an integer that specifies the version of this box (0 or 1 allowed in this specification). basemediadecodetime is an integer equal to the sum of the decode durations of all earlier samples in the media, expressed in the media's timescale. It does not include the samples added in the enclosing track fragment. trackfragmentduration is a 32-bit or 64-bit integer that indicates the sum of the durations of the samples contained in this track fragment, expressed in the media s timescale. ntp_timestamp_integer is a 32-bit integer that represents the NTP timestamp integer value (seconds component) per [NTPv4]. The reference clock shall be UTC. ntp_timestamp_fraction is a 32-bit integer that represents the NTP timestamp fractional value (sub-second component) per [NTPv4]. other_box Optional storage of one additional box within tfdt Trick Play Box ( trik ) Box Type Container Mandator y Quantity trik Sample Table Box ( stbl ) or Track Fragment Box ( traf ) No Zero or one This box answers three questions about AVC sample dependency: Is this sample independently decodable (i.e. does this sample NOT depend on others)? Can normal-speed playback be started from this sample with full reconstruction of all subsequent pictures in output order? Can this sample be discarded without interfering with the decoding of a known set of other samples? In the absence of this table:
17 1. The sync sample table partially answers the first and second questions, above; in AVC video codec, IDR-pictures are listed as sync points, but there may be additional Random Access I-picture sync points and additional I-pictures that are independently decodable. 2. The dependency of other samples on this one is unknown. 3. The sdtp table, if present, may be used to identify samples that are always disposable, but does not indicate other samples that can additionally be disposed. When performing random access (i.e. starting normal playback at a location within the track), beginning decoding at samples of picture type 1 and 2 ensures that all subsequent pictures in output order will be fully reconstructable. Note: Pictures of type 3 (unconstrained I-picture) may be followed in output order by samples that reference pictures prior to the entry point in decoding order, preventing those pictures following the I-picture from being fully reconstructed if decoding begins at the unconstrained I- picture. When performing trick mode playback, such as fast forward or reverse, it is possible to use the dependency level information to locate independently decodable samples (i.e. I-pictures), as well as pictures that may be discarded without interfering with the decoding of subsets of pictures with lower dependency_level values. If this box appears in a Sample Table Box, then the size of the table, sample_count, is taken from the sample_count in the Sample Size Box ('stsz') or Compact Sample Size Box ( stz2 ) of the stbl that contains it. Alternatively, if this box appears in a Track Fragment Box, then sample_count is taken from the sample_count in the corresponding Track Fragment Run Box ( trun ). If used, the Trick Play Box MAY be present in the Sample Table Box ( stbl ) and SHOULD be present in the Track Fragment Box ( traf ) for all video track fragments in fragmented movie files Syntax aligned(8) class TrickPlayBox extends FullBox( trik, version=0, flags=0) for (i=0; I < sample_count; i++) unsigned int(2) pic_type; unsigned int(6) dependency_level;
18 Semantics pic_type takes one of the following values: 0 The type of this sample is unknown. 1 This sample is an IDR picture. 2 This sample is a Random Access (RA) I-picture, as defined below. 3 This sample is an unconstrained I-picture. dependency_level indicates the level of dependency of this sample, as follows: 0x00 The dependency level of this sample is unknown. 0x01 to 0x3E This sample does not depend on samples with a greater dependency_level values than this one. 0x3F Reserved Random Access (RA) I-Picture A Random Access (RA) I-picture is defined in this specification as an I-picture that is followed in output order by pictures that do not reference pictures that precede the RA I-picture in decoding order, as shown in Figure 2-5. NO OK Display Order Random Access (RA) I-picture
19 Figure 2-5 Example of a Random Access (RA) I picture CFF Constraints on Trick Play Box The Trick Play Box is generally defined as optional and can apply to both fragmented and nonfragmented movie files. The Common File Format, however, defines the following additional requirements: The Trick Play Box ( trik ) SHALL be present in every Track Fragment Box ( traf ) for AVC video tracks in the file.