VoiceAge Corporation 750 Chemin Lucerne, Suite 250 Ville Mont-Royal (Quebec) H3R 2H6 Canada (514) 737-4940 Fax (514) 908-2037 www.voiceage.com Open AMR Initiative Technical Documentation Version 1.0 Revision 2004-07-15 Copyright 2004 VoiceAge Corporation. No part of this manual may be reproduced in any form, written or otherwise, without the express written permission of VoiceAge Corporation.
Table of Contents PACKAGE CONTENTS...3 INPUT/OUTPUT FORMAT...4 DISCONTINUOUS TRANSMISSION (DTX)...6 ABOUT THE ENCODER/DECODER SAMPLE PROGRAMS...7 AMR-NB API FUNCTIONS...8 LIST OF REFERENCED 3GPP AMR SPECIFICATIONS...10 Open AMR Initiative/TD/2004-07-15 2
Open AMR Initiative Technical Documentation VoiceAge AMR is an adaptive multirate narrowband speech coder with eight bit rate modes ranging from 4.75 kbit/s to 12.2 kbit/s and an additional low bit rate background noise mode. The codec includes a voice activity detector, comfort noise generator, and an error concealment mechanism, all of which improve speech quality over lossy transmission mediums. The implementation provided in this package is the AMR floating-point speech encoder and fast fixed-point speech decoder. The encoder produces output that is compatible with the AMR-NB IF2 format. The decoder is bit-exact with 3GPP TS 26.173 [1]. PACKAGE CONTENTS AMR-NB.pdf AMR-NB.lib encoder.c decoder.c interf_enc.h This document. Win32 statically linkable library of AMR-NB floating-point encoder / fixed-point decoder for Pentium and compatible processors. Source code for encoder test program. Source code for decoder test program. Header files needed to compile encoder and decoder test programs. interf_dec.h typedef.h encoder.exe decoder.exe Encoder test program executable. Decoder test program executable. Open AMR Initiative/TD/2004-07-15 3
INPUT/OUTPUT FORMAT Input to the encoder is in 16-bit pulse code modulation (PCM) speech data sampled at 8 khz. The decoder outputs the reconstructed speech data in the same format. Each input speech frame of 20 ms consists of 160 16-bit PCM words containing 14-bit left-aligned uniform samples. The encoder outputs compressed speech data in octet aligned (by using bit stuffing) AMR-NB Interface Format 2, as defined in the 3GPP TS 26.201 [2]. Frame Type (4 bits) Frame Quality Indicator (1 bit) AMR-NB Core speech frame (size depends on bit rate mode) Bit Stuffing (n bits) Frame structure for AMR-NB IF2 An AMR-NB IF2 frame contains a header with the fields Frame Type and Frame Quality Indicator (FQI). The 4-bit Frame Type field identifies the current frame as either an AMR-NB codec mode, comfort noise, lost speech, or empty frame. This is followed by a 1-bit FQI field, which, when equal to zero, indicates a bad or corrupted frame and when equal to one, a good frame. The AMR-NB core frame is the compressed speech data or comfort noise data within a 20ms frame. The size of this data depends on the current AMR-NB codec mode. The last field contains stuffing bits, which are necessary to align the AMR-NB IF2 frame to the next multiple of eight. The following table shows how bits are allocated for each codec mode. Open AMR Initiative/TD/2004-07-15 4
Frame Type Index Bit rate (kbit/s) Frame type bits AMR-NB core bits Padding bits Total bytes per AMR- NB IF2 frame 0 4.75 4 95 5 13 1 5.15 4 103 5 14 2 5.90 4 118 6 16 3 6.70 4 143 6 18 4 7.40 4 148 0 19 5 7.95 4 159 5 21 6 10.2 4 204 0 26 7 12.2 4 244 0 31 8 AMR SID 4 39 5 6 9 GSM-EFR SID 4 39 1 6 10 11 TDMA-EFR SID 4 39 6 6 PDC-EFR SID 4 39 7 6 12-14 (for future use) - - - - 15 no data 4 0 4 1 Total bits used for an AMR-NB core frame *bit rate of comfort noise (FT index 8) is 1.75 kbit/s when assuming continuous transmission Byte MSB bit 8 bit 7 bit 6 bit 5 bit 4 bit 3 bit 2 LSB bit 1 1 Core Frame Bits Frame Type d(0) d(1) d(2) 2 n d(3) d(4) d(5) d(6) d(7) d(8) d(9) Stuffing Bits UB UB UB Bit mapping of an AMR-NB IF2 frame Open AMR Initiative/TD/2004-07-15 5
DISCONTINUOUS TRANSMISSION (DTX) In a typical telephone conversation, voice transmission alternates regularly between both sides, leaving long pauses of silence. These can be more efficiently represented as background noise that is transmitted at a much lower bit rate. The discontinuous transmission mode (also called source controlled rate operation) is used to encode frames that contain only background noise. When operating in DTX mode, a voice activity detector (VAD) on the TX side evaluates whether a 20 ms frame contains any voice data. In the absence of speech, a silence identifier (SID) frame is transmitted, which contains characteristics describing the background noise. On the RX side, a comfort noise generator is used to synthesize background noise based on the SID frame parameters. Open AMR Initiative/TD/2004-07-15 6
ABOUT THE ENCODER/DECODER SAMPLE PROGRAMS The sample programs encoder.c and decoder.c demonstrate how to initialize and call the encoding and decoding processes. Input to the encoder and output from the decoder is in the form of 16-bit PCM words containing 14-bit left-aligned uniform speech samples. Usage of the encoder: encoder (-dtx) mode speech_file bitstream_file -dtx enables discontinous transmission mode. mode specifies encoding at one of the 8 AMR-NB bit rates. modefile filename can be used instead of the mode argument to specify the encoding mode for each frame from a mode control file. This text file should contain one mode number (0-7) per line. mode: 0 1 2 3 4 5 6 7 bit rate (kbps) 4.75 5.15 5.90 6.70 7.40 7.95 10.20 12.20 Usage of the decoder: decoder bitstream_file synth_file To build the speech encoder or decoder sample programs, compile the file encoder.c (or decoder.c). Link this object file to the codec static AMR-NB library. Open AMR Initiative/TD/2004-07-15 7
AMR-NB API FUNCTIONS E_IF_init Allocates and initializes encoder state memory. Syntax #include " interf_enc.h " void * E_IF_init (dtx); Arguments dtx : dtx = 1 to enable discontinuous transmission Returned value void * : Pointer to state memory used by the encoder E_IF_encode Encodes one frame of speech data into a byte-aligned IF2 compatible packed data stream. Syntax #include " interf_enc.h " int E_IF_encode (Word16 mode, Word16 *speech, Uword8 *serial); Arguments mode : encoding mode at one of 8 AMR-NB bit rates (0-7) speech : Input buffer containing one frame of speech samples serial : Output buffer containing compressed data Returned value Number of bytes written to output buffer E_IF_exit Frees encoder state Syntax #include " interf_enc.h " Arguments void E_IF_exit (); Returned value none Open AMR Initiative/TD/2004-07-15 8
D_IF_init Allocates and initializes decoder state memory. Syntax #include " interf_dec.h " void * D_IF_init (void); Arguments none Returned value void * : Pointer to state memory used by the decoder D_IF_decode Decodes one compressed speech frame. Syntax #include " interf_dec.h " void D_IF_Decode (Uword8 *bits, Word16 *synth); Arguments bits : Input buffer containing compressed data from encoder synth : Output buffer containing one frame of decoded speech samples Returned value none D_IF_exit Frees decoder state memory Syntax #include " interf_dec.h " Arguments void D_IF_exit (); Returned value none Open AMR Initiative/TD/2004-07-15 9
LIST OF REFERENCED 3GPP AMR SPECIFICATIONS [1] 3GPP TS 26.071: AMR speech Codec; General description. [2] 3GPP TS 26.101: AMR Narrowband Speech Codec; Frame Structure. Open AMR Initiative/TD/2004-07-15 10