AMR format
AMRオーディオエンコーダの概要とファイルフォーマット分析
全称Adaptive Multi-Rate、適応マルチレート符号化、主にモバイル機器のオーディオに用いられ、圧縮比は比較的大きいが、他の圧縮フォーマットに比べて品質が悪く、人声、通話に多く用いられるため、効果はやはり良い.
全称Adaptive Multi-Rate、適応マルチレート符号化、主にモバイル機器のオーディオに用いられ、圧縮比は比較的大きいが、他の圧縮フォーマットに比べて品質が悪く、人声、通話に多く用いられるため、効果はやはり良い.一、分類1.AMR:AMR-NBとも称する、以下のWBに対して音声帯域幅範囲:300-3400 Hz、8 KHzサンプリング2.AMR-WB:AMR WideBand、音声帯域幅範囲:50-7000 Hz 16 KHzサンプリング「AMR-WB」は全称「Adaptive Multi-rate-Wideband」、すなわち「適応マルチレートブロードバンド符号化」であり、サンプリング周波数は16 kHzであり、国際標準化機構ITU-Tと3 GPPに同時に採用する広帯域音声符号化規格であり、G 722とも呼ばれる.2標準.AMR−WBは、音声帯域幅が50〜7000 Hzの範囲を提供し、ユーザは、音声が以前よりも自然で、快適で、分解しやすいことを主観的に感じることができる.
.
.
.
五、AMRファイルの記憶形式(RFC 3267):
AMR IF 1,IF 2は、無線伝送用のAMRのフレームフォーマットを定義する.RFC 3267は、AMRデータをファイルに格納するファイルフォーマットを定義する.
AMRのファイル形式は、次の図1に示します.
ファイルヘッダとフレーム1フレームのAMRデータが含まれています.
1. ファイルヘッダフォーマット:
AMRファイルは、モノラルおよびマルチチャネルをサポートします.シングルチャネルとマルチチャネルのファイルヘッダは異なります.
モノラル:
AMR-NBファイルヘッダ:「#!AMR」(or 0 x 321414 d 520 a in hexadecimal)(引用符内の部分)
AMR-WBファイルヘッダ:「#!AMR-WB」(or 0 x 321414 d 522 d 57420 a in hexadecimal).(引用符内)
マルチチャネル:
マルチチャネルのファイルヘッダにはmagic numberと32 bit channle descriptionドメインが含まれています.
AMR-NBのmagic number:"#!AMR_MC 1.0"
(or 0x2321414d525F4D43312E300a in hexadecimal).
AMR-WBのmagic number:"#!AMR-WB_MC 1.0"
(or 0x2321414d522d57425F4D43312E300a in hexadecimal).
32 bitのchannel descriptionドメインの定義は以下の通りです.
ここでreserved bitsは0でなければなりません.CHAN:現在のファイルにいくつかのチャネルが含まれていることを示します.
フレームヘッダのフォーマット:
フレームヘッダのフォーマットは、図2に示すように、1バイト(8 bit)を占める
Pは塗りつぶしを0に設定
FTは符号化モード、すなわち上述した16の符号化モードである.
Qはフレーム品質インジケータであり、0の場合はフレームが破損していることを示す.
【図3】AMR−NB 5.9 Kbitの1フレームのフォーマットを示す図である.
5.9 kbitのフレームの118 bitのデータについては、15*8=120=118+2であるため、最後に2 bitのパディングビットがある.
参考文献:
RFC3267 RTP Payload Format for AMR and AMR-WB
3GPP TS 26.201 V6.0.0
3GPP TS 26.101 V6.0.0+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
http://tools.ietf.org/html/rfc3267
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
AMR format
Each frame can be encoded using one of 8 varying levels of compression using various bitrates(AMR modes 0-7). Following are the various AMR modes and their corresponding bitrates:
Each frame consists of a 1-byte header, then the rest of the frame is audio data. The entire frame is fed into the AMR decoder (header too). The frame size can be deduced from the frame header.
The 2nd bit through 5th bit (counting the most significant bit as the first bit) comprise the CMR (Codec Mode Request), values 0-7 being valid for AMR. The top bit of the CMR can actually be ignored, though it is used when AMR forms RTP payloads. The lower 3-bits of the header are reserved and are not used. Viewing the header from most significant bit to least significant bit, the encoding is XCCCCXXX, where Xs are reserved (typically 0) and the Cs are the CMR.
Frame size of AMR modes in bytes (including the header byte) are shown below:
The above frame specifications and header information applies only for AMR-NB and the frame format might be different for AMR-WB.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
AMR over RTP:
RTP Packet Size 1389
RTP header size 12
--------
1377
-----------
(1377/Frame Size) - 1 = Number of ToC Entries; each frame has ToC entry in RTP
Or just count the number of bytes (ToC entries) to identify the frames available in a RTP packet;
How can we extract the AMR mode information from the RTP packet?
First check the CMR (Codec Mode Request) .
It is having bandwidth. For Each bandwidth, the frame size is fixed;
CMR Mode Frame size (bytes)
0 AMR 4.75 13
1 AMR 5.15 14
2 AMR 5.9 16
3 AMR 6.7 18
4 AMR 7.4 20
5 AMR 7.95 21
6 AMR 10.2 27
7 AMR 12.2 32
Every AMR frame is having 20 ms of audio data;
So if 40 frames are available means
40 * 20 = 800 milli seconds (play time of that frames);
1000 milliseconds = 1 second
From Source Filter, Based on Number of Frames we need to set the Start and Stop timestamps.
AMR over RTP is as follows:
+----------------+-------------------+----------------
| payload header | table of contents | speech data ...
+----------------+-------------------+----------------
Payload header is 4 bits;
First 4 bits are CMR (Codec Mode Request); From Codec Mode Request value, we can identify the frame size;
ToC (Table of contents) -
Each and every frame has an entry for the ToC;
If RTP packet is having 43 audio frames means that much of Toc Bytes must be available.
After F0, Remove the same types of bytes; (this byte indicates the audio bit rate information)
But How can we know the Frame Type?
1011 1100 (BC) 12.2 kbps
1011 0100 (B4) 10.2 kbps
1010 1100 (AC) 7.95 kbps
RTP packet:
---------------
If a terminal has no preference in which mode to receive, it SHOULD set CMR=15 in all its outbound payloads
Each RTP AMR data begins with F0 and then ToC entries like 0xac, 0xbc, 0xb4 as repetitive.
If the RTP packet has N frames, RTP packet is having N number of TOC Entries.
From the TOC Entry we can define the frame size of the audio frame;
TOC Entry will be in the following form:
---------------------------------------------
F (1 bit) | FT (4 bits) | 1 (1 bit)
---------------------------------------------
1 0111 1 00 -12.2 kbps ( 0x BC)
1 0110 1 00 -10.2 kbps ( 0x B4)
1 0101 1 00 - 7.95 kbps ( 0x AC)
------------------------------------------------
FO BC BC BC BC BC BC BC BC BC 3C
After the TOC contents, the First start code acts as a frame header;
3C is the frame header available in an audio frame and every audio frame must begins with 3C;
From the bit rate, we can determine the Frame Size;
CMR Mode Frame size (bytes)
0 AMR 4.75 13
1 AMR 5.15 14
2 AMR 5.9 16
3 AMR 6.7 18
4 AMR 7.4 20
5 AMR 7.95 21
6 AMR 10.2 27
7 AMR 12.2 32
Frame Size is including a frame header;
But RTP packet is having the frames as follows:
First Frame alone has the 1 byte audio frame header; rest of the frames will not have header; we need to add it manually;
Ac 12 20 39 40 29 20 39 33
Ac is a frame header and from the header onwards we can identify the number of bytes per frame; assume that if the frame header info as 4.75kbps having frame size 13 means from the frame header, count the 13 bytes; then next insert the First frame’s frame header and count the 13 bytes from the header and then insert the frame header for 3rd frame;
1st Frame Header bytes After 13 bytes insert the First Frame’s header, then next insert the frame header after the 26 bytes and next insert the frame header after the 39 bytes. Do it repeatedly.
If we have not inserted the frame header at the every frame start, then it will be decoded by the AMR decoder but u will not have any hearable audio.
POSTED BY SUNDAR
AT THURSDAY, APRIL 03, 2008
Am not a Donkey said...
Excellent post. Helped me a lot. Keep up the good work!
SEPTEMBER 21, 2009 9:59 PM
全称Adaptive Multi-Rate、適応マルチレート符号化、主にモバイル機器のオーディオに用いられ、圧縮比は比較的大きいが、他の圧縮フォーマットに比べて品質が悪く、人声、通話に多く用いられるため、効果はやはり良い.
全称Adaptive Multi-Rate、適応マルチレート符号化、主にモバイル機器のオーディオに用いられ、圧縮比は比較的大きいが、他の圧縮フォーマットに比べて品質が悪く、人声、通話に多く用いられるため、効果はやはり良い.一、分類1.AMR:AMR-NBとも称する、以下のWBに対して音声帯域幅範囲:300-3400 Hz、8 KHzサンプリング2.AMR-WB:AMR WideBand、音声帯域幅範囲:50-7000 Hz 16 KHzサンプリング「AMR-WB」は全称「Adaptive Multi-rate-Wideband」、すなわち「適応マルチレートブロードバンド符号化」であり、サンプリング周波数は16 kHzであり、国際標準化機構ITU-Tと3 GPPに同時に採用する広帯域音声符号化規格であり、G 722とも呼ばれる.2標準.AMR−WBは、音声帯域幅が50〜7000 Hzの範囲を提供し、ユーザは、音声が以前よりも自然で、快適で、分解しやすいことを主観的に感じることができる.
.
.
.
五、AMRファイルの記憶形式(RFC 3267):
AMR IF 1,IF 2は、無線伝送用のAMRのフレームフォーマットを定義する.RFC 3267は、AMRデータをファイルに格納するファイルフォーマットを定義する.
AMRのファイル形式は、次の図1に示します.
ファイルヘッダとフレーム1フレームのAMRデータが含まれています.
1. ファイルヘッダフォーマット:
AMRファイルは、モノラルおよびマルチチャネルをサポートします.シングルチャネルとマルチチャネルのファイルヘッダは異なります.
モノラル:
AMR-NBファイルヘッダ:「#!AMR」(or 0 x 321414 d 520 a in hexadecimal)(引用符内の部分)
AMR-WBファイルヘッダ:「#!AMR-WB」(or 0 x 321414 d 522 d 57420 a in hexadecimal).(引用符内)
マルチチャネル:
マルチチャネルのファイルヘッダにはmagic numberと32 bit channle descriptionドメインが含まれています.
AMR-NBのmagic number:"#!AMR_MC 1.0"
(or 0x2321414d525F4D43312E300a in hexadecimal).
AMR-WBのmagic number:"#!AMR-WB_MC 1.0"
(or 0x2321414d522d57425F4D43312E300a in hexadecimal).
32 bitのchannel descriptionドメインの定義は以下の通りです.
ここでreserved bitsは0でなければなりません.CHAN:現在のファイルにいくつかのチャネルが含まれていることを示します.
フレームヘッダのフォーマット:
フレームヘッダのフォーマットは、図2に示すように、1バイト(8 bit)を占める
Pは塗りつぶしを0に設定
FTは符号化モード、すなわち上述した16の符号化モードである.
Qはフレーム品質インジケータであり、0の場合はフレームが破損していることを示す.
【図3】AMR−NB 5.9 Kbitの1フレームのフォーマットを示す図である.
5.9 kbitのフレームの118 bitのデータについては、15*8=120=118+2であるため、最後に2 bitのパディングビットがある.
参考文献:
RFC3267 RTP Payload Format for AMR and AMR-WB
3GPP TS 26.201 V6.0.0
3GPP TS 26.101 V6.0.0+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
http://tools.ietf.org/html/rfc3267
Real-Time Transport Protocol (RTP) Payload Format and File Storage
Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate
Wideband (AMR-WB) Audio Codecs
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
AMR format
Each frame can be encoded using one of 8 varying levels of compression using various bitrates(AMR modes 0-7). Following are the various AMR modes and their corresponding bitrates:
MODE BIT RATES
0 - AMR 4.75 - Encodes at 4.75kbit/s
1 - AMR 5.15 - Encodes at 5.15kbit/s
2 - AMR 5.9 - Encodes at 5.9kbit/s
3 - AMR 6.7 - Encodes at 6.7kbit/s
4 - AMR 7.4 - Encodes at 7.4kbit/s
5 - AMR 7.95 - Encodes at 7.95kbit/s
6 - AMR 10.2 - Encodes at 10.2kbit/s
7 - AMR 12.2 - Encodes at 12.2kbit/s
Each frame consists of a 1-byte header, then the rest of the frame is audio data. The entire frame is fed into the AMR decoder (header too). The frame size can be deduced from the frame header.
The 2nd bit through 5th bit (counting the most significant bit as the first bit) comprise the CMR (Codec Mode Request), values 0-7 being valid for AMR. The top bit of the CMR can actually be ignored, though it is used when AMR forms RTP payloads. The lower 3-bits of the header are reserved and are not used. Viewing the header from most significant bit to least significant bit, the encoding is XCCCCXXX, where Xs are reserved (typically 0) and the Cs are the CMR.
Frame size of AMR modes in bytes (including the header byte) are shown below:
CMR MODE FRAME SIZE( in bytes ) 0 AMR 4.75 13 1 AMR 5.15 14 2 AMR 5.9 16 3 AMR 6.7 18 4 AMR 7.4 20 5 AMR 7.95 21 6 AMR 10.2 27 7 AMR 12.2 32
The above frame specifications and header information applies only for AMR-NB and the frame format might be different for AMR-WB.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
AMR over RTP:
RTP Packet Size 1389
RTP header size 12
--------
1377
-----------
(1377/Frame Size) - 1 = Number of ToC Entries; each frame has ToC entry in RTP
Or just count the number of bytes (ToC entries) to identify the frames available in a RTP packet;
How can we extract the AMR mode information from the RTP packet?
First check the CMR (Codec Mode Request) .
It is having bandwidth. For Each bandwidth, the frame size is fixed;
CMR Mode Frame size (bytes)
0 AMR 4.75 13
1 AMR 5.15 14
2 AMR 5.9 16
3 AMR 6.7 18
4 AMR 7.4 20
5 AMR 7.95 21
6 AMR 10.2 27
7 AMR 12.2 32
Every AMR frame is having 20 ms of audio data;
So if 40 frames are available means
40 * 20 = 800 milli seconds (play time of that frames);
1000 milliseconds = 1 second
From Source Filter, Based on Number of Frames we need to set the Start and Stop timestamps.
AMR over RTP is as follows:
+----------------+-------------------+----------------
| payload header | table of contents | speech data ...
+----------------+-------------------+----------------
Payload header is 4 bits;
First 4 bits are CMR (Codec Mode Request); From Codec Mode Request value, we can identify the frame size;
ToC (Table of contents) -
Each and every frame has an entry for the ToC;
If RTP packet is having 43 audio frames means that much of Toc Bytes must be available.
After F0, Remove the same types of bytes; (this byte indicates the audio bit rate information)
But How can we know the Frame Type?
1011 1100 (BC) 12.2 kbps
1011 0100 (B4) 10.2 kbps
1010 1100 (AC) 7.95 kbps
RTP packet:
---------------
If a terminal has no preference in which mode to receive, it SHOULD set CMR=15 in all its outbound payloads
Each RTP AMR data begins with F0 and then ToC entries like 0xac, 0xbc, 0xb4 as repetitive.
If the RTP packet has N frames, RTP packet is having N number of TOC Entries.
From the TOC Entry we can define the frame size of the audio frame;
TOC Entry will be in the following form:
---------------------------------------------
F (1 bit) | FT (4 bits) | 1 (1 bit)
---------------------------------------------
1 0111 1 00 -12.2 kbps ( 0x BC)
1 0110 1 00 -10.2 kbps ( 0x B4)
1 0101 1 00 - 7.95 kbps ( 0x AC)
------------------------------------------------
FO BC BC BC BC BC BC BC BC BC 3C
After the TOC contents, the First start code acts as a frame header;
3C is the frame header available in an audio frame and every audio frame must begins with 3C;
From the bit rate, we can determine the Frame Size;
CMR Mode Frame size (bytes)
0 AMR 4.75 13
1 AMR 5.15 14
2 AMR 5.9 16
3 AMR 6.7 18
4 AMR 7.4 20
5 AMR 7.95 21
6 AMR 10.2 27
7 AMR 12.2 32
Frame Size is including a frame header;
But RTP packet is having the frames as follows:
First Frame alone has the 1 byte audio frame header; rest of the frames will not have header; we need to add it manually;
Ac 12 20 39 40 29 20 39 33
Ac is a frame header and from the header onwards we can identify the number of bytes per frame; assume that if the frame header info as 4.75kbps having frame size 13 means from the frame header, count the 13 bytes; then next insert the First frame’s frame header and count the 13 bytes from the header and then insert the frame header for 3rd frame;
1st Frame Header bytes After 13 bytes insert the First Frame’s header, then next insert the frame header after the 26 bytes and next insert the frame header after the 39 bytes. Do it repeatedly.
If we have not inserted the frame header at the every frame start, then it will be decoded by the AMR decoder but u will not have any hearable audio.
POSTED BY SUNDAR
AT THURSDAY, APRIL 03, 2008
1 COMMENTS:
Am not a Donkey said...
Excellent post. Helped me a lot. Keep up the good work!
SEPTEMBER 21, 2009 9:59 PM