You are on page 1of 7

TP CH KHOA HC V CNG NGH, I HC NNG - S 4(39).

2010

TNG QUAN V CC K THUT NN AUDIO CHT LNG CAO MP3 V AAC DNG TRONG THIT B S HIN NAY
OVERVIEW OF HIGH QUALITY AUDIO COMPRESSION TECHNOLOGIES MP3 AND AAC FOR TODAYS DIGITAL MEDIA Hong L Uyn Thc, Phm Vn Tun
Trng i hc Bch khoa, i hc Nng
TM TT Trong vi nm gn y, chng ta chng kin s gia tng khng ngng ca k thut nn tn hiu audio s, c bit l MP3 (Moving Picture Experts Group 1 - Layer 3) v AAC (Moving Picture Experts Group 2 - Advanced Audio Coding). MP3 v AAC l hai chun nn audio s cht lng cao, tn hiu audio khi phc nghe c gn ging vi tn hiu gc trc khi nn. Ty thuc vo yu cu v cht lng m t l nn c th c chn la thch hp. Vi cht lng gn ging CD, t l nn c th t c khong 11:1. Bi bo trnh by tng quan v cch thc hin m ha/gii m MP3 v AAC trn c s li dng nhng c im cm quan ca tai ngi. ng thi tin hnh so snh MP3 v AAC v cht lng tn hiu audio, tc bit v t l nn bng phng php nh gi cht lng ch quan l nghe th. Kt qu thc nghim hon ton ph hp vi cc nghin cu c cng b trc . ABSTRACT There has been a widespread proliferation of digital audio signal compression technologies in the past few years, especially MP3 (Moving Picture Experts Group 1 - Layer 3) and AAC (Moving Picture Experts Group 2 - Advanced Audio Coding). The MP3 and AAC standards are two high quality compression technologies in which AAC performs better than MP3. The reconstructed audio signal almost sounds similarly to the original one before compression. The compression ratio can be chosen according to the sound quality requirement. The near-CD sound quality can be reached at the compression ratio of 11:1. This article presents an overview of human perception of sound, based on which, the encoding and decoding of MP3 and AAC are implemented. The article also makes the comparison of several properties of MP3 and AAC, such as audio quality, bit rate, compression ratio using the subjective evaluation which is based on the listening test. The experimental results are quite in accordance with previous publications.

1. t vn Nm 1982, hai cng ty in t Philips v Sony thnh cng vang di vi vic tung ra th trng phng tin mi lu tr tn hiu audio di dng s - a compact (CD). Yu cu dung lng cn thit ghi m tn hiu audio s vo khong 1.411 Mbps, ngha l: 44100 (mu/giy) x 16 (bit/mu) x 3600 (giy/gi) x 2 knh = 1.411 Mbps. S pht trin nhanh chng ca cc phng tin nghe nhc b ti, cc dch v chia s file audio gia cc my tnh qua internet, cc dch v truyn hnh s (i km audio)
235

TP CH KHOA HC V CNG NGH, I HC NNG - S 4(39).2010

gy nh hng su sc n cc ng dng truyn/lu tr audio. iu ny thc y cc chun m ha nn audio mi ra i; trong ph bin nht l chun MP3 v AAC. Chun MP3 c ng dng truyn tn hiu audio qua internet v lu tr tn hiu audio trong cc thit b nghe nhc b ti. Chun AAC l chun nn audio tip theo MP3, ang c s dng trong ca hng m nhc trc tuyn ca Apple l iTunes. 2. M ha cm quan tn hiu audio M ha audio cm quan (perceptual encoding) l mt k thut li dng nhng c im cm quan ca tai ngi t c t l nn cao vi cht lng nn tt. Nghin cu [1] cho thy: nhy ca tai khc nhau i vi cc thnh phn tn s khc nhau, nn c th li dng iu ny lng t ha tn hiu audio vi s bit khc nhau cho mi bng con, dn n s bit trung bnh gim xung (hnh 1)

Khi nghe hai m thanh mnh yu khc nhau vi tn s khc nhau xy ra cng lc, m mnh hn c th che khut khin tai khng nghe c m yu hn. Hiu ng ny gi l mt n tn s (frequency masking). Tng t nh vy, nu m yu hn c pht ra ngay trc hoc ngay sau m mnh hn th cng b che khut. Hiu ng ny gi l mt n thi gian (temporal masking). Hnh 2 minh ha s kt hp hai hiu ng ny. MPEG l nhm cc chun m ha audio cm quan cht lng cao. MPEG-1 hot ng ba ch khc nhau gi l lp (layer), vi mc phc tp v hiu qu tng dn t lp 1 n lp 3 [1]. MPEG-1 lp 3 (cn gi l MP3) l nhm MPEG-1 phc tp nht, cung cp cht lng audio gn vi cht lng CD tc bit thp. MP3 h tr cc tn s ly mu khc nhau nh 32kHz, 44.1kHz v 48kHz; tc bit c th thay i t 32 n 448 kbps; mode m ha c th thay i, bao gm: mono, dual mono, stereo v joint stereo. Hnh 3 l s khi ca mt b m ha MP3 in hnh.
236 Hnh 2. Kt hp hiu ng mt n tn s vi mt n thi gian [1]

Hnh 1. Phn chia di tn nghe c thnh cc bng con v lng t ha cc mu trong tng bng vi s bit khc nhau [1]

2.1. Chun m ha audio MP3

TP CH KHOA HC V CNG NGH, I HC NNG - S 4(39).2010

Gin lc (Filterbank): phn tch tn hiu vo thnh 32 bng con, u ra ca cc b lc bng con c ni vi b bin i Cosin ri rc MDCT (Modified Discrete Cosine Transform). MDCT chia tip cc u ra ca gin lc thnh 576 bng con nhm t phn gii tt hn trong min tn s. Vic phn chia bng con l nhm li dng c im nhy ca tai thay i i vi cc thnh phn tn s khc nhau. Lp m hnh cm quan (Psychoacoustic model): khu ny quyt nh cht lng ca tn hiu MP3. B m ha MP3 tin hnh nh x t min thi gian sang min tn s bng php bin i Fourier nhanh FFT (Fast Fourier Trasform) 1024 im, gip phn gii tn s tt hn nhm c lng ngng mt n chnh xc hn.

Hnh 3. S khi b m ha audio theo chun MP3 [2]

Lng t ha v m ha (Quantization and Coding): thc hin lng t ha v m ha cc thnh phn ph vi yu cu nhiu lng t ha thp hn ngng mt n. Cc gi tr lng t ha c m ha Huffman vi bng m thay i i vi nhng di tn s khc nhau, thch nghi tt hn vi tn hiu. V m Huffman l m c di t m thay i v cn gi cho nhiu thp hn ngng mt n nn phi tnh li v cc h s t l trc khi lng t ha. tm c li v cc h s t l ti u i vi mt khi cho trc, MP3 dng hai vng lp lng vo nhau. Vng lp trong hay vng lp iu khin tc (rate control loop): hiu chnh li tng dn kch thc bc lng t ha, gim dn s mc lng t ha cho n khi s bit yu cu cho m ha Huffman nh, dn n bit tc bit ca tn hiu MP3 nh. Vng lp ngoi hay vng lp iu khin nhiu (distortion control loop): hiu chnh h s t l gim dn nhiu lng t ha, lc s mc lng t ha tng dn ln, lm tc bit tng dn ln, dn n vng lp trong phi hiu chnh li. Nu khng ng thi tha mn c yu cu v tc bit v cht lng
237

TP CH KHOA HC V CNG NGH, I HC NNG - S 4(39).2010

audio th hai vng lp s khng c im hi t. trnh trng hp ny, phi hiu chnh cc thng s m ha khi b m ha hot ng cc tc bit khc nhau. nh dng dng bit (bitstream formatting): dng bit MP3 c nh dng theo tng khung, gm cc h s ph c m ha, u khung l header gm: t m ng b, tc bit, tn s ly mu, lp, mode m ha. Do cc thng tin trn c lp li trong tt c cc khung nn ta c th gii m vo bt c lc no.

2.2. Chun m ha audio AAC AAC c kin trc tng t nh MP3 nhng khc vi MP3 ch AAC dng phng php modul ha (hnh 4), pht trin thm nhiu cng c m ha mi, gip ci thin cht lng audio tc bit thp: - Gin lc: AAC thay gin lc trong MP3 bng MDCT vi kch thc ca s di 1024 (thay cho 576 trong MP3). iu ny lm tng phn gii tn s so vi MP3.

Hnh 4. S khi b m ha audio theo chun MPEG-2 AAC [2]

- TNS (Temporal Noise Shaping): l mt cng ngh mi rt thnh cng trong vic ci thin cht lng ting ni tc bit thp. TNS to dng nhiu trong min thi gian bng mt vng lp h d on trong min tn s [1] - D on (prediction): c th dng khi d on tng t l nn bng cch hng cho b lng t ha tp trung vo nhng mu tn hiu ng quan tm [1]. - M ha audio: m ha M/S (mid/side) v ghp cp (coupling) mm do hn trong MP3, cho php gim tc bit. - M ha Huffman: dng t m c di thay i gim hn na d trong h s t l v trong gi tr ca cc vch ph lng t ha. - Bitstream multiplexer: tng t MP3, dng bit AAC c nh dng thnh
238

TP CH KHOA HC V CNG NGH, I HC NNG - S 4(39).2010

cc khung, trong khung AAC cng c t m ng b v cc tham s m ha nhng khng gn lin vi nhau m thay i ty ng dng c th. V d nh: ADIF (Audio Data Interchange Format) t tt c thng tin iu khin gii m vo trong mt header n trc dng audio, gip cho vic trao i file d dng hn, nhng khng th gii m vo bt c lc no ta mun. Hay ADTS (Audio Data Transport Stream) nh dng header tng t nh MP3, cho php gii m bt c lc no cn. 3. So snh MP3 v AAC 3.1. Cht lng m ha nh gi cht lng m ha tn hiu audio, v c bn c ba phng php l: nghe th (nh gi ch quan), nh gi khch quan v o cm quan. Cho n nay th nghe th vn l phng php n gin v hiu qu nh gi cht lng ca cc thut ton m ha audio khc nhau. ITU-R (International Telecommunications Union, Radiocommunications sector) cng vi cc pht thanh vin v nhm MPEG audio xut mt lot cc quy tc phc tp nh gi cht lng bng cch nghe th. nh gi khch quan l phng php da vo t s tn hiu trn nhiu SNR. Tuy nhin lm th ny c th xem l khng tun theo mc ch ca m ha cm quan, v m ha cm quan ci thin cht lng audio bng cch to thm nhiu min thi gian v tn s da trn c ch cm quan ca tai, nn c th dn n SNR thp. ITU-R chun ha mt phng php nh gi cht lng da gi l o cm quan, da vo m hnh cm quan ca tai nh gi cht lng ca tn hiu audio nn [5]. 3.2. Tc bit MPEG khng lm vic vi tc bit c nh m ngi dng c th ty chn tc bit. Tc bit thp hn s dn n t l nn tt hn nhng cht lng thp hn. Tuy nhin, ta c th tm c nhng tc bit c bit gi l sweet spots, ti thut ton c th lm vic tt nht. Ti cc tc bit ln hn sweet spots, cht lng tn hiu audio tng rt chm, trong khi ti cc tc bit thp hn, cht lng li gim rt nhanh. 3.3. Kt qu so snh MP3 v AAC bng thc nghim Dng chc nng Recoring trong module Audio Compression ca chng trnh [1] ghi m 20 file m nhc dng *.wav, trong c 10 file nhc c in v 10 file nhc Rap. Ch thu c chn l stereo, tn s ly mu ln lt l 32kHz v 44.1kHz. Sau tin hnh nn cc file wav bng chc nng Audio codec, ln lt chn thut ton nn MP3 v AAC. i vi MP3, tin hnh nn tc bit 32kbps, 64kbps v 128kbps. i vi AAC, tin hnh nn tc bit 64kbps, 128kbps v 192kbps. Sau , so snh cht lng bng phng php nghe th nhm kim tra ting n, mo, cao ca cc nt, s n nh,, chng ti nhn thy kt qu nh sau:
239

TP CH KHOA HC V CNG NGH, I HC NNG - S 4(39).2010

Bng 1. Kt qu so snh cht lng ca cc file MP3 Tc bit 32kbp s Nhc c in ly mu 32kHz Rt mo, rt n v nhng nt di b t on Vn mo v n, nhng nt di t b t on hn Cn mo mt t, nhng hu nh cc nt di khng cn b t on Nhc c in ly mu 44.1kHz Khng ci thin my so vi tn s ly mu 32kHz Mo v b t on mt t cc nt di, tt hn so vi tn s ly mu 32kHz, tc 64kbps Cht lng gn vi file gc, kh phn bit vi file wav Nhc Rap ly mu 32kHz Rt mo, rt n v nhng on ni di b t on Vn cn mo v n, nhng on ni di bt b t on Nhc Rap ly mu 44.1kHz Khng ci thin my so vi tn s ly mu 32kHz Tt hn so vi rap ly mu tn s 32kHz, nhng vn cn nhn ra mo, n v t on Cht lng gn vi file gc, kh phn bit vi file gc

64kbp s

128kb ps

Cn mo, n v b t on nhng c th chp nhn c

3.4. Nhn xt Kt qu nh gi cht lng i vi file nn MP3 bng 1 cho thy: khi thu m tn s ly mu 32kHz th cht lng m nhc tt t. C hai loi nhc th nghim u t cht lng chp nhn c tn s ly mu 44.1kHz v tc bit 64kbps, nhng mun cht lng kh phi nn tc bit 128kbps. Lc ny t l nn t c kh cao l: 1.411 (Mbps) : 128 (kbps) = 11 : 1. i vi AAC, nh kt qu trnh by trong bng 2, nhc c in thu m tn s ly mu 44.1kHz v nn tc bit 64kbps c cht lng chp nhn c v tt hn so vi Rap, v nhng nt cao nghe r v trong hn; hn na nhng on ni trong nhc Rap b mo nhiu hn so vi nhc. C hai loi nhc th nghim khi thu m tn s ly mu 44.1kHz v nn tc bit 128kbps v 192kbps u cho cht lng rt tuyt, c bit rt kh phn bit cht lng gia tc 128kbps v 192kbps.
Bng 2. Kt qu so snh cht lng ca cc file AAC Tc bit 64kbps Nhc c in ly mu 32kHz C n mt t v mt s nt cao b ph Gn vi cht lng gc, kh phn bit vi file wav Gn vi cht lng gc, kh phn bit vi tc 128kbps Nhc c in ly mu 44.1kHz Rt t nhiu, cn t mo, nghe kh tt Gn vi cht lng gc, kh phn bit vi file wav Gn vi cht lng gc, kh phn bit vi tc 128kbps Nhc Rap ly mu 32kHz C n mt t v mt s nt cao b ph Gn vi cht lng gc, kh phn bit vi file wav Gn vi cht lng gc, kh phn bit vi tc 128kbps Nhc Rap ly mu 44.1kHz C n mt t v mt s nt cao b ph Gn vi cht lng gc, kh phn bit vi file wav Gn vi cht lng gc, kh phn bit vi tc 128kbps

128kbps

192kbps

240

TP CH KHOA HC V CNG NGH, I HC NNG - S 4(39).2010

Nh vy, kt qu nh gi cht lng ca cc file nhc nn bng MP3 v AAC bng thc nghim l ph hp vi cc kt qu nghin cu c cng b ti [1], [2], [3]. 4. Kt lun C hai chun m ha MP3 v MPEG-2 AAC u c th nn tn hiu audio vi cht lng gn cht lng ca CD. Trong hai chun trn, MP3 t phc tp hn AAC, AAC cung cp cht lng tt hn MP3 vi cng tn s ly mu v t l nn. Hng nghin cu tip theo: tm hiu v pht trin cc chun nn audio mi da trn MPEG-4, thc hin y cc phng php nh gi cht lng nh: single stimulus rating, paired rating with reference, multiple stimulus rating, ITU-R BS.1116-1,
MUSHRA.

TI LIU THAM KHO [1] Jenq-Neng Hwang, Multimedia Networking, Cambridge University Press 2009. [2] Karl-Heinz Brandenburg, MP3 and AAC explained, AES 17th International Conference on High Quality Audio Coding. [3] Stephen Bunting, A subjective comparison of MPEG-4 AAC codecs, 4B Technical Project 2004. [4] Serkan Kiranyaz, Mathieu Aubazac, Moncef Gabbouj, Unsupervised Segmentation and Classification over MP3 and AAC Audio Bitstreams, WIAMIS 2003. [5] C. Colomes, C. Schmidmer, and W.C. Treurniet, Perceptual quality assessment for digital audio: PEAQ-the proposed ITU standard for objective measurement of perceived audio quality, AES 17th International Conference.

241

You might also like