You are on page 1of 7
Recognizing Human Action in Time-Sequential Images using Hidden Markov Model Junji YAMATO Jun OHYA™ Kenichiro [SHIT NTT Human Interface Laboratories. Take 1-2956, Yokosuka, Japan Abstract “This pope propote «new human ection recognition method asad on a Hidden Markov Model (HMM). We do not adopt mode Sosd top-down approcch becouse out purpoveitnot tor Coatract » geometric presentation of the human Body but to ‘cogs human action, ene a featre based bottom pap [rocchwith HA that a characterized by earning apa hd time-sale invariably, To aply HMMs to our aim one of time eguential images tranformed tt om image Jo {Srenectorteuence, andthe sequence i coveted iio 9 {el sequen by sector quantisation. In lerning human action ntegrian, the paramters of te HMMs, one per category, are pts sat Best deverbe the training nequences fom he ‘egry. To recoeae on oberoed sequence, the IMM which Stat matcher the sequence chosen. Emperimental resus of real Limesequenil imager of sports scener show recotion Fates Iigher than 80%. We confirm thet the recon rte i im. proved by icreving the marber of people ued f peeve he ‘ening dats The indicates he pour of etelahing 0 per fom independent action recoeser, ‘This method con be any “pled te other fale acute no domain knowledge ir wed 1 Introduction In recent yest, motion related topce have een maler cor cerns in computer viion, Considering moving object in rel ‘Sener, human being are very important rcogiton target ‘Sncebuman ation eopssion algorthn ean greatly cosebate fo the relation of automatic monitoring aytems fr asus Importan applications. This paper introduces » new algorishin {or eecogssing human ation from timecequental mages. The theerved hunt action ean be camifed ar one hutan action ‘Busing approsches tlted to hus action secopition laclude the topdows methods based on geomet. body ‘eeenrutioasy fy 16) and the Botlomp melhods based ot lowclevel image fears 4) ‘Most eye bused on the top-down approach employ 9 ge mesial model of the human bodys buman body pats ae Alesribed as eplindery, super quai}, and 90 ‘ng apaio-terporal anabys(t6),consteint propagation] or IReda snaps}, model parameters Ate determined Gomi ees, That means the posture ofthe human is exracled fom {Uclimage and tepeesentation is obtained. Mode perameters tr ebioed from the sequence of images Reconstructing the aman Sody shape (Le. extrocting model parneters) lech thd eal tepreenttins such ay joint angle parameter if the Taare alr AT Comsinion Sytem Revere boro: (0-146 2855.39 $3.0 01992 EEE ™ econstoction i ucessfl. Howeves, the rconsrution proce. {Royer ae neler robust or eeable for real image. Thea be. ‘aurea images are usualy toons to permit exsy model ft. lings Thus extencingsucetfl high-level representations frm Images ie vey aif. ‘Therefore almont no reeit on haman ston recogeton by the model bused approach has been te- povied, wile putters catcationtechniquer can be seed 10 [ecogsit nseqhence of model parameters extracted fom image ‘Sequences, Improving the sabuxtneas of reconstruction i impor {Satin ths kindof approoch, Beene flare in reconstruction prevents suceaflreogition, However, arth quatetive ‘Bon paredgn[s) polnt ou, i recoortrecion sely necenary for human ction tecopition? Oat purpose to reopaise he ‘tun action om tine-eeguental image, no obtaining geomet ‘epreeatatios of human bodies. Since we oces 08 reogition, ire avid secostucton becasue the representation baled by {rometscrconsroction sot easenia.Tnteed, we tlie low Toveliage fear in Bottom-up manner. Bottom ap appreschen which beaitically tlie low Jere {eater extracted fom rallmage hare been the sbject of vars lout atic 8. Lowlevel age features such asthe areas of frm condate repens have been aed for couatng the au ber of pedentrane i rel acencn. Tn general lowlee fetazer do not provide decipiens ae ch av tho of made based sep {eventation, bat thelr extraction process ze more robust thas ‘model ig procedure ‘We emphasie the svbustaess of the bottom-up approach , ‘but existing sai have only ben able to coust the number ‘of prope, aot recognising thelr ection Another problem with oltomsp approaches ie that to dencrib ation categories © Towle tpreenttin it ore dfs than fn phere ( snodeLbuted)rpseenttion, Thi i Deus featons between ‘ivensions ofthe fentre vector andthe hghlevel description ofcaepory ate not explicit. Furthermore, th dimension ofthe feature recor uualy to large tobe uadersiond intial. ‘There are two problem in applying the bottom-up apprtch How can we enable the system to recognize more completed as ‘ons! How can we deveibe the defsition of etegris witout ‘odel based high-level representation? ‘Thee problems can be eliminated with a lating procedure that wee lowlevel image fester. Out goal to elsif ob- ‘ivedle lee image fentve sequences ino human action ct ‘ovis. Ural thn Kind of tsk normalized usw supervised Tesringprsbem in the pattern damieaton Rel. Unlike sort ‘Shc patie cluifctiontechaiqun, the data to be clas. fed ime seguenil dat. "To sealze thie lenring capability for ie sequential image Auta, we employ the Hiden Markov Model (HMB) [5 which as del wth Simeequental date and can provide timescale Invaoity ae wel ax lating capability for recogution, AL ‘Gough HMMs ave been accent aed in speech ecosition, bite have been applied to only few problems in computer ison fld= pasar shape caeiaton, i] handwritten word ‘cognition and modeling eye movemeat[0) In other words, Me have sever been applied to motion recognition. In ou propoed approch,tme-sequental images expreing uma action ae transformed to an image fentare vetor se quence by extracting feature vector fom sch images Lao furzet implementation, the tesh fetal] sted ate lowe Teel image fetare. Each letare vector ofthe sequence ie signed ta aymbol which correspond tos codeword in the code. Took created by Vector Quastzation(1), The faint veins of 1Re sample date set for ting are vector geanired. Conse ‘seal, the timesegueail images are converted fo sya Sequence. Inthe learning phase, the model parameter of the FHM of each eaegory are optimized soa to bent desrbe the ‘talniag symbol veqsences fom the categories of human acti tobe recopaised. For human setonrecopition he todel which Dest matcher the oberved symbol sequence echoes ie ee ecnned caterer ‘Section? deta the HMAGa aad th recopiion and lesen procedates Section 3 iurtzstes how a set of timeaeqsentia Timges are converted to symbol sequence tat the IMM cas proces. Section {giver experimental resis ung tenia ston scenee, Secon 6 sttunaies ont method 2 Hidden Markov Model 2.1 Outline Mie, which have secntly been applied with pasicala success to speech recngtin, are « Lind of stochastic state leasot models]. EMMa make it posible fo del with Une: sequential dat and can provide me acale nearby i eo titles. Moreover, HMMs are charocteiaed by thi learaing hiliy which ie achieved by presenting timeseqaental data tos HMM snd automatically optininng the model with the data ‘ATHMM conse oft numberof sates each of whic i a signed a probity aftranuitio rom one tate to another tate ‘With ine state transitions occu stochastically. Like Mariow model stater at any time depend only onthe nate a ihe poe: Ceding tine. One aybol eyelets one of the HMM sates ‘cording the probabiter ign othe tater HM cates fe not ceil observable and canbe oboerved ony through = eqaence of obuerved ‘ymbols. Te describe n dete HMM! 5 the flowing notation ae defined Te length af the obuerraton sequence. @ re }aet of states 1 = numberof tenn the model V = fenonyensenaet of pone output aymbals M =numbes af servation symbols A foley = Pays = ie = Mate ma proba, 1 tthe probably of trnsiting Grom tae gt state 4 3B (86) = Paenoc =) Sobol nia pod, (2) isthe probably of output epmbol vy at state uel Bara ye dc ie ach up pe ‘Ede can be avoided, but wnforepotely, he leming proces i oe open {ribs =Pelos = gy) Iii state probability {A.B,7} complete parameter st ofthe mode ‘ain tis model, transitions ate deveribed a flows SS ahe f= dyes T! State 4 the t Oe sate (ane sheen) 08 04,05,..0r : Oberred symbol neguense (length=T) Figuret iuszstr the concept of the 2 with « tension raphe There ae four sates in tne example indicted es cols ‘ach directed ine ina transition frm one state to another, ete the easton probability i indicated by the character slngade feline. Note that there are ale traseiion pate fom state to then setter Thee pat cos provide the HMM wit siecle ina {Ole Beene they allow the HOM to stay in the sme sate Tor any dertion. Bach stoe ofthe HMM stochartillyoxtpts a symbol. [a state qj, aymbal my iv opt wth» probably of). there See M kinds of pmbol, (4) becomes an N 1 M matic. The HMM outpets the aymbol sequence O = O1,02,-—sOr fom tine 1 to We ean obuere the symbol sequencer ont by {he HMM bet we can aot observe the HMM sates, The nl slate ofthe EM is ue etertinedsochariaiy by be ntl Plate probability +. A HMM is charctried by thee malice Sate transit probality matrix A, aymbel output probabiy Toatic Bad atl tae probeblty mutex ‘The peremeter of A,B, and ae determined during the learning process derived in 2.2. As decried in 32 ne MIM. ‘Seated foreach etegry to be recopnned. Recopnsing Sie ‘equental ayobols inequivalent fo determining which EMM pre deed the observed ymbol sequence 2. and 24 expla the 2.2 Recognition ‘Torecopnlae observed nyo sequence, we create one HMA for euch elegoy. For danifer of © caegoie, we choose {he model which best matches the observations from © AMIMG Ais (AigBretsjt= Ise-C. Thlo enna that when a sequence Sl onknown category ie given, we calcslate P(3[0) fr ech MAO, and select ee, where = argemgx(P1010)) 0 Gives the obervtion sequence O = O,,..Op andthe IMM, 2s according to the Bayes rule the problem i hw to eralaste r(O[i), the probability that the sequence was generated by HMO 3 Thi probability i calalted by ain the “onward poeta) "The forward algorithm ia dened a flows: au() = PO, a. ® auf) ie cle the forward vaiale and cap be calculated cusiely a follows ai) = ( 11800) « Then P18) seemax(Px0410)) Wie an calculate the kaliood of ach HMM etng the above sqoation and elect the mov lkely TIM as the secopaitio re rts Since the lliood ie calculated Som the entire length ax described above, tne senle variance, time shits tome lore In vector qeeatinaton bev Ble infuence on the ‘Scuacy of determining the Ekeod. The advantage of HMMs {or tne sequent pattern recog, which i abust to time seal rtance end shi eute fom thie factor 2.3 Learning be tenn so that fori ete {un Trinng an HMM means optimising the model param ‘Reve (dnBy4) to maximise the probabiity ofthe observation Sequence PROD), The Baun-Weich algorithm ie weed for thee ‘mations Defoe: Bi) = PCO rye Orlee= 663) o called the bcknard variable and can alo be solved induc- {hey in sangeet ed fr the forward aie ans). WO = P= gira) ‘a. POR) e Poorman =al0ie0ra) stash Ques eosl3) 10) ~ POW oo ‘Using these equations, HMM parameters A can be improved to A, The re-estimation equations ftom 2 = (4.x) to 3 (B,#) are: ay = Ea a, aw YEE wo! aay = Bogus 2, on 100) 9) Learning converges if X = 2. The Baum. Welch algorithm doesnot alway fad the global maser, but it does And the local masiniom of PXO[S). Our expetince shows that ths sot sgnifiant problem. 3. Applying HMM to time-sequential im- ages ‘Te spply HMM: to simeseguential mages 1 islet, the imagen wart be tassforned inte symbol vex {ences O(Blgure2) in the Tearing and recopition pastes, ‘rom cach fnme Ly of an image sequence, 4 feature vee for fe @ RY, (= Le2jonTyn ib the dimension of the fe tere vector) ia extracted, and fs assigned to a eymbol ‘hore fom the aymbol st V. Is necessary to associate the Symbol st V withthe fntae space Hor thi the fentre Spec i divided into elutes by » pattern clasifeation tech ee, ead the eyols re assigned to the clusters. We ated Sector quastsatiog/s] her, aes cal in EMM For veclor quantisation, codewords g, € RY which represent the centers ofthe elutes inthe fate apace, ate needed CCodeword ye sgned to symbols) Consequently, the se ike code book equals the number of HSM output symbol a ack fate vector fe transformed ita the symbol which ia aeigned to tbe codeword nearest to the vector inthe fare Ghace, This menos fre teasformed into symbol C3 rami, (fs) (lea) tance between s and 9). image Js i tanaormed into the symbol which i assigned to the codeword neatest to fetane vector fj, and O,(= 8) 8 Fede. We ue mesh fostarens) a the fate vectors becase they sere saccefily applied to compler 2D patterse such as mali font characters, Figured expe the elealtion of mesh fee tres, Bach biassed image (My ise) in divided into ‘Mie & Nag pas meshes, The ato of black zal in ach mesh ‘becomes an element of the fate vector, Mui) = wu of Back pine) /(Mia x Nae) (14) “Thus, the feature vector sequence F extracted Som a iege sequence [is trasformed tow symbol sequence O by exrn- Ig the mesh ectarevecos ofeach fame and vector quanling ten. “These transformations have the mei that they ae appliable to varios images without mantal tuning Becoae aller extract ing the hum tvs thie ethod een no patter except mesh {Ste nck can be decided fom thew ti eberved ange sequence eabattted into Bgu()(8), 238 {he recogiton svat in otained as the category C» Which mas: imisee BS) ln leaching, the symbol sequences ate obisined {fom tisiag wits timeequntial image dasa and TDM pee acer ate optiniaed foreach eatgory by Bqe(11}(13. 4 Experimental results and discussions 4.1 Experimental conditions We leved ost agers on el timesequetial imager, Oar experiment ated tenia actions "The categories 10 be ce Stnlued mere ni tonne irae: Yorshand robe backend Mroke\Yorehand volley backhand voley’, mash’, and er “Theee persons performed each ofthe sx tennis actions 10 timer, The peiformances were caplated by a TV exmera( NTSC, 90 feamen/vcond) and igtined into 20> 200 ne 258, roplevel images Figured shows an orginal image sequence of {he tennis action Norhand volley whete every forth fame’ Asplaged ‘A shows in Figure, the cignal loge sequence contains & complicated bocigoundy and the postions of he player mover ‘Iihing eequnce, Thor bumaa aren exracton nad tacking are Seeded We used the image preproceniog operations described teow 1)Peparing backgrownd imege(Figure®(b)) for is rg Ina action tmage(Pigures-()) Bling both images with slow pase ltr 5)Bxtecting human ares (Flguret()) bythe flowing cond- Sflsena) — Maa) < thy 159) ive Ke) = le) Tuma erence nage, Zeorgial age Tydackgrouad image, thresbold ‘he thes binacige tke extracted imager to thatthe white and back piselacorespoaded to Noman areas and bachgyound1e- pectin Figure? shows bina image examples of thee ‘The mesh featur extraction described in Section 3 wat pet: formed onthe Maat images. The sine Mag Ny ofthe mesh featee vector fin Be(04) waa 8 Oct consequent the lmensian naff, was 625. An the potions aad sis of has ‘seas vsied in the binaied images, the mesh center was placed the center of ravity ofeach human ace to ved the iadnce f grombody diplacement nother words, to normale the po- ‘isons. The body ate was lao sormelrd by sealing each fame othe binned images so nt eqalz the mean rail of human For vetorquastization, me selected 12 images foreach cat ‘gory us colenords inthe codebook forthe experiment. Thu, {ie tumber of HMM symbols was T; symbols Oto 11 repress! "ynckband stoke’, 12 to 23 the bachand valley 3 fo 35th werrice’ 36 to 47 the mash’, 48 to 59 the oreband stoke 52 607 the forehand vole ‘As shows ia Figures, symbole were correctly assigned 9 ‘he Bnaced images, where tach auigned symbol ison of he ‘ymbelsreprerentng“orehand volley” nthe HMMs wed in our experiment, the number W of states Q was et 1038. As the durations ofthe action were st com: ant the leaglh ofthe bwerved symbal request vated be- {ween 25 and 70, In traning the HMM, wile the lkebood fonverged after about 10D iteration, we performed 160 iter ‘one fa each HMR 4.2. Experiment 1 42.1 Experiment details We secoguied the actions performed by three people aiag LHMBG which were teined by the date of tbe abject. Taree subject perion Ay B, and C ) performed each of the ix ac Une ten finan, Fie sequences wee ued to tain the HMR while te semaining Sve seqeencs mere ed to test scopiion perormance 422 Rerults ‘The regulon reelts ze shows in Tablet and Table2. ‘Tablet shows the ikeliood of Yorehand voley' wih ix FMS lusned path the data of eubject C. The likened ofthe cote category isthe hight ‘abled shows the recognition rte ofthe thre subject wth 10 diferent combinations of fie training sequences out ofthe 10 vale; that iy foreach person, thee ate 300(= 10.5 % 6) et ata. The average fr the thee subjects wen 96.0%, Recognition perfomance werres atthe numberof tuning tenn decetcs, We tse tasted the recopazer constructed BY Ne ined by three taining patterns inten combinations For ore combination, the recognition rte wa 100%(42/#2) bat (or vme other combinations tlt 76.5% ‘The performance of our recognition stem depends not only ‘on the nomber a teining patter but also Bow wal the tning atlenscepevent the elegy. When the numberof traning Patterns is ral the recognition rate Becomes more stable ‘This in because the HMM parameters citialy depead on the selection of training pattern "To contact a robust recognition system, appropriate tri lng pltern are inspovtant. Ths means trnning patterns should cover the aman et pettern scatter posible I covering the er with one HMM ie ital, we shold divide the eatery into vebcategoie, 3 4.3, Bxperiment 2 43.1 Experiment details, In experiment (2), rining aad tnt subjects were complet it. ferent. The HMMs were tained with the sequen of one or two subject and tested with howe ofother subject, The HMDS tree tained using 10 mized date ete cli om one ot two ‘abject and tered with thove of other subject ese diferest oot only in HMM peoceing but as vector gan ation, The code bok of vector quantisation was conereted sing only taining data. This experimen ie more sppropine for eatimatng the asefuineus ofthe method in rel atuations ‘becaue real word secogition apse sould e peton inde desk Tee i ao any ope mute bed forlearing, This redacedacle experiment sed ited tA date ealected Gom more thas one perton to recogane aster 432° Results In experiment (2), a shown In Table, the cecogition rte of ‘he HMMs were tol es good ae recorded in expesment i). This Is benuse test peter robjecte and taining Patera subjects tree diferent Zach person has some wniguesen ins actions Sut the mance engin nite sch hat bemans ca couse ‘Thus we es improve the ecoition rate by ung mixed tri lng dala. The recgstion rate of the HMA tated withthe date of wo abjece war improved to TOA. Thos performance ta be improved by collecting more tng patterns Which we (lable for tpreenting the Category. ‘Our caren implementation uses the sh feature frit si ‘ty. However, thir doesnot satch the hasan poste space [earning capably ny well developed bet we should futher 5 Conclusion proach or inn aeton reogution rom ase of me-sequnsal Images. I our algorithm, mech fentare vector squence © symbole which correspond to codewordnin the codebook cesed Sy vector quantination. Ip leatnng, sya sequence obtuned ‘Hom sling ime sequence dat are wed to optimize FMS for nction entegores In tecopsition, a aymbol sequence oe ak served nage sequences processed by HDIMo, andthe reco: Aiton real ie deterrind te the cabeocy hich best ches {he observed sequence "Phe min expesnestal rl axing tenis ations peformed ty three subject ae flows, When telning date and ter ate are thon ofthe seme subject, a recognition rate of over 0% was achieved. On the other han, hen tuning date tod tert daa ae thee of diferent subjects the perfomance drops ‘These revalta show that our method is promising to sealte suman scion rengntion for vasousapplealon sucha fang ‘hopliters in department siere ued dangerous evi nt [Endergerten. To improve oor current implementation, we will lap alarge sale experiment and ferther refi feature exten Tia method buscally deal with 2D image bt ican be ex tended to SD objet actone sung for example, spect raphe In anigning aymbols or making HDIMs, HMMa cane applied fo actin recognition in other vsious ways Lhllawait) wee HMA to secogain words by analysing round und the beh of lp data, Thin shows the appLcabilty of HMM to mull ‘Sneeqoenial pattern recognition. Ts other word, saaot fox ton poblems, Fatere improvement in recognition acaracy and {ecopisingcompllented actions witout any reduction in rbust- irocare now beng sought Acknowledgements ‘The authors would like to thank De.Yakio Kobayashi and ‘Tala Badoof NTT Human Interface Laberatris fat thet ex Couragemeat and wappott for this work. The author lo thank ‘Aare inanare of NTT Harman Intertace Laboratories fr hin Spl advice on HDMs, laa, thanks are due to Inboratory ‘member for saiataiaing » postive rresch environment and forverring ws the modea, References [u) BHorowits and A. Pentland, *Recorey of Now Rigid Mo- Aion and Strectaze™ In Proceding of CVPR, pp. $25-830, 199 (a) Yang He and Aman Kunde. *Plnar Shape Clamiaton ‘ng Hidden Mashov Model In Pre.CVPR, pp. 10-18, 9 [3] Devi Hogg. "Model-based vision: program to ace «walle “mage and ison computing, Vol 1, Noy (i) B.W, Hwang and 5. Thlaba. "Real-Time Mearoremeat of Peden Flow Using Proceming of ITV Images". Tren. {TEIGE, Va. 36, No pp 917-924, 198. (in Japasee)- {6} 1Asimonon, *Purpoive and Qualitative Active Vision". in Preceding of ICPR, pp. 346-360, 190. (6) A. Kunde, ¥. He, and P, Bahl "landwstien Word Recor- {kina Hidden Maskoy Model Booed Approach". Petters ‘Recgtionpp- 243-207, may 1088 (PJ, ORouke and NBadler. °Model Based mage Analyis {Hamas Motion Using Constraint Propagation”. IEEE Tronr PAM, Vol PAME2, No 6, pp. 522-536, Nov 1980. [s) B.Oveamon. *TV.Camera Detecting Pedestrians fr Ta feLight Conia”. In Tecnologia and Methodloial Ade ences Mesurement (ACTA IMEKO), volume 3, 9p. 215 m2, 1002 (0) LR. Rabiner and BA Juang, "An Introduction to Hidden iskor Models", IEEE ASSP MAGAZINE, yp, 4-16, 303 1966 [to] RDRimey and CALBrown, "Selective Attention Se ‘questa Behavior Modeling Bye Movements with an Auge {Bealed Hidden Matkor Mode”. In Proc. DARPA Inepe Understanding Workshor, pp. 640-149, 1890. ‘T.Acso snd Mishikaws, "Sensor Fsin ualng Stochastic Proc”. In fad Symposia of Autonomous Distributed ‘Systoma pp- 115-116, 1991, (in Jepanese). D. Tersopenlo and D. Metaxas, "Dynamic 3D Mod {hs with Leeland Global Deformations" Deoraable Se- pesquader". IEEE Trans. on PAM, Val. 18, No. 7, pp 10-114, 1981. M. Umeda, "Recognition of MaltLFont Printed Chinese Characters. In Pro. 6th ICPR, pp. 782-726, 198. ty oy) 03) (QU W.D.ae and SLY. Kung, “An Object Recopation Spe. s ‘en Using Stochatc Knomledge Soerer aad VEST Perle ‘Achitecetue” In Proc 10 ICPR, pp. 881-838, 100. 5, Y- As and M.AJuck. “Hidden Marhow Mod. (U6) M. Yamamoto, "Hamas Motion Analysis Bused on A Robot aa) 4a) aa nay se) 9) 86) BO) oA) MC HO i OW We Tata oes Symbol Sepce O.OrO,040 4.0 Figure 1: HMM Concept vite ie 4 Mesh festa Bit Ronee ian (010 Or Symbol segaence aN Figure 2: Processing fow 38 ‘Table 1: Likehood (backhand volley ) TB = 4 TacHiand voleyconectont) Trcliand wrcke " forehand voley | “233.9752 forebend stroke “42.948 eas mah Sano apsunber of ak metas Figure 3: Mesh feature ‘Table 2: Recognition rate (expeviment 3) Fiver A | voce (TST layer B | $2.33 (292/300) layer C | 1000 (300/300) Figure §: Human area extraction ‘Table 3: Recognition ete (5) (experiment 2) ‘oti, b)bockground,cstacted "Hise Bae Tare ae ais ata on [Sool segue sequence | 60 61 61 62 62 62 63 63 64 64 65 66 66 66 67 68 68 69 69 70 70 70 71 71 Figure & Example of extracted tennis action and symbol Sequence(forehand voll). ((Gadestined symbol ie asigned to fame of above figure. ) TWEE

You might also like