You are on page 1of 4

2020 International Conference on Wireless Communications and Smart Grid (ICWCSG)

7KHDQDO\VLVRISURGXFWHYDOXDWLRQDQGVDOHV
FKDUDFWHULVWLFPRGHOEDVHGRQGDWDPLQLQJ
;LQJ/LX -LQIHQJ/LX
6FKRRORI%XVLQHVV 6FKRRORI%XVLQHVV
6LFKXDQ1RUPDO8QLYHUVLW\ 6LFKXDQ1RUPDO8QLYHUVLW\
&KHQJGX&KLQD &KHQJGX&KLQD

&RUUHVSRQGLQJDXWKRUOLX[LQJVLFQX#FRP OMM#LFORXGFRP

-LD\DQJ=KDQJ <X'HQJ
6FKRRORI%XVLQHVV 6FKRRORI&RPSXWHU6FLHQFH
6LFKXDQ1RUPDO8QLYHUVLW\ 6LFKXDQ1RUPDO8QLYHUVLW\
&KHQJGX&KLQD &KHQJGX&KLQD
MHUU\B]KDQJ#FRP GHQJ\XILJKW#FRP


;LDRKXL/L 
6FKRRORI%XVLQHVV 
6LFKXDQ1RUPDO8QLYHUVLW\ 
&KHQJGX&KLQD 
[LDRKXLOL#FRP 

Abstract²6LQFH D ORW RI RQOLQH HYDOXDWLRQ LQIRUPDWLRQ LV GHFLVLRQV EXW DOVR KHOSV PDQDJHUV XQGHUVWDQG FRQVXPHUV¶
DFFXPXODWHG ZLWK WKH DSSOLFDWLRQ DQG GHYHORSPHQW RI H QHHGV VR DV WR LPSURYH SURGXFWV VHUYLFH >@ 7KHUHIRUH EDVHG
FRPPHUFHLQGXVWU\VHQWLPHQWDQDO\VLVWHFKQRORJ\FDQEHXVHGWR RQ WKH IUDPHZRUN VHPDQWLF WKHRU\ WKLV SDSHU FRQVWUXFWV D
PLQH DQG XWLOL]H LW HIIHFWLYHO\ VR DV WR SURYLGH UHIHUHQFH IRU H VSHFLDOVHPDQWLFFODVVLILFDWLRQGLFWLRQDU\LQWKHILHOGRIRQOLQH
FRPPHUFH LQGXVWU\ PDQDJHPHQW )LUVW D VHQWLPHQW VHPDQWLF VKRSSLQJ UHYLHZV WDJJLQJ WKH EDVLF VHQWLPHQW LQIRUPDWLRQ RI
FODVVLILFDWLRQ GLFWLRQDU\ ZDV FRQVWUXFWHG EDVHG RQ WKH IUDPH UHYLHZVHQWHQFHVZLWKWKHPHWKRGRIGLFWLRQDU\DQGUXOHVWKHQ
VHPDQWLFVWKHRU\7KHQEDVHGRQWKHGLFWLRQDU\DQGVRPHUXOHV WR UHDOL]H WKH VHPDQWLF XQGHUVWDQGLQJ RI WKH VHQWLPHQW
WKH VHQWLPHQW RI RQOLQH UHYLHZV ZDV DQDO\]HG )LQDOO\ WKURXJK
LQIRUPDWLRQRIRQOLQHVKRSSLQJUHYLHZV
WKH WHVW RI RQOLQH UHYLHZ GDWD D KLJKHU DFFXUDF\ UDWH DQG UHFDOO
UDWHZHUHREWDLQHGZKLFKSURYHGWKHYDOLGLW\DQGHIIHFWLYHQHVVRI
WKHUHVHDUFKPHWKRG ,, /,7(5$785(5(9,(:
:LWK WKH UDSLG GHYHORSPHQW RI WKH LQWHUQHW HFRPPHUFH
Keywords-frame semantics; sentiment analysis; online reviews LQGXVWU\LVGHYHORSLQJUDSLGO\WKHGHPDQGIRURQOLQHVKRSSLQJ
LV LQFUHDVLQJ 6RPH VFKRODUV KDYH VWXGLHG KRZ WR RSWLPL]H
, ,1752'8&7,21 FXVWRPHUV
 RQOLQH VKRSSLQJ VHUYLFH H[SHULHQFH .DQJ HW DO >@
:LWK WKHGHYHORSPHQW RI WKH LQWHUQHW WKH HFRPPHUFHKDV VXJJHVWHGDIUDPHZRUNIRUPHDVXULQJ FXVWRPHU VDWLVIDFWLRQLQ
GHYHORSHGUDSLGO\2QOLQHVKRSSLQJDFFRXQWVIRUDQLQFUHDVLQJ PRELOH VHUYLFHV ZLWK HPSOR\LQJ VHQWLPHQW DQDO\VLV WR H[WUDFW
SURSRUWLRQ RI PDUNHW FRQVXPSWLRQ 0RUH DQG PRUH FXVWRPHUV LQIRUPDWLRQRQFXVWRPHUVDWLVIDFWLRQ3XMDULHWDO>@GHVLJQHG
DUH ZLOOLQJ WR VKDUH WKHLU RSLQLRQV RU H[SHULHQFHV RQ SURGXFWV DQHZIUDPHZRUNFRPSULVLQJRIWKHLQEXLOWSDFNDJHVRIS\WKRQ
RQWKHLQWHUQHWWKXVIRUPLQJDZHDOWKRIUHYLHZVRQWKHXVHRI ZKLFK PLQHV PDQ\ FXVWRPHUV
 RSLQLRQV DERXW D SURGXFW DQG
RQOLQH VKRSSLQJ SURGXFWV )RU PDVVLYH SURGXFW UHYLHZV WR JURXSVWKHPDFFRUGLQJO\EDVHGRQWKHLUVHQWLPHQWVZKLFKDLGV
REWDLQYDOXDEOHLQIRUPDWLRQ$JUHDWGHDORIUHVHDUFKDWWHQWLRQ WKHSRWHQWLDOEX\HUVWRIRUPDFDSLWDOL]HGYLHZRQWKHSURGXFW
KDVEHHQGHYRWHGWRRQOLQHVKRSSLQJUHYLHZV>@WKHIROORZLQJ <XVVXSRYD HW DO >@ GHVFULEHG WKH DSSOLFDWLRQ RI D QRYHO
HPHUJHGVHQWLPHQWDQDO\VLVWHFKQRORJ\VKRZVDEHWWHUVXSSRUW GRPDLQLQGHSHQGHQW GHFLVLRQ VXSSRUW DSSURDFK IRU FXVWRPHU
WR WKLV SRLQW >@ =KDR HW DO DSSOLHG DSSUDLVDO H[SUHVVLRQV WR VDWLVIDFWLRQ UHVHDUFK WKURXJK GHHS DQDO\VLV RI FRQVXPHU
VHQWHQFH VHQWLPHQW FODVVLILFDWLRQ LQFOXGLQJ VHPDQWLF IHDWXUHV UHYLHZV SRVWHG RQ WKH LQWHUQHW LQ QDWXUDO ODQJXDJH &XVWRPHU
V\QWDFWLF IHDWXUHV OH[LFDO IHDWXUHV DQG SRODULW\ IHDWXUHV ZHUH UHYLHZV DUH UHFRJQL]HG DV IUXLWIXO LQIRUPDWLRQ VRXUFHV IRU
GHVLJQHGWRFODVVLI\VHQWLPHQWVHQWHQFHVDVSRVLWLYHRUQHJDWLYH PRQLWRULQJDQGHQKDQFLQJFXVWRPHUVDWLVIDFWLRQOHYHOVDQGKHOS
>@ :LWK WKH DQDO\VLV RI WH[W UHYLHZV FXVWRPHUV
 VHQWLPHQW PDQDJHU PDNH GHFLVLRQV SDUWLFXODUO\ DV WKH\ FRQYH\ WKH UHDO
SRODULW\ WRZDUGV WKH SURGXFWV DUH REWDLQHG 7KDW LV SRVLWLYH YRLFHV RI DFWXDO FXVWRPHUV H[SUHVVLQJ UHODWLYHO\ XQDPELJXRXV
VDWLVILHG  RU QHJDWLYH GLVVDWLVILHG  6HQWLPHQW DQDO\VLV QRW RSLQLRQV
RQO\ SURYLGHV VXSSRUW IRU FRQVXPHUV WR PDNH SXUFKDVH

978-1-7281-9820-0/20/$31.00 ©2020 IEEE 151


DOI 10.1109/ICWCSG50807.2020.00040

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on September 26,2020 at 13:40:23 UTC from IEEE Xplore. Restrictions apply.
$ JURZLQJ QXPEHU RI UHFHQW VWXGLHV KDYH IRFXVHG RQ WKH 6WHSDFFRUGLQJWRWKHRULJLQDO85/WRFUDZOWKHSDJHWR
HFRQRPLFYDOXHVRIUHYLHZVH[SORULQJWKHUHODWLRQVKLSEHWZHHQ JHW WKH GDWD VWRUHG LQ WKH GDWD ILOH RU GDWDEDVH DQG JHW D QHZ
WKH VDOHV SHUIRUPDQFH RI SURGXFWV DQG WKHLU UHYLHZV >@ DQG 85/
VHYHUDOVHQWLPHQWVDQDO\VLV,WKDVEHHQIRXQGWKDWLQWKHHDUO\
UHVHDUFKWKHUHDUHVHYHUDOJHQHUDODQGRSHQ(QJOLVKVHQWLPHQW 6WHS SXW WKH QHZ 85/ LQWR WKH TXHXH DFFRUGLQJ WR WKH
GLFWLRQDU\ UHVRXUFHV DSSURDFKHV KDYH SURSRVHG H[WUDFWLRQ RI QHZ85/WRFUDZOZHESDJHVDQGVWRUDJHGDWDDQGUHSHDWWKH
WKHVHQWLPHQWLQIRUPDWLRQIURPFXVWRPHUUHYLHZV.DQJHWDO>@ DERYH FUDZOLQJ SURFHVV :KHQ PHHWLQJ WKH VWRS FRQGLWLRQ RI
GHYHORSHG D QHZ IUDPHZRUN IRU PHDVXUHPHQW RI FXVWRPHU WKHFUDZOHUVWRSFUDZOLQJWRJHWGDWD
VDWLVIDFWLRQ IRU PRELOH VHUYLFHV E\ FRPELQLQJ 9,.25 :HE FUDZOHUV DUH PRVWO\ LPSOHPHQWHG LQ -DYD 3\WKRQ
DSSURDFKDQGVHQWLPHQWDQDO\VLV7KHIRUHJRLQJOLWHUDWXUHVKDYH & DQG RWKHU SURJUDPPLQJ ODQJXDJHV ,Q RUGHU WR VDYH WLPH
FRQGXFWHG RQ D PHDQLQJIXO H[SORUDWLRQ RQ VHQWLPHQW DQDO\VLV DQG LPSURYH WKH TXDOLW\ RI WKH SURJUDP VRPH FUDZOHU IUDPHV
RI RQOLQH UHYLHZV 7KLV SDSHU FRQVWUXFWV D VHQWLPHQW VHPDQWLF DUH XVHG WR GHYHORS FUDZOHU SURJUDPV LQ SUDFWLFDO SURMHFWV
FODVVLILFDWLRQ GLFWLRQDU\ EDVHG RQ WKH IUDPH VHPDQWLFV WKHRU\ &XUUHQWO\LQWKHILHOGRIPDFKLQHOHDUQLQJ3\WKRQLVSUHIHUUHG
EDVHGRQWKHGLFWLRQDU\DQGVRPHUXOHVWKHVHQWLPHQWRIRQOLQH DVWKHSURJUDPPLQJODQJXDJHZKLFKKDVDYHU\SRZHUIXOWKLUG
UHYLHZVZDVDQDO\]HG SDUW\SURJUDPOLEUDU\DQGSURYLGHVDKLJKTXDOLW\RSHQVRXUFH
FUDZOHUIUDPHZRUN
,,, 5(6($5&+'(6,*1
B. Implementation of online shopping reviews data crawling
7KHUHVHDUFKUHJDUGLQJRQOLQHUHYLHZVLQFOXGHVWKUHHSDUWV
WH[W H[WUDFWLRQ SUHSURFHVVLQJ RI UHYLHZ WH[W DQG VHQWLPHQW in Python
VHPDQWLF WDJJLQJ 7H[W H[WUDFWLRQ LQFOXGHV EDVLF LPSOHPHQW $V IRU PDFKLQH OHDUQLQJ WKH FUDZOLQJ RI RQOLQH VKRSSLQJ
SULQFLSOH RI ZHE FUDZOHU DQG LPSOHPHQWDWLRQ RI RQOLQH UHYLHZ GDWD LV UHDOL]HG YLD 3\WKRQ 7KH VSHFLILF VWHSV DUH
VKRSSLQJ UHYLHZV GDWD FUDZOLQJ LQ 3\WKRQ SUHSURFHVVLQJ RI VKRZHGDVIROORZV
UHYLHZ WH[W LQFOXGHV ZRUG VHJPHQWDWLRQ DQG SDUWRIVSHHFK 6WHS SUHSDUH WKH ³5HTXHVWV´ OLEUDU\ DQG ³8VHU $JHQW´
WDJJLQJ6HQWLPHQWVHPDQWLFWDJJLQJRIRQOLQHUHYLHZVFRQWDLQV OLEUDU\ 7KH ³5HTXHVWV´ OLEUDU\ LV XVHG WR LPSOHPHQW WKH
HVWDEOLVKPHQW RI VHQWLPHQW GLFWLRQDU\ IUDPH DQG VHQWLPHQW FUDZOHU IXQFWLRQ 7KH ³8VHU $JHQW´ OLEUDU\ LV XVHG WR PDNH
WKHPHWDJJLQJDQGWKHFDOFXODWLRQRIVHQWLPHQWWHQGHQF\YDOXH UHTXHVWV WR WKH VHUYHU RI WKH WDUJHW SDJH YLD WKH EURZVHU
7KHUHVHDUFKPRGHOLVSUHVHQWHGLQ)LJXUH GLVJXLVHGDVDQRUPDOXVHU
6HQWLPHQWVHPDQWLFWDJJLQJ
2QOLQH
UHYLHZV
6WHS GHWHUPLQH WKH WDUJHW SDJH DQG DQDO\]H LWV VWUXFWXUH
7KLV SDUW RI LQIRUPDWLRQ LV WKH GDWD WR EH FUDZOHG E\ WKH ZHE
7H[WH[WUDULRQ (VWDEOLVKPHQWRI FUDZOHUZKLFKFRQWDLQVRQOLQHVKRSSLQJUHYLHZGDWD
VHQWLPHQWGLFWLRQDU\
%DVLFLPSOHPHQWSULQFLSOHRIZHE 6WHS FUDZO WKH WDUJHW SDJH DQG VDYH LW DV D ORFDO ILOH LQ
FUDZOHU
3\WKRQ7KH³5HTXHVWV´OLEUDU\LVXVHGWRUHTXHVWWKHFRQWHQWRI
)UDPHDQGVHQWLPHQW
WKHPHWDJJLQJ
WKH UHTXLUHG SDJH WKH ³%HDXWLIXO6RXS´ OLEUDU\ LV XVHG IRU
,PSOHPHQWDWLRQRIRQOLQHVKRSSLQJ +70/ SDUVLQJ DQG WKH ³UH´ OLEUDU\ SURYLGHV UHJXODU
UHYLHZVGDWDFUDZOLQJLQ3\WKRQ
H[SUHVVLRQIXQFWLRQV
7KHFDOFXODWLRQRI
VHQWLPHQWWHQGHQF\YDOXH
3UHSURFHVVLQJRI
9 35(352&(66,1*2)5(9,(:7(;7
YLHZWH[W
7H[WSUHSURFHVVLQJLVDNH\VWHSLQGDWDSURFHVVLQJZKLFK
:RUGVHJPHQWDWLRQ LVWRJLYHDEHWWHUHIIHFWWRWKHDOJRULWKP'XHWRWKHIUHHGRPRI
2QOLQHUHYLHZV RQOLQH UHYLHZV DQG FROORTXLDO H[SUHVVLRQ VRPH UHGXQGDQW
3DUWRIVSHHFKWDJJLQJ
VHQWLPHQWLQIRUPDWLRQ LQIRUPDWLRQ LV LQYROYHG 'LUHFWO\ H[WUDFWHG RQOLQH UHYLHZV DUH
 RIWHQ LUUHJXODU 7KHUHIRUH ZRUG VHJPHQWDWLRQ DQG SDUW RI
)LJXUH6HQWLPHQWDQDO\VLVRIRQOLQHUHYLHZVEDVHGRQIUDPHVHPDQWLFV VSHHFKWDJJLQJVKRXOGEHFDUULHGRXWEHIRUHVHPDQWLFWDJJLQJ

A. Word segmentation
,9 7(;7(;75$&7,21
7KHZRUGLVWKHEDVLFFRQVWLWXHQWXQLWRI(QJOLVKVHQWHQFH
A. Basic implement principle of web crawler 7KHVSDFHLVXVHGDVWKHVHSDUDWRUEHWZHHQ(QJOLVKZRUGVDQG
WKH SXQFWXDWLRQ LV XVHG DV WKH SDUWLWLRQ EHWZHHQ (QJOLVK
&UDZOHU WHFKQRORJ\ DV DQ LPSRUWDQW WHFKQLFDO PHDQV WR VHQWHQFHV 7KHUHIRUH IRU JHQHUDO (QJOLVK SDUDJUDSKV
REWDLQQHWZRUNGDWDFDQEHZLGHO\XVHGLQVHDUFKHQJLQHVGDWD SXQFWXDWLRQ FDQ EH XVHG IRU VHQWHQFH VHJPHQWDWLRQ DQG ZKLWH
DQDO\VLV DQG RWKHU ILHOGV DQG FDQ DOVR EH XVHG WR FROOHFW VSDFH IRU ZRUG VHJPHQWDWLRQ ZKLOH IRU FRPSOH[ WH[W ZLWK
YDOXDEOH GDWD *HQHUDO ZHE FUDZOHU EDVLF LPSOHPHQWDWLRQ VSHFLDO H[SUHVVLRQV WKH QDWXUDO ODQJXDJH SURFHVVLQJ SODWIRUP
SULQFLSOHLVDVIROORZV 1/7. EDVHG RQ 3\WKRQ FDQ EH XVHG IRU ZRUG VHJPHQWDWLRQ
6WHSREWDLQWKHLQLWLDO85/ GRFXPHQW  WRRO
6WHS˖XVH WRNHQL]H ZRUG VHJPHQWDWLRQ SDFNDJH WR ZRUG
VHJPHQWDWLRQ˗

152

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on September 26,2020 at 13:40:23 UTC from IEEE Xplore. Restrictions apply.
6WHS˖UHPRYHSXQFWXDWLRQDQGVWRSZRUGV$IWHUDVLPSOH   *, 7KH *HQHUDO ,QTXLUHU  >@ ZKLFK JLYHV YHU\
ZRUGVHJPHQWDWLRQSURFHVVWKHUHDUHVWLOOPHDQLQJOHVVDUWLFOHV FRPSUHKHQVLYHLQIRUPDWLRQDERXWHDFKHQWU\
SUHSRVLWLRQV DGYHUEV DQG FRQMXQFWLRQV WKDW KDYH QR LQIOXHQFH  034$6XEMHFWLYLW\&XHV/H[LFRQ>@ZKLFKFRQWDLQV
RQ WKH VHQWHQFH PHDQLQJ RI WKH ZKROH VHQWHQFH VXFK DV ³D´  QHJDWLYH VHQWLPHQW ZRUGV DQG  SRVLWLYH VHQWLPHQW
³WKH´³RU´³DQG´HWFZKLFKQHHGWREHILOWHUHGE\WKH(QJOLVK ZRUGV
VWRSZRUGGLFWLRQDU\SURYLGHGE\1/7.
  %LQJ /LX 2SLQLRQ /H[LFRQ >@ ZKLFK FRQWDLQV 
6WHS ˖ ZRUG VWHP SURFHVVLQJ :RUG VWHP H[WUDFWLRQ LV QHJDWLYH VHQWLPHQW ZRUGV DQG  SRVLWLYH VHQWLPHQW ZRUGV
UHDOL]HG E\ UHPRYLQJ WKH URRW DQG DIIL[ RI WKH ZRUG DQG %XW WKH GLFWLRQDU\ FRQWDLQV PDQ\ VSHOOLQJ PLVWDNHV 6XFK DV
REWDLQLQJWKHZRUGSURWRW\SH6XFKDVZHDULQJDQGZRUHVRFN PXOWLSOH OHWWHUV PLVVLQJ OHWWHUV VODQJ DQG JUDPPDWLFDO
DQGVRFN$OWKRXJKWKHVHZRUGVDUHGLIIHUHQWLQIRUPWKH\DOO LQIOHFWLRQV
FRUUHVSRQG WR WKH VDPH URRW 7KH\ VKRXOG EH WKH VDPH ZRUGV
VRLWLVQHFHVVDU\WRH[WUDFWWKHVWHP³VWHP´SDFNDJHLQ1/7.  /,:& /LQJXLVWLF,QTXLU\DQG:RUG&RXQW>@ZKLFK
SURYLGHV /DQFDVWHU 6WHPPHU 3RUWHU DQG RWKHU PRGXOHV IRU GHVFULEHVWKHODZRIVHQWLPHQWZRUGVLQGLIIHUHQWW\SHV
ZRUGVWHPSURFHVVLQJ   6HQ :RUG1HW >@ ZKLFK FODVVLILHV :RUG1HW HQWULHV
LQWR DIIHFWLYH W\SHV DQG PDUNV WKH ZHLJKW RI HDFK HQWU\
B. Part-of-speech tagging DFFRUGLQJWRWKHSRVLWLYHDQGQHJDWLYHW\SHV
'XHWRDWWULEXWHH[WUDFWLRQRIVHQWLPHQWZRUGVLQYROYHVWKH
$VWKHDERYHGLFWLRQDULHVDUHDOOJHQHUDO(QJOLVKVHQWLPHQW
QRXQDGMHFWLYHVYHUEVDQGDGYHUEVDQGVRRQZKLFKQHHGVWR
GLFWLRQDULHV ZKLFK FRQWDLQ PDQ\ JHQHUDO (QJOLVK VHQWLPHQW
WDJWKHSDUWRIVSHHFK7KHSDUWRIVSHHFKWDJJLQJFDQEHFDUULHG
ZRUGV+RZHYHULWLVIRXQGWKDWWKHVHVHQWLPHQWZRUGVDUHWRR
RXW ZLWK ³WDJ´ SDFNDJH LQ 1/7. WKH WDJ SDFNDJH SURYLGHV
FRPPRQ WR VDWLVI\ WKH VHQWLPHQW DQDO\VLV LQ D SDUWLFXODU ILHOG
VRPH FODVV DQG SDUW RI VSHHFK WDJJLQJ LQWHUIDFHV DQG GHILQHV
0RUHRYHU WKHVH GLFWLRQDULHV RU VHQWLPHQW ZRUGV PD\ EH
VHYHUDOSDUWRIVSHHFKWDJJHUV)RUWKHJLYHQZRUGVWKHWDJJHU
PLVFODVVLILHG VR LW LV QRW HIIHFWLYH WR XVH WKHVH VHQWLPHQW
ORRNVIRUWKHPRVWSDUWRIVSHHFKFRUUHVSRQGLQJWRHDFKZRUGLQ
GLFWLRQDULHVDORQH7KHUHIRUHWKLVSDSHUFODVVLILHVWKHZRUGVLQ
WKHWUDLQLQJGDWDVHWDQGPDUNVLW)RUWKHZRUGVWKDWDUHQRWLQ
WKH DERYH VHQWLPHQW GLFWLRQDULHV FXW RXW WKH UHSHDWHG ZRUGV
WKH WUDLQLQJ GDWDVHW WKHLU SDUWV RI VSHHFK ZLOO EH PDUNHG DV
WKHQDQRYHOVHQWLPHQWGLFWLRQDU\LVREWDLQHG,QDGGLWLRQWKLV
QRQH7KLVSDSHUHPSOR\VWKH3RVWDJIXQFWLRQWRWDJSDUWRI
SDSHU DGGV WKHVH VSHFLDO UHYLHZV LQ HDFK VSHFLILF ILHOG WR EH
VSHHFK LQ VSHFLILHG ZRUG OLVW ZLWK WKH SDUW RI VSHHFK WDJJHU
VWXGLHGWRWKHVHQWLPHQWDOGLFWLRQDU\)RUH[DPSOHIRUFORWKLQJ
UHFRPPHQGHGE\1/7.7KHUHVXOWLVDOLVWRIZRUGVDQGWXSOHV
7KH FORWKHV DUH WRR WLJKW WR EH ZRUQ LV DGGHG 7KH FORWKHV
IRUPHGE\WKHFRUUHVSRQGLQJSDUWRIVSHHFKDVHOHPHQWV
DUH YHU\ ELJ , GRQ
W ILW ZHOO ,Q WKH HDUO\ VWXGLHV DGMHFWLYHV
7KH VHQWLPHQW DQDO\VLV RI SURGXFWV LV WKH VHQWLPHQW ZHUHXVXDOO\XVHGDVVHQWLPHQWZRUGV/DWHULWZDVIRXQGWKDW
H[SUHVVLRQ DQG DWWULEXWH RI FRQVXPHUV WR WKH SURGXFWV 7KLV QRW RQO\ DGMHFWLYHV EXW PDQ\ QRXQV DQG YHUEV FRXOG H[SUHVV
SDSHUVFRUHVWKHVHQWLPHQWH[SUHVVLRQRIFRQVXPHUVZKLFKDUH VHQWLPHQWV DQG MXGJH WKH VHQWLPHQWDO WHQGHQF\ ,I LW RQO\
SRVLWLYH DQG QHJDWLYH 1HJDWLYH LV WKH H[SUHVVLRQ RI UHODWLYH FRQVLGHUV DGMHFWLYHV DV VHQWLPHQW ZRUGV WKH DFFXUDF\ RI
GLVVDWLVIDFWLRQSRVLWLYHLVWKHH[SUHVVLRQRIUHODWLYHVDWLVIDFWLRQ VHQWLPHQWDQDO\VLVZLOOEHPXFKORZHU$VDUHVXOWQRXQVDQG
$W SUHVHQW WKH PHWKRGV RI VHQWLPHQW DQDO\VLV DUH PDLQO\ YHUEV VKRXOG DOVR EHLQJ FRQVLGHUHG LQ WKH VHQWLPHQWDO
GLYLGHG LQWR WZR W\SHV VHQWLPHQW GLFWLRQDU\ DQG PDFKLQH GLFWLRQDU\0RUHRYHUFRQYHQWLRQDOGLFWLRQDULHVRIVHQWLPHQWGR
OHDUQLQJ7KHPHWKRGEDVHGRQPDFKLQHOHDUQLQJLVHTXLYDOHQW QRWLQFOXGHYHUEDORULQWHUQHWODQJXDJH)RUH[DPSOH<DKRR
WR D FODVVLILFDWLRQ SUREOHP ZKLFK UHTXLUHV D ORW RI PDQXDO 2K ER\ :KRRSHH :KDW QHUYH 2K GHU %RRERR
WDJJLQJLQWKHWUDLQLQJWH[WFRQVXPHVDORWRIWLPHDQGHIIRUW HWFZHUHDGGHGWRWKHGLFWLRQDU\RISUDLVHVLQDPDQXDOZD\
DQG WKH WUDLQLQJ UHVXOWV DUH DSSOLHG LQ D VSHFLDOL]HG ILHOG
ZLWKRXW XQLYHUVDOLW\ 7KHUHIRUH WKLV SDSHU FKRRVHV WKH B. Frame and sentiment theme tagging
VHQWLPHQW DQDO\VLV PHWKRG EDVHG RQ WKH VHQWLPHQW GLFWLRQDU\ 7KHRQOLQHUHYLHZVHQWLPHQWDQDO\VLVLQFOXGHVGHWHUPLQLQJ
7KHQH[WVWHSLVWRFRQVWUXFWWKHVHQWLPHQWGLFWLRQDU\ WKHIUDPHRIWKHVHQWLPHQWZRUGVLGHQWLI\LQJWKHVXEMHFWRIWKH
VHQWLPHQWZRUGVDQGFDOFXODWLQJWKHVHQWLPHQWWHQGHQF\YDOXHV
9, 6(17,0(176(0$17,&7$**,1* ZKLFKLVIRUPDOO\GHILQHGDVIROORZV

A. Establishment of sentiment dictionary Fi Ei  vi  ˄˅


6HQWLPHQW GLFWLRQDU\ LV D GLFWLRQDU\ ZKLFK MXGJHV WKH +HUHLQ Fi GHQRWHV WKH IUDPH RI WKH VHQWLPHQW ZRUGV
SRODULW\ RI VHQWLPHQW EDVHG RQ VHPDQWLFV 7KH TXDOLW\ RI Ei UHSUHVHQWVWKHVHQWLPHQW WKHPHDQG vi HTXDOV WKH VHQWLPHQW
VHQWLPHQWGLFWLRQDU\DWWDFKHVDJUHDWLQIOXHQFHRQWKHDFFXUDF\
WHQGHQF\ YDOXHV 7KH IUDPH Fi LQ WKH VHQWLPHQW VHPDQWLF
RI VHQWLPHQW DQDO\VLV %DVHG RQ VHYHUDO FRPPRQ (QJOLVK
VHQWLPHQWGLFWLRQDULHVWKHVHQWLPHQWGLFWLRQDU\LQWKLVSDSHULV VWUXFWXUH FDQ EH GHWHUPLQHG E\ SRVLWLRQLQJ WKH YHUEV DQG
FRQVWUXFWHG E\ DGGLQJ VHQWLPHQW ZRUGV DQG QHWZRUN DGMHFWLYHV LQ WKH VHQWHQFH DQG PDWFKLQJ WKH IUDPH VHPDQWLF
H[SUHVVLRQVLQWKLVILHOG7KHUHOHYDQWGLFWLRQDULHVDQGOLWHUDWXUH GLFWLRQDU\ 7KH VHQWLPHQW WKHPH Ei LQ WKH VHQWLPHQW VHPDQWLF
KDYHSURYLGHGDQH[FHOOHQWUHIHUHQFHLQ,WKDVEHHQIRXQGWKDW VWUXFWXUHLVRQHRIWKHFRUHIUDPHZRUNHOHPHQWVLQWKHVHPDQWLF
LQWKHHDUO\UHVHDUFKWKHUHDUHVHYHUDOJHQHUDODQGRSHQ(QJOLVK UROH WKDW LV WKH REMHFWV HYDOXDWHG RU WKH HYDOXDWHG VXEMHFW
VHQWLPHQWGLFWLRQDU\UHVRXUFHV ZKRVH V\QWDFWLF IHDWXUHV DUH VWURQJO\ FRUUHVSRQGLQJ WR WKH
GHSHQGHQWV\QWDFWLF VWUXFWXUH7KHUHIRUH WKH PHWKRGEDVHG RQ

153

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on September 26,2020 at 13:40:23 UTC from IEEE Xplore. Restrictions apply.
GHSHQGHQWV\QWDFWLFVWUXFWXUHLVHPSOR\HG7KHPDWFKLQJUXOHV VHQWLPHQW DQDO\VLV LQWR WKH RQOLQH VKRSSLQJ UHYLHZV RI H
RIIUDPHDQGVHPDQWLFWKHPHDUHDVIROORZV FRPPHUFHLQGXVWU\
LU frame > SBV † ATT  head theme @> ADVd GHJ ree @>Vnegative @  ˄˅ +RZHYHU IRU VHQWLPHQW FODVVLILFDWLRQ WKLV SDSHU DVVXPHV
+HUHLQ LU GHQRWHVWKHVHQWLPHQWZRUGRIWKHUHYLHZWH[W WKDW HLWKHU SRVLWLYH RU QHJDWLYH ZKLFK LJQRUHV WKH QHXWUDO
IHDWXUH YLHZ 7KDW LV ODFNLQJ WKH REMHFWLYH GHVFULSWLRQ RI WKH
DQG LWV FRUUHVSRQGLQJ IUDPH LV GHWHUPLQHG DFFRUGLQJ WR WKH
SURGXFW 7KH FODVVLILFDWLRQ RI QHXWUDO VHQWLPHQW SRODULW\ QHHGV
IUDPHVHPDQWLFGLFWLRQDU\'XHWRWKHFKDUDFWHULVWLFVRIRQOLQH
WREHIXUWKHUVWXGLHG5HYLHZOHQJWKFDQLQFUHDVHWKHFRJQLWLYH
UHYLHZV VXFK DV VKRUW VHQWHQFHV VXEMHFW HOOLSVLV DQG HYHQ
QDWXUH RI WKH LQIRUPDWLRQ HVSHFLDOO\ LI LW LV DYDLODEOH DW QR
VLQJOHZRUGVHQWHQFHVLQDGGLWLRQWRWKHHPRWLRQDOZRUG LU  DGGLWLRQDOVHDUFKFRVW7KHUHIRUHLWLVSRVVLEOHWRPHDVXUHWKH
LV D QHFHVVDU\ VXEMHFW ZKLOH RWKHU VXEMHFWV DQG PRGLILHUV DUH DPRXQW RI XVHIXO LQIRUPDWLRQ FRQWDLQHG LQ D UHYLHZ E\
RSWLRQDO ZKLFK VKRXOG EH H[SUHVVHG DV>@ DFFRUGLQJ WR WKH FRPELQLQJ WKH OHQJWK RI WKH UHYLHZ +RZHYHU WKHUH PD\ EH
UHVXOWRIGHSHQGHQF\SDUVLQJ VRPHOHQJWK\UHYLHZVWKDWGHVFULEHLQIRUPDWLRQXQUHODWHGWRWKH
%DVHG RQ WKH UHVXOW RI GHSHQGHQF\ V\QWD[ DQDO\VLV LI WKH SURGXFW RU WKH EXVLQHVV ZKLFK GRQ
W PDNH VHQVH WR RWKHU
VHQWLPHQW ZRUG GRPLQDWHV D VXEMHFW FRPSRQHQW 6%9  WKH FRQVXPHUV,QWKHFRQWH[WRIWKLVIXWXUHUHVHDUFKFDQIRFXVRQ
FRPSRQHQW LV WDJJHG DV WKH IUDPH HOHPHQW RI WKH FODVV RI WKH DSSOLFDWLRQ RI WH[W PLQLQJ WR PHDVXUH WKH XVHIXO
VHQWLPHQWWKHPH,IWKHVHQWLPHQWZRUGLVLQWKHDWWULEXWHKHDG LQIRUPDWLRQ FRQWDLQHG LQ WKHUHYLHZ FRQWHQW VR DV WR LPSURYH
$77KHDG SRVLWLRQLQWKHELDVHGVWUXFWXUHLWLVWDJJHGDVWKH WKH UHOLDELOLW\ DQG HIIHFWLYHQHVV RI WKH UHYLHZ VHQWLPHQW
VHQWLPHQWWKHPH7KHV\PEROLQGLFDWHVWKDWWKHUHLVDORJLFDO DQDO\VLV
H[FOXVLYHRU UHODWLRQVKLS EHWZHHQ WZR FDVHV 7KDW LV HLWKHU
6%9LVWUXHRU$77KHDGLVWUXHEXWQHLWKHUFDQEHWUXHDWWKH 5()(5(1&(6
VDPH WLPH ,I DGYHUE RI GHJUHH $'9G  DSSHDUV LQ WKH >@ =KDQJ / /LX %   6HQWLPHQW $QDO\VLV DQG 2SLQLRQ 0LQLQJ ,Q
GHSHQGHQF\ JUDPPDU VWUXFWXUH LW LV WDJJHG DV GHJUHH RI 6DPPXW & :HEE *, HGV  (QF\FORSHGLD RI 0DFKLQH /HDUQLQJ DQG
IUDPHHOHPHQW GHJUHH ,IWKHUHLVDQDGYHUELDO $'9Q ZLWK 'DWD0LQLQJ6SULQJHU%RVWRQ0$
QHJDWLYHWLPHVDQGLWLVWDJJHGDVQHJDWLYH´ERWKZLOOEHXVHG >@ 1LVVLP 0 DQG 9 3DWWL 6HPDQWLF $VSHFWV LQ 6HQWLPHQW $QDO\VLV
DVWKHMXGJPHQWEDVLVIRUWKHYDOXHRIVHQWLPHQWWHQGHQF\ 6HQWLPHQW$QDO\VLVLQ6RFLDO1HWZRUNV
>@ .DUOJUHQ - 6DKOJUHQ 0 2OVVRQ ) (VSLQR]D ) +DPIRUV 2
8VHIXOQHVV RI VHQWLPHQW DQDO\VLV ,Q WK (XURSHDQ &RQIHUHQFH RQ
C. The calculation of sentiment tendency value ,QIRUPDWLRQ5HWULHYDO  
%DVHGRQWKHVHQWLPHQWWHQGHQF\YDOXHRIWKHZRUGVLQWKH >@ =KDR < 4LQ % &KH : HW DO $SSUDLVDO ([SUHVVLRQ 5HFRJQLWLRQ ZLWK
IUDPHVHPDQWLFGLFWLRQDU\WKHILQDODVVLJQPHQWLVREWDLQHGE\ 6\QWDFWLF 3DWK IRU 6HQWHQFH 6HQWLPHQW &ODVVLILFDWLRQ>-@ ,QWHUQDWLRQDO
DGMXVWLQJWKHGHJUHHDQGQHJDWLYHPRGLILFDWLRQLQIRUPDWLRQ -RXUQDORI&RPSXWHU3URFHVVLQJRI/DQJXDJHV  
LQWKHIUDPHHOHPHQWWDJJLQJUHVXOWV >@ .DQJ'3DUN<5HYLHZEDVHGPHDVXUHPHQWRIFXVWRPHUVDWLVIDFWLRQ
LQ PRELOH VHUYLFH 6HQWLPHQW DQDO\VLV DQG 9,.25 DSSURDFK>-@ ([SHUW
7KHQIRUWKHZRUGVZLWKWKHVHQWLPHQWYDOXHRI WKDWLV 6\VWHPVZLWK$SSOLFDWLRQ SW 
WKH FHQWUDO ZRUG  WKH VHQWLPHQW WHQGHQF\ YDOXHV RI ZRUGVDUH >@ 3XMDUL&KHWDQD$LVZDU\D6KHWW\1LVKD3&RPSDULVRQRIFODVVLILFDWLRQ
GHWHUPLQHGDFFRUGLQJWRWKHVXEMHFWLQIRUPDWLRQ)RUH[DPSOH WHFKQLTXHVIRU )HDWXUH 2ULHQWHG6HQWLPHQW$QDO\VLV RI 3URGXFW 5HYLHZ
WKH FHQWUDO ZRUG KLJK ZDV  LQ WKH UHYLHZ VHQWHQFH KLJK 'DWD>-@
FRVWZKLOHLWZDVLQWKHKLJKWHFKQRORJ\)RUWKHZRUGV >@ </LX;<X;+XDQJDQG$$Q%ORJ'DWD0LQLQJ7KH3UHGLFWLYH
3RZHU RI 6HQWLPHQWV 'DWD 0LQLQJ IRU %XVLQHVV $SSOLFDWLRQV SS 
ZKRVHVHQWLPHQWWHQGHQF\YDOXHLVQRWHTXDOWRLIWKHUHLV 6SULQJHU
DQ DGYHUE RI GHJUHH LW VKRXOG EH DGMXVWHG DFFRUGLQJ WR WKH >@ <XVVXSRYD 1  %R\NR 0  %RJGDQRYD '  HW DO $ 'HFLVLRQ 6XSSRUW
DGYHUE DGMXVWPHQW DPRXQW LI WKHUH LV D QHJDWLYH PRGLILFDWLRQ $SSURDFKEDVHGRQ6HQWLPHQW$QDO\VLV&RPELQHGZLWK'DWD0LQLQJIRU
LQWKHUHYLHZVHQWHQFHWKHGHJUHHYDOXHVKRXOGEHDVVLJQHGDV &XVWRPHU6DWLVIDFWLRQ5HVHDUFK>-@
WKHRULJLQDOYDOXH >@ .DQJ'3DUN<5HYLHZEDVHGPHDVXUHPHQWRIFXVWRPHUVDWLVIDFWLRQ
LQ PRELOH VHUYLFH 6HQWLPHQW DQDO\VLV DQG 9,.25 DSSURDFK>-@ ([SHUW
6\VWHPVZLWK$SSOLFDWLRQV  ±
9,,&21&/86,216$1')8785(:25.6
>@ 3KLOLS - 6WRQH *HQHUDO ,QTXLUHU &RPSXWHU $SSURDFK WR &RQWHQW
7KLV SDSHU VWXGLHV WKH VHQWLPHQW DQDO\VLV RI RQOLQH $QDO\VLV>-@LQIRUPDWLRQVWRUDJH UHWULHYDO  
VKRSSLQJUHYLHZVEDVHGRQIUDPHVHPDQWLFVZKLFKUHILQHVWKH >@ :LOVRQ7:LHEH-+RIIPDQQ3HWDO5HFRJQL]LQJ&RQWH[WXDO3RODULW\
H[SUHVVLRQ RI VHQWLPHQW LQIRUPDWLRQ LQWR VHQWHQFHV DQG LQ 3KUDVH/HYHO 6HQWLPHQW $QDO\VLV>&@ HPSLULFDO PHWKRGV LQ QDWXUDO
SURYLGHV DQ HIIHFWLYH FODVVLILFDWLRQ V\VWHP RI VHQWLPHQW ODQJXDJHSURFHVVLQJ
VHPDQWLFVDQGVHPDQWLFWDJJLQJWHFKQRORJ\2QWKHRQH KDQG >@ +X 0  /LX %  0LQLQJ DQG VXPPDUL]LQJ FXVWRPHU UHYLHZV>&@
3URFHHGLQJV RI WKH 7HQWK $&0 6,*.'' ,QWHUQDWLRQDO &RQIHUHQFH RQ
WKLV SDSHU HVWDEOLVKHV D IUDPHZRUN VHPDQWLF FODVVLILFDWLRQ .QRZOHGJH 'LVFRYHU\ DQG 'DWD 0LQLQJ 6HDWWOH :DVKLQJWRQ 86$
GLFWLRQDU\ LQ HFRPPHUFH LQGXVWU\ ZKLFK SURYLGHV SUDFWLFDO $XJXVW$&0
DQG XVHIXO GLFWLRQDU\ NQRZOHGJH UHVRXUFHV IRU VHPDQWLF >@ -: 3HQQHEDNHU 5- %RRWK DQG 0( )UDQFLV /LQJXLVWLF ,QTXLU\ DQG
DQDO\VLVRIRQOLQHVKRSSLQJUHYLHZV2QWKHRWKHUKDQGEDVHG :RUG&RXQW/,:&RSHUDWRU¶VPDQXDO
RQ WKH IUDPHZRUN VHPDQWLF FODVVLILFDWLRQ GLFWLRQDU\ DQG WKH >@ %DFFLDQHOOD6(VXOL$6HEDVWLDQL)6HQWL:RUG1HW$Q(QKDQFHG
GHSHQGHQF\V\QWDFWLFUXOHVWKHIUDPHZRUNVHPDQWLFWDJJLQJRI /H[LFDO 5HVRXUFH IRU 6HQWLPHQW $QDO\VLV DQG 2SLQLRQ 0LQLQJ>&@
RQOLQH VKRSSLQJ UHYLHZV LV FDUULHG RXW 7KH UHVHDUFK RI WKLV 3URFHHGLQJVRIWKH,QWHUQDWLRQDO&RQIHUHQFHRQ/DQJXDJH5HVRXUFHVDQG
(YDOXDWLRQ/5(&0D\9DOOHWWD0DOWD'%/3
SDSHU LV D EHQHILFLDO H[SORUDWLRQ IRU WKH GHYHORSPHQW RI
>@ 

154

Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on September 26,2020 at 13:40:23 UTC from IEEE Xplore. Restrictions apply.

You might also like