You are on page 1of 6

1 2 3

hadi_abdighavidel@mehr.sharif.ir
2
3
4

bahram@sharif.ir

bahrani@sharif.ir

mehdi_moradi@mehr.sharif.ir

- .

.
.

.
.
. .

F 0/65 .

.1

1960
90

.

.


.
.


.


:
.1
) (.
.2

.
.3
.






.

.

.2


Yang . Liu 1999
].[1
Joachims 1998
] Bellegarda .[2 2000
) (LSA
] Wood .[3 Gedeon 2001
] .[4
Torkolla
] Blei .[5 2003
) (LDA
] .[6

.2,1


.
.


...

1 .


.

2005 Guandong
) (PLSA
].[7

.

.

] .[8

.

.

) (LSA . LSA
.


] .[9

] .[10

].[11




.


.

. NSWS

.2,2

)(1

] [12
.
10
. 60

.
10

.
8486

.

v.wi
| | v || wi

t arg min cos 1


wi

v wi i

.
.2,4 :

.
.

[14] tf-idf tf .
idf
.
tf-idf i
:

] [13 .
.

.

)(2

n
M
tf _ idf i tf i idf i i . log
ni df i
i

) ni (2 i M
dfi i
. tf-idf
.
tf-idf . tf-idf
tf

.
tf
. ""
.
idf .
.
"" !

idf


.

.

.2,3


)
( .

.

.


.


. :
.
:

- 2


.

.


.



...


...

...

....4 3 2 1


...


...

...


...

...

...


...


...

...


...

225

200

25

1224

1117

107

1287

1237

50

1006

958

48

469

422

47

349

314

35

714

639

75

2427

2210

217

273

259

14

460

414

46

8486

7840

646


.

] [15
.


] [16

.

...

.3
8486

. 10 .
2
.
.
.
4

.
tf-idf
.


.
.
1
.


.

.
- 3

0/33

0/76

0/46

-1


.
:
a

)(3
ab
a
)(4

ac
a
b
c

.
3
.
. ) F
( .
5

0/84

0/88

0/86

0/57

0/42

0/48

0/76

0/45

0/56

0/50

0/72

0/59

0/57

0/75

0/64

0/88

0/91

0/89

0/46

0/89

0/61

0/40

0/50

0/44

0/90

0/94

.4

.1385 161-151
" [ 9]
"

.1385 201-190


tf-idf

.

.


.
.


.

[10] T. Pilehvar, H. Faili, M. Soltani, "Classification of


Persian textual documents using Learning Vector
Quantization", 4th IEEE Conference on Knowledge
Engineering and Natural Language Processing, NLPKE, 2009.
[11] M. Farhoodi, A., Yari, M. Mahmoudi., "A Persian
Web Page Classifi er Applying a Combination of
Content-Based andContext-Based Features",
International Journal of Information Studies, Vol. 1, No.
4, 2009.
[12] Bijankhan, M. & J. Seikhzadeghan & M. Bahrani &
M. Ghayoomi, "Lessons from Creation of a Persian
Written Corpus: Peykare". Language Resources and
Evaluation Journal. Vol. 45, No. 2. 143-164, 2011.
[ ]

" "
.1383 11-6

[14] G. Salton and C. Buckley, "Term-weighting


approaches in automatic text retrieval", Information
Processing and Management, 24(5):513523, 1988.

[1] Y. Yang and X. Liu, "A Re-examination of Text


Categorization Methods", Proceedings of the 22nd
annual international ACM SIGIR conference on research
and development in information retrieval, pp. 42-49,
1999.
[2] T. Joachims, "Text Categorization with Support
Vector Machines: Learning with Many Relevant Features
in Machine Learning", 10th European Conference on
Machine Learning, pp. 137-142, 1998.
[3] J.R. Bellegarda, "Exploiting Latent Semantic
Information in Statistical Language Modeling",
Proceedings of IEEE, Vol. 88, No. 8, pp. 1279-1296,
2000.
[4] S.A. Wood and T.D. Gedeon, "A Hybrid Neural
Network for Automated Classification" Proceedings of
the 6th Australasian Document Computing Symposium,
2001.
[5] K. Torkolla, "Linear Discriminant Analysis in
Document Classification", IEEE ICDM workshop on text
mining, 2001.
[6] D., Blei, A. Ng, M. Jordan, Latent Dirichlet
Allocation, Journal of Machine Learning Research, Vol.
3, pp. 993-1022, 2003.
[7] X. Guandong, Y. Zhang, Z. Zhou, "Using
Probabilistic Latent Semantic Analysis for Web Page
Grouping", Proceedings of Research Issues in Data
Engineering: Stream Data Mining and Applications, pp.
29-36, 2005.

[ 15]
( )
.1389
[ 16]
.1371

" [ 8]
"

You might also like