This action might not be possible to undo. Are you sure you want to continue?

7, October 2010

**Using Fuzzy Support Vector Machine in Text Categorization Base on Reduced Matrices
**

1

**Vu Thanh Nguyen1
**

University of Information Technology HoChiMinh City, VietNam email: nguyenvt@uit.edu.vn

Abstract - In this article, the authors present result compare from using Fuzzy Support Vector Machine (FSVM) and Fuzzy Support Vector Machine which combined Latin Semantic Indexing and Random Indexing on reduced matrices (FSVM_LSI_RI). Our results show that FSVM_LSI_RI provide better results on Precision and Recall than FSVM. In this experiment a corpus comprising 3299 documents and from the Reuters-21578 corpus was used. Keyword – SVM, FSVM, LSI, RI

classifier. The categorization results are compared to those reached using standard BoW representations by Vector Space Model (VSM), and the authors also demonstrate how the performance of the FSVM can be improved by combining representations. II. VECTOR SPACE MODEL (VSM) ([14]). 1. Data Structuring In Vector space model, documents are represented as vectors in t-dimensional space, where t is the number of indexed terms in the collection. Function to evaluate terms weight: wij = lij * gi * nj -lij denotes the local weight of term i in document j. - gi is the global weight of term i in the document collection - nj is the normalization factor for document j. lij = log ( 1 + fij )

I. INTRODUCTION Text categorization is the task of assigning a text to one or more of a set of predefined categories. As with most other natural language processing applications, representational factors are decisive for the performance of the categorization. The incomparably most common representational scheme in text categorization is the Bag-of-Words (BoW) approach, in which a text is represented as a vector t of word weights, such that ti = (w1...wn) where wn are the weights of the words in the text. The BoW representation ignores all semantic or conceptual information; it simply looks at the surface word forms. BoW modern is based on three models: Boolean model, Vector Space model, Probability model. There have been attempts at deriving more sophisticated representations for text categorization, including the use of n-grams or phrases (Lewis, 1992; Dumais et al., 1998), or augmenting the standard BoW approach with synonym clusters or latent dimensions (Baker and Mc- Callum, 1998; Cai and Hofmann, 2003). However, none of the more elaborate representations manage to significantly outperform the standard BoW approach (Sebastiani, 2002). In addition to this, they are typically more expensive to compute. In order to do this, the authors introduce a new method for producing concept-based representations for natural language data. This method is a combination of Random indexing(RI) and Latin Semantic Indexing (LSI), computation time for Singular Value Decomposition on a RI reduced matrix is almost halved compared to LSI. The authors use this method to create concept-based representations for a standard text categorization problem, and the representations as input to a FSVM

Where: - fij is the frequency of token i in document j.

is the probability of token i occurring in document j. 2. Term document matrix In VSM is implemented by forming term-document matrix. Term- document matrix is m×n matrix where m is number of terms and n is number of documents.

⎛ d11 ⎜ ⎜ d12 ⎜ • A=⎜ ⎜ • ⎜ • ⎜ ⎜d ⎝ m1 d 21

d 22

• • • d m2

• • • d1n ⎞ ⎟ • • • d 2n ⎟ • • • • ⎟ ⎟ • • • • ⎟ • • • • ⎟ ⎟ • • • d mn ⎟ ⎠

where: - term: row of term-document matrix. - document: column of term-document matrix.

139

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 7, October 2010

- dij: is the weight associated with token i in document j. III.LATENT SEMANTIC INDEXING (LSI) ([1][4]) The vector space model is presented in section 2 suffers from the curse of dimensionality. In other words, as the problem of sizes increase may become more complex, the processing time required to construct a vector space and query throughout the document space will increase as well. In addition, the vector space model exclusively measures term cooccurrence—that is, the inner product between two documents is nonzero if and only if there exist at least one shared term between them. Latent Semantic Indexing (LSI) is used to overcome the problems of synoymy and polysemy 1. Singular Value Decomposition (SVD) ([5]-[9]) LSI is based on a mathematical technique called Singular Value Decomposition (SVD). The SVD is used to process decomposes a term-by-document matrix A into three matrices: a term-by-dimension matrix, U, a singular-value matrix, ∑, and a document-by-dimension matrix, VT. The purpose of analysis the SVD is to detect semantic relationships in the documents collection. This decomposition is performed as following:

• SVD is computationally expensive. • Initial ”huge matrix step” • Linguistically agnostic. IV. RANDOM INDEXING (RI) ([6],[10]) Random Indexing is an incremental vector space model that is computationally less demanding (Karlgren and Sahlgren, 2001). The Random Indexing model reduces dimensionality by, instead of giving each word a whole dimension, it gives them a random vector by a much lesser dimensionality than the total number of words in the text. Random Indexing differs from the basic vector space model in that it doesn’t give each word an orthogonal unit vector. Instead each word is given a vector of length 1 in a random direction. The dimension of this randomized vector will be chosen to be smaller than the amount of words in the document, with the end result that not all words will be orthogonal to each other since the rank of the matrix won’t be high enough. This can be formulated as AT = A˜ where A is the original matrix representation of the d × w word document matrix as in the basic vector space model, T is the random vectors as a w×k matrix representing the mapping between each word wi and the k-dimensional random vectors, A˜ is A projected down into d × k dimensions. A query is then matched by first multiplying the query vector with T, and then finds the column in A˜ that gave the best match. T is constructed by, for each column in T, each corresponding to a row in A, electing n different rows. n/2 of these are assigned the value 1/!(n), and the rest are assigned −1/!(n). This ensures unit length, and that the vectors are distributed evenly in the unit sphere of dimension k (Sahlgren, 2005). An even distribution will ensure that every pair of vectors has a high probability to be orthogonal. Information is lost during this process (pigeonhole principle, the fact that the rank of the reduced matrix is lower). However, if used on a matrix with very few nonzero elements, the induced error will decrease as the likelihood of a conflict in each document, and between documents, will decrease. Using Random Indexing on a matrix will introduce a certain error to the results. These errors will be introduced by words that match with other words, i.e. the scalar product between the corresponding vectors will be ≠ 0. In the matrix this will show either that false positive matches are created for every word that have a nonzero scalar product of any vector in the vector room of the matrix. False negatives can also be created by words that have corresponding vectors that cancel each other out. Advantages of Random Indexing • Based on Pentti Kanerva's theories on Sparse Distributed Memory. • Uses distributed representations to accumulate context vectors. • Incremental method that avoids the ”huge matrix step”.

A = UΣV T

Where: -U orthogonal m×m matrix whose columns are left singular vectors of A - Σ diagonal matrix on whose diagonal are singular values of matrix A in descending order - V orthogonal n×n matrix whose columns are right singular vectors of A. To generate a rank-k approximation Ak of A where k << r, each matrix factor is truncated to its first k columns. That is, Ak is computed as:

Ak = U k Σ k VkT

Where: - Uk is m×k matrix whose columns are first k left singular vectors of A - Σk is k×k diagonal matrix whose diagonal is formed by k leading singular values of A - Vk is n×k matrix whose columns are first k right singular vectors of A In LSI, Ak is approximation of A is created and that is very important: detected a combination of literatures between terms used in the documents, excluding the change in usage term bad influence to the method to search for the index [6], [7], [8]. Because use of k-dimensional LSI (k<<r) the difference is not important in the "means" is removed. Keywords often appear together in the document is nearly the same performance space in kdimensional LSI, even the index does not appear simultaneously in the same document. 2. Drawback of the LSI model • SVD often treated as a ”magical” process.

140

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 7, October 2010

V. COMBINING RI AND LSI We have seen the advantages and disadvantages for both RI and LSI: RI is efficient in terms of computational time but does not preserve as much information as LSI; LSI, on the other hand, is computationally expensive, but produces highly accurate results, in addition to capturing the underlying semantics of the documents. As mentioned earlier, a hybrid algorithm was proposed that combines the two approaches to benefit from the advantages of both algorithms. The algorithm works as follows: • First the data is pre-processed with RI to a lower dimension k1. • Then LSI is applied on the reduced, lowerdimensional data, to further reduce the data to the desired dimension, k2. This algorithm supposedly will improve running time for LSI, and accuracy for RI. As mentioned earlier, the time complexity of SVD D is O(cmn) for large, sparse datasets. It is reasonable, then, to assume that a lower dimensionality will result in faster computation time, since it’s dependent of the dimensionality m. VI. TEXT CATEGORIZATION 1. Support Vector Machines Support vector machine is a very specific class of algorithms, characterized by the use of kernels, the absence of local minima, the sparseness of the solution and the capacity control obtained by acting on the margin, or on other “dimension independent” quantities such as the number of support vectors. Let is a training sample set, where xi Rn and {1,-1} .Let φ corresponding binary class labels yi is a non-linear mapping from original date space to a high-dimensional feature space, therefore , we replace sample points x i and x j with their mapping images φ (xi) and φ (x j) respectively. Let the weight and the bias of the separating hyperplane is w and b, respectively. We define a hyperplane which might act as decision surface in feature space, as following.

where ξi is a slack variable introduced to relax the hard margin constraints and the regularization constant C > 0 implements the trade-off between the maximal margin of separation and the classification error. To resolve the optimization problem, we introduce the following Lagrange function.

Where αi>=0, βj>=0 is Lagrange genes. Differentiating L with respect to w, b and ξi, and setting the result to zero. The optimization problem (2) can translate into the following simple dual problem. Maximize:

Subject to

Where (xi, xj) ( φ (xi), φ (xj) ) is a kernel function and satisfies the Mercer theorem. Let α* is the optimal solutions of (4) and corresponding weight and the bias w*, b* , respectively. According to the Karush-Kuhn-Tucker(KKT) conditions, the solution of the optimal problem (4) must satisfy

Where αi* are non zero only for a subset of vector xi called support vectors. Finally, the optimal decision function is

Where 1. Fuzzy Support Vector Machines ([12]-[13]). Consider the aforementioned binary training set S. We choose a proper membership function and receive si which is the fuzzy memberships value of the training point xi . Then, the training set S become fuzzy training set S ‘ where xi Rn and corresponding binary class labels yi {1,-1}, 0<=si<=1. Then, the quadratic programming problem for classification can be described as following: Minimize:

To separate the data linearly in the feature space, the decision function satisfies the following constrain conditions. The optimization problem is Minimize:

Subject to:

141

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 7, October 2010

2

Acquisition Money Grain Crude Average

0.93 0.95 0.79 0.92 0.912

0.965 0.972 0.933 0.961 0.9648

Subject to:

3 4 5

where C > 0 is punishment gene and ξi is a slack variable. The fuzzy membership si is attitude of the corresponding point xi toward one class. It shows that a smaller si can reduce the effect of the parameter ξi in problem (18), so the corresponding point xi can be treated as less important By using the KKT condition and Lagrange Multipliers. We are able to form the following equivalent dual problem Maximize:

Table 1: The experiment results of FSVM and FSVM+LSI+RI classifiers.

Subject to:

VIII. CONCLUSION This article introduces Fuzzy Support Vector Machines for Text Categorization based on reduced matrices use Latin Semantic Indexing combined with Random Indexing.). Our results show that FSVM_LSI_RI provide better results on Precision and Recall than FSVM. Due to time limit, only experiments on the 5 categories. Future direction include how to use this scheme to future direction include how to use this scheme to classify student's idea at University of Information Technology HoChiMinh City. REFERENCES

[1]. April Kontostathis (2007), “Essential Dimensions of latent semantic indexing”, Department of Mathematics and Computer Science Ursinus College, Proceedings of the 40th Hawaii International Conference on System Sciences, 2007. Cherukuri Aswani Kumar, Suripeddi Srinivas (2006) , “Latent Semantic Indexing Using Eigenvalue Analysis for Efficient Information Retrieval”, Int. J. Appl. Math. Comp. Sci., 2006, Vol. 16, No. 4, pp. 551–558. David A.Hull (1994), Information retrieval Using Statistical Classification, Doctor of Philosophy Degree, The University of Stanford. Gabriel Oksa, Martin Becka and Marian Vajtersic (2002),” Parallel SVD Computation in Updating Problems of Latent Semantic Indexing”, Proceeding ALGORITMY 2002 Conference on Scientific Computing, pp. 113 – 120. Katarina Blom, (1999), Information Retrieval Using the Singular Value Decomposition and Krylov Subspace, Department of Mathematics Chalmers University of Technology S-412 Goteborg, Sewden Kevin Erich Heinrich (2007), Automated Gene Classification using Nonnegative Matrix Factorization on Biomedical Literature, Doctor of Philosophy Degree, The University of Tennessee, Knoxville. Miles Efron (2003). Eigenvalue – Based Estimators for Optimal Dimentionality Reduction in Information Retrieval. ProQuest Information and Learning Company. Michael W. Berry, Zlatko Drmac, Elizabeth R. Jessup (1999), “Matrix,Vector Space, and Information Retrieval”, SIAM REVIEW Vol 41, No. 2, pp. 335 – 352. Nordianah Ab Samat, Masrah Azrifah Azmi Murad, Muhamad Taufik Abdullah, Rodziah Atan (2008), “Term Weighting Schemes Experiment Based on SVD for Malay Text Retrieval”, Faculty of Computer Science and Information Technology University Putra Malaysia, IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.10, October 2008. Jussi Karlgren and Magnus Sahlgren. 2001. From words to understanding. In Y. Uesaka, P.Kanerva, and H. Asoh, editors, Foundations of Real-World Intelligence, chapter 26, pages 294–308. Stanford: CSLI Publications. Magnus Rosell, Martin Hassel, Viggo Kann: “Global Evaluation of Random Indexing through SwedishWord Clustering Compared to the People’s Dictionary of Synonyms”, (Rosell et al., 2009).

If αi>0, then the corresponding point xi is support vectors . More, if 0 <αι< siC, then support vectors xi lies round of separating surface; if αi siC, then support vectors xi belongs to error sample. Then, the decision function of the corresponding optimal separating surface becomes

[2].

[3]. [4].

[5].

Where K(xi.x) is kernel function. VII. EXPERIMENT We will investigate the performance of these two techniques, (1) Classifying FSVM on original matrix where Vector Space Model is used, (2) and FSVM on a matrix where Random Indexing is used to reduce the dimensionality of the matrix before singular value decomposition. Performance will be measured as calculation time as well as precision and recall. We have used a subset of the Reuters-21578 text corpus. The subset comprises 3299 that include 5 most frequent categories : earn, acquisition, money, grain, crude.

[6].

[7]. [8]. [9].

[10]. F-score No Classifier FSVM 1 Earn 0.97 FSVM+LSI+RI 0.993 [11].

142

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

[12]. [13].

[14].

Shigeo Abe and Takuya Inoue (2002), “Fuzzy Support Vector Machines for Multiclass Problems”, ESANN’2002 proceedings, pp. 113-118. Shigeo Abe and Takuya Inoue (2001), “Fuzzy Support Vector Machines for Pattern Classification”, In Proceeding of International Joint Conference on Neural Networks (IJCNN ’01), volume 2, pp. 1449-1454. T.Joachims (1998), “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” in Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398, pp. 137– 142.

AUTHORS PROFILE

The author born in 1969 in Da Nang, VietNam. He graduated University of Odessa (USSR), in 1992, specialized in Information Technology. He postgraduated on doctoral thesis in 1996 at the Academy of Science of Russia, specialized in IT. Now he is the Dean of Software Engineering of University of Information Technology, VietNam National University HoChiMinh City. Research: Knowledge Engineering, Information Systems and software Engineering.

143

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

- Journal of Computer Science IJCSIS March 2016 Part II
- Journal of Computer Science IJCSIS March 2016 Part I
- Journal of Computer Science IJCSIS April 2016 Part II
- Journal of Computer Science IJCSIS April 2016 Part I
- Journal of Computer Science IJCSIS February 2016
- Journal of Computer Science IJCSIS Special Issue February 2016
- Journal of Computer Science IJCSIS January 2016
- Journal of Computer Science IJCSIS December 2015
- Journal of Computer Science IJCSIS November 2015
- Journal of Computer Science IJCSIS October 2015
- Journal of Computer Science IJCSIS June 2015
- Journal of Computer Science IJCSIS July 2015
- International Journal of Computer Science IJCSIS September 2015
- Journal of Computer Science IJCSIS August 2015
- Journal of Computer Science IJCSIS April 2015
- Journal of Computer Science IJCSIS March 2015
- Fraudulent Electronic Transaction Detection Using Dynamic KDA Model
- Embedded Mobile Agent (EMA) for Distributed Information Retrieval
- A Survey
- Security Architecture with NAC using Crescent University as Case study
- An Analysis of Various Algorithms For Text Spam Classification and Clustering Using RapidMiner and Weka
- Unweighted Class Specific Soft Voting based ensemble of Extreme Learning Machine and its variant
- An Efficient Model to Automatically Find Index in Databases
- Base Station Radiation’s Optimization using Two Phase Shifting Dipoles
- Low Footprint Hybrid Finite Field Multiplier for Embedded Cryptography

Sign up to vote on this title

UsefulNot usefulIn this article, the authors present result compare from using Fuzzy Support Vector Machine
(FSVM) and Fuzzy Support Vector Machine which combined Latin Semantic Indexing and Random Indexing on red...

In this article, the authors present result compare from using Fuzzy Support Vector Machine

(FSVM) and Fuzzy Support Vector Machine which combined Latin Semantic Indexing and Random Indexing on reduced matrices (FSVM_LSI_RI). Our results show that FSVM_LSI_RI provide better results on Precision and Recall than FSVM. In this experiment a corpus comprising 3299 documents and from the Reuters-21578 corpus was used.

(FSVM) and Fuzzy Support Vector Machine which combined Latin Semantic Indexing and Random Indexing on reduced matrices (FSVM_LSI_RI). Our results show that FSVM_LSI_RI provide better results on Precision and Recall than FSVM. In this experiment a corpus comprising 3299 documents and from the Reuters-21578 corpus was used.

- A Tutorial on Support Vector Machines for Pattern
- An SVM-Based Face Detection System
- Fuzzy Support Vector Machines Based on Fcm Clustering
- Performance Comparison of SVM and K-NN for Oriya Character Recognition
- Face Detection in Images- Neural Networks & Support Vector Machines
- LIBSVM a Library for Support Vector Machines
- Chapter 7 'Support Vector Machines for Pattern Recognition'
- Support Vector Machine in R Paper
- Using Rough Set and Support Vector Machine for Network Intrusion Detection
- A tutorial on support vector regression
- Kernel Methods for Learning Languages
- Support Vector Machines, Neural Networks, And Fuzzy Logic Models (2001)
- New Microsoft Office Word Document
- Precept Powerpoint 06
- ECL4
- Beamer Lecture
- Quiz_01,_MTH-501
- Inverse Theory
- Matrices k
- Egm4313 Exam1 Review Statics
- Linear Systems-Intro
- 221 Practice Midterm 2
- Waldron Linear Algebra
- IE301 03 Review Linear Algebra
- Computational Techniques
- Computational Techniques
- REGLS.FOR
- Lecture on Linear Transformation
- Pre-calculus / math notes (unit 16 of 22)
- Chapter 1 Notes
- Using Fuzzy Support Vector Machine in Text Categorization Base on Reduced Matrices