Professional Documents
Culture Documents
n
n
(S)
where, w
i
is the weight representing an estimate of the importance of i
th
component
of document structure and n
i
is the number of occurrences of the keyphrase k in this
component, with constraints as w
e |u,1] and w
= 1.
5.2 Relevance Evaluation
A keyphrase graph is constituted by keyphrases and relations, so the direction to
measure semantic similarity between graphs is to calculate the similarity between
keyphrases and the similarity between relations used in the graphs.
Let o: K K - |u,1] and [: R
KK
R
KK
- |u,1] be two mappings to measure
semantic similarity between two keyphrases and two relations defined in the
484 V. Do, T.T. Huynh, and T. PhamNguyen
CK_ONTO ontology. 1 represents the equivalence between two objects and 0 corres-
ponds to the lack of any semantic link between them. The values of are selected
manually based on the opinions of experts of the field. Determining manually the
values of is possible because of the small number of relations.
Definition: Let k, k K, a binary relation P on K defined as : P (k,k) iff k = k or
+S = (s
1
, s
2
, , s
n
) a sequence of integers [1, t] ( t = |R
KK
|) such that
k r
s
1
k
1
, k
1
r
s
2
k
2
, , k
n-1
r
s
n
k' with r
i
is a relation of R
KK
(for all x and y in K, x
has a relation r with y if and only if (x, y) r, written as x r y ).
The mapping may be defined by using the sequence used in the relation P as
follows:
o(k, k
i
) = u i notP(k, k
i
)
o(k, k
i
) = Hox|I(k r
s
1
k
1
, k
1
r
s
2
k
2
, , k
n-1
r
s
n
k')| i +S = (s
1
, s
2
, , s
n
) a se-
quence of integers [1, t] (t = |R
KK
|) such that k r
s
1
k
1
, k
1
r
s
2
k
2
, , k
n-1
r
s
n
k
i
. (4)
Mapping V allows considering the various semantic relations used in the sequence, is
defined as follows:
I(k r
s
1
k
1
, k
1
r
s
2
k
2
, , k
n-1
r
s
n
k
i
) = xiJ(k) _:ol
s
i
n
1
(k
-1
, k
)xiJ(k
)
(k
n
k
i
)(S)
, in which, xidf(k) is a function that has the same meaning as idf (k), but returns a
value in the range [0,1], xiJ(k) = iJ(k)log (||) and val
s
i
( k
-1
, k
) is the
weight assigned to relations r
s
i
over pair of keyphrases (k
i-1
, k
i
). This weight is a
measure of semantic similarity between the keyphrases k
i-1
and k
i
that are directly
linked by the relationr
s
i
. The value of val
s
i
(k
-1
, k
(r)) = oJ]
((r)), adj
i
(r) denotes the i
th
vertex adjacent to relation
vertex r.
r e RE, [(r, (r)) = u
k e KE, o(k, g(k)) = u
Below described formula allows valuation of one projection. The purpose of a
searching method is the semantic resemblance calculus between a query and
an document, therefore in valuation formula of the projection from H to G, H is a
query graph and G is a document graph.
Definition: A valuation pattern of a projection = (, g) from a keyphrase graph
H to a keyphrase graph G is defined as follows:
v() =
t(g(k), 0)
keKH
- o(k, g(k)) - ip(g(k), 0) + [(r, (r))
eRH
(|KE| + |RE|)
(6)
Figure 3 shows the indexation of the Document #30 and the projection from the
Query 1 with relevance ratio 46%
Fig. 3. Matching keyphrase graph
There is a partial projection from a keyphrase graph H to a keyphrase graph G iff
there exists a projection from H, a sub keyphrase graph (subKG) of H, to G. A valua-
tion pattern of partial projection v(
putuI
) only depends on vertices of H and is
defined like projection v().
Based on valuation of projections, semantic similarity between two keyphrase
graphs calculus is defined as follows:
486 V. Do, T.T. Huynh, and T. PhamNguyen
Rcl(E, 0) = Hox{v()| is o portiol pro]cction rom E to 0] (7)
Determining if a document is relevant for a user query and estimate this relevance is
done by calculating the semantic similarity between the keyphrase graphs that
represent them.
Conclusion
A solution for the organization of a semantic document repository that supports semantic
representation and processing in search is described. The proposed solution is a model
that shows the integration of components such as an ontology of the relevant domain, a
database, semantic representation for documents and a file system; with semantic
processing and searching techniques. The solution is applied to build a document retriev-
al system with semantic search function. The application has been implemented, tested at
the University of Information Technology Ho Chi Minh City, Vietnam and search results
have been highly appreciated by users. Recall and precision are used to evaluate the ef-
fectiveness of the document retrieval systems. Document repository for testing consists
of 10,000 documents as papers, books, slides, essays, etc., evenly distributed into five
branches of Information Technology such as Computer Science, Information Systems,
Networking and Communication, Computer Engineering, Software Engineering. Ac-
cording to test results on 200 selected queries, the average precision of the system is
87.16% and the average recall is 88, 32%. The research results will be the basis and tools
for building many resource management systems in various different fields.
References
1. Aly, A.A.: Using a query expansion technique to improve document retrieval. International
Journal Information Technologies and Knowledge (2008)
2. Bonino, D., Corno, F., Farinetti, L., Bosca, A.: Ontology Driven Semantic Search. WSEAS
Transaction on Information Science and Application 1(6), 15971605 (2004)
3. Genest, D., Chein, M.: An Experiment in Document Retrieval Using Conceptual Graph. In:
Lukose, D., Delugach, H., Keeler, M., Searle, L., Sowa, J. (eds.) ICCS 1997. LNCS,
vol. 1257, pp. 489504. Springer, Heidelberg (1997)
4. Styltsvig, H.B.: Ontology-based Information Retrieval. A dissertation Presented to the Fa-
culties of Roskilde University in Partial Fulfillment of the Requirement for the Degree of
Doctor of Philosophy (2006)
5. Eriksso, H.: The semantic-document approach to combining documents and ontologies. In-
ternational Journal of Human-Computer Studies 65(7), 624639 (2007)
6. Stokoe, C., Oakes, M.P., Tait, J.: Word sense disambiguation in information retrieval revi-
sited. In: Annual ACM Conference on Research and Development in Information Retrieval
Toronto, Canada (2003)
7. Tran, T., Cimiano, P., Rudolph, S., Studer, R.: Ontology-Based Interpretation of Keywords
for Semantic Search. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon,
L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudr-Mauroux,
P. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 523536. Springer, Heidelberg (2007)
8. Vallez, M., Pedraza-Jimenez, R.: Natural Language Processing in Textual Information Re-
trieval and Related Topics. I.S.S.o.t.P.F. University (2007)