Quries in ORACLE 9i Text

Oracle Text Queries Nina Bergli
Draft 30.10.03 / 1
Queries in Oracle 9i Text

Draft
Nina Bergli
Oct. 30, 2003
The context index in Oracle Text supports a wide range of query operators with the
CONTAINS operator that is defined through the index. CONTAINS must be used when the
index is of type CONTEXT. Contains queries can be used in combinations with all of the
operators discussed below.
ABOUT
An ABOUT query is a query on a document theme, This implies that text documents can be
queried on their theme and not just the words it contain. The ABOUT operator is applied in
order to make Oracle retrieve documents that contain concepts that are related to a specific
query word or query phrase. Concepts in the knowledge-base are both concrete, for example,
issuing a query for whales might return documents about orcas or other specific instances of
the species, and abstract, for example documents about lions and happiness. The document
need not contain the term whale to be retrieved. It follows that the term or phrase specified in
the query need not exactly match the themes stored in the index. Oracle normalizes the word
or phrase before performing lookup in the index. The normalization process of a query is
based on the knowledge base supplied by Oracle, and which is case-sensitive. Thus, ABOUT
queries yield the best results if queries are formulated in proper case. This must not be
interpreted as a definite requirement since Oracle will look for other case combinations if no
results are found for the case as in the query. For example, a query for Blue Whale might
return documents about blue whales
Themes are extracted from documents during indexing and enabled by a hierarchical list of
categories and concepts that represents a view of the world. This hierarchical list is called a
knowledge-base or more commonly a thesaurus. Oracle supplies a knowledge-base/thesaurus
for English and French. This knowledge base is enabled through the theme component of a
CONTEXT index. The system is then able to look up concept information in the index.
The knowledge-base supplied is basically a hierarchy of, terms within different categories for
example science and business. Since the ABOUT operator uses the supplied knowledge base
in English and French to interpret the phrase that is entered, the ABOUT query is limited to
knowing and interpreting the concepts in the knowledge base. By adding a domain specific
knowledge base, the results of an ABOUT query can be improved.
The about query increases the number of relevant documents retrieved with respect to a
specific query. Oracle scores results for an ABOUT query with the most relevant documents
retrieving the highest score.
An ABOUT query returns a score, which is a relevance measure. Thus, high scores (scores
that are close to or equal 1) represent documents that are likely to be highly relevant, while
Draft 30.10.03 / 2
low scores represent documents not that likely to be relevant. A threshold for score values to
be returned from queries can be defined in the index.1
ACCUMULATE (,)
The ACCUMULATE operator is used to search for documents that contain at least one
occurrence of any of the query terms. The ACCUMULATE operator ranks documents
according to the total term weight of a document. Term weight refers to the weight that is
assigned to a query term. A query where the query terms are not explicitly assigned weights
will automatically be assigned the weight 1 for each term.
Accumulate scoring guarantees that if a document A matches p terms with a total term weight
of m, and document B matches q terms with a total term weight of m+1, document B is
guaranteed to have a higher relevance score than document A, regardless of the numbers p
and q.
If two documents have the same weight M, the higher relevance score goes to the document
with the higher weighted average term score.
Other operators
- And (&): The score returned is the score of the lowest query term.
- Equivalence (=): Specifies an acceptable substitution for a word in a query. The
following example returns all documents that contain either the phrase “bluewhales
are social animals” or “minkewhales are social animals”: ‘bluewhales=minkewhales
are social animals’.
- Minus (-): The operator is used to search for documents that contain one query term
and the presence of another term should cause the document to be ranked lower.
- Near (;): The near operator returns a score based on the proximity of two or more
query terms. Oracle returns higher scores for terms closer together and lower scores
for terms further apart in a document. The scoring for the NEAR operator combines
frequency of the terms with proximity of the terms. A score between 1 and 100 is
generated for each document that satisfies the query condition. Near can not be used in
ABOUT queries. The operator takes the following arguments:
o Word 1-n: The terms in the query can be either single words or phrases and
must be separated by commas.
o Max_span: Can be used to specify the maximum distance allowed between
two query terms. Generally, it is used to specify the size of the biggest clump.
Clump is defined as the smallest group of words in which all query terms
occur. All clumps begin and end with a query term.
o Order: Can be specified to true if the documents retrieved should contain the
query terms in the same order as in the query formulation. Max_span must be
specified when using the order argument.
- Not ( ): The Not operator is used to search for documents containing one query term
but not another.
1
The Threshold operator is discussed further in the section”Other operators”.
Draft 30.10.03 / 3
- Or (|): The Or operator is used to search for documents that contains at least one
occurrence of one of the query terms.
- Soundex (!): This operator is used to expand queries with words that have similar
sounds, that is, words that sound like each other. This function is only available in
English.
- Stem ($): The stem operator is used to search for terms that have the same linguistic
root as the query term.
- Fuzzy: The operator is used to expand queries to include words that are spelled
similarly to the specified query term. This type of expansion is helpful for finding
more accurate results when there are frequent misspellings in the document set. The
syntax enables ranking of the result set so that documents that contain words with high
similarity to the query word are scored higher than documents with lower similarity.
The number of words generated by a fuzzy expansion depends on what is in the index.
Thus, results can vary significantly.
The parameters for fuzzy operator are
o Query term
o Score: specifies a similarity score between 1 and 80. Default is 60.
o Numresults: Specifies the maximum number of terms to use. Valid range is
from 1 – 5000. Default is 100.
o Weight: Specifies whether or not the results are to be weighted according to
their similarity scores.
- Broader term operators (BTO): The BTO’s are used to expand a query to include
terms that has been defined in a thesaurus as the broader or higher-level term for a
specific query term. Expansion of a broader term is also possible until the top of the
thesaurus hierarchy is reached. A level argument can be optionally specified for a
BTO query. The level then specifies how many levels of the thesaurus hierarchy that is
to be traversed. The default level value is one. If the name of the thesaurus to be used
is specified, the levels to be traversed must also be specified. The default value for the
thesaurus argument is DEFAULT. A thesaurus called DEFAULT must therefore exist
in the thesaurus tables.
o BT: Expands a query to a broader term as defined in the thesaurus.
o BTG: Expands a query to include all broader generic terms as defined in the
thesaurus.
o BTP: Expands a query to include all terms defined as broader partitive terms
for the specific term in the thesaurus.
o BTI: Expands a query to include all terms defined as broader instance terms for
the specific term in the thesaurus.
- Narrower term operators (NTO): The use of NTOs are equal to the use of BTOs but
it works opposite. The NTOs can as the BTOs be further divided into:
o NT
o NTG
o NTP
o NTI
All these operators equals the operators under BTO so they will not be commented
further.
- Preferred term (PT): Used to replace a query term with the preferred term as defined
in the thesaurus for the query term.
Draft 30.10.03 / 4
- Related term (RT): Expands a query to include all related terms that have been
defined in the thesaurus.
- Synonym (SYN): Expands a query to include all terms that have been defined as
synonyms for the query term in the thesaurus.
- Translation term (TR): Expands a query to include all defined foreign language
terms. The following are arguments to the operator:
o Term
o Language
o Thesaurus
- Translation Term Synonym (TRSYN): Combination of TR and SYN.
- Top Term (TT): Replaces a term in the query with the top term that has been defined
for the specific term in the standard hierarchy (BT, NT) in the thesaurus. Top terms in
the generic (BTG, NTG), partitive (BTP, NTP), and instancr (BTI, NTI) hierarchies
are not returned.
- Threshold (>): The operator can be used at two levels, the expression and the query
term level. At the expression level documents that score below the defined threshold
are eliminated. In the case of the query term level, the threshold operator selects a
document based on how a term scores in the document.
- Weight (*): This operator is used to mark the more important terms in a query given
that the query has at least two terms. Terms that are considered to be negative for the
overall result can be given weights below 1 so that documents containing this term
obtain a reduced score. Weight multiplies the score by the supplied factor and the
maximum sum that can be achieved is 100, thus implying that documents that score
above 100 will be set to a score of 100. The weight operator is useful in accumulate,
OR and AND queries when the expression has more than one query term. When there
is no weighting on individual terms, the score will not tell which of the terms that
appears more often in the document.
- Wildcards (%, _): Can be used to expand a search to a pattern search. The wildcard
characters are % and _. % specifies that any characters can appear in multiple
positions represented by the wildcard. _ specifies a single position in which any
character can occur. Wildcard query performance is improved by adding a substring
and a prefix index.
Draft 30.10.03 / 5
Example queries in Oracle Text
The text table text_tab defined in Appendix A is used in order to illustrate the query operators
in Oracle Text. The definition is as follows:
create type keyword_nt as object(

keyword varchar2(20));
create or replace type keyword_tab as table of keyword_nt;
create or replace type text_type as object

(
identifier number,
dummy char(1), --dummy column for indexing
title varchar2(200),
topic varchar2(200),
format varchar2(10),
published_date date,
media_type varchar2(200),
description varchar2(4000),
source varchar2(200),
keyword_list keyword_tab,
language varchar2(100),
text clob,
static procedure addKeyword(k varchar2, i in number));
create table text_tab of text_type

(
primary key (identifier)
)
nested table keyword_list store as text_ntab(
(primary key (nested_table_id, keyword))
organization index compress)
return as locator;
Draft 30.10.03 / 6
Operator Example query Result

About ‘about(Whales)’ -- Documents about whales ok
‘about(whales and reproduction)’ – Documents about whales
and reproduction
In terms of scoring, documents that contain both themes will
score higher than documents that have only one of the themes in
the query.
The ABOUT operator can be combined with other operators for

example AND or NOT.
about (minke whales) AND reproduction
about(blue whales) not about (reproduction)
Select score (1),identifier,title

From text_tab
Where CONTAINS(text,’about(whales)’,1)>0
Order by score(1) desc;
select identifier,title,score(1)
from text_tab
where contains ( text, 'about(whales)',1 ) > 0;
Accumulate SELECT SCORE(1), title,source SCORE(1)
FROM text_tab TITLE
WHERE CONTAINS(text, ' ---------------
bluewhale,orcas,beluga,killer-whale', 1) > 0 -------
ORDER BY SCORE(1) DESC; 28 Minke
whales
1
1Communic
ation -Music
to swim by
And SELECT SCORE(1), title,description no rows
FROM text_tab selected
WHERE CONTAINS(text, ' bluewhale AND killer-whale',
1) >0
Equivalence SELECT SCORE(1), title from text_tab no rows

WHERE CONTAINS(text, ' bluewhales=minkewhales selected
are social animals ', 1) > 0
ORDER BY SCORE(1) DESC;
Minus SELECT SCORE(1), title,description ok

FROM text_tab
WHERE CONTAINS(text, ' bluewhale - killer-whale', 1)
>0
Near SELECT SCORE(1), title, published_date from text_tab ok

WHERE CONTAINS(text, 'whales;reproduction;death',
1) > 0
Draft 30.10.03 / 7
AND published_date >= ('01-01-2000')

Not SELECT SCORE(1), title,description,source from text_tab ok

WHERE CONTAINS(text, ' whales~minkewhales', 1) >
0
Or SELECT SCORE(1), title,description,source from text_tab ok

WHERE CONTAINS(text, ' bluewhales OR
minkewhales', 1>0
Soundex select title from text_tab where contains ( text, '!whale',1 ) > 0 ok
Stem select identifier,title,score(1) from text_tab where contains ( ok
text, '$moving',1 ) > 0;
Fuzzy SELECT identifier, title ok
FROM text_tab
WHERE CONTAINS(TEXT, ‘fuzzy(whale, 70, 6, weight)’ ,
1)>0;
This query expands to the first 6 variations of whale in the index
that have a similarity score over 70. Documents in the result set
are also weighted so that documents that have the more similar
words compared to whale are ranked highest.
Using the appropriate number of commas can skip unnecessary

parameters. For example:
SELECT identifier, title
FROM text_tab
WHERE CONTAINS(TEXT, ‘fuzzy(whale, , , weight)’ ,
1)>0;
In this case, documents that score above 60 are retrieved, since
60 is the default value for score. The parameter numresults will
have the value 100.
select * from fuzzy_tbl where contains (text,'?hales')>0;
Treshold select identifier,title,score(1) ok

from text_tab
where contains ( text, '(pilot-whale > 4) and communication',1 )
>0
order by score(1) desc;
Weight select identifier,title,score(1) ok
from text_tab
where contains ( text, 'pilot-whale or communication*3',1 ) > 0
Wildcards select identifier,title,score(1) ok
from text_tab
where contains ( text, 'pilot% or communication*3',1 ) > 0
Draft 30.10.03 / 8
Comments about thesaurus queries in Oracle

Oracle supplies a test thesaurus in English. The following path locates the thesaurus:
C:\oracle\ora92\ctx\sample\thes. The thesaurus is by default called ‘drOthsus’. If the above
queries were to be defined without specifying the name of the thesaurus, Oracle would not
find it unless it is called ‘default’. Before using the thesaurus in queries, it has to be loaded.
The CTXLOAD procedure is described in McGregor (2002).
The examples are syntactically correct, but some of them do not provide semantic meaning.
This is because they are only intended as examples at the present point of time.
BT ‘BT(Bluewhale)’
‘BT(Bluewhale, 2, drOthsus)’
When homographs are under
consideration a qualifier can
be specified in order to avoid
misconception.
BT(crane{(waterfowl)}) --
crane or bird
select text from text_tab

where contains ( text,
'BT ( orcas, 1, dr0thsus )' ) >
0;
BTG select text from text_tab

'BTG ( Whale, 1, dr0thsus)' )
> 0;
BTP select text from text_tab

'BTP ( Pilot-Whale, 1,
dr0thsus )' ) > 0;
BTI select text from text_tab

'BTI ( Orcas, 1, dr0thsus )' )
> 0;
NT select text from text_tab

'NT ( Whale, 1, dr0thsus )' )
> 0;
NTG select text from text_tab

Draft 30.10.03 / 9
'NTG ( Cats, 1, dr0thsus)' ) >

0;
NTP select text from text_tab

'NTP ( Animal, 1, dr0thsus )'
) > 0;
NTI select text from text_tab

'NTI ( Animal, 1, dr0thsus )'
) > 0;
PT select text from text_tab

'PT ( Africa, dr0thsus )' ) >
0;
RT select text from text_tab

'Wolf ( Term, dr0thsus )' ) >
0;
SYN select text from text_tab

'SYN ( African lion,
dr0thsus )' ) > 0;
TR select text from text_tab

'TR ( Whale,baleine ,
dr0thsus )' ) > 0;
-- Term
--
Baleine_French_For_Whale
TRSYN select text from text_tab

'TRSYN ( Killer-Whale, all,
dr0thsus )' ) > 0;
Draft 30.10.03 / 10
Combining structural predicates with the CONTAINS operator:
SELECT SCORE(1), title, published_date from text_tab

WHERE CONTAINS(text, 'lions', 1) > 0
AND published_date >= ('01-01-2000')
REFERENCES
McGregor (2002). Oracle Text - Reference, Oracle. 2002. available at:
http://www.csis.gvsu.edu/GeneralInfo/Oracle/text.920/

Quries in ORACLE 9i Text

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quries in ORACLE 9i Text

Uploaded by

Copyright:

Available Formats

Oracle Text Queries Nina Bergli

Queries in Oracle 9i Text

Example queries in Oracle Text

create type keyword_nt as object(

create or replace type keyword_tab as table of keyword_nt;

create or replace type text_type as object

static procedure addKeyword(k varchar2, i in number));

create table text_tab of text_type

Operator Example query Result

The ABOUT operator can be combined with other operators for

Select score (1),identifier,title

Equivalence SELECT SCORE(1), title from text_tab no rows

Minus SELECT SCORE(1), title,description ok

Near SELECT SCORE(1), title, published_date from text_tab ok

AND published_date >= ('01-01-2000')

Not SELECT SCORE(1), title,description,source from text_tab ok

Or SELECT SCORE(1), title,description,source from text_tab ok

Using the appropriate number of commas can skip unnecessary

select * from fuzzy_tbl where contains (text,'?hales')>0;

Treshold select identifier,title,score(1) ok

Comments about thesaurus queries in Oracle

select text from text_tab

BTG select text from text_tab

BTP select text from text_tab

BTI select text from text_tab

NT select text from text_tab

NTG select text from text_tab

'NTG ( Cats, 1, dr0thsus)' ) >

NTP select text from text_tab

NTI select text from text_tab

PT select text from text_tab

RT select text from text_tab

SYN select text from text_tab

TR select text from text_tab

TRSYN select text from text_tab

Combining structural predicates with the CONTAINS operator:

SELECT SCORE(1), title, published_date from text_tab

You might also like