You are on page 1of 1

one of the purposes of this expression is a negation

for example
[^A-Z] means not an uppercase letter
The second is to identify the words that don't start with the given letter
^[^A-Z] all those words which don't start with capital alphabets

* means zero or more previous character


o*h will select h,oh, ooh
+ means one more previous character
o+h will select oh,ooh, ooh but net only h

The process we just went through was based on fixing two kinds of errors.
Matching strings that we should not have matched (there, then, other)
False positives (Type I)
Not matching things that we should have matched (The)
False Negatives (Type II)

Ambiguity means when the meaning of a sentence, word or phrase is uncertain, there
could be more th n one meaning of the same sentence. The best way to avoid the
ambuigity is to write cer certain simple sentences.

for example
I have never tasted a cake quite like that one before!
Ambiguity: Was the cake good or bad?
so to avoid ambuigty the sentence should like
I have never eaten such good cake.

Lemmatization: have to find correct dictionary headword form the converting words
having same lemma into single dictionary word
for example
converting car, cars, car's, cars' into car
am. are, is, into be
the boy's cars are different colours into the boy car be different color

Stemming is crude chopping of affixes


Reduce terms to their stems in information retrieval
language dependent
e.g automote(s), automatic, automation all reduced to automat.

You might also like