Professional Documents
Culture Documents
Section 5
Eng. Nesma Mahmoud
In this section
Discusses sheet 0 and 1
Discusses Task 1
Task 1
• Implement
1. Incidence matrix
2. AND, OR, NOT query on the incidence matrix
1 How to the
1. Prepare implement Incidence Matrix?
dictionary 2. Build the
matrix
• Consider these documents: (guide example)
– Doc 1 breakthrough drug for schizophrenia
– Doc 2 new schizophrenia drug
– Doc 3 new approach for treatment of
schizophrenia
– Doc 4 new hopes for schizophrenia patients
Each document
refers to a file
stored on your
device
1. Prepare the dictionary
1. Creat arrayList “AllTokens”: store the
dictionary terms
2. Read all documents (N)
– For each (or loop) document
• Tokenize each its content and add them to AllTokens
• Note: use Java StringTokenizer class
3. Lowercase and Sort all terms in “AllTokens”
2. Build the matrix
1. Create a matrix [Z][N] IncidMatrix
– Where: Z # of terms in AllTokens (length of
AllTokens) and N # of documents
2. loop on each document
– Loop on each term in the AllTokens array
– If the term exist in the current document add 1
– Else add 0
2 AND, OR, NOT query
• Guide example:
3. Compare the two incidence
– answer the query “new AND drug”
vector by looping on them
4. Display the documents that
match the query