Introduction to Telecom Technologies (Telecom

Getachew Mamo
Department of Information Technology College of Engineering and Technology Jimma University
E. Mail,


Chapter 1: Understanding the Telecommunications Revolution


Introduction  Changes in Telecommunications




What is Telecommunications?

‡ The word telecommunications has its roots in
± tele means "over a distance," and ± communicara means "the ability to share."

‡ Hence,
± telecommunications literally means "the sharing of information over a distance."


TR vs. Database Retrieval

‡ Information
± Unstructured/free text vs. structured data ± Ambiguous vs. well-defined semantics

‡ Query
± Ambiguous vs. well-defined semantics ± Incomplete vs. complete specification

‡ Answers
± Relevant documents vs. matched records

‡ TR is an empirically defined problem!

TR is Hard!

‡ Under/over-specified query
± Ambiguous: ³buying CDs´ (money or music?) ± Incomplete: what kind of CDs? ± What if ³CD´ is never mentioned in document?

‡ Vague semantics of documents
± Ambiguity: e.g., word-sense, structural ± Incomplete: Inferences required

‡ Even hard for people!
± 80% agreement in human judgments

TR is ³Easy´!

‡ TR CAN be easy in a particular case
± Ambiguity in query/document is RELATIVE to the database ± So, if the query is SPECIFIC enough, just one keyword may get all the relevant documents

‡ PERCEIVED TR performance is usually better than
the actual performance
± Users can NOT judge the completeness of an answer


‡ ‡

History of TR on One Slide
Birth of TR ± 1945: V. Bush¶s article ³As we may think´
± 1957: H. P. Luhn¶s idea of word counting and matching

Indexing & Evaluation Methodology (1960¶s)
± Smart system (G. Salton¶s group) ± Cranfield test collection (C. Cleverdon¶s group) ± Indexing: automatic can be as good as manual (controlled vocabulary)

‡ ‡

TR Models (1970¶s & 1980¶s) « Large-scale Evaluation & Applications (1990¶s-Present)
± TREC (D. Harman & E. Voorhees, NIST) ± Web search, PubMed, « ± Boundary with related areas are disappearing

Short vs. Long Term Info Need

‡ Short-term information need (Ad hoc retrieval)
± ³Temporary need´, e.g., info about used cars ± Information source is relatively static ± User ³pulls´ information ± Application example: library search, Web search

‡ Long-term information need (Filtering)
± ³Stable need´, e.g., new data mining algorithms ± Information source is dynamic ± System ³pushes´ information to user ± Applications: news filter

Importance of Ad hoc Retrieval

‡ Directly manages any existing large collection of

‡ There are many many ³ad hoc´ information needs ‡ A long-term information need can be satisfied
through frequent ad hoc retrieval

‡ Basic techniques of ad hoc retrieval can be used for
filtering and other ³non-retrieval´ tasks, such as automatic summarization.


Formal Formulation of TR

‡ Vocabulary V={w1, w2, «, wN} of language ‡ Query q = q1,«,qm, where qi  V ‡ Document di = di1,«,dimi, where dij  V ‡ Collection C= {d1, «, dk} ‡ Set of relevant documents R(q)  C
± Generally unknown and user-dependent ± Query is a ³hint´ on which doc is in R(q)

‡ Task =

compute R¶(q), an ³approximate R(q)´

Computing R(q)

‡ Strategy 1: Document selection
± R(q)={dC|f(d,q)=1}, where f(d,q) {0,1} is an indicator function or classifier ± System must decide if a doc is relevant or not (³absolute relevance´)

‡ Strategy 2: Document ranking
± R(q) = {dC|f(d,q)>U}, where f(d,q) „ is a relevance measure function; U is a cutoff ± System must decide if one doc is more likely to be relevant than another (³relative relevance´)

Document Selection vs. Ranking
True R(q)

Doc Selection f(d,q)=?

+ +- + ++


+ +- - + - + + --- --


- - -- - + - 0.98 d1 + 0.95 d2 + 0.83 d3 0.80 d4 + 0.76 d5 0.56 d6 0.34 d7 0.21 d8 + 0.21 d9 -

Doc Ranking f(d,q)=?



Problems of Doc Selection

‡ The classifier is unlikely accurate
± ³Over-constrained´ query (terms are too specific): no relevant documents found ± ³Under-constrained´ query (terms are too general): over delivery ± It is extremely hard to find the right position between these two extremes

‡ Even if it is accurate,
equally relevant

all relevant documents are not

‡ Relevance is a matter of degree!

Ranking is often preferred

‡ Relevance is a matter of degree ‡ A user can stop browsing anywhere, so the
boundary is controlled by the user
± High recall users would view more items ± High precision users would view only a few

‡ Theoretical justification: Probability Ranking Principle
[Robertson 77]


Probability Ranking Principle
[Robertson 77]

‡ As stated by Cooper
³If a reference retrieval system¶s response to each request is a ranking of the documents in the collections in order of decreasing probability of usefulness to the user who submitted the request, where the probabilities are estimated as accurately a possible on the basis of whatever data made available to the system for this purpose, then the overall effectiveness of the system to its users will be the best that is obtainable on the basis of that data.´

‡ Robertson provides two formal justifications ‡ Assumptions: Independent relevance and sequential
browsing (not necessarily all hold in reality)


According to the PRP, all we need is ³A relevance measure function f´ which satisfies
For all q, d1, d2, f(q,d1) > f(q,d2) iff p(Rel|q,d1) >p(Rel|q,d2)
Most IR research has focused on finding a good function f


Evaluation in Information Retrieval


Evaluation Criteria

‡ Effectiveness/Accuracy
± Precision, Recall

‡ Efficiency
± Space and time complexity

‡ Usability
± How useful for real user tasks?


Methodology: Cranfield Tradition

‡ Laboratory testing of system components
± Precision, Recall ± Comparative testing

‡ Test collections
± Set of documents ± Set of questions ± Relevance judgments


The Contingency Table
Action Doc Relevant Retrieved Relevant Retrieved Not Retrieved Relevant Rejected

Not relevant Irrelevant Retrieved Irrelevant Rejected

Relevant Retrieved Precision ! Retrieved Relevant Retrieved Recall ! Relevant

How to measure a ranking?

‡ Compute the precision at every recall point ‡ Plot a precision-recall (PR) curve
x x


x x


Which is better?


x x




Summarize a Ranking: MAP
Given that n docs are retrieved
± Compute the precision (at rank) where each (new) relevant document is retrieved => p(1),«,p(k), if we have k rel. docs ± E.g., if the first rel. doc is at the 2nd rank, then p(1)=1/2. ± If a relevant document never gets retrieved, we assume the precision corresponding to that rel. doc to be zero

‡ ‡ ‡

Compute the average over all the relevant documents
± Average precision = (p(1)+«p(k))/k

This gives us (non-interpolated) average precision, which captures both precision and recall and is sensitive to the rank of each relevant document Mean Average Precisions (MAP)
± MAP = arithmetic mean average precision over a set of topics ± gMAP = geometric mean average precision over a set of topics (more affected by difficult topics)


Summarize a Ranking: NDCG
‡ ‡ ‡
What if relevance judgments are in a scale of [1,r]? r>2 Cumulative Gain (CG) at rank n
± Let the ratings of the n documents be r1, r2, «rn (in ranked order) ± CG = r1+r2+«rn

Discounted Cumulative Gain (DCG) at rank n
± DCG = r1 + r2/log22 + r3/log23 + « rn/log2n ± We may use any base for the logarithm, e.g., base=b ± For rank positions above b, do not discount


Normalized Cumulative Gain (NDCG) at rank n
± Normalize DCG at rank n by the DCG value at rank n of the ideal ranking ± The ideal ranking would first return the documents with the highest relevance level, then the next highest relevance level, etc ± Compute the precision (at rank) where each (new) relevant document is retrieved => p(1),«,p(k), if we have k rel. docs


NDCG is now quite popular in evaluating Web search

When There¶s only 1 Relevant Document

‡ Scenarios:
± known-item search ± navigational queries

‡ Search Length = Rank of the answer:
± measures a user¶s effort

‡ Mean Reciprocal Rank (MRR):
± Reciprocal Rank: 1/Rank-of-the-answer ± Take an average over all the queries


Precion-Recall Curve
Out of 4728 rel docs, we·ve got 3212

about 5.5 docs in the top 10 docs are relevant

Breakeven Point (prec=recall) Mean Avg. Precision (MAP)
D1 + D2 + D3 ² D4 ² D5 + D6 Total # rel docs = 4 System returns 6 docs Average Prec = (1/1+2/2+3/5+0)/4 27

What Query Averaging Hides
1 0.9 0.8 0.7


0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Slide from Doug Oard·s presentation, originally from Ellen Voorhees· presentation


The Pooling Strategy
‡ ‡
When the test collection is very large, it¶s impossible to completely judge all the documents TREC¶s strategy: pooling
± Appropriate for relative comparison of different systems ± Given N systems, take top-K from the result of each, combine them to form a ³pool´ ± Users judge all the documents in the pool; unjudged documents are assumed to be non-relevant

‡ ‡ ‡

Advantage: less human effort Potential problem:
± bias due to incomplete judgments (okay for relative comparison) ± Favor a system contributing to the pool, but when reused, a new system¶s performance may be under-estimated

Reuse the data set with caution!

User Studies
‡ ‡ ‡
Limitations of Cranfield evaluation strategy:
± How do we evaluate a technique for improving the interface of a search engine? ± How do we evaluate the overall utility of a system?

User studies are needed General user study procedure:
± ± ± ± ± Experimental systems are developed Subjects are recruited as users Variation can be in the system or the users Users use the system and user behavior is logged User information is collected (before: background, after: experience with the system)


Clickthrough-based real-time user studies:
± Assume clicked documents to be relevant ± Mix results from multiple methods and compare their clickthroughs

Common Components in a TR System


Typical TR System Architecture
ocs query

Toke izer Doc Rep (Index) Query Rep Index


judgme ts






Text Representation/Indexing

‡ Making it easier to match a query with a document ‡ Query and document should be represented using
the same units/terms

‡ Controlled vocabulary vs. full text indexing ‡ Full-text indexing is more practically useful and has
proven to be as effective as manual indexing with controlled vocabulary


What is a good indexing term?

‡ Specific (phrases) or general (single word)? ‡ Luhn found that words with middle frequency are
most useful
± Not too specific (low utility, but still useful!) ± Not too general (lack of discrimination, stop words) ± Stop word removal is common, but rare words are kept

‡ All words or a (controlled) subset? When term
weighting is used, it is a matter of weighting not selecting of indexing terms (more later)

‡ ‡
Word segmentation is needed for some languages
± Is it really needed?

Normalize lexical units: Words with similar meanings should be mapped to the same indexing term
± Stemming: Mapping all inflectional forms of words to the same root form, e.g.
‡ computer -> compute ‡ computation -> compute ‡ computing -> compute (but king->k?)

± Are we losing finer-granularity discrimination?


Stop word removal
± What is a stop word? What about a query like ³to be or not to be´?

Relevance Feedback
etrieval E gi e Updated query
Results: d1 3.5 d2 2.4 « dk 0.5 ...

Docume t collectio

Judgments: d1 + d2 d3 + « dk ...



Pseudo/Blind/Automatic Feedback
Retrieval E gi e Updated query
Results: d1 3.5 d2 2.4 « dk 0.5 ...

Docume t collectio


Judgments: d1 + d2 + d3 + « dk ...

top 10


What You Should Know
‡ ‡ ‡ ‡ ‡ ‡
How TR is different from DB retrieval Why ranking is generally preferred to document selection (justified by PRP) How to compute the major evaluation measure (precision, recall, precision-recall curve, MAP, gMAP, breakeven precision, NDCG, MRR) What is pooling What is tokenization (word segmentation, stemming, stop word removal) What is relevance feedback; what is pseudo relevance feedback

Overview of Retrieval Models
((Rep(q), Rep(d)) Similarity

P(r=1|q,d) r {0,1} Probability of Relevance
Regression Model (Fox 83) Generative Model

P(d pq) or P(q pd) Probabilistic inference
Different inference system

Different rep & similarity

Query Doc Learn to generation generation « Inference Prob. concept Rank network space model (Joachims 02) Vector space model Prob. distr. Classical LM (Wong & Yao, 95) (Burges et al. 05) model (Turtle & Croft, 91) model prob. Model approach (Salton et al., 75) (Wong & Yao, 89) (Robertson & (Ponte & Croft, 98) Sparck Jones, 76)(Lafferty & Zhai, 01a)


Retrieval Models: Vector Space


The Basic Question
Given a query, how do we know if document A is more relevant than B?

One Possible Answer
If document A uses more query words than document B (Word usage in document A is more similar to that in query)

Relevance = Similarity

‡ Assumptions
± Query and document are represented similarly ± A query can be regarded as a ³document´ ± Relevance(d,q) w similarity(d,q)

‡ R(q) = {dC|f(d,q)>U}, f(q,d)=((Rep(q), Rep(d)) ‡ Key issues
± How to represent query/document? ± How to define the similarity measure (?


Vector Space Model

‡ Represent a doc/query by a term vector
± Term: basic concept, e.g., word or phrase ± Each term defines one dimension ± N terms define a high-dimensional space ± Element of vector corresponds to term weight ± E.g., d=(x1,«,xN), xi is ³importance´ of term i

‡ Measure relevance by the distance between the
query vector and document vector in the vector space

VS Model: illustration

D3 D10

D9 D11 D5



D4 D6 Query D7 D8 D1




What the VS model doesn¶t say

‡ How to define/select the ³basic concept´
± Concepts are assumed to be orthogonal

‡ How to assign weights
± Weight in query indicates importance of term ± Weight in doc indicates how well the term characterizes the doc

‡ How to define the similarity/distance measure


What¶s a good ³basic concept´?

‡ Orthogonal
± Linearly independent basis vectors ± ³Non-overlapping´ in meaning

‡ No ambiguity ‡ Weights can be assigned automatically and hopefully

‡ Many possibilities: Words, stemmed words, phrases,
³latent concept´, «


How to Assign Weights?
‡ Very very important! ‡ Why weighting
± Query side: Not all terms are equally important ± Doc side: Some terms carry more information about contents

‡ How?
± Two basic heuristics
‡ TF (Term Frequency) = Within-doc-frequency ‡ IDF (Inverse Document Frequency)

± TF normalization

TF Weighting

‡ Idea: A term is more important if it occurs more
frequently in a document

‡ Some formulas: Let f(t,d) be the frequency count of
term t in doc d
± Raw TF: TF(t,d) = f(t,d) ± Log TF: TF(t,d)=log f(t,d) ± Maximum frequency normalization: TF(t,d) = 0.5 +0.5*f(t,d)/MaxFreq(d) ² ´Okapi/BM25 TFµ: TF(t,d) = k f(t,d)/(f(t,d)+k(1-b+b*doclen/avgdoclen))

‡ Normalization of TF is very important!

TF Normalization

‡ Why?
± Document length variation ± ³Repeated occurrences´ are less informative than the ³first occurrence´

‡ Two views of document length
± A doc is long because it uses more words ± A doc is long because it has more contents

‡ Generally penalize long doc, but avoid overpenalizing (pivoted normalization)

TF Normalization (cont.)
Norm. TF

Raw TF Which curve is more reasonable? Should normalized-TF be up-bounded? Normalization interacts with the similarity measure

Regularized/³Pivoted´ Length Normalization
Norm. TF

Raw TF
³Pivoted ormalizatio ´: Usi g avg. doc le gth to regularize ormalizatio 1-b+b*docle /avgdocle (b varies from 0 to 1) What would happe if docle is {>, <,=} avgdocle ? Adva tage: stabalize parameter setti g

IDF Weighting

‡ Idea: A term is more discriminative if it occurs only in
fewer documents

‡ Formula:
IDF(t) = 1+ log(n/k) n ² total number of docs k -- # docs with term t (doc freq)


TF-IDF Weighting

‡ TF-IDF weighting : weight(t,d)=TF(t,d)*IDF(t)
± Common in doc high tf high weight ± Rare in collection high idf high weight

‡ Imagine a word count profile, what kind of terms
would have high weights?


How to Measure Similarity?
X Di ! ( w i 1 ,..., w iN ) X Q ! ( w q1 ,..., w qN ) w ! 0 if a term is abse t XX N Dot product similarity : sim (Q , Di ) ! § w qj  w ij
j !1 N

Cosi e :

XX sim (Q , Di ) !

§ wqj  w ij
j !1 N j !1

§ ( wqj ) 
( ! ormalized dot product)
XX si (Q, i ) !


( w ij ) 2 §
j !1



How about Euclidea ?

( wqj  wij ) 2 §
j !1


VS Example: Raw TF & Dot Product
information retrieval search engine information travel information

Sim(q,doc1)=4.8*2.4+4.5*4.5 query=³information retrieval´ Sim(q,doc2)=2.4*2.4


map travel



government president congress

info IDF(faked) 2.4 doc1 doc2 doc3 query 2(4.8) 1(2.4 )

retrieval 4.5 1(4.5)

travel map search engine govern president congress 2.8 3.3 2.1 5.4 2.2 3.2 4.3 1(2.1) 2 (5.6) 1(3.3) 1(5.4) 1 (2.2) 1(3.2) 1(4.3)



1(4.5) 55

What Works the Best?

Error [ ]

‡Use si gle words ‡Use stat. phrases ‡Remove stop words ‡Stemmi g ‡Others(?)

(Singhal 2001)

Relevance Feedback in VS
Basic setting: Learn from examples
± Positive examples: docs known to be relevant ± Negative examples: docs known to be non-relevant ± How do you learn from this to improve performance?


General method: Query modification
± Adding new (weighted) terms ± Adjusting weights of old terms ± Doing both


The most well-known and effective approach is Rocchio
[Rocchio 1971]


Rocchio Feedback: Illustration

-- --+ - - - +++q+ q - -- ++ ++ + - - - + ++ + + ++ - -- --


Rocchio Feedback: Formula
New query Parameters

Origial query

Rel docs

No -rel docs


Rocchio in Practice
‡ ‡ ‡ ‡ ‡
Negative (non-relevant) examples are not very important (why?) Often project the vector onto a lower dimension (i.e., consider only a small number of words that have high weights in the centroid vector) (efficiency concern) Avoid ³training bias´ (keep relatively high weight on the original query weights) (why?) Can be used for relevance feedback and pseudo feedback Usually robust and effective


³Extension´ of VS Model

‡ Alternative similarity measures
± Many other choices (tend not to be very effective) ± P-norm (Extended Boolean): matching a Boolean query with a TF-IDF document vector

‡ Alternative representation
± Many choices (performance varies a lot) ± Latent Semantic Indexing (LSI) [TREC performance tends to be average]

‡ Generalized vector space model
± Theoretically interesting, not seriously evaluated

Advantages of VS Model

‡ Empirically effective! (Top TREC performance) ‡ Intuitive ‡ Easy to implement ‡ Well-studied/Most evaluated ‡ The Smart system
± Developed at Cornell: 1960-1999 ± Still widely used

‡ Warning: Many variants of TF-IDF!

Disadvantages of VS Model

‡ Assume term independence ‡ Assume query and document to be the same ‡ Lack of ³predictive adequacy´
± Arbitrary term weighting ± Arbitrary similarity measure

‡ Lots of parameter tuning!


What You Should Know

‡ What is Vector Space Model (a family of models) ‡ What is TF-IDF weighting ‡ What is pivoted normalization weighting ‡ How Rocchio works



‡ This lecture
± Basic concepts of TR ± Evaluation ± Common components ± Vector space model

‡ Next lecture: continue overview of IR
± IR system implementation ± Other retrieval models ± Applications of basic TR techniques

Sign up to vote on this title
UsefulNot useful