You are on page 1of 3

Social Network Analysis

1
Social Network Analysis
 Early search engines retrieved relevant pages for the user based primarily on the
content similarity of the user query and the indexed pages of the search
engines.
 Starting from 1996, it became clear that content similarity alone was no longer
sufficient for search due to two reasons.
 First, the number of Web pages grew rapidly during the middle to late 1990s.
Given any query, the number of relevant pages can be huge. For example, given
the search query “classification technique”, the Google search engine estimates
that there are about 10 million relevant pages. This abundance of information
causes a major problem for ranking, i.e., how to choose only 10–30 pages and
rank them
suitably to present to the user
 Second, content similarity methods are easily spammed. A page owner can
repeat some important words and add many remotely related words in his/her
pages to boost the rankings of the pages
and/or to make the pages relevant to a large number of possible queries.
2
SNA
 During the period of 1997-1998, two most influential hyperlink based
search algorithms PageRank and HITS were designed.
PageRank is the algorithm that powers the successful search engine Google.

 Both PageRank and HITS were originated from social network analysis. They both
exploit the hyperlink structure of the Web to rank pages according to their levels
of “prestige” or “authority”.

 Apart from search ranking, hyperlinks are also useful for finding Web
communities. A Web community is a cluster of densely linked pages representing
a group of people with a common interest. Beyond explicit hyperlinks on the
Web, explicit or implicit links in other contexts are useful too, e.g., for
discovering communities of named entities (e.g., people and organizations) in
free text documents and for analyzing social phenomena in emails and friendship
networks on social networking sites.
3

You might also like