Upload_transparent

Splog Detection Using Self Similarity Analysis

 
 
 
 
 
Value This
Doc
Scribd
Average
     
Pages: 8 43
Words: 6894 13640
Characters: 42605 81678
Lines: 394 623
     
     
Letters per word: 6.18 5.99
Words per line: 17.5 21.89
Words per page: 861.75 317.21

Add to your reading list

Flag_red Flag this document

Document Information

480 Reads | 0 Comments

Description

research paper from May 8, 2007

"The machine-generated nature of splogs make them good candidates for statistical assessment. This assessment can include looking at:

1. Post Time: Two measures that capture regularity in posting time (micro) e.g. posts go live in the morning before the blogger's "real" job as well as a macro time view e.g. a large gap in posting due to a vacation.
2. Post Content: A measure of the topic drift by the blogger. Commonly a blogger will remain focused on a topic, but will sometimes write about other topics.
3. Post Links: The links on the blog can be telling. A large proportion directed to a particular domain, for example, suggests a relationship with that destination domain.
"

Pdf_16x16 8 Pages


Date Added

01/09/2009

Category

Uncategorized.

Tags
Groups
Copyright

Attribution Non-commercial

More info »

 

or use Facebook Connect