Professional Documents
Culture Documents
Monitoring of Suspicious Discussions On Online Forums Using Data Mining
Monitoring of Suspicious Discussions On Online Forums Using Data Mining
net/publication/336252196
CITATIONS READS
0 2,554
3 authors, including:
Shailesh Dudala
University of Chicago
2 PUBLICATIONS 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Shailesh Dudala on 04 October 2019.
1
Tanya Srivastava, 2R.Mangalagowri, 3Shailesh S.Dudala
2
Assistant Professor, 1,2,3Department of CSE,
SRM University, Chennai, India 603203
1
tanyasriva96@gmail.com, 3shailesh.dudala@hotmail.com
Abstract: With the increasing years, the internet has analyse text sources from social media and classify the
changed the lives of so many people for better or worse. text into different groups. The system distinguishes
As internet technology is progressing, many illegal between legal and illegal data using Stop-Words
activities have also increased exponentially. The Internet Selection, Stemmer algorithm and Levenshtein
is an unacknowledged path for illegal activities such as algorithm. In Stop-word Selection, the commonly used
hacking, trafficking, betting, fraud and scams etc. The words in English language, such as, “we”, “he”, “they”
cyber-crime branches are looking for provisions to detect are removed. More such words can be removed using this
these forums for illegal feedbacks, comments or reviews algorithm.
and download questionable postings as verification for The Porter Stemmer algorithm removes the suffixes from
their investigation. Our proposed system will monitor for English words and transforms into its root word, for
suspicious postings, collect it from few discussion example:
forums, implement techniques of data mining and extract The words “Stemmed”, “Stemmer”, “Stemming” have
meaningful data. In this concern, we focus on Data suffixes “ED”, “ER” and “ING” will be removed during
Mining and Sentimental Analysis to enhance the the information retrieval and will leave one word
techniques and to extract the features of the text to “STEM” as the root word.
represent them. In Levenshtein algorithm, a large set of words are
compared with each other. For example, the Levenshtein
Keywords: Illegal Activities, Discussion forums, distance between "fitten" and "bitting" is 3, since the
Sentimental Analysis. following three edits change one into the other, and there
is no way to do it with fewer than three edits:
1. Introduction
fitten → bitten (substitution of "b" for "f")
Accelerating crimes on digital mediums alert the law bitten → bittin (substitution of "i" for "e")
implementation bodies to continuously monitor online bittin → bitting (insertion of "g" at the end).
activities. To achieve the above we need to build a
system which detects suspicious postings on online Levenshtein distance is a measure of similarity between
forums. A lot of surveys and facts have proved that it is two words.
difficult to manage information which constantly keeps
changing on internet thus data mining is the optimal B. Some Limitations of current systems
choice to analyse and gather data. Using various data
mining techniques, raw data is extracted from a large text Although the existing system works fine, but there is still
corpus and this raw /unstructured data is transformed into scope for improvement. The performances of data
structured data in pre-processing. This paper highlights retrieval analysis real from online forums remains
the datamining techniques and sentimental algorithm debatable due to lack of tools. The system is to
which is prototyped and implemented using python monitor suspicious discussions automatically on an
which is functional in natural language using Natural online forum but the system does not take large amount
Language Toolkit (NLTK) library. of data as input. The system is difficult to Moderate. It
only limits to Spam reviews and feedbacks. Security
A. Existing systems vulnerability is also one of the major disadvantage.
257
International Journal of Pure and Applied Mathematics Special Issue
258
International Journal of Pure and Applied Mathematics Special Issue
259
International Journal of Pure and Applied Mathematics Special Issue
260
International Journal of Pure and Applied Mathematics Special Issue
261
262