You are on page 1of 6

MONITERING SUSPICIOUS DISCUSSION

ON ONLINE FORUM
Department of Computer Science and Engineering, MLR Institute of Technology, Dundigal ,
Hyderabad
P.Subhashini , K.Chetana2, P.Rachana3, CH.Tharun4, S.Anil5
1

subhashinivalluru@gmail.com1,chetanahoney1@gmail.com2,rachanapanja98@gmail.com3 Tharunlegacy7@gmail.com4
, , anilbunny1912@gmail.com5

Abstract: in the talk, the word is supplanted and if this happens thrice the
client record is obstructed for 24hrs and if Record hinders multiple
The Forum may be a huge place where people can express their
times the client will be out of the discussion for all time.
individual opinions and views influencing their aspect of life
for the aim of communication and marketing. to live the loyalty of
users one can keep eye on their everyday posts and Monitor for any
II. LITERATURE SURVEY
suspicious discussions. These discussion forums are  employed
by people for illegal purposes by posting suspicious chats within This section focuses on some of the related works
the sort of text, video, images and are also interchanging them that have already been done in this area. A lot number of
online with other users. The enforcement agencies are finding researchers have contributed their efforts in this very important
solutions to suspect such malicious posts which can be within research field. This is as follows-
the sort of text for criminal investigation. As mostly the info in
chatting forums are stored within the sort of text the proposed ● E. Allman [1] proposed that everyday 306.4 billion emails are
system focuses on text posts. sent to valid email addresses round the world in 2020. And
around 55 percent of this worldwide email traffic was spam.
I. INTRODUCTION
This spam is against the law under current laws. How does
spam differ from legitimate advertising? If we enjoy watching
Presently nowadays individuals are utilizing long range
network television, employing a social networking site or
interpersonal communication destinations for correspondence
checking stock quotes online, we all know that we might be
medium. The Discussion is a huge space where individuals can
subjected to advertisements, many of which can be
express and impart their insights affecting any a part of life to
irrelevant or maybe annoying to us. Most of the
plug and correspondence. Checking suspicious talks is that
precious consumer services, like social networking, news, and
the most ideal approach to measure the faithfulness of clients by
email, are supported entirely by advertising revenue. While
watching out for his or her ordinary posts. Numerous pernicious
people may unlike advertising, most consumers accept that
people groups utilize these exchange gatherings for illicit purposes
advertising may be a price they buy accessing valuable
by posting suspicious talks during a sort of content, video, pictures
content and services. These uninvited commercial email
and trade them online with different clients. The law requirement
imposes a negative trust on consumers with none market
offices are discovering answers for suspect such illicit posts that
mediated benefit, and without the chance to opt-out.
are as content for criminal examination. Generally, the
● A. Andoni [2] proposed that over the previous years,  the
knowledge put away in talking gatherings are as content, therefore
info from collections of photos to genetic data, and to network
the proposed framework will concentrate just on content posts.
track statistics are been stored by modern technologies
Checking Suspicious Talks on Online Discussion by Information
forming huge datasets. [13]The ever-growing sizes of the
mining. The system utilized is information mining in which not
datasets have made it crucial to style new algorithms capable
much information is removed from an immense measure of
of handling this data through extreme efficiency. one
information. The framework utilizes content mining to separate
among the elemental computational primitives for managing
suspicious words from the whole visit. The framework gives the
these massive datasets is that the Nearest Neighbor (NN)
discussion to talk as well as lessens the utilization of illicit words
problem[17]. The goal is to preprocess a group of
during the visit and gives the database to criminal examination if
objects, provides a query object, and one can find
any wrongdoing happened by the individual utilizing that gathering.
efficiently the info object most almost like the query. [15]This
On the off chance that the framework identifies the suspicious word
approach features a broad set of applications in data analysis we describe a way of how the filters could even be updated
and processing. as an example , it forms the idea of a widely and adapted to new sorts of phishing .
used classification method in machine learning:  to offer a
label for a replacement object, and therefore the most similar III. PROPOSED SYSTEM
labeled object and replica its label. a number of the
applications perform information retrieval, search image
Data mining[5] is familiar with monitoring social
databases, and find duplicates and also sites and lots of others.
media further as discussion forums for suspicious feedbacks
Geometric notions are wont to represent the objects and their
or comments. Discussion forums are accustomed spread any
similarity measures.
message to an outsized population almost instantly. Several
people share their views and ideas on politics, religion and
● P. Barford and V. Yegneswaran [3] proposed that the
there are also those that intentionally hurt religious or racial
continued growth and diversification of the online has been
sentiments through malicious posts. Hence it becomes
amid an increasing prevalence of attacks and
important to watch the posts on these forums. During this
intrusions. it's argued, however, that a serious malicious
paper, we make use of a set of data from different online
activity recognized[20] within the hacker community, to
forums. This data is then passed into a CSV file. On the
attacks and intrusions for gain. This shift has been marked by
other , a neighborhood of this method user goes to tend by
a growing sophistication within the tools and methods wont
his own account and credentials of an internet site, where he
to conduct attacks, thereby escalating the network security
must log in and might start a discussion with any topic. But
race. The reactive methods for network security that are
whenever he /she make use of such words are getting to be
predominant today are ultimately insufficient so more
notified to admin of the particular site. And even the
proactive methods are required[18]. Then they begin a way of
codifying the capabilities of malware by dissecting four user goes to be warned on his activity .

widely-used Internet Relay Chat (IRC) botnet codebases. Each


codebase is classed along seven key dimensions including
botnet control mechanisms, host control mechanisms,
propagation mechanisms, exploits, delivery mechanisms,
obfuscation and deception mechanisms. [7]Our study reveals
the complexity of botnet software, which we discuss the
implications for defense strategies supported our analysis.

● A. Bergholz, J. De Beer, S. Glahn, M.-F. Moens, G. Paaß,


and S. Strobel [4] have proposed phishing emails usually
contain a message from a reputable looking source requesting
a user to click a link to an online site where she/he is asked to
enter a password or other counseling[19]. Most phishing
emails aim at withdrawing money from financial institutions
or getting access to non-public information. Phishing has
increased enormously over the last years and will be a
significant threat to global security and therefore
the economy. There are a spread of possible countermeasures
to phishing. These include statistical models for the low-
dimensional descriptions of email topics, sequential analysis
of email text and external links, and therefore the detection of
embedded logos also as indicators for hidden salting. Hidden
salting is that the intentional addition or distortion of content
not perceivable by the reader. For empirical evaluation,  we
have got obtained an outsized realistic corpus of emails pre- Fig: 1
labeled as spam, phishing, and ham (legitimate). [14]Finally,
[6]
The above image shows us the architecture of the system. It
contains steps such as identification and removal of stop words
like the, an, a, etc., sentiment analysis for the identification of the
suspicious words.

Fig 2:Details of subject and message dataset format.


IV. SYSTEM DESIGN
In the above image we have the dataset format, which contains the email
A use case diagram within the Unified Modeling Language details like subject and the message in the mail.
(UML) could also be a spread of behavioral diagram defined
by and created from a Use-case analysis. Its purpose is to
present a graphical overview of the functionality provided by
a system in terms of actors, their goals (represented as use
cases), and any dependencies between those use cases. the
foremost purpose of a use case diagram is to point what
system functions are performed that actor.

Fig 3: Buttons of different tasks.

In the above image we have our application, in which we have


different buttons assigned to do different tasks. Click on ‘Upload
Dataset’ button to load forum data

V. EVALUATION

SCREENSHOTS:

Fig: 4
This image shows us the location of datasets we need to upload.

Fig 7:Model of successful generation.


Fig 5:uploaded data.
The above image shows the prompt for successful generation of
The above image shows us the uploaded data which contains suspicious model[11].
words, numeric values and stop words.

Fig 8: Detecting the suspicious word.

Now click on ‘Detect Suspicious Word’ button to detect all


forums which contains suspicious words[12][16].

Fig 6:generated training model.

We can see all numeric values remove from first and remaining rows. Fig 9: Suspicious words.
Now click on ‘[8]Data Stemming’ to remove stops words such as off,
This image shows us the emails containing the suspicious words in
the, where, why, etc. Now click on ‘Features Extraction & Generate
the uploaded data[10].
SVMPSO Model’ button to generate training mode[9]l.
Networking sites are affecting human life.
Hence this technique successfully detects the
suspicious words from chats and prevents the
suspicious activities. This technique is applicable to
each department where there's need. Not only in
social-networking sites is that this system
applicable in forest department, disaster
management system to stop illegal activities. Text
mining technology want to detect suspicious words

from discussion forum.


Fig 10: Graphical representation.

The above image shows the graphical representation of suspicious


chats detected in the uploaded dataset. It is bar graph which has
comparison of suspicious chats and total numbers of chats. In REFERENCES
graph x-axis represents total and detected chats and y-axis
represents count. In above graph total 30 chats are there and out of
that 5 contains suspicious words. Below screen showing [1] E. Allman. The economics of spam. Queue, 1(9):80,2003.
suspicious words used in this project.
[2] A. Andoni. Nearest neighbor search: the old, the new,and the
impossible, 2009.

[3] P. Barford and V. Yegneswaran. An inside look at


botnets.Malware Detection, pages 171–191, 2007.

[4] A. Bergholz, J. De Beer, S. Glahn, M.-F. Moens, G. Paaß, and


S. Strobel. New filtering approaches for phishing email. J.
Comput. Secur., 18(1):7–35, 2010.

[5] A. Bratko, B. Filipic, G. V. Cormack, T. R. Lynam, and ˇ B.


Zupan. Spam filtering using statistical data compression models.
Journal of Machine Learning Research, 7:2673–2698, 2006.

[6] J. Caballero, C. Grier, C. Kreibich, and V. Paxson. Measuring


pay-per-install: The commoditization of malware distribution. In
Proceedings of the 20th USENIX Security Symposium, 2011.
Fig 11: Detected suspicious words
[7] M. Suruthi Murugesan R. Pavitha Devi, S.Deepthi, V.Shri
This shows us the list of suspicious words detected in the chats. Lavanya, and Dr.Annie Princy PhD: “Automated Monitoring
Suspicious Discussions on online forum by data mining” Imperial
Journal of Interdisciplinary Research Vol-2 ISSN: 2454-1362

. [8] M. F. Porter. An algorithm for suffix stripping .Program,


14(3):130–137, 1980.
VI. CONCLUSION
[9] B. Connor, R. Balasubramanyan, B. R.Routledge, and N. A.
The main objective was  to watch the suspicious
Smith.”. From tweets to polls: Linking text sentiment to public
activity that happens in various online forums. This
opinion time series”. In Proceedings of the Fourth International
application satisfies with our objectives. From the
AAAI Conference on Weblogs and Social Media 2010.
time of user login and his discussion on any topic
available in online forum are monitored. Once the
[10] K.T. Frantzi, S. Ananiadou, and J. Tsujii, “The C-Value/NC-
suspicious word is found it's replaced by the * and
Value Method of Automatic Recognition for Multi-Word Terms,”
is notified to website administrator. The Social
Proc. Second European Conf. Research and Advanced Approach For The Implementation Of Double Guard Method For
Technology for Digital Libraries (ECDL ’98), pp. 585-604, 1998. Detecting The Ids In Web Applications International Journal Of
Advanced Research In Management, Architecture, Technology
[11] T. K. Ho, “Fast identification of stop words for font learning And Engineering 2018, Vol 4(6).
and keyword spotting”, In Proc of Document Analysis and
Recognition, Fifth International Conference on (ICDAR). IEEE; [20] G. Prabhakar Reddy, K. Sai Prasad, N. Chandra Shekar
pp. 333-336 Sep. 1999. Reddy and R. Karthik, 2018. Privacy Preserving and Data
Publishing using Tuple Grouping Algorithm. Journal of
[12] Murugesan, M. Sururthi, R. Pavitha Devi, S. Deepthi, V. Shri Engineering and Applied Sciences, 13: 930-933.
Lavanya, and Annie Princy. Automated Monitoring Suspicious
Discussions on Online Formus Using Data Mining Statistical
Corpus Based Approach. Imperial Journal of Interdisiplinary
Research (IJIR) Vol2, Issue-5, 2016

[13] Harika Upgaganlawar, Nilesh Sambhe. Surveillance of


Suspicious Discussions on Online Forums Using Text Mining.
International Journal of Advances in Electronics and Computer
Science, Volume4, Issue-4, April-2017 [14] Suhas Pandhe and
Sahil Pawar. Algorithm to Monitor Suspicious on Social
Networking Sites Using Data Mining Techniques. International
Journal of Computer Applications. Volume 116 – No. 12, April
2015

[15] Javad Hosseinkhani, Mohammad Koochakazei, Solmaaz


Keikhaee and Yahaya Hamedi Amin. Detecting Suspicion
Information on Web Crime Using Crime Data Mining Techniques.
International Journal of Advanced Computer Science and
Information Technology (IJACSIT) Vol.-3, No. 1, 2014, Page 32-
41.

[16] Pushpa Rani, K., Jhansi, M., Chandrasekhara Reddy, T."Best


keyword cover search using keyword-nne algorithm
",International Journal of Mechanical Engineering and
Technology ",Volume 8 Issue 7July 2017 pp 37-43

[17] Susmitha Valli.Gogula, Karthik Rajendra - A Novel


Approach For Noise Filtering From Diabetic Retinopathy Images
Using Improved Pillar K-Means Algorithm Journal of
Fundamental And Applied Sciences FREE Journal Issue 10(6s)

[18] Lakshmi, L., Pushpa Rani, K., Purushotham Reddy, M."A


comparative study of navigation techniques and information
retrieval algorithms for web mining ", International Journal of
Advanced Trends in Computer Science and Engineering
8(1.3):10-14 · July 2019 

[19] Y.Md.Riyazuddin , G.Susmithavalli , G.Victor Daniel - An

You might also like