You are on page 1of 18

EVENT DETECTION USING

MACHINE LEARING TECHNIQUES


VASANTHARAN K [21ADR056]
PRETHISH G A [21ADR035]
SANKAR V [21ADR066]
Kongu Engineering College,
Perundurai, Erode

Dr. KOGILAVANI S. V.
Associate Professor (Sr. G) – Department of AI
Outline

 Problem Statement
 Objectives
 Introduction
 Literature Survey
 Methodology
 Performance Evaluation
 Conclusion
 References
Problem Statement

 Detecting and extracting significant events from a collection of news


articles manually is a time-consuming and challenging task.
 As the volume of news data continues to grow, there is an increasing
need for an automated system that can efficiently identify
noteworthy events.
 This problem is compounded by the unstructured nature of textual
data, requiring advanced techniques to sift through vast amounts of
information and pinpoint relevant events accurately.
Objectives
 Primary Objective
To develop a robust machine learning-based system for automated event detection from
news articles using Count Vectorizer and TF-IDF feature extraction methods.

 Secondary Objectives:

1.To evaluate the performance of various machine learning algorithms, including Logistic
Regression, Gaussian Naïve Bayes, Random Forest, Multinomial Naïve Bayes, K-NN,
Decision Tree, and Support Vector Classifier, for event detection.

2.To compare the effectiveness of Count Vectorizer and TF-IDF in transforming raw text data
into numerical representations for event detection.

3. To identify the key challenges and limitations in the current approach and propose potential
enhancements for more accurate event detection.

 Tertiary Objectives:

1. To assess the impact of different hyperparameter configurations on the performance of the


chosen machine learning algorithms.

2. To investigate the scalability of the proposed system for handling large volumes of news
articles in real-time.
Introduction
 Events categorization will make us easier to read the
desired news articles as per our needs.
 Without categorization, it will be time consuming for us
to find a desired article.
 We have created a Machine Learning system to
categorize these articles using the words which
represents a particular event.
 The detection of these events is done through ML
Regression Algorithms.
 The Random Forest Regression Algorithm is the best
system with accuracy of 98.43% and precision 0.98.
.
Literature Survey
S.N TITLE NAME OF THE AUTHOR ALGORITHM
O USED
1. A Universal Felix Hamborg1, Corinna Random Forests
System for Breitinger1, Bela Gipp2
Extracting Main
Events from
News Articles
2. Event Fazlourrahman Balouchzahi, H L Space Syntax Analysis,
detection Shashirekha Linear Regression
from News in Model, Spatial
Indian Regression Model, OLS
Languages Regression
using linear
SVC.
3. Temporal Shafiq Ur Rehman Khan Neural Networks,
Information , Muhammad Arshad Islam Random forests, IDW,
retrieval and and kriging
text
classification.
4. Detect news . Hassan Sayyadi, Alireza XGBoost Regression;
event using a Sahraei, and Hassan Abolhassani Gradient Boost;
label-based Ensemble Learning
clustering
approach.
Methodology
We have chosen two datasets, namely BBC News Train and BBC
News Test to train and test the ML model.
The train dataset consists of 1490 rows x 30 columns.
1. Data Pre-processing
The train data is needed to be pre-processed to get fit into the
system. The category column of the train data consist of ‘business’,
‘tech’, ‘politics’, ‘sport’, ‘entertainment’ as their values. These
object values are label encoded to numerical values such as 0, 1, 2, 3
to perform Regression Techniques.
2. Data Analysis
The data needs to be analysed for the further process. The
count of the categories in the data set is visualized and it can be seen
in fig 1 and fig 2.
fig 1 fig 2

The tags, special characters and stop words are removed


as it will be ignored by the search engine. All the words are
converted to lower case and are lemmatized to have a
proper dataset to train the model.
3. Proposed Model
Performance Evaluation
We have used count vectorizer and TF-IDF for the feature extraction
purpose. Out of which TF-IDF gives the best accuracy and precision
TF-IDF VECTORIZER:
1. Linear Regression
2. Random Forest

3. Multinomial Naive Bayes


4. Support Vector Classifier

5. Decision Tree Classifier


6. K Nearest Neighbour

7. Gaussian Naive Bayes


REPORT OF ACCURACY
TF-IDF:

COUNT VECTORIZER :
Conclusion
 Utilizing machine learning techniques, such as Count Vectorizer
and TF-IDF feature extraction, proves robust for event detection
in news articles.
 Count Vectorizer captures word frequencies, while TF-IDF
emphasizes rare yet significant terms.
 Employing various algorithms like Logistic Regression and
Random Forest facilitates efficient processing, with the latter
outperforming in TF-IDF.
 The results affirm the superiority of the Random Forest model
for both training and test data
References
1] Hamborg, F., Breitinger, C., & Gipp, B. (2019). Giveme5w1h: A universal system for
extracting main events from news articles. arXiv preprint arXiv:1909.02766.
2] Balouchzahi, F., & Shashirekha, H. L. (2020, December). An Approach for Event
Detection from News in Indian Languages using Linear SVC. In FIRE (Working
Notes) (pp. 829-834).
3] Khan, S. U. R., & Islam, M. A. (2019). Event-Dataset:
Temporal information retrieval and text classification dataset. Data in brief, 25,
104048.
4] Toda, H., & Kataoka, R. (2005, November). A search result clustering method
using informatively named entities. In Proceedings of the 7th annual ACM international
workshop on Web information and data management (pp. 81-86).
5] L. Hu, B. Zhang, L. Hou, J. Li, Adaptive online event detection in news streams,
Knowledge-Based Systems 138 (2017) 105–112.
6] J. Weng, B.-S. Lee, Event detection in twitter., Icwsm 11 (2011) 401–408.
7] Khodra, M.L. 2015. Event extraction on Indonesian news article using multiclass
categorization. ICAICTA 2015 - 2015 International Conference on Advanced Informatics:
Concepts, Theory and Applications (2015).
8] Lejeune, G. et al. 2015. Multilingual event extraction for epidemic detection.
Artificial Intelligence in Medicine. (2015).
9] R. Campos, G. Dias, A.M. Jorge, A. Jatowt, Survey of temporal information
retrieval and related applications, ACM Comput. Surv. 47 (2) (2014) 15.
10] P.K. Choubey, K. Raju, R. Huang, Identifying the most dominant event in a news
article by mining event coreference relations, in: Proceedings Of the 2018 Conference Of
the North American Chapter Of the Association For Computational Linguistics: Human
Language Technologies, vol. 2, 2018, pp. 340e345. Short Papers.
11] S. Upadhyay, C. Christodoulopoulos, D. Roth, Making the news': identifying
noteworthy events in news articles, in: Proceedings Of the Fourth Workshop On Events,
2016, pp. 1e7.
12] A. Jatowt, C. Man, A. Yeung, K. Tanaka, Generic method for detecting focus time
of documents, Inf. Process. Manag. 51 (6)(2015) 851e868.

May 8, 2024
THANK YOU

May 8, 2024

You might also like