0% found this document useful (0 votes)
88 views6 pages

A Machine Learning Framework For Automated News Article Title Classification in Albanian

This paper presents a machine learning framework for automated classification of news article titles in Albanian, addressing the challenges posed by limited text corpora and the complexity of the Albanian language. A dataset of 9600 news titles across six categories is introduced, and various machine learning algorithms, particularly recurrent neural networks, are evaluated for their effectiveness in classifying these titles. The study demonstrates that deep learning methods outperform traditional classifiers in accurately categorizing news articles in low-resource languages like Albanian.

Uploaded by

Ameer Hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views6 pages

A Machine Learning Framework For Automated News Article Title Classification in Albanian

This paper presents a machine learning framework for automated classification of news article titles in Albanian, addressing the challenges posed by limited text corpora and the complexity of the Albanian language. A dataset of 9600 news titles across six categories is introduced, and various machine learning algorithms, particularly recurrent neural networks, are evaluated for their effectiveness in classifying these titles. The study demonstrates that deep learning methods outperform traditional classifiers in accurately categorizing news articles in low-resource languages like Albanian.

Uploaded by

Ameer Hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A Machine Learning Framework for Automated

News Article Title Classification in Albanian


2024 International Conference on INnovations in Intelligent SysTems and Applications (INISTA) | 979-8-3503-6813-0/24/$31.00 ©2024 IEEE | DOI: 10.1109/INISTA62901.2024.10683815

Evis Plaku∗ , Klei Jahaj † , Arben Cela ‡ , and Nikolla Civici ‡


∗ AI
Laboratory, University Metropolitan Tirana
Sotir Kolea Street, Tirana, Albania
Email: [Link]@[Link]
† Faculty of Computer Science and IT, University Metropolitan Tirana

Sotir Kolea Street, Tirana, Albania


Email: [Link]@[Link]
‡ Laboratory of Images Signals and Intelligent Systems, ESIEE Paris

Noisy-le-Grand CEDEX, Paris, France


Email: [Link]@[Link]
§ Faculty of Engineering, University Metropolitan Tirana

Sotir Kolea Street, Tirana, Albania


Email: ncivici@[Link]

Abstract—Automated news article classification is a method significant challenges [4]. Currently, the majority of available
of categorizing textual data into predefined classes. Addressing text corpora, which algorithms are trained on, are in English.
this problem finds applications in diverse domains, including For under-represented languages such as Albanian, the present
information retrieval, topic modeling, sentiment analysis and
content recommendation systems. In Albanian, though there is a text corpora is limited and small in size. An additional
rapid increase of digital content, there is limited availability of challenge is caused from the inherent ambiguity of news
text corpora, presenting significant obstacles for advancement of articles, often falling under multiple categories. For example,
natural language processing research and applications. an article discussing sporting events may also include elements
The contribution of this paper is twofold. First, we introduce of social or cultural significance, making it difficult to assign
a dataset consisting of 9600 news article titles spanning across
various categories. Second, we utilize this dataset to assess the a single, definitive label. Moreover, the grammatical structure
effectiveness of several machine learning algorithms for topic and writing style of Albanian text significantly differs from
classification. Experimental results demonstrate the efficacy of that of English, posing further obstacles for accurate classifi-
recurrent neural networks in comparison to simpler classifiers cation. Collectively, these limitations affect the performance of
and ensemble methods. machine learning algorithms to semantically understand text
Index Terms—news classification, machine learning, NLP
for low-resource languages [4], [5].
I. I NTRODUCTION To address these challenges, we introduce a comprehensive
dataset of 9600 news article titles in Albanian, covering six
Text classification is a central problem in natural lan- distinct categories. News titles are sourced from several news-
guage processing with applications in information retrieval papers, ensuring a balanced representation across categories
and summarization, aggregation of news sources by topic, such as politics, economy, sport, culture, lifestyle and current
customer feedback segmentation, and content personalization affairs. This labeled dataset provides a substantial body of
[1]. Due to the vast volume of digital content, it is necessary news article titles available for training. We leverage the col-
to understand, examine and organize text [2]. This work lected dataset to address the task of news article classification
focuses on classification of news articles in the Albanian by examining the effectiveness of various machine learning
language. Currently available content is characterized by small models. We begin by testing traditional models such as lo-
volume of information provided, a wide variety of lexical and gistic regression, support vector machines and decision trees,
grammatical structures, and direct and formal writing style. which serve as benchmarks for comparison. Additionally, we
Recent advancements in machine learning, natural language investigate the performance of ensemble learning methods
processing (NLP) and large language models are playing like random forest and gradient boosting. Experimental results
a pivotal role in achieving high level of accuracy in text identify recurrent neural networks as the better performing
understanding, generation and classification [3]. Such meth- model, able to capture sequential and contextual information
ods leverage large training data and sophisticated algorithms in news headline data.
to extract text patterns, semantic representations and under-
standing of language nuances. Despite these advancements, II. P ROBLEM D EFINITION AND R ELATED W ORK
classification of news article titles in Albanian presents several Let D be a dataset containing N news article titles, where
979-8-3503-6813- 0/24/$31.00 ©2024 IEEE each title xi is associated with a category label yi such

Authorized licensed use limited to: National University Fast. Downloaded on November 01,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
that yi ∈ {1, 2, . . . , K} with K being the total number the effectiveness of deep learning methods for low-resource
of categories. Each news article title xi is composed of a languages.
sequence of words represented as xi = (wi1 , wi2 , . . . wiNi ) News article classification tasks with Albanian text corpora
where Ni is the number of words in title xi . These words is also addressed by previous research. A study comparing
form the features used for classification. the effectiveness of various classifiers such as Multinomial
The goal is to leverage the training data to learn a function Naı̈ve Bayes, Logistic Regression, SVM, and others in terms
f : X → Y where X denotes the set of news article of accuracy and execution time, has shown that Passive
titles, while Y represents the set of corresponding true labels. Aggressive algorithm achieves the highest accuracy, while
Function f maps each title xi to its corresponding category Random Forest performs the poorest [15]. In other work,
yi , with the goal of accurately predicting the category of new, beside traditional classifiers, bag of word models and hierar-
previously unseen titles. In other words, the objective is to chical classifiers are also employed, focused on semantic and
identify optimal function f ∗ that minimizes a predefined loss syntactical similarities between words, resulting in models that
function L(f (xi ), yi ) over the entire dataset D, such that achieve high accuracy in multi-label text classification [16] -
[17]. In a more closely related work, a series of traditional and
N
 ensemble classification algorithms is employed on a relatively
f ∗ = argmin L(f (xi ), yi ) (1) small dataset of Albanian news article headlines. Experimental
f i=1
results demonstrate that basic models outperform ensemble
where N denotes the total number of news article titles, learning methods [18]. Though previous research have ex-
while L represents the loss incurred by the prediction of the amined news article topic classification in Albanian relying
model. mostly on traditional classification algorithms, our approach
Various methods are employed to address this problem, distinguishes itself with a large dataset of over 9000 news
from probabilistic models to recent deep neural architectures. article titles. In addition, we also employ deep learning meth-
Early contributions to topic modeling include methods such as ods, such as recurrent neural networks and achieve a higher
Latent Semantic Indexing (LSI) [6] and Probabilistic Latent accuracy on news article topic classification.
Semantic Indexing (PLSI) [7], aiming to discover hidden
thematic structures within a body of text based on word usage III. P ROPOSED M ETHODOLOGY
statistics. While these approaches seek to discover laten topics We employ an approach with three key modules. First,
without prior knowledge of categories, we use labeled training we construct a dataset by web scraping news article titles
data to predict the category of previously unseen article spanning several categories. Second, we process the dataset by
titles. Traditional machine learning approaches such as logistic tokenizing and converting it to numerical tokens. Labels are
regression [8] model the likelihood of a given news article title also encoded into numerical values. Third, we build a variety
belonging to a particular category. Support Vector Machine of classification algorithms, from traditional approaches, to
(SVM) [9] aim to find an optimal hyperplane that separate data ensemble and deep learning models and train them with the
points into distinct categories by creating decision boundaries objective of building a model to classify previously unseen
that maximize the separation between classes. Decision trees news article titles. Figure 1 provides an illustration.
maximize information gain and assign categories based on
majority of data points [10], while ensemble models such as Data Extraction Module
random forests leverage multiple decision trees and use major- Newspapers
Article Raw Data Article Headlines
ity votes to improve prediction accuracy [11]. Other ensemble online editions Web scraping Filtering

models such as gradient boosting build a series of decision


trees, each aiming to correct and improve the performance Data Processing Module
of the previous ones [12]. In comparison, our work extends Construct News
Preprocessing Label Train test
beyond linear relationships in input data by incorporating deep Article Title
Dataset
and tokenization encoding validation split

learning models, such as recurrent neural networks, which


have demonstrated high performance in capturing complex
Classification Module
patterns and non-linear relationships in the data.
Train
Vectorize input Predict new Evaluate
News article topic classification becomes even more chal- features
classification
unseen data performance
model
lenging when addressing low-resource languages where the
available labeled data is limited and scarce. Classical su-
pervised learning classifiers have been developed for several Fig. 1. Schematic representation of key modules in our approach
languages beside English, including Arabic [10], Polish [11],
Italian [12] and German [13], among others. Deep learning
models have also been utilized to address news article headline A. Data Preparation and Text Representation
classification, including convolutional neural networks and We build a dataset of records by scraping the web for news
recurrent neural networks [14]. We specifically address the article titles, including six high-interest distinct categories,
classification of Albanian news article titles, demonstrating namely: politics, economy, current affairs, sport, culture and

Authorized licensed use limited to: National University Fast. Downloaded on November 01,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
lifestyle. To ensure robustness of our approach, we include po- TABLE II
tentially overlapping categories, such as culture and lifestyle. I LLUSTRATION OF T EXT T OKENIZATION AND PADDING
An example is shown in table I. Original Text Vectorized and Padded Text
Once the data are assembled, to ensure uniform formatting,
tirana mposht pastër teutën dhe [ 0 0 0 0 0 0 0 0 0 0 0 35 1
a preprocessing phase takes place to remove punctuations, rikthehet tek fitorja dopietë e 29 12 148 8 360 39 4 38 118
special symbols, convert all data to lowercase and remove florent hasanit 363 32]
titles with insufficient length. We ensure a well balanced
representation of each category across the entire dataset. A
human supervisor checks the quality and correctness of the
data into a format that is suitable to be utilized as input for
data to ensure quality and relevance of the training data.
machine learning algorithms.

TABLE I B. Recurrent Neural Networks for News Article Classification


I LLUSTRATION OF DATA S AMPLES
When analysing text, humans adopt an almost instinctive
Text Topic principle of breaking down larger pieces of content into
Liverpool mposht Atalanten por eliminohet nga Eu- spo smaller absorbable chunks, creating an internal model to
ropa, Leverkusen mbetet e pathyeshme remember the most relevant aspects that convey meaning and
Reforma zgjedhore: vota e emigranteve, lehtesisht e pol understanding. Recurrent neural networks (RNN) refer to a
arritshme
Disa arsye pse mund te jeni beqare, sipas shkences lif deep learning approach that in order to predict an output, rely
Rritja ekonomike: Turizmi ne zhvillim, kete vit eko on information from prior inputs while maintaining a state of
presim mbi 5 miliarde euro what the network has observed up to that point [20]. RNNs
Drejtonte urbanin ne gjendje te dehur kro
RTSH publikon vendimin e jurise kul possess an internal memory that empowers them to make
past information persist. This peculiar characteristics has made
RNNs widely applicable in natural language processing tasks
Preprocessed cleaned raw data are then transformed into such as news article topic classification. In our context, when
an appropriate format that machine learning algorithms can analysing a news article title, the memory unit of a RNN will
be trained on. That involves tokenizing text, converting it to be utilized to maintain information about past words in the
numerical representation and adding padding to ensure all sequence. As new information comes in, the internal state will
input sequences have the same length. Tokenization consists be continuously updated, therefore establishing connections
of splitting text into smaller units known as tokens, that between past and present elements in the text.
is, news article titles are broken down into a series of in- However, as shown in [21], RNNs in their basic architecture,
dividual words. The tokenized input is then converted into can face significant challenges to properly assign the correct
numerical representations, a form that can be understood and weight (importance) to words that are distant from each other
processed by classification algorithms. This process, known in the sequence. This problem is known as vanishing gradient
as text vectorization, aims to transform a series of tokens problem, where gradients for inputs that are too distant become
into numerical vectors. We leverage the Term Frequency - extremely small during the training process, prohibiting there-
Inverse Document Frequency (TF-IDF) method [19], which fore the model to learn well. To address this issue, a variant of
assign weights to words based on their relevance in a document recurrent neural networks known as Long Short-Term Memory
(that is news article title) and across the entire dataset. This Units (LSTM) were proposed [22]. LSTMs are able to handle
method combines term frequency, that is word occurrence in a well long-term dependencies, because they preserve relevant
document, with inverse document frequency, a way to penalize information from earlier sequences and carry it forward in the
terms that are common across all documents. The objective is network. These types of networks can even learn from events
to create a sparse matrix representation where values denote that have a significant time lag between them.
the importance of each word in the document relative to the The particularity of a long short-term memory unit is the
entire dataset. way that the next state of the carried information is computed.
As it is common in natural language processing tasks, The LSTM can add or remove information from the cell
uniformity in data size is required to effectively process text state based on regulations imposed by some special structures,
data as input. Since news article titles can vary in length, we known as gates. The role of gates is to decide if information
use padding to achieve constant length for all inputs. That is, will pass through or not. Three distinct transformations are
we add a series of zeros to the beginning of sequences to involved that are described by three types of gates that control
ensure that all inputs have the same length. Clearly, by adding the flow of information in the cell state. Figure 2 provides an
a series of zeros we do not affect the semantic meaning of data illustration.
points, but rather ensure efficient computation and processing. The forget gate has to decide which information is relevant
Table II provides an example of a single news article title, in to keep from the prior cell state. It considers the input at a
its original form and then its representation after tokenization, given timestep Xt and the hidden state ht−1 and applies a
vectorization and padding. The objective is to convert raw text sigmoid function, which generates a result between zero and

Authorized licensed use limited to: National University Fast. Downloaded on November 01,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2) Model Optimization: Once the architecture of the neural
network is defined, the model is trained using the Adam
optimizer, a well known enhancement of gradient descent,
widely applicable in neural network trainings. Adam is renown
for its robustness to deal with sparse gradients and noisy data.
Categorical crossentropy is chosen as the loss function for
our multi-class classification task. It measures the discrepancy
between the predicted probability distribution and the true
distribution of the classes. Because neural network architec-
tures tend to have a larger number of trainable parameters in
comparison to more traditional approaches, with the objective
Fig. 2. Illustration of Recurrent Neural Network Gates. The figure depicts of avoiding overtraining, we are implementing early stopping
the architecture of a typical LSTM cell, featuring a cell state an input gate, a as a regularization technique to optimize model performance.
forget gate, an output gate, and a candidate cell state In addition, we employ other regularization techniques such
as Dropout and Batch Normalization to ensure that the model
is robust enough to learn meaningful patterns and generalizes
one. Values of zero denote information that is discarded, while well to previously unseen data.
values of one shows information that is remembered, with
anything in between being partially remembered. C. Traditional Classification Algorithms
The role of the input gate is to identify the elements that In our work, we consider several traditional classification
need to be added to the cell state and the long-term memory models to address the task of news article title classification,
of the network. The input gate decides which values will be including simpler models such as logistic regression, support
updated. To achieve that, a sigmoid function is applied on vector machines and decisions trees, and ensemble models
the current state Xt and the hidden state ht−1 , transforming including random forest and gradient boosting.
the values in the range zero (not relevant) to one (relevant). Logistic regression is a simple, yet effective linear model
The input gate is charged with the task of deciding what whose objective is to estimate the probability of a data point
information is relevant to update in the current cell state. At belonging to a particular category [8]. Logistic regression
this point, the network has enough information to calculate the uses the logistic function to identify underlying relationship
cell state and it is ready to store the information in it. between the input features (that is word embeddings) and
The objective of the output gate is to decide what to output category outputs. Logistic regression is computationally effi-
from the memory cell. It contains information on previous cient, highly interpretable and has demonstrated to work well,
inputs and decides the value of the next hidden state. especially if classes are well separated among one another.
1) Model Architecture: To address the news article topic Support vector machines, on the other hand, identify a
classification problem, we propose a neural network archi- hyperplane to separate data points into distinct categories,
tecture composed of several sequential layers, whose goal maximizing the distance between categories [9]. In comparison
is to process vectorized text data and understand hidden with logistic regression, SVMs are better suited to handle
relationships to perform classification for new unseen article high-dimensional data and capture complex non-linear rela-
titles. tionships, while logistic regression inherently assumes a linear
To map each word in the input news article title to a relationship between the input features and output categories.
dense vector representation, we use an embedding layer, whose Another traditional classification model we utilize is a
goal is to capture semantic meaning of words based on decision tree [10]. The objective of decision trees, as non-
their contextual usage in the dataset. Next, we connect the parametric models, is to partition the input space into non-
embedded layer to a LSTM layer consisting of 128 memory overlapping regions and recursively split this space based
units. The goal is to leverage the LSTM layer to capture decision nodes represented by the most informative features.
long-term dependencies in the input sequences, enabling the Decision trees aim to maximize information gain and assign
algorithm to gain the relevant context for topic understanding, categories based on majority of data points within a region.
and ultimately accurate predictions. Next, after the LSTM Though highly interpretable, decision trees may suffer from
layer has processed input sequences, our model uses a dense poor generalization capabilities and be prone to overfitting.
layer with 64 neuron units aiming to extract high-level features To improve the performance of single decision trees, another
and abstract representations of the input data. This dense common approach is to utilize random forests, as an ensemble
layer drives the classification process to discriminate between learning methodology that constructs multiple decision trees
different news article topics. Finally, a dense layer with six and makes a decision based on the majority vote [11]. In
different units is used as the output layer, with the goal of particular, a single decision tree is trained on a random subset
performing the final classification, yielding the probability of the data, with the goal of introducing randomness and
distributions enabling the model to predict the most likely diversity in the ensemble. More concretely, a random forest
category given a news article title. model aggregates individual predictions to allow the model

Authorized licensed use limited to: National University Fast. Downloaded on November 01,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
to improve its overall performance. Because random forests C. Performance of Classification Algorithms
are ensemble models, they are less prone to overfitting in A series of experimental results were conducted to assess
comparison with decision trees, and generally more robust. the performance of various classification models. Accuracy
Another ensemble learning method utilized in our approach score is used as evaluation measure in test data. Note that
is gradient boosting. The objective is to iteratively build a accuracy is calculated as the proportion of instances classified
sequence of decision trees, each aiming to correct errors made correctly (i.e., true positives and true negatives) over the entire
by preceding models [12]. In the context of news article number of records. Figure 3 shows the performance of various
topic classification, gradient boosting model focuses on the classification algorithms.
most informative parts of the input space, gradually improving
predictive capabilities. Gradient boosting is able to capture
nonlinear relationships of input data, but is computationally
expensive and sensitive to the choice of hyperparameters.

IV. E XPERIMENTS AND R ESULTS


A. Data Preparation
We constructed a dataset consisting of 9600 news article
titles, equally distributed among six categories of 1600 titles
each. The included categories, namely politics, economy,
current affairs, sport, culture and lifestyle provide a diverse
representation of news topics. A balanced distribution across
categories helps to train on a diverse range of topics, while
minimizing bias that might result from uneven class represen- Fig. 3. Performance of classification algorithms with the recurrent neural
tation. However, classification of news article headlines into network achieving the highest accuracy and the decision tree the lowest.
these categories is far from being straightforward. Textual data
are inherently ambiguous. Furthermore, several categories can The recurrent neural network architecture achieves the
overlap and the distinction between them might be blurred. highest accuracy score of 90%. This performance can be
For example, distinguishing cultural phenomena and lifestyle attributed to several key factors. First, RNNs are well-suited
trends, or similarly, current affairs with political or econom- to process sequential data involving text, as it the case of
ical events is challenging due to subjectivity and overlap of news article topic classification. RNN architectures are able to
information categorization. capture dependencies between words in a news article headline
To better understand the structure of the dataset, we con- leveraging their ability to remember information from previous
ducted character and token length analysis. News article titles tokens in the input. This capability empowers RNNs to extract
range from 19 to 143 characters, with approximately 73 the necessary context and meaning of news article titles by
characters on average. In addition, each title is between 6 to considering the entire headline when making predictions and
30 tokens, with an average number of 13 tokens. That means not only words in isolation. Moreover, RNNs are highly
that news article titles are short, concise, yet contain sufficient adaptable and therefore such models are able to recognize
information for classification. well subtle differences or nuances between news article titles
belonging to different categories.
The entire dataset was constructed by web scraping major
Besides RNNs, simpler and more traditional models such
online editions Albanian newspapers. Titles were annotated
as logistic regression and support vector machines achieve a
with their respective category through an automated process,
notably high accuracy score of 85%. Despite their simplicity,
whenever such information was clearly available on the web.
such models are effective when required to separate data points
A human supervisor labeled the rest of the data and double
into distinct classes, especially when data may exhibit clear
checked the accuracy of headlines categorized automatically,
boundaries between categories. That is often the case of some
to ensure accuracy and correctness of data.
of the chosen categories, such as sport, politics and economy.
Further digging into results shows that both models struggle
B. Training Procedure and Data Splitting
more when boundaries between categories (as is the case of
The entire dataset comprising of 9600 records was randomly culture and lifestyle) is blurred and often overlapping.
divided into distinct subsets for training, validation and testing. In contrast, ensemble models such as random forest and
In particular, following commonly established practices 80% gradient boosting showed poorer performance in comparison,
of the data was allocated for training and the remaining with accuracy scores of 77% and 75%. Though generally such
20% for testing. To increase the robustness of the model and models are less prone to overfitting, they might struggle to dis-
minimize the risk of overfitting, the training set was further tinguish between nuances present in the text. In addition, the
divided, setting aside 25% of it for validation. Reproducibility dependence of random forest models on the mode of classes
of results across different runs is also ensured. and the iterative correction of errors in gradient boosting might

Authorized licensed use limited to: National University Fast. Downloaded on November 01,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
not be able to fully capture the overlapping nuances of various R EFERENCES
news article topic headlines, leading to lower accuracy in [1] A. Palanivinayagam, C. El-Bayeh, R. Damaševičius. “Twenty Years
comparison to RNNs and logistic regression. of Machine-Learning-Based Text Classification: A Systematic Review,”
Lastly, decision tree models achieved a low accuracy of Algorithms, 2023.
[2] A. Joshi, E. Fidalgo, E. Alegre, L. Fernández-Robles. “L DeepSumm:
only 58%. Decision trees segment the input feature space using Exploiting topic models and sequence to sequence networks for extrac-
decision nodes and categorize each news article headline based tive text summarization,” Expert Systems with Applications, 2023.
on a majority class. This procedure, however, tends to overfit [3] D. Khurana, A. Koli, K. Khatter, K, S. Singh, S. “Natural language
processing: State of the art, current trends and challenges,” Multimedia
the training data, resulting therefore in poor generalization tools and applications, 2023, vol 82, pp. 3713–3744.
capabilities, and consequently, low performance on previously [4] E. Çano, D. Lamaj, D. “AlbNews: A Corpus of Headlines for Topic
unseen test data. Modeling in Albanian, ” arXiv preprint arXiv:2402.04028, 2024.
[5] V. Blaschke, H. Schuetze, B Plank. “A survey of corpora for Germanic
lowresource languages and dialects,” In Proceedings of the 24th Nordic
D. Limitations Conference on Computational Linguistics (NoDaLiDa), 2024, pp. 392–
One limitation is the inherent ambiguity and overlap be- 414.
[6] C. H. Papadimitriou, H. Tamaki, P. Raghavan, S. Vempala. “Latent
tween some class categories. Exploring various cases of mis- semantic indexing: a probabilistic analysis,” In Proceedings of the Sev-
classification revealed that in several cases, even for a human enteenth ACM SIGACT-SIGMODSIGART Symposium on Principles of
supervisor, it would be difficult to assign one unique category Database Systems, 1998, pp. 159–168.
[7] T. Hofmann. “Probabilistic latent semantic indexing,’ ’In Proceedings
to a certain news article headlines. For example, distinguishing of the 22nd Annual International ACM SIGIR Conference on Research
between culture and lifestyle categories is challenging as and Development in Information Retrieval, 1999, pp 50–57.
both classes share common themes. Similarly, current events, [8] D.R. Cox. “The regression analysis of binary sequences. Journal of the
Royal Statistical Society: Series B (Methodological)”, 1958, vol. 20(2),
politics and economics topics can be ambiguous, as all three pp 215–232.
categories may intersect with one another. [9] C. Cortes, V. Vapnik. “Support-vector networks. Machine Learning”,
Another limitation of neural networks, despite having the 1995, vol. 20(3), pp. 273–297.
[10] C. Sammut, G.I. Webb. “Decision Tree In Encyclopedia of Machine
highest performance, is that they are less explainable and Learning,” 2011, Springer, Boston, MA. [Link]
interpretable, making it challenging to understand the un- 387-30164-8 832
derlying decision-making process. To address these limita- [11] T. K. Ho. “Random decision forests,” In Proceedings of 3rd international
conference on document analysis and recognition, 1995, vol. 1, pp. 278–
tions, as future work, we will consider to refine classification 282.
boundaries by integrating domain-specific knowledge. Further- [12] J. H. Friedman, “Greedy function approximation: a gradient boosting
more gradient-based attribution methods and human-in-the- machine”, 2001, Annals of Statistics, pp. 1189–1232.
[13] L. A. Qadi, H. E. Rifai, S. Obaid and A. Elnagar. “Arabic Text
loop techniques can help to increase the interpretability of Classification of News Articles Using Classical Supervised Classifiers,”
deep learning models. 2nd International Conference on new Trends in Computing Sciences
(ICTCS), 2019, pp. 1–6.
V. C ONCLUSION [14] T. Walkowiak, P. Malak. “Polish Texts Topic Classification Evaluation,”
In ICAART, 2018, vol. 2, pp. 515–522.
This paper presented an approach that utilizes machine [15] A. Petukhova, N. Fachada.“MN-DS: A multilabeled news dataset for
learning classification algorithms to address the problem of news articles hierarchical classification”, Data, 2023, vol. 8.
[16] S. Parida, P. Motlicek, S.R. Dash. “German News Article Classification:
categorizing Albanian news article headlines into predefined A Multichannel CNN Approach,” Advances in Systems, Control and
classes. A novel dataset consisting of 9600 records was Automations, Lecture Notes in Electrical Engineering, 2021, vol. 708.
meticulously constructed and leveraged to advance research in [17] Z. Zhai, X. Zhang, F. Fang, , L. Yao. “Text classification of Chinese
news based on multi-scale CNN and LSTM hybrid model,” Multimedia
natural language processing for low-resource languages such Tools and Applications, 2023, vol. 82(14), pp. 20975–20988.
as Albanian. [18] L. Shkurti, F. Kabashi, V. Sofiu, A. Susuri. “Performance Comparison
The findings of this work underscore the effectiveness of of Machine Learning Algorithms for Albanian News articles”, 2022,
IFAC-PapersOnLine, vol. 55(39), pp. 292-295.
deep learning methods such as recurrent neural networks in [19] A. Kadriu, L. Abazi. “A Comparison of Algorithms for Text Classifica-
automated news article classification. Moreover, a wide range tion of Albanian News Articles”, ENTRENOVA - ENTerprise REsearch
of traditional machine learning approaches such as logistic InNOVAtion, 2017, vol. 3(1), pp. 62–68.
[20] A. Kadriu, L. Abazi, H. Abazi. “Albanian Text Classification: Bag of
regression, support vector machines and decisions trees, in Words Model and Word Analogies,” Business Systems Research, 2019,
addition to ensemble methods including random forest and vol. 10(1), pp. 74–87.
gradient boosting were evaluated on the constructed dataset. [21] E. Çano, D. Lamaj, D. “AlbNews: A Corpus of Headlines for Topic
Modeling in Albanian, ” arXiv preprint arXiv:2402.04028, 2024.
This work opens up several potential research directions. [22] C. Sammut, G.I. Webb. “TF-IDF In Encyclopedia of Machine Learning,”
One approach is to further explore deep learning architectures 2011, Springer, Boston, MA. [Link]
through techniques such as attention mechanisms, or leverage 8 832
[23] R.M., Schmidt, R. M. “Recurrent neural networks (rnns): A gentle
pre-trained large language models. Secondly, to overcome the introduction and overview,” 2019, arXiv preprint arXiv:1912.05911.
inherent ambiguity between certain news article categories, [24] S.F. Ahmed, M.S.B. Alam, M. Hassan, M.R. Rozbu, T. Ishtiak, N. Rafa,
we might explore more sophisticated models, or human in- A.H. Gandomi. “Deep learning modelling techniques: current progress,
applications, advantages, and challenges”, 2023, Artificial Intelligence
tervention to better capture subtle textual nuances. Thirdly, Review, vol. 56(11), pp. 13521–13617.
the expansion of the current dataset, both in terms of records, [25] S. Hochreiter, J. Schmidhuber. “Long short-term memory. Neural Com-
features and categories could increase the generalization ca- putation,” 1997, vol. 9(8), pp. 1735–1780.
pabilities of our model.

Authorized licensed use limited to: National University Fast. Downloaded on November 01,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.

You might also like