Professional Documents
Culture Documents
5, OCTOBER 2021
Abstract— Crop cultivation prediction is an integral part of agricultural productivity and reduce the impact on the envi-
agriculture and is primarily based on factors such as soil, ronment. Accurately identifying crops for cultivation, based
environmental features like rainfall and temperature, and the on soil and environmental factors, is critical to agricultural
quantum of fertilizer used, particularly nitrogen and phosphorus.
These factors, however, vary from region to region: consequently, productivity and has been an active research topic for decades.
farmers are unable to cultivate similar crops in every region. This Most of the existing approaches use machine learning (ML)
is where machine learning (ML) techniques step in to help find the for crop yield estimation, though very little has been done to
most suitable crops for a particular region, thus assisting farmers predict region-specific crops based on soil and environmental
a great deal in crop prediction. The feature selection (FS) facet parameters. Parameters such as soil type, nutrients (nitrogen,
of ML is a major component in the selection of key features
for a particular region and keeps the crop prediction process phosphorus, and potassium), micronutrients (iron, boron, and
constantly upgraded. This work proposes a novel FS approach manganese), temperature, and rainfall influence crop cultiva-
called modified recursive feature elimination (MRFE) to select tion. Since the parameters differ for every zone, thus making
appropriate features from a data set for crop prediction. The pro- for a massive crop prediction data set, there is a need to select
posed MRFE technique selects and ranks salient features using a crucial features that help identify suitable crops for specific
ranking method. The experimental results show that the MRFE
method selects the most accurate features, while the bagging tech- areas of land. The process is carried out using feature selection
nique helps accurately predict a suitable crop. The performance (FS) techniques.
of proposed MRFE technique is evaluated by various metrics ML algorithms play a major role in prediction. For enhanced
such as accuracy (ACC), precision, recall, specificity, F1 score, ML performance, FS techniques [1]–[6] are used to reduce
area under the curve, mean absolute error, and log loss. From overfitting and ascertain salient features from the data set
the performance analysis, it is justified that the MRFE technique
performs well with 95% ACC than other FS methods. for the prediction process. The FS technique is divided into
three categories: filter [7], wrapper [8], and embedded [9].
Index Terms— Agriculture, classification, crop prediction, Filter methods are independent of the performance of the
feature selection (FS), modified recursive feature elimination
(MRFE). classifier, whereas wrapper methods select features based on
its performance. The embedded method, which combines the
filter and wrapper methods, is somewhat similar to the latter.
I. I NTRODUCTION This work pays special attention to wrapper FS techniques.
The features selected are fed to the k-nearest neighbor (kNN),
A GRICULTURAL research has strengthened the economy
worldwide, and is an area that offers humanity, as whole,
inestimable benefits. Crop prediction in agriculture continues
Naive Bayes (NB), decision tree (DT), support vector machine
(SVM), random forest (RF), and bagging classifiers to predict
to present difficulties, notwithstanding current developments a suitable crop, and evaluate the performance of the FS
that include the use of an array of technological resources, process. The objective of this work is to select key features
tools, and procedures. Agri-technology and precision farming, from a data set and improves crop prediction performance.
now termed virtual farming, have emerged as new scientific The main contribution of this work is to propose a novel
areas of interest that use data-intensive methods to boost modified recursive feature elimination (MRFE) technique to
select the most appropriate key features using permutation crop
Manuscript received January 28, 2021; revised March 31, 2021; accepted data set based on soil and environmental factors, while using
April 17, 2021. Date of publication May 5, 2021; date of current version
September 30, 2021. (Corresponding author: G. Mariammal.) permutation data set, the algorithm need not to be updated with
G. Mariammal and A. Suruliandi are with the Department of Computer the data set at each iteration, so it reduces the computational
Science and Engineering, Manonmaniam Sundaranar University, Tirunelveli time than existing RFE method.
627012, India (e-mail: suba.g1212@gmail.com; suruliandi@yahoo.com).
S. P. Raja is with the School of Computer Science and Engi-
neering, Vellore Institute of Technology, Vellore 632014, India (e-mail:
avemariaraja@gmail.com). A. Related Work
E. Poongothai is with the Department of Computer Science and Engineering,
SRM University, Chennai 603203, India (e-mail: poongothai.rp@gmail.com). Several studies on FS that have been undertaken for
Digital Object Identifier 10.1109/TCSS.2021.3074534 improved prediction are discussed in this section.
2329-924X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on November 15,2022 at 06:03:48 UTC from IEEE Xplore. Restrictions apply.
MARIAMMAL et al.: PREDICTION OF LAND SUITABILITY FOR CROP CULTIVATION BASED ON SOIL AND ENVIRONMENTAL CHARACTERISTICS 1133
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on November 15,2022 at 06:03:48 UTC from IEEE Xplore. Restrictions apply.
1134 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 8, NO. 5, OCTOBER 2021
A. Existing FS Techniques
1) Sequential Forward Feature Selection: Sequential fea-
ture selection (SFS) is a wrapper-based FS technique. This
Fig. 2. Flow diagram of MRFE process.
algorithm is divided into two, SFFS and sequential backward
feature selection (SBFS). This work takes the SFFS for the FS
process, the working of which is given in [29]. It starts with
represents features.
an empty set, selects important features from the data set, and
repeats the process until every important feature is selected. 1 4 7
The SFFS algorithm is based on the Akaike information Example: Given matrix 2 5 8
criterion (AIC) value for FS [30]. 3 6 9
2) Boruta: The Boruta algorithm is a wrapper FS technique 4 1 7
built around the RF classification algorithm. The advantage Shuffled matrix 8 5 2.
of RF classification is that it runs quickly and, in addition, 3 6 9
estimates the importance of features [16]. The results provide
a Z score. In the Boruta, the Z score has a great impact on This process does not affect the feature values. A data set
the FS technique. The pseudo code of the Boruta algorithm is that contains n × m records shows no change following
mentioned in [31]. permutation application.
3) Recursive Feature Elimination: The RFE is the most 2) The shuffled data set is then combined with the input
frequently used wrapper FS technique. The RFE starts with a data set, i.e., the crop data set.
whole data set and removes its weak features using a ranking In the example below, the given matrix is combined with
method. It then updates the data set and continues the process the shuffled matrix.
until all the weak features are eliminated. In the RFE, the Gini 1 4 7 4 1 7
importance ranking method is used for feature elimination. The Combined matrix 2 5 8 8 5 2
pseudo code for the RFE technique is given in Algorithm 1. 3 6 9 3 6 9.
3) Finally, the combined crop data set is merged with the
Algorithm 1 The Pseudo Code of the RFE Algorithm [13] Is original data set, and then shuffling process is applied to
Given Below obtain the extended data set. It also known as permuted
Inputs: crop data set. Furthermore, this data set is used to find
Training set T, the importance of features.
Set of p features F = {f1 , . . . , fp }
In the example below, the given matrix is extended with the
Ranking Method RM(T,F)
help of the combined matrix.
Output:
Final ranking R 3 4 9 4 7 1 8 5 9
Code: Extended matrix 1 6 7 5 2 8 4 6 2.
Repeat for i in (1:p) 2 5 8 6 9 3 3 1 7
Rank set F using RM(T,F)
f ∗ ← last ranked feature in F Extending the data set results in a drop in the standard
R(p − i + 1) ← f ∗ deviation value, indicating that the value is close to the mean.
F ← F−f ∗ The permutation process offers two distinct advantages. The
first is its ability to standardize the co-efficient of variables
to help the ranking process eliminate weak features from the
data set. The second is that the model needs no retraining as
B. Proposed FS Technique
forward or backward, thus making it faster than the existing
1) Modified Recursive Feature Elimination: The proposed RFE technique.
MRFE technique removes weak features from the data set Step 2: Finding the Most Important Features:
using the permutation data set and ranking method. The The RF classifier is used to discover the most important
permutation data set shuffles the values in each field and features as well as the mean decrease value that helps find
duplicates the crop data set fed as input. Fig. 2 shows the the Z score. The extended crop data set is fed into the RF
process of the MRFE technique. classifier to find the most important features. The two main
Step 1: Initiating the Permutation Process: parameters of the RF classifier are as follows.
1) The given input crop data set is considered an n × m 1) mtry: This refers to the number of variables that are
matrix, where n represents records of each area and m used as each split, and is called the mtry parameter. The
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on November 15,2022 at 06:03:48 UTC from IEEE Xplore. Restrictions apply.
MARIAMMAL et al.: PREDICTION OF LAND SUITABILITY FOR CROP CULTIVATION BASED ON SOIL AND ENVIRONMENTAL CHARACTERISTICS 1135
recommended value for the mtry is the root square of Algorithm 2 The Pseudo Code for the Novel Approach MRFE
the number of features. Technique Is Given Below
2) ntree: This refers to the number of trees, called the ntree
Input:D ← Dataset, F ← Featur es f1 , f2 , . . . , fp ,
parameter, which decides the splitting range of trees in RM←Ranking Method
the forest, with the default ntree used in the RF classifier Output: F1 ← Selected Features
being 100. Let a be the entire crop data set
Step 3: Finding Z Score: Let b be the shuffled crop data set
The Z score is the standard score that is used to compare Let c be the combined data set of crop data set and shuffled
the importance of the features selected. To fine-tune the crop data set
performance of the RF classifier and evaluate it in this work, Let d be the extended data set of crop data set
the ntree value is altered from 100 to 500. The basic Z score Begin
formula is given as follows: Initialization:F = f1 , f2 , . . . , fp , F1 = ∅
Z score = mean decrease accuracy loss(x − μ) ÷ σ The algorithm start with whole data set D
a←D
where x represents the observed value, μ the mean value of b ← shuffle(a)
the samples, and σ the standard deviation of the samples. c ← cbind(a + b)
Step 4: Applying the Ranking Method: d ← shuffle(cbind(a + c))
Finally, a ranking method is applied to find weak soil Apply RF to d for finding the importance of features
and environmental features from the data set. Several ranking Zscore ← RF (d)
methods [32], [33] are used for FS. This work evaluates the Apply Ranking Method after predicting Zscore
performance of rank aggregation [34], Gini importance [27], for all weak features
PIMP [15], and actual impurity reduction importance (AIR- repeat for 1 to p
IMP) [35] to find the best ranking method for FS so as to refine R1←RM (d,F)
the crop prediction process. The AIRIMP ranking method W1←Calculate the weak feature from R1
outperforms others and is discussed below in the section on if R1≤W1
results. Hence, it is used in the proposed MRFE FS technique remove weak features from F
to rank every feature, from the best to the worst. else if R1>W1
2) Algorithm for MRFE: The pseudo code for the proposed return F1
MRFE technique is given in Algorithm 2. end if
end for
III. C LASSIFICATION T ECHNIQUE end for
Classification is the learning process used in ML to predict Repeat this procedure until terminate condition is satisfied
the target class of a given input. Classification technique is Terminate if all weak features are eliminated from
divided into two, supervised and unsupervised. In this work, thedata set then stop the process
supervised learning methods such as the kNN, NB, DT, SVM, End
RF, and Bagging are used for the crop prediction process.
In addition, they help evaluate the performance of the FS C. Decision Tree
technique.
The DT is a supervised learning model with a tree-like
structure. Each internal node is labeled with an input fea-
A. k Nearest Neighbor ture [25] and follows a top-down approach. Each leaf node is
The kNN is a supervised learning process that predicts labeled with the class used to predict the target variable [25].
a suitable crop, based on the closest training samples, and For the DT, which holds the prediction class, tree splitting is
is centered on a distance measurement for the prediction important. Using the splitting, data values from the testing set
process [14]. Using the distance measurement, a new sample are used to identify a suitable crop.
from the testing set is allocated to a particular target class,
based on how closely it matches the training set.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on November 15,2022 at 06:03:48 UTC from IEEE Xplore. Restrictions apply.
1136 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 8, NO. 5, OCTOBER 2021
F. Bagging
Bagging, also known as bootstrap aggregating, is an ensem-
ble of a meta-algorithm used to improve the stability and ACC
of ML algorithms [28]. It combines weak learners for the
prediction process. The samples taken from the training data
set are used to train the classifier to predict the most suitable
crop for cultivation. The classifier then takes votes for each
sample, and this is used to improve the prediction level. Bag-
ging is a special feature of the model average approach [28]
and eliminates the need for weight updation. The pseudo code
for the Bagging technique is given in Algorithm 3.
B. Testing Phase
Algorithm 3 The Pseudo Code [38] for the Bagging Algo- The remaining subset of the data set is used to validate the
rithm Is Given Below classifier in the testing phase.
Input: D→Data set, T→Training set of examples of size N, Step 8: The remaining 30% of samples from the reduced
k→number of bootstrapsamples, LA→Learning Algorithm data set are taken as testing samples.
Output: C∗ bagging ensemble with k component classifiers Step 9: The learned classifier is applied in the testing
Learning phase: samples for the prediction process.
for i=1 to k do Step 10: The learning classifier finds the target class for
Si ← bootstrap sample from T the new given input and predicts the most suitable crop for a
Generate classifier Ci ← LA(Si ) specific land area.
end for Step 11: From the results, the most suitable crop for
Predicting class
label for new instance x: cultivation is recommended.
C∗ (x) = arg maxy ki=1 [Ci (x) = y]
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on November 15,2022 at 06:03:48 UTC from IEEE Xplore. Restrictions apply.
MARIAMMAL et al.: PREDICTION OF LAND SUITABILITY FOR CROP CULTIVATION BASED ON SOIL AND ENVIRONMENTAL CHARACTERISTICS 1137
absolute error (MAE), area under the curve (AUC), log loss
(LL), and out of bag (OOB).
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on November 15,2022 at 06:03:48 UTC from IEEE Xplore. Restrictions apply.
1138 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 8, NO. 5, OCTOBER 2021
TABLE IV TABLE VI
P ERFORMANCE E VALUATION OF THE M ODIFIED RFE VARYING P ERFORMANCE E VALUATION OF THE M ODIFIED RFE VARYING R ANKING
RF PARAMETER NTREE M ETHOD
TABLE VII
P ERFORMANCE E VALUATION OF M ODIFIED RFE U SING VARIOUS C LAS -
TABLE V SIFIER T ECHNIQUES
P ERFORMANCE E VALUATION OF THE M ODIFIED RFE VARYING
RF PARAMETER MTRY
TABLE VIII
P ERFORMANCE E VALUATION OF VARIOUS FS T ECHNIQUES W ITH THE
BAGGING C LASSIFIER
describe the performance of the proposed MRFE FS technique
when the ntree and mtry are varied.
Table IV depicts the fine-tuned performance of the RF
classifier. The result is used to select the number of trees
that is to be generated, ranging from 100 to 500, to find the
most important features. The range chosen is based primarily
on the OOB error rate. It is evident from Table IV that the
ntree 400 minimizes the OOB error rate and makes better
predictions. Hence, the RF performs better in the splitting
range of 400. Furthermore, the lower LL value offers better to obtain better ranking. Since it combines different ranking
predictions, and the results show that the ntree splitting range orders for finding feature importance, it is more complicated
of 400 shows a lower LL value and a better AUC rate, than other methods. The PIMP results are better than those
compared with other splitting ranges. of the Gini, though the running time is slower than that of
Table V illustrates the fine-tuned performance of RF clas- the Gini and AIRIMP. The AIRIMP ranking method, which
sifier while varying the parameter “mtry.” It helps to find the performs best of all, is used for feature ranking in the proposed
number of attributes at each split that used for RF model. From technique.
the result, it is observed that the RF classifier performs better 3) Performance Evaluation of Modified RFE Technique
for mtry = 4 than other ranges. Using Various Classifiers: This evaluation identifies the best
2) Performance Evaluation of the Modified RFE Varying classifier to predict a suitable crop for cultivation. Table VII
Ranking Method: The ranking method ranks features in order, shows the performance of the MRFE technique using classi-
from the best to the worst. It eliminates irrelevant features fiers such as the kNN, NB, DT, SVM, RF, and Bagging.
in the process, greatly impacting the FS process. Table VI Table VII shows that the MRFE technique produces a better
illustrates the performance of the MRFE FS technique using prediction rate with the bagging classifier. Furthermore, it has a
ranking methods such as rank aggregation, Gini importance, lower LL value, indicative of higher prediction ACC. Bagging
PIMP, and AIRIMP. combines several weak learners for the prediction process,
The ranking methods mentioned above play a major role in takes votes for each sample, and aggregates it. It is on this
FS techniques. These methods help order features in terms of basis that the bagging method accurately classifies crops, for
importance and eliminate weak ones from the data set. It is particular stretches of land, better than other classification
observed from Table VI that the AIRIMP method provides techniques.
better results than the others, involving less time and bias. 4) Performance Evaluation of Various Features Selection
The rank aggregation method combines different ranking order Techniques With the Bagging Classifier: Table VIII presents a
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on November 15,2022 at 06:03:48 UTC from IEEE Xplore. Restrictions apply.
MARIAMMAL et al.: PREDICTION OF LAND SUITABILITY FOR CROP CULTIVATION BASED ON SOIL AND ENVIRONMENTAL CHARACTERISTICS 1139
TABLE IX TABLE X
P ERFORMANCE E VALUATION OF MRFE W ITH THE BAGGING C LASSIFIER P ERFORMANCE E VALUATION OF MRFE W ITH THE BAGGING C LASSIFIER
BASED ON S OIL FACTORS BASED ON E NVIRONMENTAL FACTORS
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on November 15,2022 at 06:03:48 UTC from IEEE Xplore. Restrictions apply.
1140 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 8, NO. 5, OCTOBER 2021
TABLE XIV
P ERFORMANCE E VALUATION OF MRFE W ITH O THER FS T ECHNIQUES
U SING THE BAGGING C LASSIFIER FOR B ENCHMARK D ATA S ETS
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on November 15,2022 at 06:03:48 UTC from IEEE Xplore. Restrictions apply.
MARIAMMAL et al.: PREDICTION OF LAND SUITABILITY FOR CROP CULTIVATION BASED ON SOIL AND ENVIRONMENTAL CHARACTERISTICS 1141
used to examine the FS technique with the classifier for the [4] P. S. Maya Gopal and R. Bhargavi, “Feature selection for yield prediction
multiclass problem. Table XII indicates that the MRFE with in boruta algorithm,” Int. J. Pure Appl. Math., vol. 118, no. 22,
pp. 139–144, 2018.
all the classifiers performed well for the class, chickpeas, but [5] S. Ji, S. Pan, X. Li, E. Cambria, G. Long, and Z. Huang, “Suicidal
poorly for the class, maize. The results show that the MRFE ideation detection: A review of machine learning methods and applica-
with the bagging technique offers higher ACC than the other tions,” IEEE Trans. Comput. Social Syst., vol. 8, no. 1, pp. 214–226,
Feb. 2021.
classifiers.
[6] K. Ranjini, A. Suruliandi, and S. P. Raja, “An ensemble of heterogeneous
9) Performance Evaluation of MRFE With Other FS Tech- incremental classifiers for assisted reproductive technology outcome
niques Using the Bagging Classifier for Other Benchmark prediction,” IEEE Trans. Comput. Social Syst.early access, Nov. 3, 2020,
Data Sets: To ensure the relevance of the proposed MRFE doi: 10.1109/TCSS.2020.3032640.
[7] H. Liu and R. Setiono, “A probabilistic approach to feature selection-a
technique to data sets other than crop-related ones, miscella- filter solution,” in Proc. 13th Int. Conf. Int. Conf. Mach. Learn., vol. 96,
neous data sets were downloaded from the UCI Repository. 1996, pp. 319–327.
Table XIII describes that the three data sets which are used to [8] R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artif.
Intell., vol. 97, nos. 1–2, pp. 273–324, Dec. 1997.
examine the MRFE technique are suitable for all kind of data
[9] H. Wang, M. Taghi Khoshgoftaar, and K. Gao, “Ensemble feature
set. Table XIV below displays the performance of the MRFE selection technique for software quality classification,” in Proc. 22nd
wrapper FS technique as against the others, and analyzes its Int. Conf. Softw. Eng. Knowl. Eng., 2010, pp. 215–220.
suitability to diverse benchmark data sets. [10] B. Gregorutti, B. Michel, and P. Saint-Pierre, “Correlation and vari-
able importance in random forests,” Statist. Comput., vol. 27, no. 3,
Table XIV reveals that the proposed MRFE technique is pp. 659–678, May 2017.
as relevant to crop data sets as it is to others. Owing to the [11] M. A. Hall and G. Holmes, “Benchmarking attribute selection techniques
use of permutation and ranking, the MRFE selects the most for discrete class data mining,” IEEE Trans. Knowl. Data Eng., vol. 15,
appropriate features with improved prediction ACC in the least no. 6, pp. 1437–1447, Nov. 2003.
[12] H. Liu and L. Yu, “Toward integrating feature selection algorithms for
time, compared with other techniques. classification and clustering,” IEEE Trans. Knowl. Data Eng., vol. 17,
no. 4, pp. 491–502, Apr. 2005.
[13] P. M. Granitto, C. Furlanello, F. Biasioli, and F. Gasperi, “Recur-
VI. C ONCLUSION sive feature elimination with random forest for PTR-MS analysis of
agroindustrial products,” Chemometric Intell. Lab. Syst., vol. 83, no. 2,
Predicting a suitable crop for cultivation is critical to agri- pp. 83–90, Sep. 2006.
culture. In this work, the MRFE, a novel approach, has been [14] A. Araåzo-Azofra and J. M. Benítez, “Empirical study of feature
proposed for selecting salient features using a permutation selection methods in classification,” in Proc. 8th Int. Conf. Hybrid Intell.
Syst., Barcelona, Spain, Sep. 2008, pp. 584–589.
crop data set and a ranking method to identify the most suitable
[15] A. Altmann, L. Toloåi, O. Sander, and T. Lengauer, “Permutation
crop for a particular region. Experiments were conducted importance: A corrected feature importance measure,” Bioinformatics,
to evaluate the efficiency of the proposed MRFE technique vol. 26, no. 10, pp. 1340–1347, May 2010.
using the kNN, NB, DT, SVM, RF, and bagging classification [16] M. B. Kursa and W. R. Rudnicki, “Feature selection with the Boruta
package,” J. Stat. Softw., vol. 36, no. 11, pp. 1–13, 2010.
techniques to predict the most suitable crops for cultivation.
[17] G. Ruß and R. Kruse, “Feature selection for wheat yield prediction,”
Soil and environmental factors were considered for an analysis in Research and Development in Intelligent Systems. London, U.K.:
of the crop prediction process. The results indicate that the Springer, 2010, pp. 465–478.
MRFE with the bagging technique classifier gives better crop [18] B. F. Darst, K. C. Malecki, and C. D. Engelman, “Using recursive feature
elimination in random forest to account for correlated variables in high
prediction ACC than the MRFE with other classifiers. The dimensional data,” BMC Genet., vol. 19, no. S1, p. 65, Sep. 2018.
performance of the MRFE technique for the crop data set was [19] J.-Y. Hsieh, W. Huang, H.-T. Yang, C.-C. Lin, Y.-C. Fan, and H. Chen,
assessed and compared with existing techniques like the SFFS, “Building the rice blast Disease Prediction Model based on Machine
Learning and Neural Networks,” Easy Chair World Sci., vol. 1197,
Boruta, and RFE. Furthermore, the suitability of the proposed pp. 1–8, Dec. 2019.
MRFE technique was evaluated using three benchmark data [20] H. Liu, J. Li, and L. Wong, “A comparative study on feature selection
sets. The results show that the proposed MRFE technique and classification methods using gene expression profiles and proteomic
outperforms the others. Nevertheless, the MRFE technique patterns,” Genome Informat., vol. 13, no. 13, pp. 51–60, 2002.
[21] J. Camargo and A. Young, “Feature selection and non-linear classifiers:
needs performance-wise improvements before it can be used Effects on simultaneous motion recognition in upper limb,” IEEE Trans.
in large feature data sets. Neural Syst. Rehabil. Eng., vol. 27, no. 4, pp. 743–750, Apr. 2019.
[22] M. B. Kursa, A. Jankowski, and W. R. Rudnicki, “Boruta—A system
for feature selection,” Fundam. Inf., vol. 101, no. 4, pp. 271–285, 2010.
ACKNOWLEDGMENT [23] R. Rajasheker Pullanagari, G. Kereszturi, and I. Yule, “Integrating
airborne hyperspectral, topographic, and soil data for estimating pasture
The authors would like to thank the Department of Agricul- quality using recursive feature elimination with random forest regres-
ture, Sankarankovil Taluk, Tenkasi, India, for providing data sion,” Remote Sens., vol. 10, no. 7, pp. 1117–1130, 2018.
for the analysis. [24] A. Choudhary, S. Kolhe, and H. Rajkamal, “Performance Evaluation of
feature selection methods for Mobile devices,” Int. J. Eng. Res. Appl.,
vol. 3, no. 6, pp. 587–594, 2013.
R EFERENCES [25] F. Balducci, D. Impedovo, and G. Pirlo, “Machine learning applications
on agricultural datasets for smart farm enhancement,” Machine, vol. 6,
[1] A. Mark Hall, “Feature selection for discrete and numeric class machine no. 3, pp. 38–59, 2018.
learning,” Comput. Sci., Univ. Waikato, pp. 359–366, Dec. 1999. [26] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and
[2] I. Guyon and A. Elisseeff, “An introduction to variable and feature V. Vapnik, “Feature selection for SVMs,” in Proc. Adv. Neural Inf.
selection,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, Jan. 2003. Process. Syst., vol. 13, 2001, pp. 668–674.
[3] R. Gilad-Bachrach, A. Navot, and N. Tishby, “Margin based feature [27] A. Bahl et al., “Recursive feature elimination in random forest classifica-
selection–theory and algorithms,” in Proc. 21st Int. Conf. Mach. Learn. tion supports nanomaterial grouping,” NanoImpact, vol. 15, Mar. 2019,
(ICML), 2004, p. 43. Art. no. 100179.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on November 15,2022 at 06:03:48 UTC from IEEE Xplore. Restrictions apply.
1142 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 8, NO. 5, OCTOBER 2021
[28] D. H. Zala and M. B. Chaudhri, “Review on use of BAGGING technique A. Suruliandi received the B.E. degree in electron-
in agriculture crop yield prediction,” Int. J. Sci. Res. Develop., vol. 6, ics and communication engineering from the Coim-
no. 8, pp. 675–677, 2018. batore Institute of Technology, Coimbatore, India,
[29] F. Shirbani and H. Soltanian Zadeh, “Fast SFFS-based algorithm for in 1987, the M.E. degree in computer science and
feature selection in biomedical datasets,” Amirkabir Int. J. Sci. Res., engineering from the Government College of Engi-
vol. 45, pp. 43–56, Dec. 2013. neering, Tirunelveli, India, in 2000, and the Ph.D.
[30] M. Gopal P S and B. R, “Selection of important features for optimizing degree from Manonmaniam Sundaranar University,
crop yield prediction,” Int. J. Agricult. Environ. Inf. Syst., vol. 10, no. 3, Tirunelveli, in 2009.
pp. 54–71, Jul. 2019. He is currently working as a Professor with the
[31] W. Paja, K. Pancerz, and P. Grochowalski, “Generational feature elimi- Department of Computer Science and Engineering,
nation and some other ranking feature selection methods,” in Advances Manonmaniam Sundaranar University. He has more
in Feature Selection for Data and Pattern Recognition, vol. 138. Cham, than 29 years of teaching experience. He has authored 50 articles in interna-
Switzerland: Springer, 2018, pp. 97–112. tional journals, 23 in IEEE Xplore publications, 33 in national conferences,
[32] H. Stoppiglia, G. Dreyfus, R. Dubois, and Y. Oussar, “Ranking a random and 13 in international conferences. His research areas are remote sensing,
feature for variable and feature selection,” J. Mach. Learn. Res., vol. 3, image processing, and pattern recognition.
pp. 1399–1414, 2003.
[33] Z. Karimi, M. Mansour Riahi Kashani, and A. Harounabadi, “Feature
ranking in intrusion detection dataset using combination of filtering
methods,” Int. J. Comput. Appl., vol. 78, no. 4, pp. 21–27, Sep. 2013.
[34] V. Pihur, S. Datta, and S. Datta, “RankAggreg, an R package for S. P. Raja was born in Tuticorin, India. He received
weighted rank aggregation,” BMC Bioinf., vol. 10, no. 1, p. 62, 2009. the B.Tech. degree in information technology from
[35] S. Nembrini, I. R. König, and M. N. Wright, “The revival of the gini the Dr. Sivanthi Aditanar College of Engineer-
importance?” Bioinformatics, vol. 34, no. 21, pp. 3711–3718, Nov. 2018. ing, Tiruchendur, India, in 2007, and the M.E.
[36] M. A. Al Maruf and S. Shatabda, “IRSpot-SF: Prediction of recom- degree in computer science and engineering and
bination hotspots by incorporating sequence based features into Chou’s the Ph.D. degree in image processing from Manon-
pseudo components,” Genomics, vol. 111, no. 4, pp. 966–972, Jul. 2019. maniam Sundaranar University, Tirunelveli, India,
[37] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for in 2010 and 2016, respectively.
cancer classification using support vector machines,” Mach. Learn., He is currently working as an Associate Professor
vol. 46, nos. 1–3, pp. 389–422, 2002. with the School of Computer Science and Engineer-
[38] M. Lango and J. Stefanowski, “Multi-class and feature selection exten- ing, Vellore Institute of Technology, Vellore, India.
sions of roughly balanced bagging for imbalanced data,” J. Intell. Inf. He has more than 14 years of teaching experience in engineering colleges.
Syst., vol. 50, no. 1, pp. 97–127, 2018. He has authored 42 articles in international journals, 24 in international
conferences, and 12 in national conferences. His area of interests is image
processing and cryptography.
Dr. Raja is an Associate Editor of the International Journal of Interactive
Multimedia and Artificial Intelligence, Brazilian Archives of Biology and
Technology, Journal of Circuits, Systems and Computers, Computing and
Informatics, International Journal of Image and Graphics, and International
Journal of Biometrics.
G. Mariammal received the B.E. degree in com- E. Poongothai received the B.E. degree from the
puter science and engineering from Francis Xavier Bhajarang Engineering College, Anna University,
Engineering College, Tirunelveli, India, in 2011, Chennai, India, in 2011, and the M.E. and Ph.D.
and the M.E. degree in computer science and engi- degrees in computer science and engineering from
neering from Manonmaniam Sundaranar University, Manonmaniam Sundaranar University, Tirunelveli,
Tirunelveli, in 2017, where she is currently pur- India, in 2013 and 2020, respectively.
suing the Ph.D. degree in computer science and She is currently working as an Assistant Profes-
engineering. sor with the Department of Computer Science and
Her research areas are machine learning, data Engineering, SRM University, Chennai, India. Her
analytics, and image processing. research areas are machine learning and computer
vision.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on November 15,2022 at 06:03:48 UTC from IEEE Xplore. Restrictions apply.