Assignment 4

A.
Write a research proposal including the following details:
1. Background to the study:
The background of big data classification is deliberated in section 1 and section 2
depicts the literature review of the existing big data classification methods. In
section 3, the proposed method of big data classification is presented and section 4
discusses the results of the proposed method. Finally, section 5 concludes the
paper.
2. Statement of the problem

The need to solve the multimodal optimization objectives with highly complex
and non-linear constraints insist the researchers to work for developing better
optimizations that assure the global optimization solutions without any conflicting
constraints. Metaheuristics pave a way for multi-objective problems, which never
concludes with a single best solution instead, metaheuristics generate a set of
solutions for a better approximation. Moreover, most of the algorithms developed
based on the metaheuristics is suitable for single objective optimizations rather
than for the multi-objectives, and these existing algorithms convert the multi-
objectives as single objective with the help of weights. On the other hand, the
generation of solutions with better diversity is another challenge faced by the
existing metaheuristics. Additionally, the real-world issues, like uncertainty and
noise should not have impact on the algorithm as it should be robust to permit
inhomogeneity and should offer a good option for the decision-makers to go for
effective decision-making. Thus, metaheuristic algorithms contribute much to the
multi-objective global optimization. Keeping all these in mind, a novel
metaheuristic search algorithm, called as E-bat algorithm is developed. The
proposed E-Bat algorithm is the integration of the EWMA [22] with BA [2].
3. Aims and objectives
The main aim of the research is to establish a big data classification model using
an optimization algorithm. The big data classification is progressed using the
MapReduce framework that uses the proposed optimization algorithm, named
Exponential Bat (E-Bat) algorithm. The proposed algorithm is obtained by
integrating the EWMA with the BA. The big data that is obtained from the
distributed sources is fed to the mapper phase that performs the feature selection
using the E-Bat algorithm. The effective feature extraction ensures the
classification of the data such that the classification accuracy is enhanced. The
selected features are fed to the reducer for data classification, and the reducer
utilizes the Neural Network (NN), which is trained using the proposed E-Bat
algorithm such that the data is classified as various classes.
4. Assumptions
5. Hypothesis
6. Research questions
7. Literature review
Eight literary works related to the data classification framework in the big data environment
is presented in table 1.
Authors Methodology Pros Cons

H. Ke, et al. [1] Big Data The proposed Neglected the use of
Classification with approach achieves deep learning and
Lightweight the performance optimal partitioning.
VGGNet without the need for
denoising the EEG.
Furthermore, the
approach requires
only one
hyperparameter,
which avoids the
potential errors
caused by excessive
parameter settings.
Cost effective
solution
P. Ezatpoor, et al. MapReduced Faster processing Data classification

[2] Enhanced Bitmap time for incomplete data
Index Guided has been left
Algorithm unaddressed
Suitable only for the

data environment
with moderate miss
rate.
Elkano, M. et al. [3] distributed Improves execution Fails to provide
MapReduce time without better results in high
prototype generation comprising accuracy dimensional
method CHI-PG and reduction rates. environment.
Mikel Elkano, et al. Fuzzy Rule-Based The model improved Problem of sizeup is
[4] Classification classification left unaddressed
System accuracy regardless
of total number of The algorithm has
computing nodes. linear relation on
execution time and
scaleup algorithm.
S. Ramírez-Gallego, Nearest Neighbor Alleviates time Driftt changes in
et al. [5] Classification problem occurring data are neglected
during High during data
dimensional classification.
scenario.
Zhai, J., et al. [6] Fuzzy integral-based Simple structure and Does not suitable for
ELM ensemble easy for multi-classification
implementation. of imbalanced data.
Deals with particle
problem of medium
size
Murugan, N.S. and LR-PCA Achieved increased Inefficient in large
Devi, G.U. [7] hybridization detection rate and datasets, and
accuracy neglected the effects
the noise in the data.
R. Varatharajan et LDA with an Use of reduced Using large data
al. [8] enhanced SVM feature set enhanced environments may
the data result in reduced
classification performance.
8. Research Methodology
a) Sample
b) Research Design
c) Tools for data collection
9. Data Analysis(Methods)
10. Expected research findings
11. Expected research implications
12. Conclusion
The paper deals with the proposed big data classification that aimed at
meeting the raising demands of high volume, high velocity, high value,
high veracity, and huge variety. The big data classification is performed
using the MapReduce framework such that the data from the distributed
sources is handled parallel at the same time. The big data is analyzed by
the MapReduce framework to yield the classified results and the
processing is of two steps. The first step is feature extraction that extracts
the optimal features from the data using the proposed E-Bat algorithm in
the mappers. In contrary, the classification is performed in the reducers
that are provided with the NN. The optimal tuning of the weights of NN is
processed using the proposed EBatNN algorithm. The final output from
the MapReduce framework is the classified big data that forms the clusters
for the whole big data. The experimentation of the proposed big data
classification is performed using four standard databases taken from the
UCI machine learning Repository.
13. References
[1] A. Alexandrov et al., The stratosphere platform for big data analytics, The VLDB
Journal, 23(6) (2014), 939-964.
[2] A. Fernandez et al., Fuzzy rule based classification systems for big data with
MapReduce: granularity analysis, Advances in Data Analysis and Classification,
11(4) (2017), pp. 711-730.
[3] A.J.C. Slooter et al., Seizure detection in adult ICU patients based on changes in EEG
synchronization likelihood, Neurocritical care, 5(3) (2006), 186-192.
[4] B. Xue, M. Zhang and W. N. Browne, Particle Swarm Optimization for Feature
Selection in Classification: A Multi-Objective Approach, in IEEE Transactions on
Cybernetics, 43(6) (2013), 1656-1671.
[5] Breast cancer dataset,
"http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29,"
accessed on March 2018.
[6] D. Singh, D. Roy, and C. K. Mohan, DiP-SVM: Distribution Preserving Kernel
Support Vector Machine for Big Data, IEEE Transactions on Big Data, 3(1) (2017),
pp. 79-90.
[7] D.Cui et al., Estimation of genuine and random synchronization in multivariate neural
series, Neural Networks, 23(6) (2010), 698-704.
[8] G. Chatzigeorgakidis et al., FML-kNN: scalable machine learning on Big Data using
k-nearest neighbor joins, Journal of Big Data, 5(1) (2018), p.4.
[9] G. Manogaran and D. Lopez, Spatial cumulative sum algorithm with big data
analytics for climate change detection,Computers & Electrical Engineering, 2017.
[10] H. Ke et al., Towards Brain Big Data Classification: Epileptic EEG Identification
with a Lightweight VGGNet on Global MIC, in IEEE Access 99, 1-1.

Assignment 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment 4

Uploaded by

Copyright:

Available Formats

A.

Write a research proposal including the following details:

1. Background to the study:

The background of big data classification is deliberated in section 1 and section 2

2. Statement of the problem

3. Aims and objectives

an optimization algorithm. The big data classification is progressed using the

MapReduce framework that uses the proposed optimization algorithm, named

Exponential Bat (E-Bat) algorithm. The proposed algorithm is obtained by

algorithm such that the data is classified as various classes.

Authors Methodology Pros Cons

P. Ezatpoor, et al. MapReduced Faster processing Data classification

Suitable only for the

10. Expected research findings

11. Expected research implications

Journal, 23(6) (2014), 939-964.

MapReduce: granularity analysis, Advances in Data Analysis and Classification,

11(4) (2017), pp. 711-730.

synchronization likelihood, Neurocritical care, 5(3) (2006), 186-192.

Selection in Classification: A Multi-Objective Approach, in IEEE Transactions on

Cybernetics, 43(6) (2013), 1656-1671.

[5] Breast cancer dataset,

accessed on March 2018.

[6] D. Singh, D. Roy, and C. K. Mohan, DiP-SVM: Distribution Preserving Kernel

series, Neural Networks, 23(6) (2010), 698-704.

k-nearest neighbor joins, Journal of Big Data, 5(1) (2018), p.4.

analytics for climate change detection,Computers & Electrical Engineering, 2017.

with a Lightweight VGGNet on Global MIC, in IEEE Access 99, 1-1.

You might also like