You are on page 1of 6

A.

Write a research proposal including the following details:

1. Background to the study:

The background of big data classification is deliberated in section 1 and section 2

depicts the literature review of the existing big data classification methods. In

section 3, the proposed method of big data classification is presented and section 4

discusses the results of the proposed method. Finally, section 5 concludes the

paper.

2. Statement of the problem


The need to solve the multimodal optimization objectives with highly complex
and non-linear constraints insist the researchers to work for developing better
optimizations that assure the global optimization solutions without any conflicting
constraints. Metaheuristics pave a way for multi-objective problems, which never
concludes with a single best solution instead, metaheuristics generate a set of
solutions for a better approximation. Moreover, most of the algorithms developed
based on the metaheuristics is suitable for single objective optimizations rather
than for the multi-objectives, and these existing algorithms convert the multi-
objectives as single objective with the help of weights. On the other hand, the
generation of solutions with better diversity is another challenge faced by the
existing metaheuristics. Additionally, the real-world issues, like uncertainty and
noise should not have impact on the algorithm as it should be robust to permit
inhomogeneity and should offer a good option for the decision-makers to go for
effective decision-making. Thus, metaheuristic algorithms contribute much to the
multi-objective global optimization. Keeping all these in mind, a novel
metaheuristic search algorithm, called as E-bat algorithm is developed. The
proposed E-Bat algorithm is the integration of the EWMA [22] with BA [2].

3. Aims and objectives

The main aim of the research is to establish a big data classification model using

an optimization algorithm. The big data classification is progressed using the

MapReduce framework that uses the proposed optimization algorithm, named

Exponential Bat (E-Bat) algorithm. The proposed algorithm is obtained by

integrating the EWMA with the BA. The big data that is obtained from the
distributed sources is fed to the mapper phase that performs the feature selection

using the E-Bat algorithm. The effective feature extraction ensures the

classification of the data such that the classification accuracy is enhanced. The

selected features are fed to the reducer for data classification, and the reducer

utilizes the Neural Network (NN), which is trained using the proposed E-Bat

algorithm such that the data is classified as various classes.

4. Assumptions

5. Hypothesis

6. Research questions

7. Literature review

Eight literary works related to the data classification framework in the big data environment
is presented in table 1.

Authors Methodology Pros Cons


H. Ke, et al. [1] Big Data The proposed Neglected the use of
Classification with approach achieves deep learning and
Lightweight the performance optimal partitioning.
VGGNet without the need for
denoising the EEG.

Furthermore, the
approach requires
only one
hyperparameter,
which avoids the
potential errors
caused by excessive
parameter settings.

Cost effective
solution

P. Ezatpoor, et al. MapReduced Faster processing Data classification


[2] Enhanced Bitmap time for incomplete data
Index Guided has been left
Algorithm unaddressed

Suitable only for the


data environment
with moderate miss
rate.
Elkano, M. et al. [3] distributed Improves execution Fails to provide
MapReduce time without better results in high
prototype generation comprising accuracy dimensional
method CHI-PG and reduction rates. environment.
Mikel Elkano, et al. Fuzzy Rule-Based The model improved Problem of sizeup is
[4] Classification classification left unaddressed
System accuracy regardless
of total number of The algorithm has
computing nodes. linear relation on
execution time and
scaleup algorithm.
S. Ramírez-Gallego, Nearest Neighbor Alleviates time Driftt changes in
et al. [5] Classification problem occurring data are neglected
during High during data
dimensional classification.
scenario.
Zhai, J., et al. [6] Fuzzy integral-based Simple structure and Does not suitable for
ELM ensemble easy for multi-classification
implementation. of imbalanced data.
Deals with particle
problem of medium
size
Murugan, N.S. and LR-PCA Achieved increased Inefficient in large
Devi, G.U. [7] hybridization detection rate and datasets, and
accuracy neglected the effects
the noise in the data.
R. Varatharajan et LDA with an Use of reduced Using large data
al. [8] enhanced SVM feature set enhanced environments may
the data result in reduced
classification performance.

8. Research Methodology
a) Sample
b) Research Design
c) Tools for data collection

9. Data Analysis(Methods)

10. Expected research findings

11. Expected research implications

12. Conclusion
The paper deals with the proposed big data classification that aimed at
meeting the raising demands of high volume, high velocity, high value,
high veracity, and huge variety. The big data classification is performed
using the MapReduce framework such that the data from the distributed
sources is handled parallel at the same time. The big data is analyzed by
the MapReduce framework to yield the classified results and the
processing is of two steps. The first step is feature extraction that extracts
the optimal features from the data using the proposed E-Bat algorithm in
the mappers. In contrary, the classification is performed in the reducers
that are provided with the NN. The optimal tuning of the weights of NN is
processed using the proposed EBatNN algorithm. The final output from
the MapReduce framework is the classified big data that forms the clusters
for the whole big data. The experimentation of the proposed big data
classification is performed using four standard databases taken from the
UCI machine learning Repository.
13. References

[1] A. Alexandrov et al., The stratosphere platform for big data analytics, The VLDB

Journal, 23(6) (2014), 939-964.

[2] A. Fernandez et al., Fuzzy rule based classification systems for big data with

MapReduce: granularity analysis, Advances in Data Analysis and Classification,

11(4) (2017), pp. 711-730.

[3] A.J.C. Slooter et al., Seizure detection in adult ICU patients based on changes in EEG

synchronization likelihood, Neurocritical care, 5(3) (2006), 186-192.

[4] B. Xue, M. Zhang and W. N. Browne, Particle Swarm Optimization for Feature

Selection in Classification: A Multi-Objective Approach, in IEEE Transactions on

Cybernetics, 43(6) (2013), 1656-1671.

[5] Breast cancer dataset,

"http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29,"

accessed on March 2018.

[6] D. Singh, D. Roy, and C. K. Mohan, DiP-SVM: Distribution Preserving Kernel

Support Vector Machine for Big Data, IEEE Transactions on Big Data, 3(1) (2017),

pp. 79-90.

[7] D.Cui et al., Estimation of genuine and random synchronization in multivariate neural

series, Neural Networks, 23(6) (2010), 698-704.

[8] G. Chatzigeorgakidis et al., FML-kNN: scalable machine learning on Big Data using

k-nearest neighbor joins, Journal of Big Data, 5(1) (2018), p.4.

[9] G. Manogaran and D. Lopez, Spatial cumulative sum algorithm with big data

analytics for climate change detection,Computers & Electrical Engineering, 2017.

[10] H. Ke et al., Towards Brain Big Data Classification: Epileptic EEG Identification

with a Lightweight VGGNet on Global MIC, in IEEE Access 99, 1-1.

You might also like