You are on page 1of 3

Scheduling Algorithms Used in BigData Platforms

Athar Hussein Ali Al-Azzawi


Department of Information Techonolgy
Altinbas University
Istanbul, Turkey
lolo89.adm@gmail.com

183104946

Abstract
big data highlight on the new technology that reveals in a computers
sector, in this paper we illustrating platforms used in big data. This study
shows different types of scheduling algorithms which it is a significant part
in big data.
I. INTRODUCTION

Big data considered one of the most significant because it effect on the
computer sector for the modern technology. However, the data have
grown in the last 10 years enormously. Which shows increase in data
creation comes from many different types of processes carries out in
companies from diverse fields, such as oceanography, astronomy,
manufacturing and many others [1]. moreover, there are different types of
traditional tools created to handle these data, but for these dramatical
increase of data the normal tools cannot handle it, for that the developer
invite new types of tools for instance Hadoop [2], spark, flink, storm and
many other tools for processing and storage these giant of data. Form
pointing to the need for processing data, Hadoop is a standard platform for
process and store for several types of batch data, when the spark also
used for processing stream data
I. MOTIVATIONS
1-which type of data platforms can process?
2-which algorithms built-in for each platform?

I. MOTHDOLOGY

Hadoop used for process batch data and store it, Hadoop composes form
yarn and HDFS, where YARN is the resource management for all data and
scheduling algorithms are built or plug-in where spark used YARN for
scheduling tasks and HDFS for storage [3]. Task scheduling is an
essential part in Apache Hadoop, when the utmost important part is the
way of sharing resources without make an overload on node for that
Hadoop used FIFO as default scheduling algorithm, whereas each
scheduling algorithms has own mechanism for scheduling tasks [4]. Fair
scheduling algorithms used for scheduling but there are big difference in
the mechanism used for each one for FIFO scheduling tasks depending on
the arrival time and the node task all resources for one task till finished
processing when fair scheduling depending on the sharing resources
between tasks and all task can

I. CONCLUSION
Hadoop are the defacto standard for processing big data where the other
platforms can used Hadoop also. However, different scheduling algorithm
we investigated in this study, this shows Fair was the best compare to the
others form different sides such as processing time, sharing resources,
homogeneity clustering node, etc.
References
[1] J. V. Gautam, H. B. Prajapati, V. K. Dabhi, and S.
Chaudhary, "A survey on job scheduling algorithms in
Big data processing," in Electrical, Computer and
Communication Technologies (ICECCT), 2015 IEEE
International Conference on, 2015, pp. 1-11: IEEE.
[2] T. White, Hadoop: The Definitive Guide, 4th Edition.
2015.
[3] D. Cheng, X. Zhou, P. Lama, J. Wu, and C. Jiang,
"Cross-platform resource scheduling for spark and
MapReduce on YARN," IEEE Transactions on
Computers, vol. 66, no. 8, pp. 1341-1353, 2017.
[4] M. Usama, M. Liu, and M. Chen, "Job schedulers for
Big data processing in Hadoop environment: Testing
real-life schedule with benchmark programs," Digital
Communications and Networks, vol. 28, no. 1, pp. 34-
39, 2017.

You might also like