Professional Documents
Culture Documents
net/publication/342946040
CITATIONS READS
5 6,789
1 author:
Preeti Gulia
Maharshi Dayanand University
71 PUBLICATIONS 269 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Preeti Gulia on 15 July 2020.
Ayushi Chahal
Research Scholar, Department of Computer Science and Application,
MDU, Rohtak, Haryana, India
ABSTRACT
The Internet has helped technology and communication to grow very fast, which
further increased the connection between different machines and sensor-based devices.
This connection of machines or devices through the internet gives rise to the concept of
IoT (Internet of Things). Various wearable devices like smart-watch, cars, home
appliances like washing machines, doors, door locks, lights, etc. are now connected
over the Internet of things. These sensor devices produce Big data in bulk per day. This
data can be used for analysis to solve out different day-today problems. This paper
discusses different Big data tools and techniques that can be used for IoT frameworks.
It also presented a way how Big Data can be used to analyze IoT data sets intelligently.
Different platforms of Big-data Analytics are explained in detail, and light is given on
which of them is best for IoT data.
Keywords: Big data, Frameworks, Internet of Things (IoT), Architecture, Big Data
Analytics (BDA)
Cite this Article: Preeti Gulia and Ayushi Chahal, Big Data Analytics for IoT,
International Journal of Advanced Research in Engineering and Technology (IJARET),
11(6), 2020, pp. 593-603.
http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=11&IType=6
1. INTRODUCTION
Big-Data is developing briskly, and so is IoT. [31] This recent advancement is affecting all
areas of business and technology. Data produced by IoT devices play an essential role in the
conversion of raw data to knowledge. This can be done by applying the correct methods of big
data analytics over raw data. Gartner has characterized Big Data in three qualities [1], i.e.,
volume, variety, and velocity, which are discussed in detail in section 2 of the paper.
IoT collects data in different forms and from different sources; that is why it is called a
heterogeneous data.[2] IoT can collect data from healthcare industries, smart homes, smart
traffic management, airplane system, railways system, weather forecasting system, agricultural
sensors, and many more shown in figure 1. IoT data is unstructured, having no pattern. By
applying the right big data analytical techniques, one can find out the hidden pattern, new
information, hidden correlation, revel trends, etc. [3], [36], [50] from this unstructured data.
1.1. IoT
When a set of anyone, anything, anytime, anyplace, any service, and any network gets
connected, it creates a situation of Internet of Things (IoT). Researchers give different
definitions and architectures for IoT.
IoT is a system of interrelated things or machines (computing, mechanical or digital
devices) which can connect these machines or things without any interruption of human. It is a
Machine to Machine (M2M) communication process.
Application Layer
Network Layer
Perception Layer
processors and techniques cannot handle such a large amount of heterogeneous data.
So, there is a strict requirement of enhanced techniques to process such data.
• Velocity: Velocity represents the rate of big incoming data from various devices. This
velocity is indeed an essential factor of big data. Velocity describes the speed of
generating the data by various machines over the network. One of the most common
examples of data generation speed is social media. It creates a variety of data. Now,
every person is concerned to post most hot updates about themselves (a tweet, Instagram
posts, WhatsApp status updates, etc.)
• Variety: As the definition of Big data says, it is a large amount of heterogeneous data.
So, variety is indeed the essential property of big data. These days collection of different
kinds of data types (structured, semi-structured, or unstructured) exists over data
generation devices. Sometimes, this collected data may be in a different format as
expected. This unexpected format may cause trouble in the data processing. To remove
these troubles, any organization should have that kind of data storage system which can
examine and process any form of data irrespective of their structure.[5]
• Value: Continuous amount of data generation tends to create Big Data. This data is of
no use until or unless it seems to have some value. Thus the value of data indeed is an
essential factor of big data. These days big data analytics, which has become an integral
part of the society, is based on the valuable data that different devices provide to the
analyst or data scientist. It is not always necessary that big data will have a value.
• Veracity: Veracity does not refer to the quantity of data. It belongs to the
understandability of data that Big data provides to its users. Any organization working
on a large amount of data should remove “dirty data” before it accumulates in the
systems.
• Validity: For future use of data, it must be precise and accurate. Any organization should
validate the data if it wants to make correct decisions for the future based on the data
collected by the devices. So, Validity is considered an essential factor for big data.
• Variability: Variability includes data consistency and value of data.
• Viscosity: Viscosity is considered as a part of velocity. It is used to describe the delay
or lag-time which occurs between the sender and receiver during data transmission.[5]
• Virality: It describes the data speed. This property has checks on the data speed with
which sender and receiver access data from different devices.
• Visualization: This property represents big data symbolically. Visualization helps to
find out the hidden patterns. These hidden patterns help in decision making for any
query of big data. Visualization helps Big data to play an essential part in decision-
making.
For handling such a massive amount of data, reliable software systems are required.
Software testing plays a crucial role in ensuring the quality of the software [37-49]
After analyzing the data, these tools are also used to visualize these outcomes in the form
of a graph, tables, pie chart, bar chart, etc. Here in this section, various platforms that can
analyze IoT data are discussed. Big data analytics platforms are described below [12]:
3.3. Dryad
It works as a data flow graph for parallel as well as distributed data sets. A user can use multiple
machines at a time without knowing concurrent programming. It is efficient in handling faults
in the cluster, graph generation, scheduling available free machines for allotment, visualizing
jobs to free machines, etc. [21]
3.5. Storm
It is used for extensive data processing. It works on real-time data, which should be distributed
and fault-tolerant. It forms a cluster of data that is similar to Hadoop clusters. It also works as
a Master node and worker node.
3.6. Splunk
It is a combination of Big data and cloud technology. It uses a web interface to allow the user
to analyze, search, and monitor the data. It helps to index structured and unstructured data
generated by machines. Hence, it is useful for IoT Big data-sets. It is an intelligent support
system for real-time and business-oriented data exploration. [23]
3.7. Jaspersoft
It is an open-source tool that is used for real-time data analysis. It visualizes data on various
platforms like Mongo DB, Cassandra, Redis. It can create powerful HTML reports.
3.9. 1010data
It consists of columns in the database. It deals with semi-structured data. It supports enormous
scale infrastructure. It is not considered adequate for extracting the data, transforming the data,
and loading the data. It provides advanced analytical services, including statistical analysis and
optimization also. [14]
3.11. SAP-Hana
It is used for in-memory addressing transactions for big IoT data analytics. It gives solutions to
various big unstructured IoT data. SAP-Hana contains libraries for spatial processing, text
analysis, and support R tool language. [16]
3.12. HP-HAVEn
HP introduced Hadoop Autonomy Vertica Enterprise (HAVEn). A large number of HP systems
use this platform for Big IoT data analytics. It is for massive data, which is analyzed as the
columnar database. It provides parallel processing. [17]
3.13. Hortonworks
Hortonworks is a Hadoop based platform. It is used for Big IoT data analytics. It is open-source
software and an improved version of Hive. It can-not minimizes the number of nodes group.
[18]
3.15. Infobright
It is suitable for the analysis of machine-generated data like IoT data. It can analyze up to 50
TB at a time. It works with large scale data-based systems such as Hadoop. It is a columnar
designed tool which has data skipping and automatic indexing property. [20]
4. CONCLUSION
IoT has now become a significant source of Big Data, which is useless if not analyzed properly.
This paper focuses on Big Data context concerning the Internet of Things. It describes the basic
concepts of IoT and its architecture. It gives an elaborated structure of Gartner’s 3 V’s model
Big Data in the form of 10 V’s model. This paper enhances the understandability of the reader
for the relation between IoT, Big data, and analytics. It familiarizes reader to different Big Data
Analytics platforms which can handle various IoT datasets. After reading his paper, a reader
will be aware of different platforms and will be able to select one for their particular problems.
REFERENCES
[1] M. Beyer, ``Gartner says solving `Big Data' challenge involves more than just managing
volumes of data,'' Tech. Rep., AaltoDoc, Aalto Univ., 2011.
[2] R. Mital, J. Coughlin, and M. Canaday, ``Using big data technologies and analytics to predict
sensor anomalies,'' in Proc. Adv. Maui Opt. Space Surveill. Technol. Conf., Sep. 2014, p. 84.
[3] N. Golchha, ``Big data-the information revolution,'' Int. J. Adv. Res., vol. 1, no. 12, pp. 791_794,
2015.
[4] Y. Wang, L. Kung, W. Y. C. Wang, and C. G. Cegielski, “An integrated big data analytics-
enabled transformation model: Application to health care,” Inf. Manage., vol. 55, no. 1, pp. 64–
79, Jan. 2018.
[5] R. Khan, S. Khan, R. Zaheer & S. Khan, “Future Internet: The internet of things architecture,
possible applications, and key challenges,” In Proceedings of international conference on
frontiers of information technology, pp. 275-260, 2012.
[6] A. Ilapakurti, J. S. Vuppalapati, S. Kedari, S. Kedari, C. Chauhan, and C. Vuppalapati,
“iDispenser #x2014; Big Data Enabled Intelligent Dispenser,” in 2017 IEEE Third International
Conference on Big Data Computing Service and Applications (BigDataService), pp. 124–130,
2017.
[7] Y. Wang, L. Kung, and T. A. Byrd, “Big data analytics: Understanding its capabilities and
potential benefits for healthcare organizations,” Technol. Forecast. Soc. Change, vol. 126, pp.
3–13, Jan. 2018.
[8] M. Marjani, F. Nasaruddin, A. Gani, A. Karim, I.A.T. Hashem, A. Siddiqa, I. Yaqoob “Big IoT
Data Analytics: Architecture, Opportunities, and Open Research Challenges,” IEEE Access, vol.
5, pp. 5247–5261, 2017.
[9] E. Ahmed, Ibrar Yaqoob, Ibrahim Abaker Targio Hashem, Imran Khan, Abdelmuttlib Ibrahim
Abdalla Ahmed, Muhammad Imran, Athanasios V. Vasilakos, “The role of big data analytics
in Internet of Things,” Computer Networks, vol. 129, pp. 459–471, Dec. 2017.
[10] T. O. Center: Introducción a Hadoop y su ecosistema.
http://www.ticout.com/blog/2013/04/02/introduccion-a-Hadoop-y-su-ecosistema/
[11] Acharjya, D.P., Ahmed, K., “A survey on Big Data analytics: challenges, open research issues,
and tools.” in Int. J. Adv. Comput. Sci. Appl. Vol.7, issue 2, pp. No.- 511–518, 2016.
[12] F. Constante Nicolalde, F. Silva, B. Herrera, and A. Pereira, “Big Data Analytics in IoT:
Challenges, Open Research Issues and Tools,” in Trends and Advances in Information Systems
and Technologies, Cham, 2018, pp. 775–788.
[13] A. S. Foundation: Spark 0.8.0: This document gives a short overview of how Spark runs on
clusters, to make it easier to understand the components involved, 2014, https://spark.
apache.org/docs/0.8.0/cluster-overview.html
[14] V. Morabito, “Managing change for big data driven innovation,” in Big Data and Analytics.
Springer, 2015, pp. 125–153.
[15] A. Bhardwaj, S. Bhattacherjee, A. Chavan, A. Deshpande, A. J. Elmore, S. Madden, and A. G.
Parameswaran, “Datahub: Collaborative data science & dataset version management at scale,”
arXiv preprint arXiv:1409.0798, 2014.
[16] F. Farber, S. K. Cha, J. Primsch, C. Bornh¨ovd, S. Sigg, and W. Lehner, “Sap hana database:
data management for modern business applications,” ACM Sigmod Record, vol. 40, no. 4, pp.
45–51, 2012.
[17] S. Burke, “Hp haven big data platform is gaining partner momentum,” CRN [online]
http://www. crn.com/news/applications-os/240161649, 2013.
[18] (2019, Accessed on 3rd December) Hortonworks. [Online]. Available:
https://hortonworks.com/
[19] Y. Zhuang, Y.Wang, J. Shao, L. Chen, W. Lu, J. Sun, B.Wei, and J. Wu, “D-ocean: an
unstructured data management system for data ocean environment,” Frontiers of Computer
Science, vol. 10, no. 2, pp. 353–369, 2016. [Online]. Available: http://dx.doi.org/10.1007/s11704-
015-5045-6
[20] D. Slezak, P. Synak, J. Wr ´oblewski, and G. Toppin, “Infobright analytic database engine using
rough sets and granular computing,” in Granular Computing (GrC), 2010 IEEE International
Conference on. IEEE, 2010, pp. 432–437.
[21] Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D. Dryad, “distributed data-parallel programs
from sequential building blocks” in ACM SIGOPS Oper. Syst. Rev. 41, pp. No.- 59–72, 2007.
[22] Kelly, J.: Apache Drill Brings SQL-Like, Ad Hoc Query Capabilities to Big Data (2013).
http://wikibon.org/wiki/v/Apache_Drill_Brings_SQL-Like,_Ad_Hoc_Query_Capabilities_to_Big_Data
[23] C.L.P., Chen, C.Y. Zhang, “Data-intensive applications, challenges, techniques, and
technologies: a survey on Big Data.” In Inf. Sci. 275, pp. no. -314–347, 2014.
[24] G. Ingersoll, “Introducing apache mahout: Scalable, commercial-friendly machine learning for
building intelligent applications,” White Paper, IBM Developer Works, pp. no. - 1- 8, 2009.
[25] A. Verma, “Internet of Things and Big Data - Better Together,” Whizlabs Blog, 01-Aug-2018.
[Online]. Available: https://www.whizlabs.com/blog/iot-and-big-data/. [Accessed: 11-Mar-2020].
[26] “Integrating IoT with Big Data, a Revolutionary Step,” Experfy Insights. [Online]. Available:
https://www.experfy.com/blog/integrating-iot-with-big-data-a-revolutionary-step. [Accessed: 11-Mar-
2020].
[27] C.-W. Tsai, C.-F. Lai and A. V. Vasilakos, “Future Internet of Things: open issues and
challenges,” Wireless Netw, vol. 20, no. 8, pp. 2201–2217, Nov. 2014, DOI: 10.1007/s11276-014-
0731-0.
[28] M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Netw Appl, vol. 19, no. 2, pp. 171–
209, Apr. 2014, DOI: 10.1007/s11036-013-0489-0.
[29] G. Manogaran, D. Lopez, C. Thota, K. M. Abbas, S. Pyne, and R. Sundarasekar, “Big Data
Analytics in Healthcare Internet of Things,” in Innovative Healthcare Systems for the 21st
Century, H. Qudrat-Ullah and P. Tsasis, Eds. Cham: Springer International Publishing, 2017,
pp. 263–284.
[30] F. Alshohoumi, M. Sarrab, A. AlHamadani, and D. Al-Abri, “Systematic Review of Existing
IoT Architectures Security and Privacy Issues and Concerns,” International Journal of
Advanced Computer Science and Applications (IJACSA), vol. 10, no. 7, 57/31 2019, DOI:
10.14569/IJACSA.2019.0100733.
[31] “LNCS Titles published in 2015,” springer.com.
http://www.springer.com/computer/lncs?SGWID=4-164-66-653429-0 (accessed May 03, 2020).
[32] M. Mittal, V. E. Balas, L. M. Goyal, and R. Kumar, Eds., Big Data Processing Using Spark in
Cloud. Springer Singapore, 2019.
[33] S. Tanwar, S. Tyagi, and N. Kumar, Eds., Multimedia Big Data Computing for IoT Applications:
Concepts, Paradigms, and Solutions. Springer Singapore, 2020.
[34] A. Dhankhar, K. Solanki, A. Rathee and Ashish, “Predicting Student’s Performance by using
Classification Methods,” International Journal of advanced trends in computer science and
engineering, Volume 8 No. 4, 2019.
[35] A. Dhankhar and K. Solanki, State of the Art of Learning Analytics in Higher Education,
International journal of emerging trends in engineering research, Vol. 8 No. 3, pp. 868-877,
2020.
[36] M. Hooda and C. Rana, Learning Analytics Lens: Improving Quality of Higher Education,
International journal of emerging trends in engineering research, Vol. 8 No. 5, pp. 1626-1646,
2020.
[37] A. Dhankhar and K. Solanki, A Comprehensive Review of Tools & Techniques for Big Data
Analytics, International journal of emerging trends in engineering research, Vol. 7 No. 11, pp.
556-562, 2019.
[38] O. Dahiya and K. Solanki, S. Dalal, A. Dhankhar, Regression Testing: Analysis of its
Techniques for Test Effectiveness, International Journal of advanced trends in computer science
and engineering, Vol. 9, No. 1, pp. 737-744, 2020.
[39] O. Dahiya and K. Solanki, Comprehensive cognizance of Regression Test Case Prioritization
Techniques, International journal of emerging trends in engineering research, Vol. 7 No. 11, pp.
638-646, 2019.
[40] O. Dahiya and K. Solanki, S. Dalal, A. Dhankhar, An Exploratory Retrospective Assessment on
the Usage of Bio-Inspired Computing Algorithms for Optimization, International journal of
emerging trends in engineering research, Vol. 8 No. 2, pp. 414-434, 2020.
[41] O. Dahiya and K. Solanki, and A. Dhankhar, Risk-Based Testing: Identifying, Assessing,
Mitigating & Managing Risks Efficiently In Software Testing, International Journal of advanced
research in engineering and technology (IJARET), Vol. 11, Issue 3, pp. 192-203, 2020.
[42] O. Dahiya, and K. Solanki, A systematic literature study of regression test case prioritization
approaches, International Journal of Engineering & Technology, 7(4), pp.2184-2191, 2018.
[43] O. Dahiya, K. Solanki and S. dalal, Comparative Analysis of Regression Test Case Prioritization
Techniques, International Journal of advanced trends in computer science and engineering, Vol.
8 No. 4, pp. 1521-1531, 2019.
[44] K. Solanki, Y. Singh, and S. Dalal, “Experimental analysis of m-ACO technique for regression
testing,” Indian Journal of Science and Technology, 9(30), pp.1-7.
[45] K. Solanki, and S. Kumari, “Comparative study of software clone detection techniques.”
In 2016 Management and Innovation Technology International Conference (MITicon), pp.
MIT-152, IEEE, 2016.
[46] Shivani Yadav and Bal Kishan, “Reliability of Component-Based Systems – A Review”,
International Journal of Advanced Trends in Computer Science and Engineering, vol. 8, no. 2,
pp. 293-299, 2019. doi: doi.org/10.30534/ijatcse/2019/31822019
[47] Shivani Yadav and Bal Kishan, “Assessment of software quality models to measure the
effectiveness of software quality parameters for Component Based Software (CBS)”, Journal of
Applied Science and Computations, vol. 6, no. 4, pp. 2751-2756, 2019.
[48] S. Yadav and B. Kishan, “Analysis and Assessment of Existing Software Quality Models to
Predict the Reliability of Component-Based Software”, International journal of emerging trends
in engineering research, vol. 8, no. 6, 2020. [In Press]
[49] P. Gulia and Palak, “Nature-inspired soft computing-based software testing techniques for
reusable software components” Journal of Theoretical & Applied Information Technology,
95(24), 2017.
[50] P. Gulia, and Palak, “Hybrid swarm and GA based approach for software test case selection.”
International Journal of Electrical & Computer Engineering, pp. 2088-8708, Issue-9, 2019.
[51] R. Ratra, and P. Gulia, “Big Data Tools and Techniques: A Roadmap for Predictive Analytics.”,
International Journal of Engineering and Advanced Technology (IJEAT), Vol. 9, Issue-2, pp.
4986-4992, 2019.
[52] K. Vikram, Ch.Aparna, Harshitha.B and Ishpreet Kaur, A Secure and Certifiable Access
Mechanism System Designed For Big Data Storage In Clouds. International Journal of
Computer Engineering & Technology, 9(2), 2018, pp. 86–90.
[53] Azhagammal Alagarsamy and Dr. K. Ruba Soundar, A Survey Paper on Deep Belief Network
for Big Data. International Journal of Computer Engineering and Technology, 9(5), 2018, pp.
161-166.
[54] Dr. Nirmal Kumar Gupta, Addressing Big Data Security Issues and Challenges. International
Journal of Computer Engineering & Technology, 9(4), 2018, pp. 229-237.
[55] Kodimalar Palanivel and Chellammal Surianarayanan, An Approach for Prediction of Crop
Yield Using Machine Learning and Big Data Techniques, International Journal of Computer
Engineering and Technology 10(3), 2019, pp. 110-118.